Samuel Kong.png

A Sondrel Engineering Consultant and alumni of Imperial College, Samuel Kong, attended this year's British Machine Vision Conference held at his old university. 

In his first blog, Samuel gave a summary explanation of Image Convolution, Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN). In the second of this series, he looked at innovative projects presented at the event. In this final post he looks at 3 more projects, presented by Google and Microsoft Research:

  • Exploring the Structure of a Real-Time, Arbitrary Neural Artistic Stylisation Network
  • PixColour: Pixel Recursive Colourisation
  • HoloLens: Computer Vision Meets Mixed Reality


Presented by Golnaz Ghiasi of Google Inc.

The project presented a method to combine any pair of content and style images. A content image could be a photo of a landmark or a person, whereas a style image could be a painting of a certain style, such as Van Gough’s Starry Night. Figure 22 below shows examples of different styles combined with four different content images.


The program uses a set of Neural Networks called Style Transfer Network and Style Prediction Network to combine the pair of images. The project presents a new algorithm that can apply the combination in real-time on content and style images that have never been seen before. Figure 23 below shows the architecture used to combine the content image and the style image using Neural Networks. By calculating the style loss and the content loss as error functions, the networks can be trained to compromise between the two losses. Figure 24 shows how the content image and the style image are on two ends of the scale and it is down to the training process to balance between the two.



For more information, refer to the white paper titled: Exploring the structure of a real-time, arbitrary neural artistic stylization network [17].



Presented by Sergio Guadarrama of Google.

The aim of this project was to take greyscale images and reproduce them with plausible colour, such that if a person were to look at it, the image would appear realistic. Figure 25 below shows a greyscale image of a fish and three recolourisation results from the proposed architecture. The image on the far right is the original image of the fish of which the greyscale image came from. The example shows how a single greyscale image could yield many different recolourised results.


The key contribution of this project was the Pixel Recursive Colorization architecture, which is shown in Figure 26 below.


To test the performance of their solution, a Visual Turing Test was carried out, where a panel of human judges were used to judge between the output of their algorithm and the ground truth image. The judges were shown each image for one second and they had to choose which they thought were real and which were generated. This test was applied with images generated by similar work done by other research groups. The results of the test showed that the proposed solution in this project was more likely to be selected than the other related works as the real image, albeit only having a 38.3% selection rate.

For more information, refer to the white paper titled: PixColor, Pixel Recursive Colorization [18].



Presented by Jamie Shotton of Microsoft Research.

This workshop keynote was held by Microsoft Research to present their HoloLens product. The product is similar to the Google Glass, which applies Augmented Reality (AR) over the scenery the user is currently in. Figure 27 below is the Microsoft HoloLens that was the focus of the Workshop Keynote presented.

It is worth mentioning that the focus of the presentation in the keynote was very software oriented. The main coverage was on the applications, algorithms and breakthroughs in detecting and modelling hand movement and gesture. There was a single slide on the hardware used in the HoloLens, so the bulk of the details on the Hardware covered in this section is from online research outside of the conference.


The idea of the HoloLens was for a solution for Mixed Reality where users can share their vision through Augmented Reality regardless of where they were located. The example applications presented during the conference included using the HoloLens as a portable, virtual computer where all the computations were carried out within the glasses and the user interface was created via AR. The examples also included multiple users using HoloLenses for collaborative work such as in workshops or factories. The HoloLens can also be used to connect two users who are far away through a Skype Conference-like mechanism where the HoloLens projects the other member into the scenery. Figure 28 below shows two example uses for the HoloLens, one as an interface to a virtual speaker, another as a collaborative tool.


In order to make all the software that will drive those interactions a reality, Microsoft have been working on dedicated hardware to carry out the computationally intensive processes. Microsoft have designed a Holographic Processing Unit (HPU) that acts as a Co-Processor to the SoC to aid the processing required for generating the 3D Holographic models for AR. The HPU is responsible for taking all the bulk sensor data, processing it and packaging it in a structure that is easy for the SoC to consume. Figure 29 below shows the architecture of the HoloLens. Within the HoloLens architecture features is an Intel Cherry Trail SoC that is based around their Intel Atom Processors; the HPU is custom made specifically for the HoloLens, and a variety of smaller off-the-shelf IPs. Figure 30 below shows the architecture of the Intel Cherry Trail SoC and Figure 31 shows the floorplan of the custom made HPU.




During the conference, it was mentioned that Microsoft was currently working on the next generation HPU. HPU 2.0 is to feature an AI co-processor that is custom designed to implement DNNs natively and flexibly. There is currently little detail available on the internals of the HPU 2.0, but is to be featured in the next iteration of the HoloLens.

For more information, refer presentation titled: Mixed Reality:Dalle demo a un prodotto [20]. The presentation was from a conference a year ago.


[17]    G. Ghiasi, H. Lee and M. Kudlur, “Exploring the structure of a real-time, arbitrary neural artistic stylization network,” in British Machine Vision Conference 2017, London, 2017.

[18]    S. Guadarrama, R. Dahl, D. Bieber, M. Norouzi, J. Shlens and K. Murphy, “PixColor, Pixel Recursive Colorization,” in British Machine Vision Conference 2017, London, 2017.

[19]    Microsoft, “Microsoft Hololens,” [Online]. Available: [Accessed 17 September 2017].

[20]    M. Valoriani, “Mixed Reality:Dalle demo a un prodotto,” in Disruptive Technologies Conference, Torino, 2016.

[21]    C. Angelini, “Tom's Hardware,” 30 August 2016. [Online]. Available:,news-53783.html. [Accessed 17 September 2017].

[22]    Biomedical Image Analysis Group, “BioMedIA,” 2017. [Online]. Available: [Accessed 17 September 2017].

[23]    B. Hou, A. Alansary, S. McDonagh, A. Davidson, M. Rutherford, J. V. Hajnal, D. Rueckert, B. Glocker and B. Kainz, “Predicting Slice-to-Volume Transformation in Presence of Arbitrary Subject Motion,” in Medical Image Computing and Computer Assisted Intervention 2017, Quebec, 2017.

[24]    Imperial College London, “Personal Robotics Lab,” [Online]. Available: [Accessed 17 September 2017].

[25]    H. &. R. B. Wang, “A survey: Time travel in deep learning space: An introduction to deep learning models and how deep learning models evolved from the initial ideas.,” 2015.

[26]    S. Raschka, “What is the Role of the Activation Function in a Neural Network?,” KD nuggets, August 2016. [Online]. Available: [Accessed 11 September 2017].