Faster, More Effective Image Segmentation & Image-To-Image Translation Through Deep Learning Research
Neural network and machine learning research is advancing at a rapid pace, capturing the public’s attention with the promise of technology adaptations in self driving cars, medical advancements and enterprise applications that will impact businesses globally.
The NVIDIA Research team, a group of 120+ researchers, is working at the forefront of AI to enable a deeper understanding of our world that allows machine learning / deep learning algorithms to be smarter and faster with advanced training models that are applied to computer vision, graphics, self-driving cars, and more.
GPUs are at the heart of high precision computing and the research papers featured here are a sample of the work that the NVIDIA Research team is pioneering.
Spatial Propagation Network
Understanding images quickly and effectively is key to any number of real world situations from recognizing one face from another, to objects a self-driving car might encounter on the road.
Learning Affinity via Spatial Propagation Networks is research that proposes a unique solution for improved image segmentation, called the spatial propagation network.
This research works to generate more accurate results quickly, as can be seen in the difference between the “coarse mask” and “refined result” in Figure 1.
How it works
Image segmentation identifies and labels each pixel to belong to a specific class within the picture. As can be seen below, the specific pixels are then put into categories (given “semantic naming”) such as “bicycle tire,” “bike frame,” or “tree.”
By itself this results in the image you see in the “coarse mask” example. To get even more detailed results the researchers have developed a three-way connected model. In Figure 1 you can see how the blurry black and white image gets clearer through the use of an affinity matrix.
What is affinity?
Affinity describes the relationships between pixels, describing which elements are closer together physically, or categorically. For example, you can see two categorically (or, semantically) similar clusters in the photo below, both identified as “eye.”
Through the affinity matrix, the results are refined through layering information about associated pixels on all sides, resulting in updated values which refine the image.
In this way, you can see how the method of using a spatial propagation network far outpaces the traditional method, not to mention up to 100x faster through GPUs, using the CUDA parallel programming model.
Generative Adversarial Networks (GANs)
This second research paper demonstrates the capability of a neural network to take a Winter image scene and use AI to generate a rendition of that image in Summer. Similarly, it was able to input a Sunny image scene and generate a rainy image scene:
The full article, “Unsupervised Image-to-Image Translation Networks,” describes how this was achieved, primarily through a practice called unsupervised learning.
What is the difference between supervised and unsupervised learning?
In order to generate this image, the machine first needs to understand what exactly is meant by “Winter” from a visual perspective.
This is traditionally achieved through a process called supervised learning, where the machine is trained on data that is a one-to-one match. That is, the machine would be presented with a sunny image, and a corresponding rainy image of the same exact angle, location and overall context, requiring a great degree of capturing and labeling by the trainer.
Unsupervised learning, by contrast, presents the machine with two groups of images, one sunny and the other rainy, which are otherwise not similar in terms of context.
As a comparison of methods, supervised learning is easier on the machine and tedious and painstaking work for the trainer, and unsupervised learning is lower effort on the trainer but higher effort on the machine.
For us humans therefore, unsupervised learning is clearly the more desirable method, but we can’t solve this weather translation problem alone – it requires the use of additional information to narrow down the possibilities.
That additional information comes from a process called generative modeling.
What is generative modeling?
A good way to understand the concept of generative modeling is to understand what it’s not.
Its opposite, discriminative modeling, takes a set of x inputs (for example, sunny images) and y outputs (rainy images) in order to predict the most likely characteristics of a rainy image given a sunny image, using the function that helped to define sunny and rainy characteristics.
Generative modeling, by contrast, creates a number of possible outcomes for a matching sunny or rainy image from the original (or vice versa), resulting in many more possibilities.
How it works
For the purposes of this research the team used two networks as part of this model, specifically called generative adversarial networks or GANs. One network was trained on the winter images, and the other on the sunny images.
Because these two GANs were using different images, the team introduced a latent space assumption. Latent space, a hidden space which captures the structure and similarity of the data, is then used to associate the two GANs by grouping similar elements closer together. In this way, they can be mapped together to narrow the many possible results.
The method could be used to generate rainy images from sunny images, but also, as can be seen below, generate lion, cheetah, and other animal images from cat images:
The possibilities for this methodology are limitless, and can be applied to a broad spectrum of image types. What’s clear though, is that this improved process is faster and more efficient, requiring much less effort on the part of the person training the machine.
- Read about the Learning Affinity via Spatial Propagation Networks.
- Learn more about Unsupervised Image-to-Image Translation Networks.
- Visit the NVIDIA Research Website to read further on similar research.