These people on the image below are not real people. They are generated with Generative Adversarial Networks (GANs) that is trained on 30,000 celebrity photos.
GANs are one of the Deep Learning-based generative methods that can produce novel samples from high-dimensional data distributions, like images or LiDAR data. In the previous figure results, the model learned to generate entirely new images that mimic the appearance of real photos, so it generated new people photos. Machine Learning models can be looked at as one of two types: Generative Models and Discriminative Models. To understand the difference, let us assume you have input data X and you want to classify the data into labels Y. A generative model learns the joint probability distribution P(X,Y) and a discriminative model learns the conditional probability distribution P(Y|X), which is read as "the probability of Y given X". As a simple example, if you have data in the form of (X,Y), here is what each model tries to estimate:
Generative models provide a way to learn data representations without extensively annotated training data. It can learn in supervised, unsupervised, or semi-supervised modes. The distribution P(Y|X) is more natural for classifying a given example X into a class Y, however the generative model distribution P(X,Y) can be transformed into P(Y|X) by applying Bayes rule and then also be used for classification. For example, suppose we have two classes of animals, elephant (Y=1) and dog (Y=0), and X is the animal picture. Given a training dataset, a discriminative model tries to find a multidimensional decision boundary that separates the elephants and dogs images in space. Then, to classify a new animal image as either an elephant or a dog, it checks on which side of the decision boundary it falls, and makes its prediction accordingly. While a generative model looks at elephants images and builds a model of what elephants look like. Then, looking at dogs images and builds a separate model of what dogs look like. Finally, to classify a new animal image, we can match the image against the elephant and the dog models, to see whether it looks more like the elephants or more like the dogs we had seen in the training dataset. On the other hand, the advantage of generative models, is that it can use P(X,Y) to generate likely (X,Y) pairs as well, because it learned the training data distribution. However, discriminative models generally outperform generative models in classification tasks.
Generative models can learn data of any type; the data can be speech, LiDAR cloud of scan points, text, videos, etc. The data does not have to be images. Generative models are mainly classified into two types:
1) Density Estimation models:
It learn a Probability Density Function (PDF) of the training dataset that can be used to generate data similar to what had been seen in that training dataset. It tries to estimate a PDF that is too close to the training dataset.
2) Sample Generation models:
It learns a model that can generate data that is too close to the training dataset directly. There is no need to explicitly learn a training data PDF.
Examples Density Estimation models is Deep Belief Networks (DBNs) and Restricted Boltzmann Machines (RBM). While Generative Adversarial Networks (GANs) falls into the second category of Sample Generation models. The name “adversarial” here means training the network to classify adversarial examples by training it on these adversarial examples, these examples are generated (not real) so the network learns to differentiate it from real data examples. Typically, a GAN consists of two Neural Networks: a generator (G) and a discriminator or a critic (D).
Real data samples X (could be images) are sampled from the training dataset and given as input to D, and the training is conducted to make D(X) near to 1 (D is differentiable). On the other side, input noise vector N is inputted to G so it generates a fake image G(N), and then that image is given as input to D and the training is conducted to make D( G(N) ) near to 0. Hence, with time the discriminator learns to differentiate real from fake data. To make the generator more efficient in generating real-like samples, D is stacked after G, and G is trained to make D( G(N) ) near to 1 (G is differentiable and D is kept constant). Hence, with time the generator becomes smarter and generates more real-like samples. The following figure describes the learning process. Typically, the generator is of main interest, the discriminator is an adaptive loss function that gets discarded once the generator has been trained.
If D makes the right prediction, G updates its parameters in order to generate better fake samples to fool D. If D’s prediction is incorrect, it tries to learn from its mistake to avoid similar mistakes in the future. This process continues until an equilibrium is established. Let’s Discuss such equilibrium through an example. Assume G is a money forger and D is a police. The police wants to allow people with real money to safely spend it without being punished and to catch forged money. Simultaneously, the money forger want to fool the police and successfully use their forged money. So with time, both of the two players gets better and better in doing their job. If both players have unlimited capabilities, So the “Nash equilibrium” (from game theory) corresponds to that G generates perfect samples that comes from the same distribution of the training data. Hence for any image received (real or generated), D will say it’s 50% real and 50% fake. The goal of learning the GANs is to establish equilibrium between errors of Generator and of Discriminator. In other words, the learning phase is ended when Generator is “smart” enough to fool the Discriminator in 50% of cases.
The generator random noise vector specifies which image is generated; each generated image corresponds to input random input number. One nice and interesting observation about GANs is that arithmetic operations on that input noise vector started to take a meaningful role. For example, if you subtract the vector that generates a “man with glasses” image, from the vector that generates a “man” image, and then add the vector that generated a “woman” image, the resulted vector causes the GAN to generate a “woman with glasses” image. It is like an arithmetic operation:
Generative Adversarial Networks are an interesting development, giving us a new way to do unsupervised learning and allow machines to generate data and arts, like the following painting of Van Gogh, which is generated by Convolutional Neural Networks. The figure also shows bedrooms that are generated with GANs.
One key advantage about GANs is that it bridges the gap between simulation and reality. The task of autonomous driving requires collecting a huge amount of real data to achieve acceptable generalization. With GANs the translation from simulated (syntactic) data to real data becomes feasible. In the same context, now in CDA/CDV we currently investigate the ability to generate ScaLa LiDAR data that corresponds to camera images, and the opposite way around.
One big open problem about GANs is that it is hard to evaluate its performance. In the images domain, it is quite easy to at least look at the generated samples to judge the model accuracy, although this is obviously not a satisfying solution because it is not automated and the human factor makes it a qualitative evaluation. The evaluation becomes a bigger problem in case of non-image data, like for example a GAN that generates ScaLa-like LiDAR data.