Final Project Report

Jiaheng Hu (jiahengh)

Overview

For this project, I focused on exploring conditional GAN. For the first part of this project, I reimplemented "cgans with projection discriminator", and run it with a few different datasets of different resolution. Please see my code submission for implementation details. For the second part of this project, I used my reimplemented network to achive image-to-image translation, show success and failure cases, and proposed potentially solutions to polish the failure cases

Part 1: cgans with projection discriminator

1.1 CIFAR-10

First, I trained with 32*32 CIFAR 10 dataset. I show results for a few different labels below

Planes

Cars

Birds

Cats

We can see that the network was able to produce images of pretty decent quality.

1.2 Dogs

Next, I run it on the 64*64 standford dogs dataset, and get the following results (species 1 - 6):

1.3 Image Blending

Another thing that is extremely interesting is that we can actually blend between two classes by averaging the conditional batch normalization values of the corresponding classes. Here I show two blending results:

Blending species 1 with species 2:

Blending species 3 with species 4:

1.4 Failure Case

I also found that the network struggles to train on the tiny-imagenet dataset, producing results as this:

This is likely due to the fact that the size of the training data is too small (500) for each class, and show that our algorithm wouldn't really be able to handle the lack of training data.

Part2: cGAN for image-to-image translation

One interesting thing I found is that if we are sampling with the same latent code "z", the images we get for different labels would have similar structure, as shown below:

2.1 Same Latent Visualization

Therefore, I came up with this idea that we can simply take an image & label pair, optimize for its z value (similar to hw 5), and change the label value to achieve unsupervised image-to-image translation. I implemented this idea, and got mixed result with it.

2.2 Success Case

Queried Image:

Translated Images:

2.2 Failure Case

Queried Image:

Translated Images:

2.3 Failure Case Analysis and Future Work

What happened to the failure case? Turns out that the problem is with reconstruction:

Reconstructed Image:

We can clearly see that while the reconstruction is able to get the position of the dog right, it fails to figure out the correct location of mouth and nose, and as a result produce bad quality images. In fact, as pointed out in Transforming and Projecting Images into Class-conditional Generative Networks by Huh et. al., "Purely gradient-based optimizations fail to find good solutions for projection with conditional GAN". I believe that in order to polish the result of the reconstruction, special cares needs to be taken with the projection steps. This is beyond the scope of my project, but something I belive future works could look into.

Latestly, I'd like to thank Professor Zhu and the TAs for amazing lectures and supportive feedbacks, through which I learned a lot. All the best!