16-726 SP22 Assignment #3 - When Cats meet GANs

Run

Please run according to the assignment pages instructions.

For result of diff_augment.py in code, add parser --use-diffaug True after the command.

Introduction

This assignment requires us to implement two GANs: DCGAN and CycleGAN. The first one generates cat images from noise and the second one transfer style from one group of pictures to another group.

Part1: DCGAN

Explanation

For Data Augmentation, there are two augmentation in this assignment. One is in data_loader.py with parser -data_preprocessing 'deluxe' . The second is Differentiable Augmentation, and is added in training process with activation parser --use-diffaug True.

For the padding value, I use the formula from PyTorch official page: nn.Conv2d:

${H}_{out}=⌊\frac{{H}_{in}+2×padding\left[0\right]-dilation\left[0\right]×\left(kernel\mathrm{_}size\left[0\right]-1\right)-1}{stride\left[0\right]}+1⌋$

Take kernel_size = 4 and stride = 2 and the scale of input and output image into this formula, we got the padding value 1.

And as for the first layer in Generator without using up_conv, taking the values in and we got padding value 3.

Result

For basic and diff_aug only mode, the network overfits the dataset in the late iteration, and the loss function doesn't seem converge. For the deluxe augmentation, the result is better.

And when the DiffAug and Deluxe are both enabled, through the iterations the details becomes more and more clear and cat's feature becomes recognizable. In early stages only the illusions is generated but later the face and eyes is generated as well.

(*: Curve smooth with 0.9 in Tensorboard)

Part2: CycleGAN

Generally speaking, the more iterations, the better result it is. And by adding cycle consistency loss function, the output image seems more stable and has more features from the other image. Take this as an example. With cycle loss, the brown fur is kept and it's more like the original picture of Grumpy cat.

Without cycle loss

With cycle loss

Generally speaking, the more iterations, the better result it is. And y adding cycle consistency loss function, the

And for Discriminator selection, its hard to tell which one is better than the other. What my observation is that the result generated by Patch Discriminator usually have the expected shape, which looks more like real pictures. The DC Discriminator may have facial expressions very clear, but in general the cat is sometimes in deformation status. The only difference between the two discriminators is the output layer of Patch is 4x4 while the DC is 1x1. So the Patch Discriminator could resolve the general features well while the DC focus more on detailed features of images.

One interesting to note is that usually X->Y direction is better than Y->X direction.

Cat Dataset

It is the same with the Apple-orange Dataset. The background is remained and the edge of apple looks great with cycle loss.

Without cycle loss

With cycle loss