The objective of this project is to apply GAN models and conduct experiments on two datasets - grumpifyCat and apple2orange. I utilized a Deep Convolutional GAN (DCGAN) model to generate images from noise and a CycleGAN model to learn the latent shared vector between unpaired images. In addition, I incorporated a diffusion model to generate images from noise.
I experimented with different settings, including basic transformation (resizing and normalizing) and deluxe transformation (resizing, random crop, random horizontal flip, and normalization). Moreover, to improve the quality of generated images, I used differentiable data augmentation (DiffAug), patch discriminator.
DCGAN (Deep Convolutional Generative Adversarial Networks) is a type of generative adversarial network (GAN).
The architecture of a DCGAN model typically consists of a generator network and a discriminator network. The generator network takes random noise as input and upsamples it through a series of transposed convolutional layers to produce an image. The discriminator network, on the other hand, takes an image as input and performs a series of convolutions to classify it as real or fake.


The results of Basic augmentation seem better than Deluxe augmentation. However, for Deluxe augmentation, discriminator and generator losses converge to lower losses than those of Basic augmentation. Moreover, using DiffAug leads to much better results.
|
|||
|
iteration 200 |
||
|
iteration 4800 |
||
|
D/total loss |
||
|
G/total loss |
|
|||
|
iteration 200 |
||
|
iteration 4800 |
||
|
D/total loss |
||
|
G/total loss |
|
|||
|
iteration 200 |
||
|
iteration 5800 |
|
|||
|
iteration 200 |
||
|
iteration 5600 |
CycleGAN is a type of generative adversarial network (GAN) that aims to learn a mapping between two different image domains in an unsupervised manner. Unlike traditional supervised learning methods, CycleGAN can learn mappings between two domains without the need for paired training data.
The architecture of CycleGAN comprises two generator models and two discriminator models. Each generator is responsible for transforming images from one domain to another, while each discriminator is tasked with determining the authenticity of the generated images. The discriminator can be DCGAN discriminator or PatchGAN discriminator.


First, The results of the translation from Blue cats to Grumpy are better because Grumpy has more images than Blue.
Second, using consistency loss improves the results.
Finally, the patch discriminator works better.
Patch discriminator works better because it learn the structure of images better. Moreover, cycle-consistency loss improves the results because it constraints the model to map images with similar content and style.
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
D/XY/fake and D/XY/real loss |
||
|
D/YX/fake and D/YX/real loss |
||
|
G/XY/fake and G/XY/real loss |
||
|
G/YX/fake and G/YX/real loss |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
|
|||
|
iteration 100 |
||
|
iteration 10000 |
For Bells & Whistles, I implemented the followings:
Diffusion model is a generative model. In the forward process, it iteratively adds Gaussian noise to the input image through t time steps. In the backward process, it learns reverse distribution to restore the image.

|
|||
|
iteration 200 |
||
|
iteration 13600 |