Overview

The objective of this project is to apply GAN models and conduct experiments on two datasets - grumpifyCat and apple2orange. I utilized a Deep Convolutional GAN (DCGAN) model to generate images from noise and a CycleGAN model to learn the latent shared vector between unpaired images. In addition, I incorporated a diffusion model to generate images from noise.

I experimented with different settings, including basic transformation (resizing and normalizing) and deluxe transformation (resizing, random crop, random horizontal flip, and normalization). Moreover, to improve the quality of generated images, I used differentiable data augmentation (DiffAug), patch discriminator.

DCGAN

DCGAN (Deep Convolutional Generative Adversarial Networks) is a type of generative adversarial network (GAN).

The architecture of a DCGAN model typically consists of a generator network and a discriminator network. The generator network takes random noise as input and upsamples it through a series of transposed convolutional layers to produce an image. The discriminator network, on the other hand, takes an image as input and performs a series of convolutions to classify it as real or fake.

DCGAN Results

The results of Basic augmentation seem better than Deluxe augmentation. However, for Deluxe augmentation, discriminator and generator losses converge to lower losses than those of Basic augmentation. Moreover, using DiffAug leads to much better results.

Basic Augmentation

iteration 200

iteration 4800

D/total loss

G/total loss

Deluxe Augmentation

iteration 200

iteration 4800

D/total loss

G/total loss

Basic and Differentiable Augmentation

iteration 200

iteration 5800

Deluxe and Differentiable Augmentation

iteration 200

iteration 5600

CycleGAN

CycleGAN is a type of generative adversarial network (GAN) that aims to learn a mapping between two different image domains in an unsupervised manner. Unlike traditional supervised learning methods, CycleGAN can learn mappings between two domains without the need for paired training data.

The architecture of CycleGAN comprises two generator models and two discriminator models. Each generator is responsible for transforming images from one domain to another, while each discriminator is tasked with determining the authenticity of the generated images. The discriminator can be DCGAN discriminator or PatchGAN discriminator.

CycleGAN Results

First, The results of the translation from Blue cats to Grumpy are better because Grumpy has more images than Blue.
Second, using consistency loss improves the results.
Finally, the patch discriminator works better.

Patch discriminator works better because it learn the structure of images better. Moreover, cycle-consistency loss improves the results because it constraints the model to map images with similar content and style.

Patch Disc, Deluxe Aug

Cat Dataset X-Y

iteration 100

iteration 10000

Cat Dataset Y-X

iteration 100

iteration 10000

Apple/Orange Dataset X-Y

iteration 100

iteration 10000

Apple/Orange Dataset Y-X

iteration 100

iteration 10000

Patch Disc, Deluxe Aug, with Cycle-Consistency Loss

Cat Dataset X-Y

iteration 100

iteration 10000

Cat Dataset Y-X

iteration 100

iteration 10000

D/XY/fake and D/XY/real loss

D/YX/fake and D/YX/real loss

G/XY/fake and G/XY/real loss

G/YX/fake and G/YX/real loss

Apple/Orange Dataset X-Y

iteration 100

iteration 10000

Apple/Orange Dataset Y-X

iteration 100

iteration 10000

DC Disc, Deluxe Aug, with Cycle-Consistency Loss

Cat Dataset X-Y

iteration 100

iteration 10000

Cat Dataset Y-X

iteration 100

iteration 10000

Apple/Orange Dataset X-Y

iteration 100

iteration 10000

Apple/Orange Dataset Y-X

iteration 100

iteration 10000

Bells & Whistles

For Bells & Whistles, I implemented the followings:

  1. Diffusion Model

Diffusion Model

Diffusion model is a generative model. In the forward process, it iteratively adds Gaussian noise to the input image through t time steps. In the backward process, it learns reverse distribution to restore the image.

Diffusion Results

iteration 200

iteration 13600