assignment 3 - When Cats meet GANs

16-726 | Qin Han | qinh@andrew.cmu.edu

Overview

The aim of this assignment is to explore and implement two types of Generative Adversarial Networks: a GAN and a CycleGAN. In the first part, we will implement the Deep Convolutional GAN (DCGAN) architecture, focusing on its capability to synthesize realistic images of grumpy cats from random noise inputs. In the second part, we will implement the CycleGAN,which is renowned for its ability to perform image-to-image translation between distinct domains without paired examples. We test the CycleGAN on the task of translating images of two kinds of cats: grumpy cats and happy cats, as well as the translation between apples and oranges.

Part 1: Deep Convolutional GAN

Training Curves

dcgan_loss
Loss of the discriminator

The Discriminator loss curve should decrease initially, indicating that the Discriminator is improving in distinguishing real images from fake ones. However, as training progresses and the Generator gets better at producing realistic images, the D loss will stabilize, showing fluctuations around a certain value. This stabilization occurs because the Discriminator finds it increasingly difficult to differentiate between real and fake images, leading to a roughly 50% accuracy rate, akin to guessing. Here, the Discriminator loss curve with deluxe data processing is higher than the one without deluxe data processing, indicating that the Discriminator is more difficult to train with deluxe data processing. Also, the Discriminator loss curve with data augmentation is higher than the one without data augmentation.

dcgan_loss
Loss of the generator

The G loss curve typically starts high, reflecting the Generator’s initial poor performance at creating convincing images. Over time, as the Generator improves, this loss decreases. However, unlike the D loss, the G loss might not stabilize as much. Instead, it may continue to show a downward trend or fluctuate, indicating the ongoing struggle of the Generator to fool the Discriminator by continuously improving the quality of the generated images. Here, the Generator loss curve with deluxe data processing is lower than the one without deluxe data processing, indicating that the Generator could generate better images with deluxe data processing. Also, the Generator loss curve with data augmentation is lower than the one without data augmentation.

Generated Images

Here are the generated images from the DCGAN model:

Basic Data Processing Deluxe Data Processing
Without Data Augmentation
With Data Augmentation

The images produced by the DCGAN after 6,400 iterations are presented, utilizing both basic and deluxe data preprocessing techniques, with and without differential augmentations. The results indicate that differential augmentations enhance the quality of the generated images in both basic and deluxe preprocessing scenarios. This enhancement is evidenced by the reduction of artifacts and the generation of more lifelike images, demonstrating the efficacy of differential augmentations in improving image realism.

iteration=200 iteration=6400

We also provide generated samples with deluxe and differential augmentations from both early and later stages of training. It is observed that images from later iterations exhibit greater realism and detail. Throughout the GAN training process, there is a noticeable improvement in both the quality and diversity of the generated images, with them increasingly resembling the distribution of real images.

Part 2: CycleGAN

Generated images

Here are the generated images from the CycleGAN model with deluxe data processing, patch discriminator and cycle-consistency loss at 10000 iterations:

XtoY patch & cycle-consistency (10000 iterations)
YtoX patch & cycle-consistency (10000 iterations)
XtoY patch & cycle-consistency (10000 iterations)
YtoX patch & cycle-consistency (10000 iterations)

Ablation Study

Effect of cycle-consistency loss

Here is the generated images from the CycleGAN model with deluxe data processing and patch discriminator, but without cycle-consistency loss:

XtoY patch (1000 iterations)
YtoX patch (1000 iterations)
XtoY patch (1000 iterations)
YtoX patch (1000 iterations)

Here is the generated images from the CycleGAN model with deluxe data processing, patch discriminator and cycle-consistency loss:

XtoY patch & cycle-consistency (1000 iterations)
YtoX patch & cycle-consistency (1000 iterations)
XtoY patch & cycle-consistency (1000 iterations)
YtoX patch & cycle-consistency (1000 iterations)

The results indicate that the CycleGAN model with cycle-consistency loss generates more realistic images than the one without cycle-consistency loss. Cycle-consistency loss plays a pivotal role in ensuring the generated images maintain fidelity to the original input, both in terms of content and style. This loss function compels the model to create images that are not only visually convincing but also faithful to the input's domain characteristics, thereby enhancing the overall authenticity and appeal of the results. Furthermore, the incorporation of cycle-consistency loss facilitates the model in preserving key attributes and textures of the input, leading to outputs that are coherent and detailed, ultimately bridging the gap between generated and real images.

Effect of different Discriminator

Here is the generated images from the CycleGAN model with deluxe data processing and cycle-consistency loss, but using DC-Discriminator:

XtoY DC & cycle-consistency (10000 iterations)
YtoX DC & cycle-consistency (10000 iterations)
XtoY DC & cycle-consistency (10000 iterations)
YtoX DC & cycle-consistency (10000 iterations)

Compared with the patch discriminator, the DC-Discriminator generates less realistic images. This is because the patch discriminator examines images on a patch-by-patch basis. This means it looks at small portions of the image at a time, assessing whether each patch is real or fake. This localized approach allows the Patch Discriminator to focus on fine-grained details and textures, leading to more nuanced and accurate assessments of image realism. On the other hand, the DC Discriminator typically evaluates the image as a whole. While this can be effective for capturing the overall structure and coherence of the image, it may overlook the finer details and textures that contribute significantly to the perception of realism.

Bells & Whistles

I implemented the diffusion model for generating cat images. And here are 10 generated images from the diffusion model after 2000 training steps:

As shown in the generated images, the diffusion model is capable of generating realistic cat images.