Overview

This assignment aims to explore various image manipulation techniques using Generative Adversarial Networks (GANs) latent space. Initially, I inverted a pre-trained generator to obtain a latent variable that closely reconstructs a given real image. Subsequently, I experimented with interpolating between two images in the latent space for image editing. At the end, I used latent variables to generate images that correspond to hand-drawn sketches.

Inverting the Generator

Since natural images are on a low-dimensional manifold, the problem is to make the output manifold of a trained generator as close as possible to the natural image manifold. This problem is a non-convex optimization where the best latent code z should be found that minimizes the difference between the output of the generator G with the latent code z and the real image x, based on a chosen loss function.

Since standard Lp losses do not work well for image synthesis tasks, I used a combination of pixel loss and perceptual loss to solve this problem. Since this is a non-convex optimization problem with access to gradients, it can be solved using any first-order or quasi-Newton optimization method. I chose to use LBFGS.

To compute the perceptual loss, I passed both the synthesized and target images through a pre-trained VGG network. I used the difference between these feature maps at multiple layers to calculate the perceptual loss at each layer. Finally, I computed the sum of these individual losses as the total perceptual loss.

Ablation study

I explored the effect of changing the weights of pixel and perceptual losses on the results. The results show that perceptual loss weight of 0.7 and a pixel loss weight of 0.3 lead to the most detailed content and style reconstruction (such as the flower, eyes, facial fur, and green background).

I explored choosing different models. The results show that StyleGAN leads to better result in comparison to VanillaGAN. However, VanillaGAN runs significantly faster than StyleGAN in terms of runtime.

I explored using different latent spaces (z, w, and w+) in StyleGAN. w and w+ perform better than z and capture more style information of the image. In addition, w+ performs better than w.

Loss Weights

Latent Vector: w+
model: StyleGan

Weights Iterations
Perc W L1 W Original Iter 250 Iter 2500 Iter 5000
0 1
1 0
0.3 0.7
0.7 0.3
0.1 10
10 0.1
0.01 10
10 0.01

Models

Latent Vector: z
perc W: 0.7
L1 W: 0.3

Model Iterations
Original Iter 250 Iter 2500 Iter 5000
StyleGan
Vanilla

Latent Vector

model: StyleGan
perc W: 0.7
L1 W: 0.3

Latent Vector Iterations
Original Iter 250 Iter 2500 Iter 5000
z
w
w+

Interpolation

I performed an interpolation of generated images using linear interpolation over a range of values for θ∈[0,1]. I first found the inverse latent vectors z1 and z2 for two given images x1 and x2. I then interpolated these vectors with a given value of θ to create a new latent vector z′ using the formula z′=θz1+(1−θ)z2. This new latent vector is then used to generate an intermediate frame using generator G.

I tested this interpolation method using different latent spaces (z, w, and w+), and different perceptual and pixel loss weights. The setting of weight 0.7 for perceptual loss, 0.3 for pixel loss and latent vector w+ produce the best-looking results.

Loss Weights

Latent Vector: w+
model: StyleGan

Weights Src Dest Gif
Perc W L1 W
0 1
1 0
0.3 0.7
0.7 0.3
0.1 10
10 0.1
0.01 10
10 0.01

Latent Vector

model: StyleGan
perc W: 0.7
L1 W: 0.3

Weights Src Dest Gif
z
w
w+

Scribble to Image

I explored generating images that look like scribbled sketches while still being realistic. The process involves solving a non-convex optimization problem to produce realistic-looking images of cats, while applying a foreground mask made from sketches so that the generated image tries to emulate the structure, color, and detail of the sketch. The objective function is to minimize the difference between the generated and the sketch. The results of this process are as follows using StyleGAN, latent space w+, and the perceptual/pixel loss weights 0.01,10 and 5000 number of iterations.

All Sketches

model: StyleGan
perc W: 0.01
L1 W: 10
Latent Vector: w+

Sketch Mask Iterations
Iter 250 Iter 2500 Iter 5000

Loss Weights

Latent Vector: w+
model: StyleGan

Weights Iterations
Perc W L1 W Original Iter 250 Iter 2500 Iter 5000
0 1
1 0
0.3 0.7
0.7 0.3
0.1 10
0.01 10
10 0.01