This assignment aims to explore various image manipulation techniques using Generative Adversarial Networks (GANs) latent space. Initially, I inverted a pre-trained generator to obtain a latent variable that closely reconstructs a given real image. Subsequently, I experimented with interpolating between two images in the latent space for image editing. At the end, I used latent variables to generate images that correspond to hand-drawn sketches.
Since natural images are on a low-dimensional manifold, the problem is to make the output manifold of a trained generator as close as possible to the natural image manifold. This problem is a non-convex optimization where the best latent code z should be found that minimizes the difference between the output of the generator G with the latent code z and the real image x, based on a chosen loss function.
Since standard Lp losses do not work well for image synthesis tasks, I used a combination of pixel loss and perceptual loss to solve this problem. Since this is a non-convex optimization problem with access to gradients, it can be solved using any first-order or quasi-Newton optimization method. I chose to use LBFGS.
To compute the perceptual loss, I passed both the synthesized and target images through a pre-trained VGG network. I used the difference between these feature maps at multiple layers to calculate the perceptual loss at each layer. Finally, I computed the sum of these individual losses as the total perceptual loss.
I explored the effect of changing the weights of pixel and perceptual losses on the results. The results show that perceptual loss weight of 0.7 and a pixel loss weight of 0.3 lead to the most detailed content and style reconstruction (such as the flower, eyes, facial fur, and green background).
I explored choosing different models. The results show that StyleGAN leads to better result in comparison to VanillaGAN. However, VanillaGAN runs significantly faster than StyleGAN in terms of runtime.
I explored using different latent spaces (z, w, and w+) in StyleGAN. w and w+ perform better than z and capture more style information of the image. In addition, w+ performs better than w.
Latent Vector: w+
model: StyleGan
| Weights | Iterations | ||||
|---|---|---|---|---|---|
| Perc W | L1 W | Original | Iter 250 | Iter 2500 | Iter 5000 |
| 0 | 1 |
|
|
|
|
| 1 | 0 |
|
|
|
|
| 0.3 | 0.7 |
|
|
|
|
| 0.7 | 0.3 |
|
|
|
|
| 0.1 | 10 |
|
|
|
|
| 10 | 0.1 |
|
|
|
|
| 0.01 | 10 |
|
|
|
|
| 10 | 0.01 |
|
|
|
|
Latent Vector: z
perc W: 0.7
L1 W: 0.3
| Model | Iterations | |||
|---|---|---|---|---|
| Original | Iter 250 | Iter 2500 | Iter 5000 | |
| StyleGan |
|
|
|
|
| Vanilla |
|
|
|
|
model: StyleGan
perc W: 0.7
L1 W: 0.3
| Latent Vector | Iterations | |||
|---|---|---|---|---|
| Original | Iter 250 | Iter 2500 | Iter 5000 | |
| z |
|
|
|
|
| w |
|
|
|
|
| w+ |
|
|
|
|
I performed an interpolation of generated images using linear interpolation over a range of values for θ∈[0,1]. I first found the inverse latent vectors z1 and z2 for two given images x1 and x2. I then interpolated these vectors with a given value of θ to create a new latent vector z′ using the formula z′=θz1+(1−θ)z2. This new latent vector is then used to generate an intermediate frame using generator G.
I tested this interpolation method using different latent spaces (z, w, and w+), and different perceptual and pixel loss weights. The setting of weight 0.7 for perceptual loss, 0.3 for pixel loss and latent vector w+ produce the best-looking results.
Latent Vector: w+
model: StyleGan
| Weights | Src | Dest | Gif | |
|---|---|---|---|---|
| Perc W | L1 W | |||
| 0 | 1 |
|
|
|
| 1 | 0 |
|
|
|
| 0.3 | 0.7 |
|
|
|
| 0.7 | 0.3 |
|
|
|
| 0.1 | 10 |
|
|
|
| 10 | 0.1 |
|
|
|
| 0.01 | 10 |
|
|
|
| 10 | 0.01 |
|
|
|
model: StyleGan
perc W: 0.7
L1 W: 0.3
| Weights | Src | Dest | Gif |
|---|---|---|---|
| z |
|
|
|
| w |
|
|
|
| w+ |
|
|
|
I explored generating images that look like scribbled sketches while still being realistic. The process involves solving a non-convex optimization problem to produce realistic-looking images of cats, while applying a foreground mask made from sketches so that the generated image tries to emulate the structure, color, and detail of the sketch. The objective function is to minimize the difference between the generated and the sketch. The results of this process are as follows using StyleGAN, latent space w+, and the perceptual/pixel loss weights 0.01,10 and 5000 number of iterations.
model: StyleGan
perc W: 0.01
L1 W: 10
Latent Vector: w+
| Sketch | Mask | Iterations | ||
|---|---|---|---|---|
| Iter 250 | Iter 2500 | Iter 5000 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Latent Vector: w+
model: StyleGan
| Weights | Iterations | ||||
|---|---|---|---|---|---|
| Perc W | L1 W | Original | Iter 250 | Iter 2500 | Iter 5000 |
| 0 | 1 |
|
|
|
|
| 1 | 0 |
|
|
|
|
| 0.3 | 0.7 |
|
|
|
|
| 0.7 | 0.3 |
|
|
|
|
| 0.1 | 10 |
|
|
|
|
| 0.01 | 10 |
|
|
|
|
| 10 | 0.01 |
|
|
|
|