Assignment 4 – Neural Style Transfer – Image Synthesis Project 2

Table of Contents

EXPLANATION

In this assignment, we will implement neural style transfer which resembles specific content in a certain artistic style. For example, generate cat images in Ukiyo-e style. The algorithm takes in a content image, a style image, and another input image. The input image is optimized to match the previous two target images in content and style distance space.

In the first part of the assignment, we start from random noise and optimize it in content space. In the second part of the assignment, we ignore content for a while and only optimize to generate textures. This builds some intuitive connection between style-space distance and gram matrix. Lastly, we combine all of these pieces to perform neural style transfer.

Part 1: Content Reconstruction

We implemented content-space loss and optimize a random noise with respect to the content loss only. The content loss between two images can be provided as below:

Figure 1: Content Image Loss

Results

Figure 2: Image Reconstruction using content loss

Figure 3: Content Loss

Experiments using conv 4, 7, 9 layers for content loss

Side by Side comparison

Content Image

Using conv4

Using conv 4, conv 7, conv 9

Part 2: Texture Synthesis

In order to perform texture synthesis, we use style transform for conv layers 1,2,3,4,5 as it is able to capture most of the texture from the style image. We use noise input in this case too.

Style Loss

Results

Using ‘conv_1’, ‘conv_2’, ‘conv_3’, ‘conv_4’, ‘conv_5’ as Style layers

Using ‘conv_1’, ‘conv_2’, ‘conv_4’, ‘conv_5’, ‘conv_6’, ‘conv_9’, ‘conv_11’ as style layers

Side by Side comparison

Style Image

‘conv_1’, ‘conv_2’, ‘conv_3’, ‘conv_4’, ‘conv_5’

‘conv_1’, ‘conv_2’, ‘conv_4’, ‘conv_5’, ‘conv_6’, ‘conv_9’, ‘conv_11’

Part 3: Style Transfer

On the left is the time-taken using Input as Content Image & on the right side, we use input image as noise. However, the content loss is very high initially for noise image input & the batch size oscillates, while in the content image case it is pretty consistent as expected. Note, that training was carried out on CPUs. We found that the training was much faster on GPUs.

Performance Analysis

Noise Input & applying both Content Loss & Style Loss

NST on CPU

Results

Hyper-parameters were tuned for various values of mean & standard deviation for the noise images, a number of epochs & convolution layers for style and content loss. The best results are reported below. The quality saw a significant improvement on adding histogram loss, training it longer, and the performance was faster on GPUs than on CPUs.

NST Matrix

BELLS & WHISTLES

Poisson Blending + NST

Content

Style

NST

Using the images from previous HWs

Preserve Luminance of the Context Image

Content

Content + Luminance

Histogram Loss

Content

Style

NST + Histogram Loss

Content Image

Content + Luminance

Content + Luminance + Histogram Loss

Mixed Transfer

Style 1

Style 2

Mixed Style

Content

Mixed Style

NST

Super Resolution

Content Image

Style Image

Low-Resolution Image

High-Resolution Image from Low-Resolution Image

Controlling Perceptual Factors in Neural Style Transfer

Content Image

Mask

Style Image

Mask

Unguided

Guided

Content Image

Style Image

NST Image

NST on Videos

We have applied frame by frame NST on videos & then applied temporal smoothening as mentioned in this paper. Note that we have used a pre-trained model for faster computation.

CREDITS

https://arxiv.org/abs/1604.08610

https://arxiv.org/pdf/1701.08893.pdf

https://arxiv.org/pdf/1805.04487.pdf

https://arxiv.org/pdf/1611.07865.pdf