Assignment 4: Neural Style Transfer

Part 1: Content Reconstruction

As the reconstruction comes from deeper layers it becomes noisy. More noise can be seen in the reconstreced Image

Original Image Conv_1 Conv_3 Conv_4 Conv_5
wally 1 wally 1 wally 1 wally 1 wally 1
wally 1 wally 1 wally 1 wally 1 wally 1
wally 1 wally 1 wally 1 wally 1 wally 1
wally 1 wally 1 wally 1 wally 1 wally 1
wally 1 wally 1 wally 1 wally 1 wally 1
Noise 1 Noise 2 Reconstructed 1 (Conv_2) Reconstructed 2 (Conv_2)
noise1 1 noise2 1 wally 1 dancing 1
noise1 1 noise2 1 wally 1 dancing 1
noise1 1 noise2 1 wally 1 dancing 1
noise1 1 noise2 1 wally 1 dancing 1

Part 2: Texture Synthesis

Texture Synthesis varies alot with the use of different layers for reconstruction. I found that Conv_Layers: 1,2,5,9,13 gives the best result.

Original Image ConvLayers 1 to 5 Layers: 1,6,14 Layers: 1,2,5,9,13 ConvLayers 15 to 19
picasso picasso picasso picasso picasso
Selected layer: 1,2,5,9,13
picasso
Noise Texture Synthesis
noise picasso
noise 2 picasso

Part 3.1: Hyperparameter tuning

The style loss is also normalized with the number of feature layers used to calculate the style loss. Best results with style_weight = 1000000 and content_weight = 1

Content Image Style Image
tubingen starry_night
style_weight = 10000 and content_weight = 1 style_weight = 100000 and content_weight = 1
style_weight = 10000 and content_weight = 1 style_weight = 100000 and content_weight = 1
style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 2
style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 2

Part 3.2: Optimized two content images mixing with two style images accordingly:

Content Image Style Image
wally the_scream
style_weight = 10000 and content_weight = 1 style_weight = 100000 and content_weight = 1
style_weight = 10000 and content_weight = 1 style_weight = 100000 and content_weight = 1
style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 2
style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 2
Content Image Style Image
phipps picasso
style_weight = 10000 and content_weight = 1 style_weight = 100000 and content_weight = 1
style_weight = 10000 and content_weight = 1 style_weight = 100000 and content_weight = 1
style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 2
style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 2

Result differs with for combinations of content loss and style loss parameters. style_weight = 1000000 and content_weight = 1 or 2 seems to give the best output. Style loss is normalized with respect to number of terms in gram matrix and number of feature layers used for loss.

Part 3.3: Noise vs Content Initialization

Noise Initialization Content Initialization
phippsstarry_night phippsstarry_night
wallypicasso wallypicasso
wallythe_scream wallythe_scream

Content Initialization seems to give better looking results.

Part 3.4: Additional Synthesis

Content Image Style Imagen Result
doberman style1 dobermanstyle1
doberman blended_02 dobermanstyle2

Part 4: Additional Synthesis on poisson blending

Content Image Style Imagen Result
blended_02 style1 blended_02style1
blended_02 blended_02 blended_02style2