Assignment 4: Neural Style Transfer
Part 1: Content Reconstruction
As the reconstruction comes from deeper layers it becomes noisy. More noise can be seen in the reconstreced Image
Original Image |
Conv_1 |
Conv_3 |
Conv_4 |
Conv_5 |
![wally wally 1](./images/content/wally.jpg) |
![1wally wally 1](./part1/1wally.png) |
![6wally wally 1](./part1/6wally.png) |
![8wally wally 1](./part1/8wally.png) |
![11wally wally 1](./part1/11wally.png) |
![fallingwater wally 1](./images/content/fallingwater.png) |
![1fallingwater wally 1](./part1/1fallingwater.png) |
![3fallingwater wally 1](./part1/3fallingwater.png) |
![11fallingwater wally 1](./part1/11fallingwater.png) |
![8fallingwater wally 1](./part1/8fallingwater.png) |
![dancing wally 1](./images/content/dancing.jpg) |
![1dancing wally 1](./part1/1dancing.png) |
![6dancing wally 1](./part1/6dancing.png) |
![8dancing wally 1](./part1/8dancing.png) |
![11dancing wally 1](./part1/11dancing.png) |
![phipps wally 1](./images/content/phipps.jpeg) |
![1phipps wally 1](./part1/1phipps.png) |
![6phipps wally 1](./part1/6phipps.png) |
![8phipps wally 1](./part1/8phipps.png) |
![11phipps wally 1](./part1/11phipps.png) |
![tubingen wally 1](./images/content/tubingen.jpeg) |
![1tubingen wally 1](./part1/1tubingen.png) |
![6tubingen wally 1](./part1/6tubingen.png) |
![8tubingen wally 1](./part1/8tubingen.png) |
![11tubingen wally 1](./part1/11tubingen.png) |
Noise 1 |
Noise 2 |
Reconstructed 1 (Conv_2) |
Reconstructed 2 (Conv_2) |
![noise1 noise1 1](./part2/noise1.jpg) |
![noise2 noise2 1](./part2/noise2.jpg) |
![2wally wally 1](./part2/features.2wally.png) |
![2wally dancing 1](./part1/3wally.png) |
![noise1 noise1 1](./part2/noise1.jpg) |
![noise2 noise2 1](./part2/noise2.jpg) |
![2tubingen wally 1](./part2/features.2tubingen.png) |
![2tubingen dancing 1](./part1/3tubingen.png) |
![noise1 noise1 1](./part2/noise1.jpg) |
![noise2 noise2 1](./part2/noise2.jpg) |
![2fallingwater wally 1](./part2/features.2fallingwater.png) |
![2fallingwater dancing 1](./part1/3fallingwater.png) |
![noise1 noise1 1](./part2/noise1.jpg) |
![noise2 noise2 1](./part2/noise2.jpg) |
![2dancing wally 1](./part2/features.2dancing.png) |
![2dancing dancing 1](./part1/1dancing.png) |
Part 2: Texture Synthesis
Texture Synthesis varies alot with the use of different layers for reconstruction. I found that Conv_Layers: 1,2,5,9,13 gives the best result.
Original Image |
ConvLayers 1 to 5 |
Layers: 1,6,14 |
Layers: 1,2,5,9,13 |
ConvLayers 15 to 19 |
![original picasso picasso](./part3/picasso.jpg) |
![1to5picasso picasso](./part3/1to5picasso.png) |
![1_6_14picasso picasso](./part3/1_6_14picasso.png) |
![1_2_5_9_13picasso picasso](./part3/1_2_5_9_13picasso.png) |
![15to19picasso picasso](./part3/15to19picasso.png) |
Selected layer: 1,2,5,9,13 |
![1_2_5_9_13picasso picasso](./part3/1_2_5_9_13picasso.png) |
Part 3.1: Hyperparameter tuning
The style loss is also normalized with the number of feature layers used to calculate the style loss. Best results with style_weight = 1000000 and content_weight = 1
Content Image |
Style Image |
![tubingen tubingen](./images/content/tubingen.jpeg) |
![starry_night starry_night](./images/style/starry_night.jpeg) |
style_weight = 10000 and content_weight = 1 |
style_weight = 100000 and content_weight = 1 |
![style_weight = 10000 and content_weight = 1 style_weight = 10000 and content_weight = 1](./tune/0tubingenstarry_night.png) |
![style_weight = 100000 and content_weight = 1 style_weight = 100000 and content_weight = 1](./tune/10tubingenstarry_night.png) |
style_weight = 1000000 and content_weight = 1 |
style_weight = 1000000 and content_weight = 2 |
![style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 1](./tune/210tubingenstarry_night.png) |
![style_weight = 1000000 and content_weight = 2 style_weight = 1000000 and content_weight = 2](./tune/3210tubingenstarry_night.png) |
Part 3.2: Optimized two content images mixing with two style images accordingly:
Content Image |
Style Image |
![wally wally](./images/content/wally.jpg) |
![the_scream the_scream](./images/style/the_scream.jpeg) |
style_weight = 10000 and content_weight = 1 |
style_weight = 100000 and content_weight = 1 |
![style_weight = 10000 and content_weight = 1 style_weight = 10000 and content_weight = 1](./tune/0wallythe_scream.png) |
![style_weight = 100000 and content_weight = 1 style_weight = 100000 and content_weight = 1](./tune/10wallythe_scream.png) |
style_weight = 1000000 and content_weight = 1 |
style_weight = 1000000 and content_weight = 2 |
![style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 1](./tune/210wallythe_scream.png) |
![style_weight = 1000000 and content_weight = 2 style_weight = 1000000 and content_weight = 2](./tune/3210wallythe_scream.png) |
Content Image |
Style Image |
![phipps phipps](./images/content/phipps.jpeg) |
![picasso picasso](./images/style/picasso.jpg) |
style_weight = 10000 and content_weight = 1 |
style_weight = 100000 and content_weight = 1 |
![style_weight = 10000 and content_weight = 1 style_weight = 10000 and content_weight = 1](./tune/0phippspicasso.png) |
![style_weight = 100000 and content_weight = 1 style_weight = 100000 and content_weight = 1](./tune/10phippspicasso.png) |
style_weight = 1000000 and content_weight = 1 |
style_weight = 1000000 and content_weight = 2 |
![style_weight = 1000000 and content_weight = 1 style_weight = 1000000 and content_weight = 1](./tune/210phippspicasso.png) |
![style_weight = 1000000 and content_weight = 2 style_weight = 1000000 and content_weight = 2](./tune/3210phippspicasso.png) |
Result differs with for combinations of content loss and style loss parameters. style_weight = 1000000 and content_weight = 1 or 2 seems to give the best output. Style loss is normalized with respect to number of terms in gram matrix and number of feature layers used for loss.
Part 3.3: Noise vs Content Initialization
Noise Initialization |
Content Initialization |
![phippsstarry_night phippsstarry_night](./part4/output_from_noise/phippsstarry_night.png) |
![phippsstarry_night phippsstarry_night](./part4/output_from_content/phippsstarry_night.png) |
![wallypicasso wallypicasso](./part4/output_from_noise/wallypicasso.png) |
![wallypicasso wallypicasso](./part4/output_from_content/wallypicasso.png) |
![wallythe_scream wallythe_scream](./part4/output_from_noise/wallythe_scream.png) |
![wallythe_scream wallythe_scream](./part4/output_from_content/wallythe_scream.png) |
Content Initialization seems to give better looking results.
Part 3.4: Additional Synthesis
Content Image |
Style Imagen |
Result |
![doberman doberman](./part4/new_img/doberman.jpeg) |
![style1 style1](./part4/new_img/style1.png) |
![dobermanstyle1 dobermanstyle1](./part4/new_img/dobermanstyle1.png) |
![doberman doberman](./part4/new_img/doberman.jpeg) |
![style2.jpeg blended_02](./part4/new_img/style2.jpeg) |
![dobermanstyle2 dobermanstyle2](./part4/new_img/dobermanstyle2.png) |
Part 4: Additional Synthesis on poisson blending
Content Image |
Style Imagen |
Result |
![blended_02 blended_02](./part4/new_img/blended_02.png) |
![style1 style1](./part4/new_img/style1.png) |
![blended_02style1 blended_02style1](./part4/new_img/blended_02style1.png) |
![blended_02 blended_02](./part4/new_img/blended_02.png) |
![style2.jpeg blended_02](./part4/new_img/style2.jpeg) |
![blended_02style2 blended_02style2](./part4/new_img/blended_02style2.png) |