**16-726 Assignment 4:
Neural Style Transfer**
**Eileen Li (chenyil)**
Submission Date: April 2, 2023
Overview
===============================================================================
In this assignment we will implement neural style transfer, combining a content image
with a style image such that we preserve the content of first while adopting the style of latter.
In the first section, we will start from random input and optimize this input in
the content space to reconstruct a content image. In the second section, we will ignore content
and only optimize to generate textures from the style image. In the third section, we will combine these
two for the final neural transfer result, initializing the input image from random noise as well as
from the content image.
Training Details
-------------------------------------------------------------------------------
I train with 300 steps locally on CPU, and L-BFGS optimizer (lr=1.0) for Parts 1 and 2.
For the final style transfer results in Part 3, I train on GPU.
Bells & Whistles
===============================================================================
In addition to optimizing an input image as required by the assignment, I also try to use a feedforward network
to directly 1) output style transfer image and 2) synthesize texture from a particular style image.
I also tried to apply the trained MLPs on multiple content images, but the results were not very good.
Train MLP to output style transfer image directly [8 pts]
-------------------------------------------------------------------------------
Ref: [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](https://cs.stanford.edu/people/jcjohns/papers/eccv16/JohnsonECCV16.pdf)
Inspired by the paper above, I use my `CycleGenerator` from Assignment 3 as my network. During training, I use `Adam` optimizer
to learn the weights of the network to minimize the same content and style losses as in the assignment. Note that unlike the
assignment which optimizes the input image directly, here I can calculate the style transfer image by `output_img = mlp(content_img)`.
Below I share some results after 5000 steps of training:
Below I share some results after 5000 steps of training:
For these experiments, I found training to be much less stable than `Part 1`, often leading to `nan` and needing to tune `style_weight`
hyperparameter. I tried `style_weight = {10, 100, 1000, ... 1M}` and kept the best results. Below are the results for using different conv_layer(s) to generate texture.
I experiment with using a single layer as well as range of layers (ex. conv1_1~conv5_1).
Note: I later realized the training instability is eliminated on the GPU.
I explained my choice of layers in the previous parts. The `style_weight` is the main hyperparameter to tune.
When this parameter is higher, we see more style influence, and vice versa. Below I show the results of varying
`style_weight` with noise initialization, and other hyperparameters from the table above. From these results, I pick
`style_weight = 1000` when we initialize from noise.
We can also initialize the input image from the content image. In this setting, we need much more style influence for
the same result, and increase `style_weight = 10000`. Unlike initializing from noise, initializing from content
did not have the danger of the content getting "washed out" by too large a `style_weight`. I like the quality of
this setting better personally. I also did not notice a significant difference in running time between the two settings.
Below I show some results, for content initialization and with the hyperparameters above: