Max Grebinskiy's Home Page for 16-726 Project 3

Project Overview

For this project, we explore the use of Poisson blending in order to blend an object from a source image into a target image, regardless of the original intensities of each individual image. In order to handle potentially noticeable seams between the source object and target image, we aim to preserve the gradient of the source object when finding the values of the target pixels without altering the background.

Relating Padding, Kernel Size, and Stride

Given that we have an n by n input image, a kernel of size k, a stride of length s, and we want our output image to be m by m, what should our padding p be in our convolution? To find this, we will consider the width solely, as the height is symmetric. We have the following information:

Total Input Width = w + 2p
Stride Length = s
Kernel Size = k
Desired Output Width = m

Given the first three points of information, the width of the image outputted by the convolution can be calculated as

which is equivalent to

. The reasoning behind this is as follows: consider the top right corner of the kernel during the convolution. It starts at k, and continues to k + c*s, where c is some positive integer, as we go from left to right. We continue until the following condition is satisfied:

. The number of iterations is precisely the floor term from above, and we add 1 to account for the starting position.

Given this formula, we can now proceed to our actual implementation. We have that our input width is 2w, our desired output width is w, our kernel size is 4, and our stride length is 2. Combining this all together, we have that

Vanilla GAN

Below are the following generated images with the deluxe dataloader and differentiable augmentation enabled at 200 iterations and 6300 iterations, respectively.

At 200 iterations, a very faint, blurry outline of the grumpy cat can be made out, but it is extremely difficult to make it out. Additionally, the color scheme is somewhat incorrect, as the center of picture is filled with red patches where the cat should be, although the fringes seem appropriate. At 6300 iterations,the resolution of the grumpy cat is much clearer, in that it is intended to be a cat. It is possible to see the whiskers of the grumpy cat now, and it has the correct color style of the grumpy cat as well. There is still significant room for improvement, as a lot of the generated images have lopsided/uneven eyes, or the eyes are malformed.

Training Plots

Below are the curves for the training loss for the discriminator and the generator in four different modes: basic dataset, deluxe dataset, basic dataset with differentiable augmentations, and deluxe dataset with differentiable augmentations. We should expect that with differentiable augmentations and/or deluxe dataset, the loss function for the generator converges towards 0 faster, and the loss function for the discriminator will be higher. This is because if we have better training data for the generator, it can generate more realistic images to fool the discriminator, so the generator will perform better whereas the discriminator will have a harder time distinguishing between fake and real images. This is accurately reflected in the graphs below.

Basic Dataset

Basic Dataset with Differentiable Augmentations

Deluxe Dataset

Deluxe Dataset with Differentiable Augmentations

Cycle GAN

For the grumpifyCat dataset, here are the following pictures:

Patch Discriminator, No Cycle Consistency Loss:

Patch Discriminator, Cycle Consistency Loss:

1000 iterations with cycle consistency loss, Russian to Grumpy

1000 iterations with cycle consistency loss, Grumpy to Russian

DC Discriminator, Cycle Consistency Loss:

1000 iterations with cycle consistency loss, DC Discriminator, Russian to Grumpy

1000 iterations with cycle consistency loss, DC Discriminator, Grumpy to Russian

When comparing the generated images without cycle consistency loss versus those with cycle consistency loss (using a patch discriminator), the output images of the cats with cycle consistency loss come out better. There is generally greater detail/clarity on these cats, and the eyes are more pronounced. For the images without cycle consistency loss, the eyes are typically just yellow blobs (in Grumpy to Russian), whereas there are visible pupils with cycle consistency loss. Overall, it preserves the features of the original domain better. One thing that seems to be worse with cycle consistency loss, however, is that the color is off; in the 10000 iterations from Grumpy Cat to Russian Blue, there is a brownish/orange hue to the generated image, whereas it should be a lighter grey, as in seen in the generated images without cycle consistency loss.

When we compare the generated images of the DC discriminator to that of the patch discriminator, the quality is significantly worse. The features are not preserved anywhere as well, and many of the output images are missing visible eyes. This is likely due to the DC discriminator paying attention to the entirety of the image, rather than local patches/structures, so the eyes do not get much attention.

For the apples2oranges dataset, here are the following pictures:

Patch Discriminator, No Cycle Consistency Loss:

Patch Discriminator, Cycle Consistency Loss:

1000 iterations with cycle consistency loss, Apples to Oranges

1000 iterations with cycle consistency loss, Oranges to Apples

DC Discriminator, Cycle Consistency Loss:

1000 iterations with cycle consistency loss, DC Discriminator, Apples to Oranges

1000 iterations with cycle consistency loss, DC Discriminator, Oranges to Apples

When comparing the generated images without cycle consistency loss to those with cycle consistency loss (using patch discriminators), it seems that cycle consistency performs worse. The coloring is inconsistent and uncoherent, as can be seen in the Oranges to Apple (10000 iterations) in the bottom row. We generate apples that are not fully red, but are instead splotched with grey-ish patches. Additionally, when we have whole or sliced oranges (as in the third row), we generate apples with strange colors. Though the colors are generally more monotone-red when not using cycle consistency loss, the visual clarity is significantly better. Additionally, people get "orangified" when using cycle consistency loss, whereas this does not happen when not using the loss. This is likely to occur because people are typically not in the training set, so on an individual patch level, it is better to orangify it, whereas when looking at the whole image, the apple is only a small portion of the image.

When comparing the generating images using the DC Discriminator to those with the patch discriminator, it seems to perform better. There are instances of sliced oranges that are properly converted to "apple-looking" slices, without any strange discolorations, while still using cycle consistency loss for both discriminators. The resolution of the images is surprisingly quite similar to that of the patch discriminator, and one is not blurrier than the other.