The goal of this project is to take images taken by Prokudin-Gorskii on 3 different plates of tinted glass and stack them together to create one RGB image.
This is easier said than done, since the input to our algorithm is a single image with the results of the 3 original images.
So, to begin, we first crop the image roughly into 3 to represent each plate. This works generally well, but just stacking them on top of each other is not enough
since they are misaligned.
To compute the right alignments, we iterate over possible offsets (within a small range) and compute either the SSD or NCC. These metrics basically tell us how "aligned"
the two images are with one another by looking at the difference in color values. To get this to work well, I cropped each side by 10% every time we wanted to compute these metrics
since this removes any overlap region that is not found in the other image and also removes unnecesarily comparing border regions which may be different across color channels.
Both of these work well, but I found NCC to be significantly slower than SSD.
This approach works well for most images (except for those which have a difference in brightness, which required me to compare gradients instead of pixel values).
The issue is that this is quite slow (several minutes per image). To remedy this, I constructed an image pyramid by downsampling (and blurring thanks to the Antialias features of Pytorch)
our initial image until we hit a small enough base level (in this case my stopping condition was when one dimension was smaller than 64 pixels). From here, we can compute the alignment
on the smallest image (which is very cheap since the image is small), then propogate the result upward a level to the next smallest image by taking our first result and multipling it by 2
(since we downsample by a factor of 2 each time) and then searching across this new region. We can continue doing this until we hit our original image. By the time we reach the original size,
we already have a fairly good guess of how we should align, which means our search space can be narrowed down significantly.
These are the results of my algorithm run on the example images provided. Note that the times are for running on my MacbookPro with an M1 Pro with the SSD algorithm, a minimum image resolution of 64x64 for the image pyramid, and a search offset of 8 pixels per level. Also, the image for "Emir" was not able to compute succesfully using just the pixel values. So, I instead operated on the image gradients which yielded a higher run time but a much better result.
Similar to above, the third image here did not work well with just raw pixel values so I instead computed it using the image gradients.
I completely removed any library calls that operated on the image and replaced them with their Pytorch equivalent. This (not surprisingly) made my code significantly faster since Pytorch calls are highly vectorized.
I noticed that the left and right sides of the images were heavily populated with streaks of black and white, which are easily detectable. So, I check to see if the average of any row or column is greater than 235 or less than 15 and remove that. This worked fairly well (see below). However, I could have spent more time doing more fine grained filtering to remove random streaks of other colors as well.
Before
After