Colorizing the Prokudin-Gorskii Photo Collection

11-747 Learning-based Image Synthesis Manuel Rodriguez Ladron de Guevara


process Harvesters. Animated process of the pyramid alignment algorithm.

Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii glass plate images and produce a color image using image processing techniques, aiming to have as few visual artifacts as possible.

The input of the process is a digitized glass plate image that consists of 3 separate channels ordered from top to bottom as BGR as shown in Figure 1. Our job is to separate the 3 channels and align them as better as possible. The question is how to align them efficiently?. The straight-away inefficient approach is to search over all possible displacements until we find best alignment. This is prohibitively expensive for large images—images with one of the dimensions larger than 3000px.

Cathedral Figure 1. Digitized Glass Plate .

Search Methods

Each alignment gets a score using some image matching metrics. For this assingment, I explored Sum of Squared Differences (SSD), Normalized Cross Correlation (NCC), and Peak Signal-to-Noise Ration (PSNR) functions. The objective is to maximize or minimize one of these functions by searching which array configuration finds the highest or lowest value.

Sum of Squared Differences SSD is calculated based on the following equation: $$ SSD = \sum_{x,y} (I_{x,y} - H_{x,y})^2 $$ where $I$ and $H$ are the two arrays and $x,y$ are pixels corresponding to the height and width dimensions respectively.

Normalized Cross Correlation NNC is calculated based on the following equation: $$ NCC = \frac{\sum_{x,y} (I_{x,y} - \mu I) (H_{x,y} - \mu H) }{\sqrt {\sum_{x,y} (I_{x,y} - \mu I)^2} \sqrt{\sum_{x,y} (H_{x,y} - \mu H)^2}} $$ where $\mu I$ and $\mu H$ are the luminance of each image.

Peak Signal-to-Noise Ration PSNR is calculated based on the following equation: $$ PSNR = 20 * \log_{10} { \frac{p_m}{\sqrt{\sigma{IH}}}} $$ where $\sigma{IH}$ is the mean of squared error function between the two images and $p_m$ is the maximum pixel intensity.

To make an efficient algorithm, the assignment already suggests to use an image pyramid—downsampling the image n times by a factor of 2—and do the search over the new set of lower resolution images. Having the image pyramid algorithm as a backbone for this project, I propose two different approaches, besides exhaustive searching over the whole pixel space.

Exhaustive Search The exhaustive search is the easy way out solution for small images, since the time complexity of the search spaces is $O(P)^2$, and consists of iterating over the rows and cols of the image array and computing a metric that compares such two array configurations.

Image Pyramid Search This is an efficient search algorithm that generally yields good results in less than 60 seconds for high resolution images—up to 9000 pixels. As mentioned above, this algorithm scales down the image by a factor of 2 n times, being n an automated parameter based on the input image resolution. Given an initial window search, the algorithm searches from coarse to fine images, within the reduced window size, which decreases by a factor of 0.8 as the images become larger. Window size is a hyperparameter.

Evolutionary Search I propose a novel approach to search the best alignment, inspired by the Darwinian theory. This black-box optimization algorithm consists on treating the search space as a multivariate Gaussian, with varying mean and covariance matrix. The algorithm starts with a population $P$, initial $\mu=0$, and initial covariance $100I$. We select elites by keeping a fraction of the population of $10\%$ at each iteration, and noise $\epsilon$ is added to covariance at each step: $0.25I$. We optimize the different metrics mentioned above. We let the algorithm run for 10 periods each time and set an initial Gaussian variance of $\sigma = 10$, which decreases by a factor of $0.9$ at each pyramid level. We optimize using the following equations: $$ \mu_{t+1} \leftarrow \frac{1}{el_s}\sum_{i-1}^{el_s} \theta_t^{(i)}, \;\;\; \sum{t} \leftarrow Cov (\theta_t^{(1)}, ..., \theta_t^{(el_s)}) + \epsilon I, $$ where $el_s$ is the elite size and $\theta_t^{(i)}$ denotes the i-th best parameters from the previous iteration and $\epsilon \in \mathcal{R}$ is a small constant.

Bells & Whistles

We introduce various improvements over our baselines.

Automated Edge Cropping Natural to the glass plates is to have artifacts along the borders. While for the baselines we do not touch the edges of the images, here we detect edges automatically by applying an intensity threshold at rows and columns near the image limits. To alleviate a potential til of the edges, instead of looking at values of single rows or cols, we look at multiple and compute a threshold within the various rows or cols.

Pytorch Implementation We re-implement the algorithm in Pytorch, this time making use of functions such as MSE loss, and efficient built-in operations such as convolutions.

Automatic Contrasting We implement automatic contrast to improve the perception of the image quality.

Results and Discussion

We show here the results obtained by different algorithms and metrics.

Pyramid algorithm

The following images are aligned using the pyramid algorithm baseline. Unless otherwise stated, there is no prior automatic cropping—we show how, in some cases, prior cropping helps alignment. Likewise, and unless otherwise stated, the alignment is calculated using SSD metric.

Cathedral. Original alignment
Cathedral. Pyramid alignment. CPU time: 0.40s
Cathedral. Pyramid alignment PSNR. CPU time: 0 .466s
Emir. Original alignment
Emir.Pyramid alignment. CPU time: 54.32s. Vector R: [-9,-89] Vector G: [-17, -79]
Emir.Pyramid alignment PSNR. CPU time: 48.02s
Icon. Original alignment
Icon. Pyramid alignment. CPU time: 47.09s
Icon. Pyramid alignment with prior 0.8 cropping. CPU time: 35.323s
Lady. Original alignment
Lady. Pyramid alignment. CPU time: 49.24s
Lady. Pyramid alignment PSNR. CPU time: 48.59s
Turkmen. Original alignment
Turkmen. Pyramid alignment. CPU time: 46.72s
Turkmen. Pyramid alignment PSNR. CPU time: 49 .45s
Harvesters. Original alignment
Harvesters. Pyramid alignment. CPU time: 46.42s
Harvesters. Pyramid alignment PSNR. CPU time: 46 .86s

Evolutionary Approach with Pyramid Algorithm

The following images are aligned using the evolutionary approach described above integrated in our pyramid algorithm. This process treats a space of actions as a gaussian distribution, and the portion of the population tha best fits the problem leave their offsprings through the mean and variance of the next gen vector.

Due to time constraints, we show the not-so-good results taken by this approach, although I'll work on this to get it done, since the part left is a grid-search of of hyperparameters.

Turkmen. Evolutionary algorithm.
Cathedral. Evolutionary algorithm.

Bells and Whistles

Results on automatic contrasting

Turkmen. Baseline.
Turkmen. Automatic Contrasting.
Harvesters. Baseline. Vector R: [38, -96] Vector G: [-25, -58] .86s
Harvesters. Automatic Contrasting. Vector R: [38, -96] Vector G: [-25, -58]

Results on automatic cropping.

Three Generations. Baseline. Vector R: [1, -100] Vector G: [-34, -8] .86s
Harvesters. Automatic Contrasting. Vector R: [1, -100] Vector G: [-34, -8]

All images + Vector Displacements

The rest of the images using SSD + automatic fixed post-cropping:

Harvesters. Vector R: [11 20], Vector G: [ 9 -10] alignment
Cathedral. Vector R: [4 4], Vector G: [ 3 6]
Emir. Vector R: [13 7], Vector G: [ 9 13]
Three-generations. Vector R: [10 14], Vector G: [ 9 10] alignment
Icon. Vector R: [16 12], Vector G: [13 4]
Village. Vector R: [11 20], Vector G: [ 7 19]
Portrait. Vector R: [14 -6], Vector G: [ 8 16] alignment
Turkmen. Vector R: [10 14], Vector G: [ 9 -2]
Train. Vector R: [10 17], Vector G: [ 7 14]
Lady. Vector R: [10 16], Vector G: [ 7 11] alignment