16-726 | Qin Han | qinh@andrew.cmu.edu.cn
Using Pytorch, I implemented an algorithm to align three color channels of the digitized Prokudin-Gorskii glass plate images so that they form a single RGB color image. The algorithm (1) calculates the norm of gradients of each channel to extract the edge; (2) constructs a pyramid of edges for each channel; (3) iteratively detects the shift between pyramids of channels; (4) stacks shifted RGB channels to form an RGB color image; (5) applies automatic border cropping, automatic white balancing, and automatic contrasting to the raw image to improve perceived image quality.
Below is an example of the algorithm, where the images are (1) shifted by raw pixels, (2) shifted by edges, (3) automatic cropping, (4) automatic white balancing, and (5) automatic contrasting, respectively.
To align two channels, the algorithm searches over a fixed-size shift window, and uses the L2 or NCC score to find the best shift, where the L2 score is defined as the L2 norm of the difference between the two channels, and the NCC score is defined as the normalized cross-correlation between the two channels. Also, I cropped the border to improve the matching stability. The image and reference can be any feature of channels. In my implementation, I tried both raw image pixels and edge features.
When dealing with large-scale images, it is inefficient to search over a very large shift window. Therefore, I constructed a pyramid of image features and found the shifts iteratively. In detail, the pyramid is constructed by downsampling the image features, and the algorithm starts from the coarsest level and iteratively refines the shift by upsampling the shift from the previous level. The search space now increases logarithmically with image size, instead of quadratically.
Below are results of the algorithm, where the images are (1) shifted by raw pixels, (2) shifted by edges, (3) automatic cropping, (4) automatic white balancing, and (5) automatic contrasting, respectively.
cathedral: shift green [5, 2], red [12, 3]
emir: shift green [49, 23], red [107, 40]
harvesters: shift green [60, 17], red [123, 13]
icon: shift green [41, 16], red [90, 23]
lady: shift green [56, 9], red [119, 13]
self_portrait: shift green [78, 29], red [175, 37]
three_generations: shift green [53, 13], red [113, 11]
train: shift green [43, 8], red [86, 33]
turkmen: shift green [56, 22], red [117, 29]
village: shift green [64, 11], red [137, 22]
All of the algorithms are implemented using PyTorch.
First, the algorithm convolves the image with the gradient kernels to get the gradient of the image. Second, calcucate the norm of the gradient, and based on that, the algorithm considers a threshold to extract the edges to be the new features. Here is an example of extracted edge features of the cathedral image.
The algorithm crops the borders of the image based on the observation that the pixel values at the edges are either very small or very large. So the algorithm iteratively calculates the mean of the image and crops the border if the mean is too small or too large.
The algorithm first calculates the average of each channel, and assumes the average is the illumination of the image. Then, the algorithm calculates the scale factor of each channel to make the average of each channel equal to the average of the three channels. Finally, the algorithm clips.
The algorithm uses non-linear sigmoid-based contrast stretching to improve the perceived image quality. The sigmoid function is defined as \(f(x) = \frac{1}{1 + e^{-\alpha(x - \mu)}}\), where \(\mu\) is the midpoint parameter of the sigmoid curve, controlling where the midpoint of the sigmoid function falls in the range of intensity values, and \(\alpha\) is a hyperparameter controlling the slope of the sigmoid function.
Below are extra examples, where the images are (1) shifted by raw pixels, (2) shifted by edges, (3) automatic cropping, (4) automatic white balancing, and (5) automatic contrasting, respectively.