16726 HW1 Zihang Lai

Overview of the project

In this project, we colorize the Prokudin-Gorskii Photo Collection. The collection contains images represented as 3 separate single RGB color image. Since the images are not perfectly aligned, in this project, we aim to find the best possible offset which aligns the images. After the image is aligned, we stack the separated images to produce a colorful image.

Method

For image A to align with image B, the general idea is to move the image A around, and compare with image B to see if the move results in a successful alignment. If the alignment is successful, a loss between the moved image A and the image B should be small. However, if we naively apply this approach, the computation for large images could be huge because the search space (there are many possible movement for image A). Thus, we additionally apply a coarse to fine structure, in which the image is first aligned in relatively coarse scale. Because images in coarse scale have limited search space, the alignment is fast. After the alignment is made in the coarse scale, we iteratively refine (i.e. increase the resolution of the input image) and build upon the existing alignment offset computed in the previous scale. Because each step we only refine the previous coarse alignment, the search space is also relatively small even in high resolution. In this way, we are able to achieve fast computation (~1min) even for 10M pixel images. For more details, please refer to the attached code.

In [56]:
%matplotlib inline
from hw1_utils import *
import matplotlib.pyplot as plt

Main results

In [31]:
# name of the input file
imname = 'data/cathedral.jpg'
im_out, offsetg, offsetr = load_and_align(imname, metric_fn=loss_grad, scales=3)
print(f'G offset: {offsetg}, R offset: {offsetr}')
# display the image
skio.imshow(im_out)
skio.show()
G offset: (2, 5), R offset: (3, 12)
In [34]:
# name of the input file
imname = 'data/emir.tif'
im_out, offsetg, offsetr = load_and_align(imname, metric_fn=loss_grad, scales=6)
print(f'G offset: {offsetg}, R offset: {offsetr}')
# display the image
skio.imshow(im_out)
skio.show()
G offset: (23, 50), R offset: (41, 106)
In [35]:
# name of the input file
imname = 'data/lady.tif'
im_out, offsetg, offsetr = load_and_align(imname, metric_fn=loss_grad, scales=6)
print(f'G offset: {offsetg}, R offset: {offsetr}')
# display the image
skio.imshow(im_out)
skio.show()
G offset: (9, 57), R offset: (13, 120)

Extra examples from Prokudin-Gorskii collection

See more examples in the folder

In [36]:
# name of the input file
imname = 'data_ours/monastery.tif'
im_out, offsetg, offsetr = load_and_align(imname, metric_fn=loss_grad, scales=6)
print(f'G offset: {offsetg}, R offset: {offsetr}')
# display the image
skio.imshow(im_out)
skio.show()
G offset: (-3, 48), R offset: (-16, 104)

Bells & Whistles

1. Use image features (gradient) rather than color values

When measuring the distance after moving one of the images to align with the other image, the naive way would be comparing their pixel values. However, because the image reflect different colors, they do not necessarily have the same pixel value. Therefore the result could be not satisfactory when loss such as SSD and SAD are applied. Instead, we can use the edge, as reflected in the gradient, as a feature for comparison. This is because although the pixel value for different color channels could be different, the edge is usually consistent.

In [49]:
# name of the input file
imname = 'data/cathedral.jpg'
im_out, _, _ = load_and_align(imname, metric_fn=loss_grad, scales=3)
im_out2, _, _ = load_and_align(imname, metric_fn=loss_ssd, scales=3)

# display the image
plt.figure(figsize=(12,8))
plt.subplot(1,2,1)
plt.imshow(im_out)
plt.title('With image gradient')
plt.subplot(1,2,2)
plt.imshow(im_out2)
plt.title('Without image gradient')
plt.show()
plt.figure(figsize=(12,8))
plt.subplot(1,2,1)
plt.imshow(im_out[50:200,150:300])
plt.title('With image gradient (details)')
plt.subplot(1,2,2)
plt.imshow(im_out2[50:200,150:300])
plt.title('Without image gradient (details)')
plt.show()

2. Automatic cropping

To remove excessive edge on the four sides of the image, we can use the image gradient as a indicator of where the edge is. Specifically, a high image gradient indicates the edge. See the following figure for an example.

In [85]:
def trim_border(im_out, delta=12):
    grad_a1 = im_out[:-1] - im_out[1:]
    grad_a2 = im_out[:,:-1] - im_out[:,1:]
    h,w,c = im_out.shape
    left = abs(grad_a1).sum(2).sum(1)[:w//2].argmax() + delta
    right = abs(grad_a1).sum(2).sum(1)[w//2:].argmax() + w//2 - delta
    top = abs(grad_a2).sum(2).sum(0)[:h//2].argmax() + delta
    bottom = abs(grad_a2).sum(2).sum(0)[h//2:].argmax() + h//2 - delta
    trim_out = im_out[left:right, top:bottom]
    return trim_out
In [86]:
# name of the input file
# im_out, _, _ = load_and_align(imname, metric_fn=loss_grad, scales=3)
im_out3 = trim_border(im_out)

# display the image
plt.figure(figsize=(12,8))
plt.subplot(1,2,1)
plt.imshow(im_out3)
plt.title('With automatic border removal')
plt.subplot(1,2,2)
plt.imshow(im_out)
plt.title('Without automatic border removal')
plt.show()

My own Bells & Whistles

1. Color transfer

We can apply the color transfer technique as described in Color Transfer between Images (Reinhard et al., 2001). In the following example, we transferred the color style of the image Starry Night of Van Gogh to our cathedral image.

In [87]:
im_van_gogh = skio.imread('Starry Night.jpg').astype(np.float32)/255
mean_vg = im_van_gogh.reshape(-1,3).mean(0)
std_vg = im_van_gogh.reshape(-1,3).std(0)

im_ori = im_out3
mean_ori = im_ori.reshape(-1,3).mean(0)
std_ori = im_ori.reshape(-1,3).std(0)
In [89]:
im_styled = (im_ori - mean_ori) * std_vg / std_ori + mean_vg
im_styled = np.clip(im_styled,0,1)
plt.figure(figsize=(14,6))
plt.subplot(1,3,1)
skio.imshow(im_van_gogh)
plt.title("Starry Night")
plt.subplot(1,3,2)
skio.imshow(im_ori)
plt.title("Original image")
plt.subplot(1,3,3)
skio.imshow(im_styled)
plt.title("Color transferred image")
skio.show()

2. Something PyTorch

Although I didn't reimplement the code with PyTorch, we can play around with some very useful PyTorch image processing and visualization functions.

In [95]:
import torch
import torchvision.utils
import torch.nn.functional as F
In [122]:
def show(img,figsize=(12,36)):
    plt.figure(figsize=figsize)
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1,2,0)), interpolation='nearest')
In [123]:
# 1. Make Grid
h,w,c = im_out3.shape
images = torch.tensor(np.stack([im_out3]*10).reshape(10,h,w,3)).permute(0,3,1,2)
show(torchvision.utils.make_grid(images,nrow=5))
In [125]:
# 2. Affine transformation
image_tensor = torch.tensor(im_out3).reshape(1,h,w,3).permute(0,3,1,2).float()
grid = F.affine_grid(torch.tensor([[[0.707,-0.707,0.2],[0.707,0.707,0]]]),(1,3,h,w))
im_resample = F.grid_sample(image_tensor, grid)
show(im_resample[0],(6,6))
In [127]:
# 3. Interpolation (resize)
image_large1 = F.interpolate(image_tensor, (h*2,w*2), mode='bilinear')
image_large2 = F.interpolate(image_tensor, (h*2,w*2), mode='nearest')
show(image_large1[0])
plt.title('Bilinaer interpolation')
show(image_large2[0])
plt.title('Nearest interpolation')
Out[127]:
Text(0.5, 1.0, 'Nearest interpolation')
In [ ]: