HW2: Single-view Reconstruction

Question 1: Camera Matrix \( \mathbf{P} \) from 2D-3D Correspondences

Part (a): Stanford Bunny

Overview

In this part, we compute the camera matrix \( \mathbf{P} \) using provided 2D-3D point correspondences from an image of the Stanford Bunny. We then project additional 3D points onto the image using the computed \( \mathbf{P} \) to visualize the results.

Algorithm Steps

Data Loading: Load the image data/q1/bunny.jpeg, the 2D-3D correspondences from data/q1/bunny.txt, surface points from data/q1/bunny_pts.npy, and bounding box edges from data/q1/bunny_bd.npy.
Construct Matrix \( \mathbf{A} \): Using the Direct Linear Transformation (DLT) method, construct matrix \( \mathbf{A} \) such that \( \mathbf{A} \mathbf{p} = \mathbf{0} \), where \( \mathbf{p} \) is a vectorized form of the camera matrix \( \mathbf{P} \).
Solve for \( \mathbf{P} \): Perform Singular Value Decomposition (SVD) on \( \mathbf{A} \) and extract \( \mathbf{p} \) corresponding to the smallest singular value. Reshape \( \mathbf{p} \) into the \( 3 \times 4 \) camera matrix \( \mathbf{P} \).
Project 3D Points: Use \( \mathbf{P} \) to project the surface points and bounding box edges onto the image plane.
Visualization: Plot the original image, annotated 2D points, projected surface points, and the bounding box overlayed on the image.

Relevant Equations

For each 2D-3D correspondence \( (\mathbf{x}, \mathbf{X}) \), where \( \mathbf{x} = [x, y, 1]^\top \) and \( \mathbf{X} = [X, Y, Z, 1]^\top \), the projection equation is:

\[ \mathbf{x} = \mathbf{P} \mathbf{X} \]

This leads to the following equations for each correspondence:

\[ \begin{cases} X p_{11} + Y p_{12} + Z p_{13} + p_{14} - x(X p_{31} + Y p_{32} + Z p_{33} + p_{34}) = 0 \\ X p_{21} + Y p_{22} + Z p_{23} + p_{24} - y(X p_{31} + Y p_{32} + Z p_{33} + p_{34}) = 0 \end{cases} \]

These equations are used to fill the matrix \( \mathbf{A} \) for all correspondences.

Implementation Details

In the code, we read the correspondences and build the matrix \( \mathbf{A} \) as follows:

for i in range(num_points):
    X, Y, Z = points_3d[i]
    x, y = points_2d[i]
    A.append([X, Y, Z, 1, 0, 0, 0, 0, -x*X, -x*Y, -x*Z, -x])
    A.append([0, 0, 0, 0, X, Y, Z, 1, -y*X, -y*Y, -y*Z, -y])

We then solve for \( \mathbf{P} \) using SVD:

_, _, Vt = np.linalg.svd(A)
P = Vt[-1].reshape(3, 4)

Quantitative Result

The calculated camera matrix P is shown in the following figure.

Camera Matrix P

Visualization

The projected surface points and bounding box edges are overlayed on the original image to visualize the correctness of the computed camera matrix \( \mathbf{P} \).

Stanford Bunny Projection Result

Part (b): Cuboid

Overview

In this part, we capture an image of a cuboid, define a 3D coordinate system based on the cuboid's dimensions, and annotate 2D-3D point correspondences. We then compute the camera matrix \( \mathbf{P} \) and project the edges of the cuboid onto the image.

Algorithm Steps

Data Acquisition: Capture or find an image of a cuboid.
Define 3D Coordinates: Assign 3D coordinates to key points on the cuboid, measuring relative dimensions.
Annotate Correspondences: Annotate the corresponding 2D points on the image.
Compute \( \mathbf{P} \): Use the DLT method as in Part (a) to compute the camera matrix.
Project Cuboid Edges: Define the cuboid's vertices and edges, project them using \( \mathbf{P} \), and overlay the edges on the image.

Implementation Details

We define the 3D coordinates of the cuboid's vertices and annotate the corresponding 2D points interactively:

# Define the 3D coordinates
points_3d = np.array([
    [0, 0, 0],
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
    [1, 1, 0],
    [0, 1, 1],
])

# Annotate 2D points
plt.imshow(img_rgb)
points_2d = np.array(plt.ginput(num_points, timeout=0))

We then compute \( \mathbf{P} \) and project the cuboid's edges onto the image.

Quantitative Result

The calculated camera matrix P is shown in the following figure.

Camera Matrix P

Visualization

The projected cuboid edges are overlayed on the original image:

Cuboid Projection Result

Question 2: Camera Calibration \( \mathbf{K} \) from Annotations

Part (a): Calibration from Vanishing Points

Overview

We compute the intrinsic camera matrix \( \mathbf{K} \) from a triad of orthogonal vanishing points, assuming zero skew and square pixels. This involves annotating pairs of parallel lines in the image and utilizing properties of the vanishing points. We can annotate the image by hand, or load the .npy file for getting annotation.

Algorithm Steps

Data Loading: Load the image data/q2a.png and annotations from data/q2/q2a.npy.
Compute Vanishing Points: For each pair of parallel lines, compute the vanishing point by intersecting the lines.
Set Up Equations: Formulate equations based on the orthogonality of directions corresponding to the vanishing points.
Solve for \( \omega \): Solve the linear system to find the image of the absolute conic \( \omega \).
Compute \( \mathbf{K} \): Extract the intrinsic parameters from \( \omega \) and compute \( \mathbf{K} \).

Relevant Equations

For vanishing points \( \mathbf{v}_i \) and \( \mathbf{v}_j \) corresponding to orthogonal directions, the following relationship holds:

\[ \mathbf{v}_i^\top \omega \mathbf{v}_j = 0 \]

Assuming zero skew and square pixels, \( \omega \) can be parameterized as:

\[ \omega = \begin{bmatrix} \omega_0 & 0 & \omega_1 \\ 0 & \omega_0 & \omega_2 \\ \omega_1 & \omega_2 & \omega_3 \end{bmatrix} \]

Implementation Details

We compute the vanishing points and set up the system \( \mathbf{A} \mathbf{w} = \mathbf{0} \):

# Compute vanishing points
for lines in annotations:
    # Compute vanishing point for each pair
    vp = np.cross(l1, l2)
    vp = vp / vp[2]
    vanishing_points.append(vp)

# Set up equations
A.append([
    vp1[0]*vp2[0] + vp1[1]*vp2[1],
    vp1[0]*vp2[2] + vp1[2]*vp2[0],
    vp1[1]*vp2[2] + vp1[2]*vp2[1],
    vp1[2]*vp2[2]
])

We solve for \( \omega \) and compute \( \mathbf{K} \):

# Solve for omega
_, _, Vt = np.linalg.svd(A)
omega = Vt[-1]
Omega = construct_omega_matrix(omega)

# Compute K
K_inv_T = np.linalg.cholesky(Omega)
K = np.linalg.inv(K_inv_T.T)
K = K / K[2, 2]

Quantitative Result

The calculated K is shown in the following figure.

Visualization

We visualize the annotated lines, vanishing points, and the computed principal point:

Vanishing Points and Principal Point

Part (b): Calibration from Metric Planes

Overview

In this part, we compute the intrinsic camera matrix \( \mathbf{K} \) from images of three squares without making additional assumptions about \( \mathbf{K} \). We use homographies and properties of the plane to solve for \( \mathbf{K} \). We can annotate the image by hand, or load the .npy file for getting annotation.

Algorithm Steps

Data Loading: Load the image data/q2b.png and annotations from data/q2/q2b.npy.
Compute Homographies: For each square, compute the homography mapping the square to the image plane.
Set Up Equations: Use the homographies to set up equations involving the intrinsic parameters.
Solve for \( \mathbf{B} \): Solve the system to find the symmetric matrix \( \mathbf{B} = \mathbf{K}^{-\top} \mathbf{K}^{-1} \).
Compute \( \mathbf{K} \): Extract the intrinsic parameters from \( \mathbf{B} \) and compute \( \mathbf{K} \).
Evaluate Angles: Compute the angles between each pair of planes to assess the correctness of the calibration.

Relevant Equations

For each homography \( \mathbf{H} \), we have:

\[ \mathbf{h}_i^\top \mathbf{B} \mathbf{h}_j = \begin{cases} 0 & i \ne j \\ 1 & i = j \end{cases} \]

This leads to a system of equations that can be expressed as \( \mathbf{V} \mathbf{b} = \mathbf{0} \), where \( \mathbf{b} \) contains the elements of \( \mathbf{B} \).

Implementation Details

We compute the homographies and set up the matrix \( \mathbf{V} \):

# Compute homographies
for i in range(3):
    H, status = cv2.findHomography(world_pts, img_pts)
    homographies.append(H)

# Set up equations
for H in homographies:
    v12 = ...
    v11 = ...
    v22 = ...
    V.append(v12)
    V.append(v11 - v22)

We solve for \( \mathbf{b} \) and compute \( \mathbf{K} \):

# Solve for b
_, _, VT = np.linalg.svd(V)
b = VT[-1]

# Construct B and compute K
B = np.array([...])
K = compute_K_from_B(B)

Evaluate Angles Between Planes

We compute the normals of each plane and calculate the angles between them:

# Compute plane normals
normals = []
for i in range(3):
    n = compute_plane_normal(annotations[i], K)
    normals.append(n)

# Compute angles
for i in range(3):
    for j in range(i+1, 3):
        angle_rad = np.arccos(np.clip(np.dot(normals[i], normals[j]), -1.0, 1.0))
        angle_deg = np.degrees(angle_rad)
        print(f"Angle between Plane {i+1} and Plane {j+1}: {angle_deg:.2f} degrees")

Quantitative Result

The calculated K and the angles between each pair of planes is shown in the following figure.

Visualization

Annotated squares on the image:

Annotated Image

Part (c): Camera calibration from rectangles with known sizes

Overview

In this part, we run the code of question 2(b) on my own image. We can only annotate the image by hand.

Quantitative Result

The calculated K and the angles between each pair of planes is shown in the following figure.

Visualization

Annotated squares on the image:

Annotated My images

Question 3: Single View Reconstruction

Overview

This question involves reconstructing a colored point cloud from a single image using plane annotations. We compute the intrinsic matrix \( \mathbf{K} \), plane normals, and perform ray-plane intersection to obtain 3D coordinates of points in the image.

Algorithm Steps

Load Data and Annotations: Load the image data/q3.png and the plane annotations from data/q3/q3.npy.
Visualization of Annotations: Plot the annotated planes over the image to confirm correctness.
Compute the Intrinsic Matrix \( \mathbf{K} \): Estimate \( \mathbf{K} \) by computing the focal length \( f \) from vanishing points, assuming the principal point is at the image center.
Compute Plane Normals and Equations: For each plane, compute the plane normal and equation using the vanishing points and a reference point on the plane.
Compute 3D Coordinates via Ray-Plane Intersection: For each pixel within a plane, compute the 3D coordinates by intersecting the ray from the camera center through the pixel with the plane.
Visualization of the Reconstruction: Plot the reconstructed 3D point cloud from different viewpoints.

Relevant Equations

Focal Length Estimation:

\[ f^2 = - ((x_i - u_0)(x_j - u_0) + (y_i - v_0)(y_j - v_0)) \]

Intrinsic Matrix \( \mathbf{K} \):

\[ \mathbf{K} = \begin{bmatrix} f & 0 & u_0 \\ 0 & f & v_0 \\ 0 & 0 & 1 \end{bmatrix} \]

Ray-Plane Intersection:

\[ \lambda = -\frac{d}{\mathbf{n}^\top \mathbf{r}} \] \[ \mathbf{X} = \lambda \mathbf{r} \]

Implementation Details

We compute the intrinsic matrix \( \mathbf{K} \) by estimating the focal length \( f \):

# Compute image center
u0 = w / 2
v0 = h / 2

# Compute f from vanishing points
for i in range(len(vanishing_points)):
    for j in range(i+1, len(vanishing_points)):
        numerator = - ( (xi - u0)*(xj - u0) + (yi - v0)*(yj - v0) )
        f_sq = numerator
        f_values.append(np.sqrt(f_sq))
f = np.mean(f_values)

We compute plane normals and equations:

# Compute direction vectors
d1 = K_inv @ vp1
d2 = K_inv @ vp2

# Compute plane normal
n = np.cross(d1[:3], d2[:3])
n = n / np.linalg.norm(n)

# Compute plane equation
d = -n @ X0

We perform ray-plane intersection for each pixel in the plane:

# Rays from camera center through image pixels
rays = K_inv @ pixels

# Compute lambda for each ray
lambdas = -d / (n @ rays)

# Compute 3D points
Xs = rays * lambdas

Visualization

Reconstructed 3D point cloud from two different viewpoints:

Annotations

Reconstruction

How to Run My Code

You can run the code like this:

Q1a: python q1a.py
Q1b: python q1b.py
Q2a: python q2a.py
Q2b: python q2b.py
Q2c: python q2c.py
Q3a: python q3a.py

Notice that for q2a and q2b, you can change the variable "use_existing_annotations" to decide whether to use the .npy annotation file or annotate by hand.

You can also run the code in this way:

Q1a: python main.py --question q1a
Q1b: python main.py --question q1b
Q2a: python main.py --question q2a --use_existing_annotations [True/False]
Q2b: python main.py --question q2b --use_existing_annotations [True/False]
Q2c: python main.py --question q2c
Q3a: python main.py --question q3a