In this part, we compute the camera matrix \( \mathbf{P} \) using provided 2D-3D point correspondences from an image of the Stanford Bunny. We then project additional 3D points onto the image using the computed \( \mathbf{P} \) to visualize the results.
data/q1/bunny.jpeg
, the 2D-3D correspondences from data/q1/bunny.txt
, surface points from data/q1/bunny_pts.npy
, and bounding box edges from data/q1/bunny_bd.npy
.For each 2D-3D correspondence \( (\mathbf{x}, \mathbf{X}) \), where \( \mathbf{x} = [x, y, 1]^\top \) and \( \mathbf{X} = [X, Y, Z, 1]^\top \), the projection equation is:
\[ \mathbf{x} = \mathbf{P} \mathbf{X} \]
This leads to the following equations for each correspondence:
\[ \begin{cases} X p_{11} + Y p_{12} + Z p_{13} + p_{14} - x(X p_{31} + Y p_{32} + Z p_{33} + p_{34}) = 0 \\ X p_{21} + Y p_{22} + Z p_{23} + p_{24} - y(X p_{31} + Y p_{32} + Z p_{33} + p_{34}) = 0 \end{cases} \]
These equations are used to fill the matrix \( \mathbf{A} \) for all correspondences.
In the code, we read the correspondences and build the matrix \( \mathbf{A} \) as follows:
for i in range(num_points):
X, Y, Z = points_3d[i]
x, y = points_2d[i]
A.append([X, Y, Z, 1, 0, 0, 0, 0, -x*X, -x*Y, -x*Z, -x])
A.append([0, 0, 0, 0, X, Y, Z, 1, -y*X, -y*Y, -y*Z, -y])
We then solve for \( \mathbf{P} \) using SVD:
_, _, Vt = np.linalg.svd(A)
P = Vt[-1].reshape(3, 4)
The calculated camera matrix P is shown in the following figure.
The projected surface points and bounding box edges are overlayed on the original image to visualize the correctness of the computed camera matrix \( \mathbf{P} \).
In this part, we capture an image of a cuboid, define a 3D coordinate system based on the cuboid's dimensions, and annotate 2D-3D point correspondences. We then compute the camera matrix \( \mathbf{P} \) and project the edges of the cuboid onto the image.
We define the 3D coordinates of the cuboid's vertices and annotate the corresponding 2D points interactively:
# Define the 3D coordinates
points_3d = np.array([
[0, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 1, 0],
[0, 1, 1],
])
# Annotate 2D points
plt.imshow(img_rgb)
points_2d = np.array(plt.ginput(num_points, timeout=0))
We then compute \( \mathbf{P} \) and project the cuboid's edges onto the image.
The calculated camera matrix P is shown in the following figure.
The projected cuboid edges are overlayed on the original image:
We compute the intrinsic camera matrix \( \mathbf{K} \) from a triad of orthogonal vanishing points, assuming zero skew and square pixels. This involves annotating pairs of parallel lines in the image and utilizing properties of the vanishing points. We can annotate the image by hand, or load the .npy file for getting annotation.
data/q2a.png
and annotations from data/q2/q2a.npy
.For vanishing points \( \mathbf{v}_i \) and \( \mathbf{v}_j \) corresponding to orthogonal directions, the following relationship holds:
\[ \mathbf{v}_i^\top \omega \mathbf{v}_j = 0 \]
Assuming zero skew and square pixels, \( \omega \) can be parameterized as:
\[ \omega = \begin{bmatrix} \omega_0 & 0 & \omega_1 \\ 0 & \omega_0 & \omega_2 \\ \omega_1 & \omega_2 & \omega_3 \end{bmatrix} \]
We compute the vanishing points and set up the system \( \mathbf{A} \mathbf{w} = \mathbf{0} \):
# Compute vanishing points
for lines in annotations:
# Compute vanishing point for each pair
vp = np.cross(l1, l2)
vp = vp / vp[2]
vanishing_points.append(vp)
# Set up equations
A.append([
vp1[0]*vp2[0] + vp1[1]*vp2[1],
vp1[0]*vp2[2] + vp1[2]*vp2[0],
vp1[1]*vp2[2] + vp1[2]*vp2[1],
vp1[2]*vp2[2]
])
We solve for \( \omega \) and compute \( \mathbf{K} \):
# Solve for omega
_, _, Vt = np.linalg.svd(A)
omega = Vt[-1]
Omega = construct_omega_matrix(omega)
# Compute K
K_inv_T = np.linalg.cholesky(Omega)
K = np.linalg.inv(K_inv_T.T)
K = K / K[2, 2]
The calculated K is shown in the following figure.
We visualize the annotated lines, vanishing points, and the computed principal point:
In this part, we compute the intrinsic camera matrix \( \mathbf{K} \) from images of three squares without making additional assumptions about \( \mathbf{K} \). We use homographies and properties of the plane to solve for \( \mathbf{K} \). We can annotate the image by hand, or load the .npy file for getting annotation.
data/q2b.png
and annotations from data/q2/q2b.npy
.For each homography \( \mathbf{H} \), we have:
\[ \mathbf{h}_i^\top \mathbf{B} \mathbf{h}_j = \begin{cases} 0 & i \ne j \\ 1 & i = j \end{cases} \]
This leads to a system of equations that can be expressed as \( \mathbf{V} \mathbf{b} = \mathbf{0} \), where \( \mathbf{b} \) contains the elements of \( \mathbf{B} \).
We compute the homographies and set up the matrix \( \mathbf{V} \):
# Compute homographies
for i in range(3):
H, status = cv2.findHomography(world_pts, img_pts)
homographies.append(H)
# Set up equations
for H in homographies:
v12 = ...
v11 = ...
v22 = ...
V.append(v12)
V.append(v11 - v22)
We solve for \( \mathbf{b} \) and compute \( \mathbf{K} \):
# Solve for b
_, _, VT = np.linalg.svd(V)
b = VT[-1]
# Construct B and compute K
B = np.array([...])
K = compute_K_from_B(B)
We compute the normals of each plane and calculate the angles between them:
# Compute plane normals
normals = []
for i in range(3):
n = compute_plane_normal(annotations[i], K)
normals.append(n)
# Compute angles
for i in range(3):
for j in range(i+1, 3):
angle_rad = np.arccos(np.clip(np.dot(normals[i], normals[j]), -1.0, 1.0))
angle_deg = np.degrees(angle_rad)
print(f"Angle between Plane {i+1} and Plane {j+1}: {angle_deg:.2f} degrees")
The calculated K and the angles between each pair of planes is shown in the following figure.
Annotated squares on the image:
In this part, we run the code of question 2(b) on my own image. We can only annotate the image by hand.
The calculated K and the angles between each pair of planes is shown in the following figure.
Annotated squares on the image:
This question involves reconstructing a colored point cloud from a single image using plane annotations. We compute the intrinsic matrix \( \mathbf{K} \), plane normals, and perform ray-plane intersection to obtain 3D coordinates of points in the image.
data/q3.png
and the plane annotations from data/q3/q3.npy
.Focal Length Estimation:
\[ f^2 = - ((x_i - u_0)(x_j - u_0) + (y_i - v_0)(y_j - v_0)) \]
Intrinsic Matrix \( \mathbf{K} \):
\[ \mathbf{K} = \begin{bmatrix} f & 0 & u_0 \\ 0 & f & v_0 \\ 0 & 0 & 1 \end{bmatrix} \]
Ray-Plane Intersection:
\[ \lambda = -\frac{d}{\mathbf{n}^\top \mathbf{r}} \] \[ \mathbf{X} = \lambda \mathbf{r} \]
We compute the intrinsic matrix \( \mathbf{K} \) by estimating the focal length \( f \):
# Compute image center
u0 = w / 2
v0 = h / 2
# Compute f from vanishing points
for i in range(len(vanishing_points)):
for j in range(i+1, len(vanishing_points)):
numerator = - ( (xi - u0)*(xj - u0) + (yi - v0)*(yj - v0) )
f_sq = numerator
f_values.append(np.sqrt(f_sq))
f = np.mean(f_values)
We compute plane normals and equations:
# Compute direction vectors
d1 = K_inv @ vp1
d2 = K_inv @ vp2
# Compute plane normal
n = np.cross(d1[:3], d2[:3])
n = n / np.linalg.norm(n)
# Compute plane equation
d = -n @ X0
We perform ray-plane intersection for each pixel in the plane:
# Rays from camera center through image pixels
rays = K_inv @ pixels
# Compute lambda for each ray
lambdas = -d / (n @ rays)
# Compute 3D points
Xs = rays * lambdas
Reconstructed 3D point cloud from two different viewpoints:
You can run the code like this:
python q1a.py
python q1b.py
python q2a.py
python q2b.py
python q2c.py
python q3a.py
Notice that for q2a and q2b, you can change the variable "use_existing_annotations" to decide whether to use the .npy annotation file or annotate by hand.
You can also run the code in this way:
python main.py --question q1a
python main.py --question q1b
python main.py --question q2a --use_existing_annotations [True/False]
python main.py --question q2b --use_existing_annotations [True/False]
python main.py --question q2c
python main.py --question q3a