This is my webpage submission for 16822 assignment 2.

Q1: Camera matrix P from 2D-3D correspondences

Q1a: Stanford bunny

Using the provided 2D-3D correspondences as well as 2D-3D correspondences that I captured myself, I compute camera matrices, P. To solve for this 3x4 matrix, I construct an A matrix with the two following constraints for each 2D-3D correspondence:

$\begin{pmatrix} \bf{0}^T & -w_i\bf{X}_i^T & y_i\bf{X}_i^T \\ w_i\bf{X}_i^T & \bf{0}^T & -x_i\bf{X}_i^T \\ \end{pmatrix} \begin{pmatrix} \bf{P}^1 \\ \bf{P}^2 \\ \bf{P}^3 \end{pmatrix} = \bf{0}$

Here, the lowercase $x_i$ , $y_i$ , and $w_i$ represent the 2D homogeneous coordinates of the pixels and the uppercase $\bf{X}_i$ represents the 3D homogeneous coordinate (a 4-vector). The $\bf{P}$ values represent rows of the P matrix that we are trying to estimate, where $\bf{P}^1$ is the first row, $\bf{P}^2$ is the second row, etc. Each row of my A matrix has 12 elements, and the P matrix that we're multiplying it by has been "unrolled" into a 12x1 matrix. Using the 8 provided correspondences therefore gives me a 16x12 A matrix. I solve for P via SVD and use this matrix to project 3D points into the image frame to produce my outputs with surface points and bounding boxes.

Q1b: Cuboid

As a second exercise, I captured an image of a cube and computed the camera matrix P for this image. The constraints were the same as above, and I used 6 correspondences to generate a 12x12 A matrix. Since 3D points weren't provided, I constructed my own world coordinate system that set the cube corner on the bottom-left of the image (annotated as (693, 2212)) as the world origin (0,0,0). From here, I assigned each edge of my cube a length of 1, which allowed me to assign 3D points to the cube such as (0,1,0) and (1,1,0) to represent the corners and to establish my correspondences.

Q2: Camera calibration K from annotations

Q2a: Camera calibration from vanishing points

I compute K using three orthogonal vanishing points, each calculated from a pair of mutually orthogonal parallel lines. For this question, I assume that the camera has zero skew and that the pixels are square. This tells me that my K matrix will ultimately be of the form $\begin{pmatrix} f & 0 & p_x \\ 0 & f & p_y \\ 0 & 0 & 1 \end{pmatrix}$

Since I am given points, I first use $l = \bf{x}$ x $\bf{x}'$ to compute pairs of parallel lines. Next, I compute vanishing points as the intersection of these lines in P2 space using $v = \bf{l}$ x $\bf{l}'$ . The constraint that I am using when constructing my A matrix here is $\bf{v^TWv'} = 0$ , which provides one constraint for each pair of vanishing points. Note that I first solve for W, or omega, which is the image of the absolute conic in this space. Under the assumption of zero skew and square pixels, it has the form $W = \begin{pmatrix} W1 & 0 & W2 \\ 0 & W1 & W3 \\ W2 & W3 & W4 \end{pmatrix}$ . By pairing vanishing points 1 and 2, 1 and 3, and 2 and 3, I obtain three constraints organized into a 3x4 A matrix.

Once I've computed W, I obtain the camera intrinsics K as the inverse of the Cholesky decomposition of W. The principal point can simply be read as the last column of this K matrix.

Q2b: Camera calibration from metric planes

In order to compute the camera intrinsics from three metric planes, I must again compute the image of the absolute conic first. Since we are not making assumptions about zero skew or square pixels, however, omega now has the form $W = \begin{pmatrix} W1 & W2 & W4 \\ W2 & W3 & W5 \\ W4 & W5 & W6 \end{pmatrix}$ .

I first compute homographies between each square plane's corner points and the corners of an ideal square defined as the points (0,1,1), (1,1,1), (1,0,1), (0,0,1) in P2 space. As a test of my resulting homographies, I verify that the point (0.5, 0.5, 1) maps to the center of each square in image space. The advantage of going through this extra effort to compute homographies is that each homography gives me two constraints, one of the form $\bf{h_1^TWh_2} = 0$ and the other of the form $\bf{h_1^TWh_1} = \bf{h_2^TWh_2}$ . With three homographies, we obtain the six constraints necessary to solve for all components of W again using SVD. Similar to Q2a, I obtain camera intrinsics K as the inverse of the Cholesky decomposition of W.

Homography test (0.5,0.5,1) maps to center

Angles computed using instrinsics and vanishing lines:

	Angles between planes (degrees)
Plane 1 & Plane 2	67.28
Plane 1 & Plane 3	92.20
Plane 2 & Plane 3	94.71

Q2c: Camera calibration from rectangles with known sizes

Using the same algorithm and constraints as in Q2b, I annotated rectangles and derived an intrinsics matrix from them. Instead of mapping the ideal square as in the previous exercise, however, I mapped to an ideal rectangle whose height and width matched the height and width of the real-life rectangles that I captured. For example, the laptop screen and keyboard planes each measured 9.5 x 14 inches, so the ideal rectangle for these planes was (0, 9.5, 1), (14, 9.5, 1), (14, 0, 1), (0, 0, 1). Similar to the last exercise, I derive constraints and produce the below images. Unfortunately, I encountered an issue with my omega matrix being not postive definite, which prevented me from solving for the K matrix at the end of the algorithm.

Q3: Single view reconstruction

Q3a: Provided image

Here, I reconstruct a colored point cloud from a single image. This task requires many annotations, which in this first part are provided for me. Using these annotations, I compute vanishing points and a K matrix using my result from Q2a. With this, I compute plane normals for the 5 planes using $n = Kl_v$ , where $l_v$ is the vanishing line obtained from the plane's two vanishing points.

I choose a reference point that intersects three of the image's planes at the bottom-right of the first plane, the bottom-left of the second plane, and the top-right of the third plane. This point I assign an arbitrary depth of 1 in the world frame and compute point depths relative to this point.

Next, I iterate through each plane performing the following calculations:

Identify all pixel coordinates within the plane
Compute the plane equation using the plane's normal as the first three coefficients and computing the fourth coefficient from the normal, the inverse of the intrinsics, and the reference point (which lies on the plane): $d = n \cdot K^{-1}ref$
Compute rays for each point: $K^{-1}x_i$
Determine the appropriate depth for each 3D point using the ray and the plane equation and apply it to the ray to determine the specific point in 3D space.

Q3b: Captured images

I collected and annotated planes on three additional object, and then reconstructed them using the same algorithm as I implemented in Q3a.