This is my webpage submission for 16822 assignment 2.

Q1: Camera matrix P from 2D-3D correspondences

Q1a: Stanford bunny

Using the provided 2D-3D correspondences as well as 2D-3D correspondences that I captured myself, I compute camera matrices, P. To solve for this 3x4 matrix, I construct an A matrix with the two following constraints for each 2D-3D correspondence:

(0TwiXiTyiXiTwiXiT0TxiXiT)(P1P2P3)=0\begin{pmatrix} \bf{0}^T & -w_i\bf{X}_i^T & y_i\bf{X}_i^T \\ w_i\bf{X}_i^T & \bf{0}^T & -x_i\bf{X}_i^T \\ \end{pmatrix} \begin{pmatrix} \bf{P}^1 \\ \bf{P}^2 \\ \bf{P}^3 \end{pmatrix} = \bf{0}

Here, the lowercase xix_i, yiy_i, and wiw_i represent the 2D homogeneous coordinates of the pixels and the uppercase Xi\bf{X}_i represents the 3D homogeneous coordinate (a 4-vector). The P\bf{P} values represent rows of the P matrix that we are trying to estimate, where P1\bf{P}^1 is the first row, P2\bf{P}^2 is the second row, etc. Each row of my A matrix has 12 elements, and the P matrix that we're multiplying it by has been "unrolled" into a 12x1 matrix. Using the 8 provided correspondences therefore gives me a 16x12 A matrix. I solve for P via SVD and use this matrix to project 3D points into the image frame to produce my outputs with surface points and bounding boxes.

image

Original image
Annotated 2D points
Surface points
Bounding box

Q1b: Cuboid

As a second exercise, I captured an image of a cube and computed the camera matrix P for this image. The constraints were the same as above, and I used 6 correspondences to generate a 12x12 A matrix. Since 3D points weren't provided, I constructed my own world coordinate system that set the cube corner on the bottom-left of the image (annotated as (693, 2212)) as the world origin (0,0,0). From here, I assigned each edge of my cube a length of 1, which allowed me to assign 3D points to the cube such as (0,1,0) and (1,1,0) to represent the corners and to establish my correspondences.

image

Original image
Annotated 2D points
Block with edges

Q2: Camera calibration K from annotations

Q2a: Camera calibration from vanishing points

I compute K using three orthogonal vanishing points, each calculated from a pair of mutually orthogonal parallel lines. For this question, I assume that the camera has zero skew and that the pixels are square. This tells me that my K matrix will ultimately be of the form (f0px0fpy001)\begin{pmatrix} f & 0 & p_x \\ 0 & f & p_y \\ 0 & 0 & 1 \end{pmatrix}

Since I am given points, I first use l=xl = \bf{x} x x\bf{x}' to compute pairs of parallel lines. Next, I compute vanishing points as the intersection of these lines in P2 space using v=lv = \bf{l} x l\bf{l}'. The constraint that I am using when constructing my A matrix here is vTWv=0\bf{v^TWv'} = 0, which provides one constraint for each pair of vanishing points. Note that I first solve for W, or omega, which is the image of the absolute conic in this space. Under the assumption of zero skew and square pixels, it has the form W=(W10W20W1W3W2W3W4)W = \begin{pmatrix} W1 & 0 & W2 \\ 0 & W1 & W3 \\ W2 & W3 & W4 \end{pmatrix}. By pairing vanishing points 1 and 2, 1 and 3, and 2 and 3, I obtain three constraints organized into a 3x4 A matrix.

Once I've computed W, I obtain the camera intrinsics K as the inverse of the Cholesky decomposition of W. The principal point can simply be read as the last column of this K matrix.

image

Original image
Annotated parallel lines
Vanishing points and principal point

Q2b: Camera calibration from metric planes

In order to compute the camera intrinsics from three metric planes, I must again compute the image of the absolute conic first. Since we are not making assumptions about zero skew or square pixels, however, omega now has the form W=(W1W2W4W2W3W5W4W5W6)W = \begin{pmatrix} W1 & W2 & W4 \\ W2 & W3 & W5 \\ W4 & W5 & W6 \end{pmatrix}.

I first compute homographies between each square plane's corner points and the corners of an ideal square defined as the points (0,1,1), (1,1,1), (1,0,1), (0,0,1) in P2 space. As a test of my resulting homographies, I verify that the point (0.5, 0.5, 1) maps to the center of each square in image space. The advantage of going through this extra effort to compute homographies is that each homography gives me two constraints, one of the form h1TWh2=0\bf{h_1^TWh_2} = 0 and the other of the form h1TWh1=h2TWh2\bf{h_1^TWh_1} = \bf{h_2^TWh_2}. With three homographies, we obtain the six constraints necessary to solve for all components of W again using SVD. Similar to Q2a, I obtain camera intrinsics K as the inverse of the Cholesky decomposition of W.

image

Original image
Annotated square 1
Annotated square 2
Annotated square 3
Homography test (0.5,0.5,1) maps to center

Angles computed using instrinsics and vanishing lines:

Angles between planes (degrees)
Plane 1 & Plane 2 67.28
Plane 1 & Plane 3 92.20
Plane 2 & Plane 3 94.71

Q2c: Camera calibration from rectangles with known sizes

Using the same algorithm and constraints as in Q2b, I annotated rectangles and derived an intrinsics matrix from them. Instead of mapping the ideal square as in the previous exercise, however, I mapped to an ideal rectangle whose height and width matched the height and width of the real-life rectangles that I captured. For example, the laptop screen and keyboard planes each measured 9.5 x 14 inches, so the ideal rectangle for these planes was (0, 9.5, 1), (14, 9.5, 1), (14, 0, 1), (0, 0, 1). Similar to the last exercise, I derive constraints and produce the below images. Unfortunately, I encountered an issue with my omega matrix being not postive definite, which prevented me from solving for the K matrix at the end of the algorithm.

Original image
Annotated square 1
Annotated square 2
Annotated square 3
Homography test maps to center

Q3: Single view reconstruction

Q3a: Provided image

Here, I reconstruct a colored point cloud from a single image. This task requires many annotations, which in this first part are provided for me. Using these annotations, I compute vanishing points and a K matrix using my result from Q2a. With this, I compute plane normals for the 5 planes using n=Klvn = Kl_v, where lvl_v is the vanishing line obtained from the plane's two vanishing points.

I choose a reference point that intersects three of the image's planes at the bottom-right of the first plane, the bottom-left of the second plane, and the top-right of the third plane. This point I assign an arbitrary depth of 1 in the world frame and compute point depths relative to this point.

Next, I iterate through each plane performing the following calculations:

Original image
Annotations
Reconstruction view 1
Reconstruction view 2
Reconstruction gif

Q3b: Captured images

I collected and annotated planes on three additional object, and then reconstructed them using the same algorithm as I implemented in Q3a.

Original image
GIF
Original image
GIF
Original image
GIF