16-822: Geometry-based Methods in Vision

Assignment 2 - Single-view Reconstruction

Long Vân Tran Ha (ltranha)

Tuesday, October 8th, 2024

Table of Contents

1. Camera matrix from 2D-3D correspondences

(a) Stanford Bunny
(b) Cuboid

2. Camera calibration from annotations

(a) Camera calibration from vanishing points
(b) Camera calibration from metric planes
(c) Camera calibration from rectangles with known sizes

3. Single-view Reconstruction

(a) Reconstruction
(b) Animated Reconstruction

1. Camera matrix from 2D-3D correspondences

(a) Stanford Bunny

Input Image	Annotated 2D points	Surface Points	Bounding Box

Brief explanation of the implementation:

$P$ should satisfy $PX \times \mathrm x = 0$ for all 3D-2D correspondences.
Each correspondence gives 2 independent equations: $$\begin{pmatrix}0_4^T & -wX^T & yX^T \\ wX^T & 0_4^T & -xX^T\end{pmatrix} p = 0_3.$$
We concatenate all equations and solve for $p$ using SVD.

Resulting camera matrix: $$P = \begin{pmatrix} 6431.69 & -2948.44 & 1146.58 & 2227.24 \\ -934.82 & -6754.86 & 2029.49 & 1822.19 \\ 0.58 & -1.42 & -0.74 & 1 \end{pmatrix}$$

Cuboid

Input Image	Annotated 2D points	Example Result

Brief explanation of the implementation:

Similar to the bunny, we solve for $P$ using the annotated 2D-3D correspondences.
Because it is a cube, we used the basic coordinates of the cube to annotate the 3D points: (0,0,1), (0,1,1), etc.

Resulting camera matrix:

$$P = \begin{pmatrix} 437.65 & -243.31 & -173.98 & 668.17 \\ -54.50 & -76.14 & -536.06 & 875.76 \\ 0.27 & 0.15 & -0.29 & 1 \end{pmatrix}$$

2. Camera calibration from annotations

(a) Camera calibration from vanishing points

We assume that the camera has zero skew and square pixels. This means the intrinsic matrix $K$ is of the form:

$$K = \begin{pmatrix} f & 0 & c_x \\ 0 & f & c_y \\ 0 & 0 & 1 \end{pmatrix}$$

and the image of the absolute conic (IAC) is of the form:

$$\omega = K^{-T}K^{-1} = \begin{pmatrix} \omega_1 & 0 & \omega_2 \\ 0 & \omega_1 & \omega_3 \\ \omega_2 & \omega_3 & 1 \end{pmatrix}$$

Here are the results from the provided annotations:

Input Image	Annotated Parallel Lines	Vanishing Points and Principal Point

Computed intrinsic matrix:

$$K = \begin{pmatrix} 1154.18 & 0 & 575.07 \\ 0 & 1154.18 & 431.94 \\ 0 & 0 & 1 \end{pmatrix}$$

Here are the results from my own annotations:

Input Image	Annotated Parallel Lines	Vanishing Points and Principal Point

Computed intrinsic matrix:

$$K = \begin{pmatrix} 1114.22 & 0 & 537.91 \\ 0 & 1114.22 & 184.07 \\ 0 & 0 & 1 \end{pmatrix}$$

One can observe that annotating the 3 pairs of parallel lines orthogonal to each other is delicate, as a slight error in the annotations can lead to a significant shift in the computed principal point and computed intrinsic matrix $K$.

Brief explanation of the implementation:

Compute the lines from the pairs of points: $\ell = p_1 \times p_2$.
Compute the vanishing points from the pairs of lines: $v = \ell_1 \times \ell_2$.
Compute the image of the absolute conic $\omega$ from the vanishing points by solving the equations $v_i^T \omega v_j = 0$. With 3 vanishing points, we have 3 equations and we get an equation of the form $A\cdot\operatorname{flatten}(\omega) = 0$ which we solve using SVD.
Compute the intrinsic matrix $K$ using the Cholesky decomposition: $\omega = LL^T$ and $K = L^{-T}$.
In this context, the principal point is simply the third column of $K$ (as viewed in $\mathbb{P}^2$).

(b) Camera calibration from metric planes

Input Image	Provided Annotations	Own Annotations

Angles between planes:

	Angle (Provided Annotations)	Angle (Own Annotations)
Plane 1 (lime) & Plane 2 (cyan)	67.60	66.86
Plane 1 (lime) & Plane 3 (yellow)	92.25	90.25
Plane 2 (cyan) & Plane 3 (yellow)	94.81	93.49

Computed intrinsic matrix:

Provided Annotations	Own Annotations
$$\begin{pmatrix}1085.88 & -14.73 & 520.56 \\ 0 & 1079.23 & 401.79 \\ 0 & 0 & 1\end{pmatrix}$$	$$\begin{pmatrix}1121.69 & -22.25 & 524.85 \\ 0 & 1108.18 & 405.77 \\ 0 & 0 & 1\end{pmatrix}$$

It seems that the computed intrinsic matrix is quite similar for both sets of annotations. The calibration using three squares may be robust to slight errors in the annotations.

Brief explanation of the implementation:

Compute the homography between the canonical square and the annotated squares. A detailed explanation can be found in the Assignment 1 - Projective Geometry and Homography.
Compute the image of the absolute conic $\omega$ from the homographies. Each homography $H = [h_1, h_2, h_3]$ gives us two equations: $h_1^T\omega h_2 = 0$ and $h_1^T\omega h_1 - h_2^T\omega h_2 = 0$. We get 6 equations from 3 homographies and solve for $\omega$ using SVD.
Compute the intrinsic matrix $K$ using the Cholesky decomposition: $\omega = LL^T$ and $K = L^{-T}$.
To compute the normal of a plane, we first compute the vanishing line $\ell$ from the two vanishing points given by the lines of the plane. The normal is given by: $n = K^T\ell$.
The angle between two planes is given by the angle between their normals: $\displaystyle \theta = \operatorname{arccos}\left(\frac{n_1\cdot n_2}{\|n_1\|\|n_2\|}\right)$.

Camera calibration from rectangles with known sizes

Input Image	Annotated Planes

Angles between planes:

	Angle
Plane 1 (lime) & Plane 2 (cyan)	87.54
Plane 1 (lime) & Plane 3 (yellow)	92.81
Plane 2 (cyan) & Plane 3 (yellow)	75.22

$$K = \begin{pmatrix} 680.44 & -31.95 & 494.04 \\ 0 & 693.68 & 389.42 \\ 0 & 0 & 1 \end{pmatrix}$$

Brief explanation of the implementation:

We follow the same steps as in part (b) to compute the intrinsic matrix $K$. The only difference is the first step: instead of computing the homography between the canonical square and the annotated squares, we compute the homography between the rectangles measured in real-world units and the annotated rectangles.

3. Single-view Reconstruction

Reconstruction

Input Image	Annotations

Here are the 3D reconstructed points:

View 1	View 2	View 3

Brief explanation of the implementation:

Compute the camera intrinsic matrix $K$ using three pairs of parallel lines orthogonal to each other. This is described in 2.(a) Camera calibration from vanishing points, assuming zero skew and square pixels.
Compute the normals of the planes as described in 2.(b) Camera calibration from metric planes.
Get the equation of the planes using the normal $\mathrm n$ and a point on the plane. To do this, we use a reference point and set its depth to 1.
Let $\mathrm x_0$ be the 2D reference point on the image.
The direction of the back-projected ray is given by $\mathrm u_0 = K^{-1}\mathrm x_0$.
The equation of the plane is given by $\pi = (n_1, n_2, n_3, -\mathrm n \cdot\mathrm u_0)^T$ where $\mathrm n = (n_1, n_2, n_3)$ is the normal of the plane.
For each plane, we create a grid of points on the plane delimited by the four corners of the plane, we store their image colors (using bilinear interpolation), and we compute the direction of the rays from the camera center to these points: $\mathrm u = K^{-1}\mathrm x$ (in a vectorized manner).
We compute the depth of each point on the plane: $d = -\frac{\pi_4}{\mathrm n\cdot\mathrm u}$.
Finally, we compute the 3D coordinates of the points on the plane: $\mathrm X = d\mathrm u$ (in a vectorized manner).
We repeat the above steps for all planes.

Animated Reconstruction


Input Image
Annotations
Animated Reconstruction