16-822 Geometry Based Methods for Vision

Q1: Camera matrix P from 2D-3D correspondences

The goal of determining the projection matrix P is to map 3D world coordinates X to 2D pixel coordinates x using a linear relationship. The relationship between these coordinates is given by the equation: x = P · X, where P is the projection matrix, X represents the 3D world coordinates, and x represents the 2D image coordinates.

1. Formulating the Problem
Given a set of point correspondences between 3D world points and 2D image points, we can express the relationship as a homogeneous system of equations. For each point correspondence, we have:

x = P · X

This can be written in the form A · p = 0, where A is a matrix constructed from the point correspondences, and p is the column vector of unknowns representing the elements of the projection matrix P.

2. Constructing the Matrix A
For each point correspondence between a 3D point X = [X Y Z W] and a 2D point x = [x y w], the matrix A is constructed similarly to the homography case. Each point correspondence provides two constraints, resulting in the following form for A:

[0    -w X.T   y X.T]
[w X.T   0    -x X.T]

Here, the matrix A is constructed by stacking the rows for each point correspondence, and it encodes the linear constraints on the projection matrix P.

3. Solving for the Projection Matrix Using SVD
Once the matrix A is constructed, we solve for the vector p using Singular Value Decomposition (SVD). The solution for p is the right singular vector corresponding to the smallest singular value of A. This vector is reshaped into the 3x4 projection matrix P.

By using point correspondences, we can thus easily compute the projection matrix P following a similar procedure to the one used for planar homography.

(a) Stanford Bunny

P = [ 6431.69 -2948.44  1146.58  2227.24] 
    [ -934.82 -6754.86  2029.49  1822.19] 
   [  0.58     -1.42    -0.74      1.00]

(b) Cuboid

 P = [-23.27  -0.39   2.65  315.86] 
     [-0.29  -16.87  -2.42  288.92] 
    [-0.00   -0.00   0.01    1.00]

Q2 (a): Camera calibration K from annotations

The goal of this question is to compute the intrinsic matrix K from a set of three orthogonal vanishing points, assuming zero skew and square pixels. The intrinsic matrix K can be estimated by leveraging the image of the absolute conic (IAC), denoted as ω. In the case of square pixels, ω takes the form:

        ω = [ w1  0   w2  ]
            [ 0   w1  w3  ]
            [ w2  w3  w4  ]

1. Relation Between Vanishing Points and ω
Given two orthogonal vanishing points v_i and v_j, we know that they are related by the constraint:

v_i.T ω v_j = 0

This means that for a pair of orthogonal vanishing points, their dot product with the conic ω must be zero.

2. Formulating the Problem as a Linear System
We can use the constraint between orthogonal vanishing points to form a linear system of equations. For each pair of vanishing points, we construct a row of the matrix A based on their coordinates. Let h₁ and h₂ be the two vanishing points. The corresponding row of A is:

[v1 * u1 + v2 * u2, v1 * u3 + v3 * u1, v2 * u3 + v3 * u2, v3 * u3]

Each pair of vanishing points provides one such row, and the three orthogonal pairs of vanishing points generate a 3x4 matrix A:

A · w = 0

Here, w is the vector of unknowns representing the elements of the IAC ω.

3. Solving for ω Using SVD
Once the matrix A is constructed from the vanishing points, we solve for the vector w using Singular Value Decomposition (SVD). The vector w corresponds to the null vector of A, which gives us the elements of the matrix ω.

4. Computing the Intrinsic Matrix K
Once ω is determined, the intrinsic matrix K is obtained by decomposing ω using Cholesky factorization. The matrix K is the inverse of the Cholesky factorization of ω, giving us the intrinsic parameters of the camera.

Output plots of the vanishing points and the principal point

Camera calibration using vanishing points

K = [1154.18    0.   575.07] 
    [   0.   1154.18 431.94] 
    [   0.      0.     1.  ]

Q2 (b): Camera calibration from metric planes

In this question, we aim to compute the camera intrinsic matrix K from the image of three squares. Unlike previous questions, we make no assumptions about K other than it being a projective camera.

1. Computing Homography for Each Square
For each square, we compute the homography H that maps the corner points (0, 0), (1, 0), (0, 1), and (1, 1) to their corresponding imaged points. Writing the homography as H = [h₁, h₂, h₃], the imaged circular points are given by h₁ ± ih₂.

2. Fitting a Conic ω
We fit a conic ω to the six imaged circular points. The constraint that the imaged circular points lie on ω is expressed as two real constraints. If h₁ ± ih₂ lies on ω, then the following must hold:

    (h₁ ± ih₂)^T ω (h₁ ± ih₂) = 0

The real and imaginary parts of this expression give us two constraints:

    h₁^T ω h₂ = 0
    h₁^T ω h₁ = h₂^T ω h₂

3. Solving for ω Using SVD
These two constraints form a set of linear equations that can be solved for ω using Singular Value Decomposition (SVD).

4. Computing the Calibration Matrix K
Finally, we compute the camera calibration matrix K from the conic ω using the relationship ω = (K K^T)^-1, and we obtain K via Cholesky factorization of ω followed by inversion.

Output visualizations of the annotations

 K = [1076.93  -4.53  511.57]
      [   0.00 1076.27  395.53]
      [   0.00    0.00    1.00]

Planes	Angle between planes(degree)
plane 1 and plane 2	67.28
plane 1 and plane 3	92.20
plane 2 and plane 3	94.71

Q2 (c): Camera calibration from rectangles with known sizes

This task involves annotating rectangles, where the homographies map the known dimensions of the rectangles to their corresponding image annotations. In this example, I utilized three notes of size 28 cm by 21 cm. By leveraging the known dimensions of these rectangles, I calculated the intrinsic camera matrix K.

Output visualizations of the annotations

K = [ 1137.31 -264.01  1141.66] 
    [  0.00   1209.27  959.21 ] 
   [  0.00      0.00     1.00]

Planes	Angle between planes(degree)
plane 1 and plane 2	59.35
plane 1 and plane 3	81.67
plane 2 and plane 3	86.00

Q3: Single View Reconstruction

The process of reconstructing a colored point cloud from a single image involves several key steps:

1. Intrinsic Matrix Calculation:
The intrinsic matrix K is computed using three orthogonal parallel lines in the image, which provide orthogonal vanishing points. We used the algorithm from Question 2a to compute the vanishing points and subsequently determine K.

2. Plane Normal Calculation:
For each plane in the image, we have two perpendicular parallel pairs. Using the vanishing points, the direction of the plane is computed using the formula:

d = K^-1 · v

The normal to the plane is obtained by taking the cross product of these two perpendicular directions.

3. Plane Equation:
After computing the normal to the plane, a reference point is selected. The plane equation is then determined using the formula:

n · X + a = 0

Here, X is the reference point, n is the plane normal, and a is computed using the known values of X and n.

4. 3D Point Calculation:
Once the plane equation is known, 3D points for all the pixel points on the plane are computed by projecting a ray from the camera to each pixel point and determining its intersection with the plane.

Output visualizations of the annotations and reconstructions