16-822 Geometry-based Methods in Vision

Assignment 3: 3D Reconstruction

Qitao Zhao (qitaoz), Fall 2024

Q1: 8-point and 7-point algorithm

(A1) F matrix using 8-point algorithm

The 8-point algorithm estimates the fundamental matrix ( F ) from eight or more point correspondences between two images. Each correspondence between points (x, y) in the first image and (x', y') in the second image contributes one row to the matrix ( A ), as shown below:

Given correspondences:

(x_i, y_i): coordinates in the first image
(x'_i, y'_i): coordinates in the second image

The matrix ( A ) is constructed as follows:

\begin{matrix} (1) & \begin{matrix} A = [\begin{matrix} x_{1}^{'} x_{1} & x_{1}^{'} y_{1} & x_{1}^{'} & y_{1}^{'} x_{1} & y_{1}^{'} y_{1} & y_{1}^{'} & x_{1} & y_{1} & 1 \\ x_{2}^{'} x_{2} & x_{2}^{'} y_{2} & x_{2}^{'} & y_{2}^{'} x_{2} & y_{2}^{'} y_{2} & y_{2}^{'} & x_{2} & y_{2} & 1 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{n}^{'} x_{n} & x_{n}^{'} y_{n} & x_{n}^{'} & y_{n}^{'} x_{n} & y_{n}^{'} y_{n} & y_{n}^{'} & x_{n} & y_{n} & 1 \end{matrix}] \end{matrix} \end{matrix}

Each row of ( A ) $\times$ 9 ) $\geq$ 8 ) (the number of point correspondences).

Steps in the Algorithm

Linear Solution: Solve for ( F ) by finding the null space of ( A ), specifically the vector ( f ) corresponding to ( A )'s smallest singular value. Reshape ( f ) $\times$ 3 ) matrix, yielding an initial estimate of ( F ).
Constraint Enforcement: Enforce the rank-2 constraint on ( F ) by adjusting it to its nearest rank-2 approximation using Singular Value Decomposition (SVD). This ensures that ( F ) represents a valid fundamental matrix.

For the bench example:
[[ 1.19197327e-07 2.53110054e-06 -2.07464874e-04]
[ 3.96237477e-07 -2.26965824e-07 -2.48724802e-02]
[ 1.06829351e-03 2.29919977e-02 -9.99425607e-01]]
For the remote example:
[[-7.18941087e-07 1.90565734e-07 -2.36193611e-03]
[ 1.88141885e-06 8.60432205e-07 8.45608352e-03]
[ 3.98021997e-03 -9.46413878e-03 -9.99908748e-01]]

(A2) E matrix using 8-point algorithm

Given the fundamental matrix we computed above, the essential matrix is given by:
$\begin{matrix} (2) & E = K^{' T} F K \end{matrix}$
For the bench example:
[[ 0.18517716 3.93215204 0.95903963]
[ 0.6155686 -0.35259924 -30.80735191]
[ 1.60047095 30.53609736 -0.44594919]]
For the remote example:
[[ -1.01658402 0.2694603 -2.94588746]
[ 2.66032971 1.21665272 11.37159011]
[ 5.82700359 -10.61597778 -0.40184261]]

(B) 7-point algorithm

Steps in the 7-point Algorithm

Normalization of Points $\sqrt{2}$ ) from the origin. The normalization matrices are denoted by ( T ) and ( T' ) for each image.
Construct Matrix ( A ): Use the normalized points to construct the matrix ( A ). For each correspondence pair ((x, y)) and ((x', y')), create a row in ( A ) as the Kronecker product of the points:
$\begin{matrix} (3) & \begin{matrix} A = [\begin{matrix} x_{1}^{'} x_{1} & x_{1}^{'} y_{1} & x_{1}^{'} & y_{1}^{'} x_{1} & y_{1}^{'} y_{1} & y_{1}^{'} & x_{1} & y_{1} & 1 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ x_{7}^{'} x_{7} & x_{7}^{'} y_{7} & x_{7}^{'} & y_{7}^{'} x_{7} & y_{7}^{'} y_{7} & y_{7}^{'} & x_{7} & y_{7} & 1 \end{matrix}] \end{matrix} \end{matrix}$
Singular Value Decomposition (SVD): Perform SVD on ( A ) to find the fundamental matrices ( F_1 ) and ( F_2 ), which correspond to the last two columns of the null space of ( A ) (since ( A ) $\times$ 9 )).
Linear Combination: Compute the fundamental matrix ( F ) as a linear combination of ( F_1 ) and ( F_2 ):
$\begin{matrix} (4) & F = α F_{1} + (1 - α) F_{2} \end{matrix}$
$\alpha$ ) $\det(F)$ = 0 ) $\alpha$ ), which can have up to three real roots. Choose the real root that satisfies the rank-2 constraint on ( F ).
Unnormalization: Convert ( F ) back to the original coordinate system by unnormalizing it:
$\begin{matrix} (5) & F = T^{' ⊤} F T \end{matrix}$
where ( T ) and ( T' ) are the normalization matrices for the first and second image, respectively.

This process yields up to several possible solutions for ( F ), from which the most suitable one is chosen based on additional criteria, if available. The 7-point algorithm is especially useful when exactly seven point correspondences are available, providing a reliable estimate of the fundamental matrix.

Fundamental matrix:
- For the hydrant example:
[[ 1.06836663e-07 -3.30854241e-06 -1.37397509e-03]
[ 4.79268901e-06 1.63305018e-07 -1.67456794e-02]
[ 1.12974017e-03 1.64136613e-02 1.00000000e+00]]
- For the ball example:
[[-3.35078059e-09+0.j -2.91545005e-06+0.j -6.74891733e-03+0.j]
[ 3.51682683e-06+0.j -8.84237457e-07+0.j -1.50044163e-02+0.j]
[ 9.09398538e-03+0.j 1.48748649e-02+0.j 1.00000000e+00+0.j]]

Essential matrix:
- For the hydrant example:
[[ 0.15857475 -4.91077927 -4.16191946]
[ 7.1136576 0.24238918 -18.26768512]
[ 5.06442012 18.65933802 1.11386097]]
- For the ball example:
[[-1.29362504e-02+0.j -1.12555838e+01+0.j -1.86746229e+01+0.j]
[ 1.35772998e+01+0.j -3.41374700e+00+0.j -2.74530807e+01+0.j]
[ 2.43843346e+01+0.j 2.45414016e+01+0.j 1.64166858e+00+0.j]]

Q2: RANSAC with 7-point and 8-point algorithm

Steps in the Algorithm:

Random Sampling:
- Randomly sample 7 or 8 point correspondences to compute a candidate fundamental matrix ( F ) using either the 7-point or 8-point algorithm.
Reprojection and Inlier Counting:
- Use ( F ) to reproject points from the second image onto the first (or vice versa), calculating the epipolar line ( l_2 ).
- $\text{pts2}$ ) to the epipolar line ( l_2 ) and count inliers where the sum of distances is below a threshold (e.g., 2 pixels).
Inlier Ratio Evaluation:
- Compute the ratio of inliers to total correspondences. If the ratio is higher than any previous iteration, update ( F ) as the best estimate.
Iterate and Select Best Model:
- Repeat this process, retaining the ( F ) that maximizes the inlier ratio, providing a robust estimate of the fundamental matrix that is less sensitive to outliers.

Here we use Hydrant as an example.

7-point Algorithm

The estimated fundamental matrix:

[[ 1.30066947e-07 -2.67447440e-06 -1.73480584e-03]

[ 4.08684223e-06 2.46604403e-07 -1.62214550e-02]

[ 1.54457266e-03 1.57896599e-02 1.00000000e+00]]

The estimated essential matrix:

[[ 0.19305482 -3.96964942 -4.10784484]

[ 6.06598847 0.36602819 -17.86717574]

[ 5.04148418 18.23486105 1.09006968]]

8-point Algorithm

The estimated fundamental matrix:

[[-1.29431630e-07 2.43170768e-06 1.76940192e-03]

[-3.81790712e-06 -2.81330608e-07 1.57887795e-02]

[-1.61273104e-03 -1.52890588e-02 -9.99755584e-01]]

The estimated essential matrix:

[[ -0.19211184 3.6093174 3.96475579]

[ -5.66681543 -0.41757135 17.42778055]

[ -4.91945338 -17.7564847 -1.06676404]]

Q3: Triangulation

Steps in the Triangulation Algorithm

Input Data:
- Use the projection matrices ( P_1 ) and ( P_2 ) of the two cameras, along with matched 2D points ( x_1 ) and ( x_2 ) in each image.
Formulate Equations:
- For each matched pair of points ( x_1 ) and ( x_2 ), define the corresponding 3D point ( X ) using the relationships ( P_1 X = x_1 ) and ( P_2 X = x_2 ).
- $\times$ x_1 = 0 ) $\times$ x_2 = 0 ), which express that ( X ) should lie on the epipolar lines induced by the projections.
Stack Equations into Matrix ( A ):
- Combine these equations into a single matrix equation ( A X = 0 ), where ( A ) is constructed as:
  $\begin{matrix} (6) & \begin{matrix} A = [\begin{matrix} x_{1}^{T} P_{1} (3, :) - P_{1} (1, :) \\ x_{1}^{T} P_{1} (3, :) - P_{1} (2, :) \\ x_{2}^{T} P_{2} (3, :) - P_{2} (1, :) \\ x_{2}^{T} P_{2} (3, :) - P_{2} (2, :) \end{matrix}] \end{matrix} \end{matrix}$
Here, ( P_1(i, :) ) and ( P_2(i, :) ) represent the ( i )-th row of the respective projection matrices, and each row in ( A ) corresponds to one of the projection constraints.
Solve with Singular Value Decomposition (SVD):
- Perform SVD on ( A ), and obtain the 3D point ( X ) as the last column of the matrix ( V ) (from the decomposition ( A = U \Sigma V^T )), which minimizes the reprojection error.

The output of this system is as follows:

Q4: Reconstruct your own scene!

Here we tried COLMAP on two scenes:

Hydrant:

Due to the limited space, we only show 4 of the input images (~50 in total)

Loading animation

Teddybear:

Due to the limited space, we only show 6 of the input images (~50 in total)

Loading animation

Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)

Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)