16-822 Geometry Based Methods for Vision

HW3: 3D reconstruction

Sivvani Muthusamy (Smuthusa)

Q1: (A1) Fundamental Matrix using the 8-Point Algorithm


The 8-point algorithm was implemented to estimate the fundamental matrix F. The procedure begins by normalizing the input point correspondences. The transformation matrix T is computed as follows:

T = [s, 0, -s * u0] 
[0, s, -s * v0]
[0, 0, 1 ]

where u0 and v0 are the mean of the points, and s = √2 / davg, where davg is the mean distance of all points from the mean. Given the equation x' F x = 0, we expand it to form a homogeneous system of equations A f = 0, where:

A = [u'u, u'v, u'w, v'u, v'v, v'w, w'u, w'v, w'w]

Here, (u, v, w) represents the coordinates of the x points, and (u', v', w') represents the x' points.

With the A matrix constructed, we apply Singular Value Decomposition (SVD) to solve for F. To enforce the rank-2 constraint on F, we recompute F by setting the smallest singular value to zero. Finally, we denormalize F using the transformation:

F = T'.T F T

This yields the fundamental matrix F using the 8-point algorithm.

Output images for the 8-point algorithm
Output Image 1
Epipolar lines visualization: Bench
Output Image 2
Epipolar lines visualization: Remote


Q1: (A2) Essential Matrix using the 8-Point Algorithm


The 8-point algorithm was implemented to estimate the fundamental matrix F. The procedure begins by normalizing the input point correspondences. The transformation matrix T is computed as follows:

E = K'.T F K

Below are the computed Fundamental matrix F and Essential matrix E:

Bench - F & E matrices

$$ F_{bench} = \begin{bmatrix} -1.19265833 \times 10^{-7} & -2.53255522 \times 10^{-6} & 2.07584109 \times 10^{-4} \\ -3.96465204 \times 10^{-7} & 2.27096267 \times 10^{-7} & 2.48867750 \times 10^{-2} \\ -1.06890748 \times 10^{-3} & -2.30052117 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ E_{bench} = \begin{bmatrix} -0.18528359 & -3.93441194 & -0.95959081 \\ -0.61592238 & 0.35280189 & 30.82505761 \\ -1.60139078 & -30.55364717 & 0.44620549 \end{bmatrix} $$

Remote - F & E matrices

$$ F_{remote} = \begin{bmatrix} 7.19006698 \times 10^{-7} & -1.90583125 \times 10^{-7} & 2.36215166 \times 10^{-3} \\ -1.88159055 \times 10^{-6} & -8.60510729 \times 10^{-7} & -8.45685523 \times 10^{-3} \\ -3.98058321 \times 10^{-3} & 9.46500248 \times 10^{-3} & 1.00000000 \end{bmatrix} $$

$$ E_{remote} = \begin{bmatrix} 1.01667679 & -0.26948489 & 2.9461563 \\ -2.66057249 & -1.21676375 & -11.37262789 \\ -5.82753537 & 10.6169466 & 0.40187929 \end{bmatrix} $$

Q1: (B) Estimating the Fundamental Matrix using the 7-Point Algorithm


The 7-point algorithm was implemented to estimate the fundamental matrix F using 7 annotatedcorrespondences. The process involves the following steps:

1. Normalization of Input Points:
Similar to the 8-point algorithm, we normalize the input correspondences using the transformation matrix T.

2. Constructing Matrix A:
We form the matrix A by arranging the point correspondences, and solve for f similar to the 8-point algorithm

3. Solving for F:
From the SVD of A, we obtain two linearly independent solutions, f1 and f2, which correspond to the last two singular vectors of A. We reshape f1 and f2 into F1 and F2 respectively, and represent the general solution for F as:

F = λ F1 + (1 - λ) F2

4. Enforcing the Determinant Constraint:
We ensure that the determinant of F equals zero, leading to a constraint on λ:

λ  s.t.  det(λ F1 + (1 - λ) F2) = 0

This results in a 3rd-degree polynomial equation, which we solve to find the values of λ. We use only the real roots for λ to estimate the candidate F matrices and then manually select the best F based on inspection.

Output images for the 7-point algorithm
Output Image 1
Epipolar lines visualization: Hydrant
Output Image 2
Epipolar lines visualization: Ball
Output Image 2
Epipolar lines visualization: Remote

Q2: RANSAC with 7-point and 8-point algorithm


This section describes the RANSAC (Random Sample Consensus) approach used to estimate the fundamental matrix F robustly from noisy point correspondences. The algorithm works as follows:

1. Random Sampling:
From the set of noisy point correspondences, we randomly sample either 8 points or 7 points, depending on whether we are using the 8-point or 7-point algorithm. These sampled points are used to compute a candidate fundamental matrix F.

2. Epipolar Line Calculation:
For each point correspondence, we compute the epipolar line l' for the point x using the equation:

l' = F x

3. Inlier Criterion:
To determine if a point correspondence is an inlier, we calculate the distance of the corresponding point x' from the epipolar line l'. If the distance is less than or equal to a threshold of 1 pixel, we classify the correspondence as an inlier.

4. Iterative Optimization:
We repeat this process for 10,000 iterations, each time keeping track of the F matrix that yields the maximum number of inliers. The best fundamental matrix F, denoted as best_F, is saved and used for further calculations, such as computing the epipolar lines.

Parameters Used:

Output 1: Epipolar lines visualization for bench dataset
Output Image 1
Output Image 1 - 8pt algorithm
Epipolar lines visualization using 8-point-algorithm: Bench
Output Image 1 - 7pt algorithm
Epipolar lines visualization using 7-point-algorithm: Bench

$$ F_{bench}^{8pt} = \begin{bmatrix} -8.01338303 \times 10^{-8} & -2.02014022 \times 10^{-6} & 3.38034847 \times 10^{-5} \\ -7.87234223 \times 10^{-7} & 4.16647155 \times 10^{-7} & 2.41760982 \times 10^{-2} \\ -9.13501992 \times 10^{-4} & -2.25066758 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ F_{bench}^{7pt} = \begin{bmatrix} -1.02686240 \times 10^{-7} & -2.35062669 \times 10^{-6} & 1.22586017 \times 10^{-4} \\ -5.02808132 \times 10^{-7} & 3.16711407 \times 10^{-7} & 2.45016212 \times 10^{-2} \\ -9.89684030 \times 10^{-4} & -2.27450007 \times 10^{-2} & 1.00000000 \end{bmatrix} $$



Output 2: Epipolar lines visualization for hydrant dataset
Output Image 2
Output Image 2 - 8pt algorithm
Epipolar lines visualization using 8-point-algorithm: Hydrant
Output Image 2 - 7pt algorithm
Epipolar lines visualization using 7-point-algorithm: Hydrant

$$ F_{hydrant}^{8pt} = \begin{bmatrix} 1.37245430 \times 10^{-7} & -2.67984464 \times 10^{-6} & -1.64254208 \times 10^{-3} \\ 4.14019202 \times 10^{-6} & 2.42918632 \times 10^{-7} & -1.60289551 \times 10^{-2} \\ 1.39698658 \times 10^{-3} & 1.55837236 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ F_{hydrant}^{7pt} = \begin{bmatrix} 9.90189221 \times 10^{-8} & -4.09103811 \times 10^{-6} & -1.28069038 \times 10^{-3} \\ 5.60723612 \times 10^{-6} & 6.54345054 \times 10^{-9} & -1.70458261 \times 10^{-2} \\ 9.97658582 \times 10^{-4} & 1.68424641 \times 10^{-2} & 1.00000000 \end{bmatrix} $$



Output 3: Epipolar lines visualization for ball dataset
Output Image 3
Output Image 3 - 8pt algorithm
Epipolar lines visualization using 8-point-algorithm: Ball
Output Image 3 - 7pt algorithm
Epipolar lines visualization using 7-point-algorithm: Ball

$$ F_{ball}^{8pt} = \begin{bmatrix} -3.50863651 \times 10^{-8} & -3.30393908 \times 10^{-6} & -6.80665335 \times 10^{-3} \\ 3.90470836 \times 10^{-6} & -1.01075778 \times 10^{-6} & -1.53874564 \times 10^{-2} \\ 9.18494513 \times 10^{-3} & 1.53662610 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ F_{ball}^{7pt} = \begin{bmatrix} -8.53620583 \times 10^{-8} & -3.25132628 \times 10^{-6} & -6.13973562 \times 10^{-3} \\ 3.83271522 \times 10^{-6} & -9.59729593 \times 10^{-7} & -1.47205563 \times 10^{-2} \\ 8.39020149 \times 10^{-3} & 1.47113120 \times 10^{-2} & 1.00000000 \end{bmatrix} $$


Q3: Triangulation


In the triangulation process, we aim to estimate the 3D point X given two image points x and x' in two views, with their respective projection matrices P and P'. The relationship between the image points and the 3D point X is expressed by the equations:

x = P X    and    x' = P' X

Using the cross product, we can express each equation as shown in the following form:

[x]× P X = 0    and    [x']× P' X = 0

Expanding this, we get two sets of equations for each view:

 [x]×  P X = 0  
[x']× P' X = 0

These two sets of equations provide us with six equations in total, but only four of them are linearly independent (two from each projection). Therefore, we can combine them into a single matrix A:

A = [ [x]× P, [x']× P' ]

Each pair of constraints from [x]× P yields two linearly independent equations, specifically:

-x₁ P₃ + P₁   and   y₁ P₃ - P₂

where P₁, P₂, and P₃ are the rows of the projection matrix P. Using these equations, we construct the A matrix. By applying Singular Value Decomposition (SVD) on A, we obtain the solution for the 3D point X.

Colored Point cloud using Triangulation
Triangulation

Q4: Reconstruction of own scene


Scene Reconstruction Output 1
Dataset Image
The dataset used for 3D reconstruction
3D Reconstruction of the Scene
Animated GIF showing the 3D reconstruction process of the scene
Scene Reconstruction Output 2
Dataset Image
The dataset used for 3D reconstruction
3D Reconstruction of the Scene
Animated GIF showing the 3D reconstruction process of the scene

Q5: Bonus 1 - Fundamental Matrix Estimation on Custom Images


In this bonus task, I implemented the SIFT (Scale-Invariant Feature Transform) detector to identify key features in pairs of custom images. The process involved the following steps:

1. Feature Detection using SIFT:
Using OpenCV's SIFT detector, I detected keypoints and computed descriptors for each image. This process allowed us to extract distinct, scale-invariant features that could be matched between image pairs.

2. RANSAC and Fundamental Matrix Estimation:
After detecting the features, I used RANSAC along with the 8-point algorithm to estimate the fundamental matrix F based on the inlier correspondences. This helped to eliminate outliers and provided a robust estimate of F.

3. Epipolar Lines Calculation:
Using the estimated fundamental matrix F, I computed the epipolar lines corresponding to the inlier points in each image. This provides a visual representation of the geometric relationship between the two views.

Output 1: Epipolar lines for own image
SIFT Inliers
Inliers detected using SIFT feature detector
Epipolar Lines for Inliers
Epipolar lines corresponding to the inliers
Output 2: Epipolar lines for own image
SIFT Inliers
Inliers detected using SIFT feature detector
Epipolar Lines for Inliers
Epipolar lines corresponding to the inliers

Q6: Bonus 2 - Stress Test of COLMAP Hyperparameters


In this section, I conducted two experiments with the COLMAP parameters to observe their impact on the 3D reconstruction quality:

1. Reducing the Number of Input Images:
For the Milan dataset, I reduced the number of input images from 50 to 24. This led to a much sparser point cloud in the reconstructed model, with the number of observed points decreasing to about half of the original. This demonstrates that fewer input images can significantly reduce the density and completeness of the reconstructed scene.

Sparse Reconstruction with Reduced Images
Sparse Reconstruction with Reduced Images
Point cloud after reducing input images to 24 (Milan dataset)

2. Increasing Feature Matching Tolerance:
I increased the feature matching tolerance by raising the `max_error` parameter during the exhaustive matching stage. This allowed more feature matches between images, potentially increasing the number of points in the reconstruction. However, it also introduces a risk of including more incorrect matches, which can degrade the quality of the point cloud by adding noise or inaccuracies.
Updated Parameters: sift_octaves=4, sift_edge_threshold=10, max_error=4.0

Reconstruction with Increased Feature Matching Tolerance
Reconstruction with Increased Tolerance
Point cloud with increased feature matching tolerance (Building dataset)