16-822 Geometry Based Methods for Vision

Q1: (A1) Fundamental Matrix using the 8-Point Algorithm

The 8-point algorithm was implemented to estimate the fundamental matrix F. The procedure begins by normalizing the input point correspondences. The transformation matrix T is computed as follows:

T = [s, 0, -s * u0] 
    [0, s, -s * v0] 
   [0, 0,      1 ]

where u₀ and v₀ are the mean of the points, and s = √2 / d_avg, where d_avg is the mean distance of all points from the mean. Given the equation x' F x = 0, we expand it to form a homogeneous system of equations A f = 0, where:

A = [u'u, u'v, u'w, v'u, v'v, v'w, w'u, w'v, w'w]

Here, (u, v, w) represents the coordinates of the x points, and (u', v', w') represents the x' points.

With the A matrix constructed, we apply Singular Value Decomposition (SVD) to solve for F. To enforce the rank-2 constraint on F, we recompute F by setting the smallest singular value to zero. Finally, we denormalize F using the transformation:

F = T'.T F T

This yields the fundamental matrix F using the 8-point algorithm.

Output images for the 8-point algorithm

Epipolar lines visualization: Bench

Epipolar lines visualization: Remote

Q1: (A2) Essential Matrix using the 8-Point Algorithm

The 8-point algorithm was implemented to estimate the fundamental matrix F. The procedure begins by normalizing the input point correspondences. The transformation matrix T is computed as follows:

E = K'.T F K

Below are the computed Fundamental matrix F and Essential matrix E:

Bench - F & E matrices

$$ F_{bench} = \begin{bmatrix} -1.19265833 \times 10^{-7} & -2.53255522 \times 10^{-6} & 2.07584109 \times 10^{-4} \\ -3.96465204 \times 10^{-7} & 2.27096267 \times 10^{-7} & 2.48867750 \times 10^{-2} \\ -1.06890748 \times 10^{-3} & -2.30052117 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ E_{bench} = \begin{bmatrix} -0.18528359 & -3.93441194 & -0.95959081 \\ -0.61592238 & 0.35280189 & 30.82505761 \\ -1.60139078 & -30.55364717 & 0.44620549 \end{bmatrix} $$

Remote - F & E matrices

$$ F_{remote} = \begin{bmatrix} 7.19006698 \times 10^{-7} & -1.90583125 \times 10^{-7} & 2.36215166 \times 10^{-3} \\ -1.88159055 \times 10^{-6} & -8.60510729 \times 10^{-7} & -8.45685523 \times 10^{-3} \\ -3.98058321 \times 10^{-3} & 9.46500248 \times 10^{-3} & 1.00000000 \end{bmatrix} $$

$$ E_{remote} = \begin{bmatrix} 1.01667679 & -0.26948489 & 2.9461563 \\ -2.66057249 & -1.21676375 & -11.37262789 \\ -5.82753537 & 10.6169466 & 0.40187929 \end{bmatrix} $$

Q1: (B) Estimating the Fundamental Matrix using the 7-Point Algorithm

The 7-point algorithm was implemented to estimate the fundamental matrix F using 7 annotatedcorrespondences. The process involves the following steps:

1. Normalization of Input Points:
Similar to the 8-point algorithm, we normalize the input correspondences using the transformation matrix T.

2. Constructing Matrix A:
We form the matrix A by arranging the point correspondences, and solve for f similar to the 8-point algorithm

3. Solving for F:
From the SVD of A, we obtain two linearly independent solutions, f₁ and f₂, which correspond to the last two singular vectors of A. We reshape f₁ and f₂ into F₁ and F₂ respectively, and represent the general solution for F as:

F = λ F₁ + (1 - λ) F₂

4. Enforcing the Determinant Constraint:
We ensure that the determinant of F equals zero, leading to a constraint on λ:

λ  s.t.  det(λ F₁ + (1 - λ) F₂) = 0

This results in a 3rd-degree polynomial equation, which we solve to find the values of λ. We use only the real roots for λ to estimate the candidate F matrices and then manually select the best F based on inspection.

Output images for the 7-point algorithm

Epipolar lines visualization: Hydrant

Epipolar lines visualization: Ball

Epipolar lines visualization: Remote

Q2: RANSAC with 7-point and 8-point algorithm

This section describes the RANSAC (Random Sample Consensus) approach used to estimate the fundamental matrix F robustly from noisy point correspondences. The algorithm works as follows:

1. Random Sampling:
From the set of noisy point correspondences, we randomly sample either 8 points or 7 points, depending on whether we are using the 8-point or 7-point algorithm. These sampled points are used to compute a candidate fundamental matrix F.

2. Epipolar Line Calculation:
For each point correspondence, we compute the epipolar line l' for the point x using the equation:

l' = F x

3. Inlier Criterion:
To determine if a point correspondence is an inlier, we calculate the distance of the corresponding point x' from the epipolar line l'. If the distance is less than or equal to a threshold of 1 pixel, we classify the correspondence as an inlier.

4. Iterative Optimization:
We repeat this process for 10,000 iterations, each time keeping track of the F matrix that yields the maximum number of inliers. The best fundamental matrix F, denoted as best_F, is saved and used for further calculations, such as computing the epipolar lines.

Parameters Used:

Threshold for inliers: 1 pixel
Number of iterations: 10,000

Output 1: Epipolar lines visualization for bench dataset

Epipolar lines visualization using 8-point-algorithm: Bench

Epipolar lines visualization using 7-point-algorithm: Bench

$$ F_{bench}^{8pt} = \begin{bmatrix} -8.01338303 \times 10^{-8} & -2.02014022 \times 10^{-6} & 3.38034847 \times 10^{-5} \\ -7.87234223 \times 10^{-7} & 4.16647155 \times 10^{-7} & 2.41760982 \times 10^{-2} \\ -9.13501992 \times 10^{-4} & -2.25066758 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ F_{bench}^{7pt} = \begin{bmatrix} -1.02686240 \times 10^{-7} & -2.35062669 \times 10^{-6} & 1.22586017 \times 10^{-4} \\ -5.02808132 \times 10^{-7} & 3.16711407 \times 10^{-7} & 2.45016212 \times 10^{-2} \\ -9.89684030 \times 10^{-4} & -2.27450007 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

Output 2: Epipolar lines visualization for hydrant dataset

Epipolar lines visualization using 8-point-algorithm: Hydrant

Epipolar lines visualization using 7-point-algorithm: Hydrant

$$ F_{hydrant}^{8pt} = \begin{bmatrix} 1.37245430 \times 10^{-7} & -2.67984464 \times 10^{-6} & -1.64254208 \times 10^{-3} \\ 4.14019202 \times 10^{-6} & 2.42918632 \times 10^{-7} & -1.60289551 \times 10^{-2} \\ 1.39698658 \times 10^{-3} & 1.55837236 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ F_{hydrant}^{7pt} = \begin{bmatrix} 9.90189221 \times 10^{-8} & -4.09103811 \times 10^{-6} & -1.28069038 \times 10^{-3} \\ 5.60723612 \times 10^{-6} & 6.54345054 \times 10^{-9} & -1.70458261 \times 10^{-2} \\ 9.97658582 \times 10^{-4} & 1.68424641 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

Output 3: Epipolar lines visualization for ball dataset

Epipolar lines visualization using 8-point-algorithm: Ball

Epipolar lines visualization using 7-point-algorithm: Ball

$$ F_{ball}^{8pt} = \begin{bmatrix} -3.50863651 \times 10^{-8} & -3.30393908 \times 10^{-6} & -6.80665335 \times 10^{-3} \\ 3.90470836 \times 10^{-6} & -1.01075778 \times 10^{-6} & -1.53874564 \times 10^{-2} \\ 9.18494513 \times 10^{-3} & 1.53662610 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

$$ F_{ball}^{7pt} = \begin{bmatrix} -8.53620583 \times 10^{-8} & -3.25132628 \times 10^{-6} & -6.13973562 \times 10^{-3} \\ 3.83271522 \times 10^{-6} & -9.59729593 \times 10^{-7} & -1.47205563 \times 10^{-2} \\ 8.39020149 \times 10^{-3} & 1.47113120 \times 10^{-2} & 1.00000000 \end{bmatrix} $$

Q3: Triangulation

In the triangulation process, we aim to estimate the 3D point X given two image points x and x' in two views, with their respective projection matrices P and P'. The relationship between the image points and the 3D point X is expressed by the equations:

x = P X    and    x' = P' X

Using the cross product, we can express each equation as shown in the following form:

[x]× P X = 0    and    [x']× P' X = 0

Expanding this, we get two sets of equations for each view:

 [x]×  P X = 0  
 [x']× P' X = 0

These two sets of equations provide us with six equations in total, but only four of them are linearly independent (two from each projection). Therefore, we can combine them into a single matrix A:

A = [ [x]× P, [x']× P' ]

Each pair of constraints from [x]× P yields two linearly independent equations, specifically:

-x₁ P₃ + P₁   and   y₁ P₃ - P₂

where P₁, P₂, and P₃ are the rows of the projection matrix P. Using these equations, we construct the A matrix. By applying Singular Value Decomposition (SVD) on A, we obtain the solution for the 3D point X.

Colored Point cloud using Triangulation

Q5: Bonus 1 - Fundamental Matrix Estimation on Custom Images

In this bonus task, I implemented the SIFT (Scale-Invariant Feature Transform) detector to identify key features in pairs of custom images. The process involved the following steps:

1. Feature Detection using SIFT:
Using OpenCV's SIFT detector, I detected keypoints and computed descriptors for each image. This process allowed us to extract distinct, scale-invariant features that could be matched between image pairs.

2. RANSAC and Fundamental Matrix Estimation:
After detecting the features, I used RANSAC along with the 8-point algorithm to estimate the fundamental matrix F based on the inlier correspondences. This helped to eliminate outliers and provided a robust estimate of F.

3. Epipolar Lines Calculation:
Using the estimated fundamental matrix F, I computed the epipolar lines corresponding to the inlier points in each image. This provides a visual representation of the geometric relationship between the two views.

Output 1: Epipolar lines for own image

Inliers detected using SIFT feature detector

Epipolar lines corresponding to the inliers

Output 2: Epipolar lines for own image

Inliers detected using SIFT feature detector

Epipolar lines corresponding to the inliers

Q6: Bonus 2 - Stress Test of COLMAP Hyperparameters

In this section, I conducted two experiments with the COLMAP parameters to observe their impact on the 3D reconstruction quality:

1. Reducing the Number of Input Images:
For the Milan dataset, I reduced the number of input images from 50 to 24. This led to a much sparser point cloud in the reconstructed model, with the number of observed points decreasing to about half of the original. This demonstrates that fewer input images can significantly reduce the density and completeness of the reconstructed scene.

Sparse Reconstruction with Reduced Images

Point cloud after reducing input images to 24 (Milan dataset)

2. Increasing Feature Matching Tolerance:
I increased the feature matching tolerance by raising the `max_error` parameter during the exhaustive matching stage. This allowed more feature matches between images, potentially increasing the number of points in the reconstruction. However, it also introduces a risk of including more incorrect matches, which can degrade the quality of the point cloud by adding noise or inaccuracies.
Updated Parameters: sift_octaves=4, sift_edge_threshold=10, max_error=4.0

Reconstruction with Increased Feature Matching Tolerance

Point cloud with increased feature matching tolerance (Building dataset)