HW3: 3D reconstruction
Sivvani Muthusamy (Smuthusa)
The 8-point algorithm was implemented to estimate the fundamental matrix F. The procedure begins by normalizing the input point correspondences. The transformation matrix T is computed as follows:
T = [s, 0, -s * u0]
[0, s, -s * v0]
[0, 0, 1 ]
where u0 and v0 are the mean of the points, and s = √2 / davg, where davg is the mean distance of all points from the mean. Given the equation x' F x = 0, we expand it to form a homogeneous system of equations A f = 0, where:
A = [u'u, u'v, u'w, v'u, v'v, v'w, w'u, w'v, w'w]
Here, (u, v, w) represents the coordinates of the x points, and (u', v', w') represents the x' points.
With the A matrix constructed, we apply Singular Value Decomposition (SVD) to solve for F. To enforce the rank-2 constraint on F, we recompute F by setting the smallest singular value to zero. Finally, we denormalize F using the transformation:
F = T'.T F T
This yields the fundamental matrix F using the 8-point algorithm.
The 8-point algorithm was implemented to estimate the fundamental matrix F. The procedure begins by normalizing the input point correspondences. The transformation matrix T is computed as follows:
E = K'.T F K
Below are the computed Fundamental matrix F and Essential matrix E:
$$ F_{bench} = \begin{bmatrix} -1.19265833 \times 10^{-7} & -2.53255522 \times 10^{-6} & 2.07584109 \times 10^{-4} \\ -3.96465204 \times 10^{-7} & 2.27096267 \times 10^{-7} & 2.48867750 \times 10^{-2} \\ -1.06890748 \times 10^{-3} & -2.30052117 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
$$ E_{bench} = \begin{bmatrix} -0.18528359 & -3.93441194 & -0.95959081 \\ -0.61592238 & 0.35280189 & 30.82505761 \\ -1.60139078 & -30.55364717 & 0.44620549 \end{bmatrix} $$
$$ F_{remote} = \begin{bmatrix} 7.19006698 \times 10^{-7} & -1.90583125 \times 10^{-7} & 2.36215166 \times 10^{-3} \\ -1.88159055 \times 10^{-6} & -8.60510729 \times 10^{-7} & -8.45685523 \times 10^{-3} \\ -3.98058321 \times 10^{-3} & 9.46500248 \times 10^{-3} & 1.00000000 \end{bmatrix} $$
$$ E_{remote} = \begin{bmatrix} 1.01667679 & -0.26948489 & 2.9461563 \\ -2.66057249 & -1.21676375 & -11.37262789 \\ -5.82753537 & 10.6169466 & 0.40187929 \end{bmatrix} $$
The 7-point algorithm was implemented to estimate the fundamental matrix F using 7 annotatedcorrespondences. The process involves the following steps:
1. Normalization of Input Points:
Similar to the 8-point algorithm, we normalize the input correspondences using the transformation matrix T.
2. Constructing Matrix A:
We form the matrix A by arranging the point correspondences, and solve for f similar to the 8-point algorithm
3. Solving for F:
From the SVD of A, we obtain two linearly independent solutions, f1 and f2, which correspond to the last two singular vectors of A. We reshape f1 and f2 into F1 and F2 respectively, and represent the general solution for F as:
F = λ F1 + (1 - λ) F2
4. Enforcing the Determinant Constraint:
We ensure that the determinant of F equals zero, leading to a constraint on λ:
λ s.t. det(λ F1 + (1 - λ) F2) = 0
This results in a 3rd-degree polynomial equation, which we solve to find the values of λ. We use only the real roots for λ to estimate the candidate F matrices and then manually select the best F based on inspection.
This section describes the RANSAC (Random Sample Consensus) approach used to estimate the fundamental matrix F robustly from noisy point correspondences. The algorithm works as follows:
1. Random Sampling:
From the set of noisy point correspondences, we randomly sample either 8 points or 7 points, depending on whether we are using the 8-point or 7-point algorithm. These sampled points are used to compute a candidate fundamental matrix F.
2. Epipolar Line Calculation:
For each point correspondence, we compute the epipolar line l' for the point x using the equation:
l' = F x
3. Inlier Criterion:
To determine if a point correspondence is an inlier, we calculate the distance of the corresponding point x' from the epipolar line l'. If the distance is less than or equal to a threshold of 1 pixel, we classify the correspondence as an inlier.
4. Iterative Optimization:
We repeat this process for 10,000 iterations, each time keeping track of the F matrix that yields the maximum number of inliers. The best fundamental matrix F, denoted as best_F, is saved and used for further calculations, such as computing the epipolar lines.
Parameters Used:
$$ F_{bench}^{8pt} = \begin{bmatrix} -8.01338303 \times 10^{-8} & -2.02014022 \times 10^{-6} & 3.38034847 \times 10^{-5} \\ -7.87234223 \times 10^{-7} & 4.16647155 \times 10^{-7} & 2.41760982 \times 10^{-2} \\ -9.13501992 \times 10^{-4} & -2.25066758 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
$$ F_{bench}^{7pt} = \begin{bmatrix} -1.02686240 \times 10^{-7} & -2.35062669 \times 10^{-6} & 1.22586017 \times 10^{-4} \\ -5.02808132 \times 10^{-7} & 3.16711407 \times 10^{-7} & 2.45016212 \times 10^{-2} \\ -9.89684030 \times 10^{-4} & -2.27450007 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
$$ F_{hydrant}^{8pt} = \begin{bmatrix} 1.37245430 \times 10^{-7} & -2.67984464 \times 10^{-6} & -1.64254208 \times 10^{-3} \\ 4.14019202 \times 10^{-6} & 2.42918632 \times 10^{-7} & -1.60289551 \times 10^{-2} \\ 1.39698658 \times 10^{-3} & 1.55837236 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
$$ F_{hydrant}^{7pt} = \begin{bmatrix} 9.90189221 \times 10^{-8} & -4.09103811 \times 10^{-6} & -1.28069038 \times 10^{-3} \\ 5.60723612 \times 10^{-6} & 6.54345054 \times 10^{-9} & -1.70458261 \times 10^{-2} \\ 9.97658582 \times 10^{-4} & 1.68424641 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
$$ F_{ball}^{8pt} = \begin{bmatrix} -3.50863651 \times 10^{-8} & -3.30393908 \times 10^{-6} & -6.80665335 \times 10^{-3} \\ 3.90470836 \times 10^{-6} & -1.01075778 \times 10^{-6} & -1.53874564 \times 10^{-2} \\ 9.18494513 \times 10^{-3} & 1.53662610 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
$$ F_{ball}^{7pt} = \begin{bmatrix} -8.53620583 \times 10^{-8} & -3.25132628 \times 10^{-6} & -6.13973562 \times 10^{-3} \\ 3.83271522 \times 10^{-6} & -9.59729593 \times 10^{-7} & -1.47205563 \times 10^{-2} \\ 8.39020149 \times 10^{-3} & 1.47113120 \times 10^{-2} & 1.00000000 \end{bmatrix} $$
In the triangulation process, we aim to estimate the 3D point X given two image points x and x' in two views, with their respective projection matrices P and P'. The relationship between the image points and the 3D point X is expressed by the equations:
x = P X and x' = P' X
Using the cross product, we can express each equation as shown in the following form:
[x]× P X = 0 and [x']× P' X = 0
Expanding this, we get two sets of equations for each view:
[x]× P X = 0
[x']× P' X = 0
These two sets of equations provide us with six equations in total, but only four of them are linearly independent (two from each projection). Therefore, we can combine them into a single matrix A:
A = [ [x]× P, [x']× P' ]
Each pair of constraints from [x]× P yields two linearly independent equations, specifically:
-x₁ P₃ + P₁ and y₁ P₃ - P₂
where P₁, P₂, and P₃ are the rows of the projection matrix P. Using these equations, we construct the A matrix. By applying Singular Value Decomposition (SVD) on A, we obtain the solution for the 3D point X.
In this bonus task, I implemented the SIFT (Scale-Invariant Feature Transform) detector to identify key features in pairs of custom images. The process involved the following steps:
1. Feature Detection using SIFT:
Using OpenCV's SIFT detector, I detected keypoints and computed descriptors for each image. This process allowed us to extract distinct, scale-invariant features that could be matched between image pairs.
2. RANSAC and Fundamental Matrix Estimation:
After detecting the features, I used RANSAC along with the 8-point algorithm to estimate the fundamental matrix F based on the inlier correspondences. This helped to eliminate outliers and provided a robust estimate of F.
3. Epipolar Lines Calculation:
Using the estimated fundamental matrix F, I computed the epipolar lines corresponding to the inlier points in each image. This provides a visual representation of the geometric relationship between the two views.
In this section, I conducted two experiments with the COLMAP parameters to observe their impact on the 3D reconstruction quality:
1. Reducing the Number of Input Images:
For the Milan dataset, I reduced the number of input images from 50 to 24. This led to a much sparser point cloud in the reconstructed model, with the number of observed points decreasing to about half of the original. This demonstrates that fewer input images can significantly reduce the density and completeness of the reconstructed scene.
2. Increasing Feature Matching Tolerance:
I increased the feature matching tolerance by raising the `max_error` parameter during the exhaustive matching stage. This allowed more feature matches between images, potentially increasing the number of points in the reconstruction. However, it also introduces a risk of including more incorrect matches, which can degrade the quality of the point cloud by adding noise or inaccuracies.
Updated Parameters: sift_octaves=4, sift_edge_threshold=10, max_error=4.0