Assignment 3 - prabhdes - 16-822: Geometry-based Methods in Vision

Q1: 8-point and 7-point algorithm

(A1) F matrix using 8-point algorithm

Explanation:

The 8-point algorithm estimates the fundamental matrix F, which relates corresponding points in two images, defined by x'^T F x = 0 where x' and x are corresponding points in each image. Here is a concise description of the steps:

Normalize Points: Scale the points in both images to have zero mean and unit variance using transformation matrices T and T'. This normalization improves numerical stability.
Construct Matrix A: For each correspondence (x,x'), form a linear equation and stack these equations to create matrix A, which relates F to the normalized points.
Solve for F: Use Singular Value Decomposition (SVD) on A to obtain the last singular vector, which forms the initial estimate of F.
Enforce Rank-2 Constraint: To ensure F has rank 2, perform SVD on F and set the smallest singular value to zero.
Denormalize: Reverse the normalization by applying the transformations, yielding the final fundamental matrix F = T^' F T
Scale-Invariant Normalization: Normalize F so that F[2, 2] = 1

Sequence Name	Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)	Fundamental Matrix
Bench			[[-1.19265833e-07 -2.53255522e-06 2.07584109e-04] [-3.96465204e-07 2.27096267e-07 2.48867750e-02] [-1.06890748e-03 -2.30052117e-02 1.00000000e+00]]
Remote			[[ 7.19006698e-07 -1.90583125e-07 2.36215166e-03] [-1.88159055e-06 -8.60510729e-07 -8.45685523e-03] [-3.98058321e-03 9.46500248e-03 1.00000000e+00]]

(A2) E matrix using 8-point algorithm

Explanation:

We obtain E = K2^T F K1

Sequence Name	Essential Matrix
Bench	[[ -0.18528359 -3.93441194 -0.95959081] [ -0.61592238 0.35280189 30.82505761] [ -1.60139078 -30.55364717 0.44620549]]
Remote	[[ 1.01667679 -0.26948489 2.9461563 ] [ -2.66057249 -1.21676375 -11.37262789] [ -5.82753537 10.6169466 0.40187929]]

(B) 7-point algorithm

Explanation:

The 7-point algorithm estimates the fundamental matrix F using only seven point correspondences, leveraging the fact that F has seven degrees of freedom. Here’s a streamlined explanation of the approach:

Normalize Points: Scale points in each image to have zero mean and unit variance using transformation matrices T and T'. This enhances numerical stability.
Construct Matrix A: For each correspondence, set up linear equations based on the relationship x^T F x = 0. Stack these equations to form matrix A, representing the constraints on F.
Solve for F: Use Singular Value Decomposition (SVD) on A to get two candidate solutions for F, denoted as F1 and F2.
Linear Combination: Since F is a linear combination of F1 and F2, express F as F = alpha x F1 + (1 - alpha) * F2. Solve for alpha by imposing the singularity constraint det(F)=0, which results in a cubic equation.
Select Real Root(s): Solve the cubic equation, retaining only the real root(s) for alpha, and construct the corresponding fundamental matrix F.
Enforce Rank-2 Constraint: Ensure F has rank 2 by performing SVD on F and setting the smallest singular value to zero.
Denormalize: Reverse the normalization step by applying the transformation matrices, yielding the final fundamental matrix F = T^' F T
Scale-Invariant Normalization: Normalize F so that F[2, 2] = 1

Sequence Name	Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)	Fundamental Matrix	Fundamental Matrix
Bench			[[-3.35078059e-09 -2.91545005e-06 -6.74891733e-03] [ 3.51682683e-06 -8.84237457e-07 -1.50044163e-02] [ 9.09398538e-03 1.48748649e-02 1.00000000e+00]]	[[-1.29362504e-02 -1.12555838e+01 -1.86746229e+01] [ 1.35772998e+01 -3.41374700e+00 -2.74530807e+01] [ 2.43843346e+01 2.45414016e+01 1.64166858e+00]]
Remote			[[ 1.06836663e-07 -3.30854241e-06 -1.37397509e-03] [ 4.79268901e-06 1.63305018e-07 -1.67456794e-02] [ 1.12974017e-03 1.64136613e-02 1.00000000e+00]]	[[ 0.15857475 -4.91077927 -4.16191946] [ 7.1136576 0.24238918 -18.26768512] [ 5.06442012 18.65933802 1.11386097]]

Q2: RANSAC with 7-point and 8-point algorithm

Explanation:

The RANSAC (Random Sample Consensus) algorithm is used to robustly estimate the fundamental matrix F by iteratively sampling correspondences and identifying the best model based on inliers. Here’s an overview of the process:

Random Sampling: For each iteration, randomly select 7 or 8 point correspondences to estimate F using either the 7-point or 8-point algorithm.
Fundamental Matrix Estimation: Calculate F for the selected points and use it to map points between images.
Inlier Counting: For each estimated F, compute the distances of all correspondences to the epipolar lines defined by F. Points are considered inliers if their distance from the epipolar line is within a predefined threshold.
Threshold Adaptation: Adjust the inlier threshold based on dataset characteristics. For most datasets, a threshold of 2 pixels is used, except for high-noise datasets, where it is increased to 15 pixels.
Iterative Refinement: Repeat steps 1-4 over 10,000 iterations, keeping track of the model with the highest inlier count.
Final Model Selection: Select the fundamental matrix with the maximum inliers, providing the most robust estimate of F based on the data.

Bench:

Algorithm	Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)	Fundamental Matrix
Eightpoint			[[-1.17732480e-08 -5.96492484e-07 -9.25459194e-05] [-2.11574661e-06 7.33247733e-07 2.38472741e-02] [-8.40950559e-04 -2.23150919e-02 1.00000000e+00]]
Sevenpoint			[[-6.00278924e-08 -1.89901309e-06 -1.01789158e-04] [-8.35050242e-07 6.35592554e-07 2.34660431e-02] [-7.75197644e-04 -2.20073479e-02 1.00000000e+00]]

Remote:

Algorithm	Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)	Fundamental Matrix
Eightpoint			[[ 3.31549792e-07 1.83719791e-06 1.65131980e-03] [-4.99599151e-06 -1.96519828e-06 -1.09521744e-02] [-3.16482952e-03 1.34098309e-02 1.00000000e+00]]
Sevenpoint			[[ 9.07903785e-07 9.52425153e-06 2.80375262e-03] [-2.37640159e-05 -1.12324145e-05 -3.28032408e-02] [-5.52592264e-03 4.79981947e-02 1.00000000e+00]]

Ball:

Algorithm	Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)	Fundamental Matrix
Eightpoint			[[ 3.71042537e-07 -2.25290945e-06 -1.29575359e-02] [ 2.83902820e-06 -9.48009479e-07 -2.08477087e-02] [ 1.68824336e-02 2.03637835e-02 1.00000000e+00]]
Sevenpoint			[[-1.12736128e-08 -2.69063603e-06 -5.47932562e-03] [ 3.29220226e-06 -7.78788925e-07 -1.30644532e-02] [ 7.31943242e-03 1.29175883e-02 1.00000000e+00]]

Hydrant:

Algorithm	Viewpoint 1 (Points)	Viewpoint 2 (Epipolar Lines)	Fundamental Matrix
Eightpoint			[[ 1.15137573e-07 -3.55180795e-06 -1.44531518e-03] [ 5.07452803e-06 8.62427635e-08 -1.69450956e-02] [ 1.18864639e-03 1.66666913e-02 1.00000000e+00]]
Sevenpoint			[[ 8.03014968e-08 -4.41774279e-06 -1.17500281e-03] [ 5.91955998e-06 -6.29390198e-08 -1.74127183e-02] [ 9.10060610e-04 1.72900940e-02 1.00000000e+00]]

Q3: Triangulation

Explanation:

The triangulation algorithm reconstructs 3D points from 2D correspondences in two images, given the projection matrices of each camera. Here’s a concise summary of the approach:

Set up Constraints: For each pair of corresponding points, x in image 1 and x' in image 2, use the projection matrices P1 and P2 to establish linear constraints based on the relation x = P1 X and x' = P2 X, where X is the desired 3D point.
Construct Matrix A: From the constraints, form a matrix A such that AX=0, representing a system of equations to solve for X.
Solve Using SVD: Perform Singular Value Decomposition (SVD) on A and obtain X as the last column of V (from the decomposition A=USV^T), corresponding to the smallest singular value.
Repeat for All Points: Apply this process to all pairs of corresponding points, obtaining a 3D point X for each pair.
Color Assignment: Optionally, assign colors to each 3D point by averaging the colors of the corresponding points in both images.
Visualization: Use the resulting set of 3D points to create a point cloud, allowing visualization of the reconstructed scene.

Q4: Reconstruct your own scene!

I've used Mast3R demo as that was easier to set up. I tried COLMAP but since I have a Mac it was difficult to set it up as the function "pycolmap.patch_match_stereo" requires the COLMAP to be built with CUDA. Also since there was flexibility in the setup I chose the above

Sequence

Multi-view images

Output

Room Near Robolounge

NSH 1st Floor