16-822 Assignment 3

Q1: 8-point and 7-point algorithm (40 points)

(A1) F matrix using 8-point algorithm (15 points) and (A2) E matrix using 8-point algorithm (5 points)

Implementation Steps

Step 1: Normalization of Points

Normalization helps improve the numerical stability of the algorithm. I compute a normalization matrix $ T $ for each set of points so that the points are centered and have an average distance of $\sqrt{2}$ from the origin. The normalization matrix $ T $ is defined as follows:

\[ T = \begin{bmatrix} \frac{\sqrt{2}}{\text{avg_distance}} & 0 & -\frac{\sqrt{2}}{\text{avg_distance}} \cdot \text{mean_x} \\ 0 & \frac{\sqrt{2}}{\text{avg_distance}} & -\frac{\sqrt{2}}{\text{avg_distance}} \cdot \text{mean_y} \\ 0 & 0 & 1 \end{bmatrix} \]

Step 2: Constructing the Matrix A

I then create matrix $ A $ using the normalized point correspondences. Each row of $ A $ is constructed using the elements of the normalized points from the two images as follows:

\[ A = \begin{bmatrix} x_2' \cdot x_1' & x_2' \cdot y_1' & x_2' & y_2' \cdot x_1' & y_2' \cdot y_1' & y_2' & x_1' & y_1' & 1 \end{bmatrix} \]

Step 3: Computing the Fundamental Matrix (F)

Using Singular Value Decomposition (SVD), I decompose $ A $ and extract the last row of $ V^T $ (from SVD of $ A $), which corresponds to the solution for $ F $. To ensure that $ F $ is a rank-2 matrix, I impose the rank-2 constraint by setting the smallest singular value to zero:

\[ F_{\text{normalized}} = U \cdot \text{diag}(s_1, s_2, 0) \cdot V^T \]

Finally, I denormalize $ F $ by applying the normalization matrices $ T_1 $ and $ T_2 $:

\[ F = T_2^T \cdot F_{\text{normalized}} \cdot T_1 \]

Step 4: Computing the Essential Matrix (E)

Once I have $ F $, I can compute the Essential matrix $ E $ using the intrinsic camera matrices $ K_1 $ and $ K_2 $:

\[ E = K_2^T \cdot F \cdot K_1 \]

Similarly, I enforce the rank-2 constraint on $ E $ by setting the smallest singular value to zero.

Step 5: Drawing Epipolar Lines

To visualize the result, I draw the epipolar lines on the second image based on points from the first image. The epipolar line equation in the second image for a point $ x $ in the first image is:

\[ l = F \cdot x \]

(B) 7-point algorithm (20 points)

Brief Explanation of the Implementation

The seven-point algorithm is used to estimate the fundamental matrix $ F $ between two views, given seven point correspondences. Since this is an underdetermined system, the algorithm generates multiple solutions for $ F $ by imposing the rank-2 constraint and solving a cubic polynomial.

Step-by-Step Implementation

Step 1: Normalizing the Points

To improve the stability of the algorithm, I normalize the points. The normalization matrix $ T $ centers the points and scales them so that they have an average distance of $ \sqrt{2} $ from the origin:

Step 2: Constructing Matrix $ A $

Using the normalized points, I construct matrix $ A $ for the seven-point algorithm. Each row of $ A $ is formed using the coordinates of corresponding points in the two images:

\[ A = \begin{bmatrix} x_2 \cdot x_1 & x_2 \cdot y_1 & x_2 & y_2 \cdot x_1 & y_2 \cdot y_1 & y_2 & x_1 & y_1 & 1 \end{bmatrix} \]

Step 3: Singular Value Decomposition (SVD)

I perform SVD on $ A $, which gives two candidate fundamental matrices, $ F_1 $ and $ F_2 $. These matrices represent possible solutions to the seven-point system.

Step 4: Solving for Polynomial Roots

I combine $ F_1 $ and $ F_2 $ using a scalar $ \alpha $ and compute the determinant of the combination:

\[ \det(\alpha F_1 + (1 - \alpha) F_2) = 0 \]

This results in a cubic polynomial, and I find the real roots of this polynomial to generate potential solutions for $ F $.

Step 5: Rank-2 Constraint and Denormalization

For each real root $ \alpha $, I construct a candidate $ F $ matrix, apply the rank-2 constraint by setting the smallest singular value to zero, and denormalize the result by applying the inverse normalization matrices $ T_1 $ and $ T_2 $:

\[ F = T_2^T \cdot F_{\text{normalized}} \cdot T_1 \]

Step 6: Epipolar Distance Calculation

I calculate the epipolar error for each candidate $ F $ by measuring the distance of corresponding points from the epipolar lines. The matrix with the lowest average error is selected as the best estimate for $ F $.

Epipolar Line Visualization

Finally, I visualize the epipolar lines on the second image. For each point in the first image, the corresponding epipolar line in the second image is computed using the fundamental matrix $ F $:

\[ l = F \cdot x \]

The lines are drawn on the image to help verify the correctness of $ F $ visually.

Output

The output images are saved under the directory output/q1/q1_b with the epipolar points and lines visualized for each dataset.

Q2: Metric Rectification Results by Category

Brief Explanation of the RANSAC Implementation

In this implementation, RANSAC algorithm is used to estimate the fundamental matrix $ F $ for each image set using both the 7-point and 8-point algorithms. RANSAC helps deal with noisy correspondences by iteratively selecting a subset of points, estimating $ F $, and determining inliers based on a defined threshold.

Step 1: Initialization

The parameters for RANSAC, including the threshold and the maximum number of iterations (max_iterations), are set. The threshold controls the maximum allowable distance for a point to be considered an inlier.

Step 2: Sampling and Fundamental Matrix Estimation

In each iteration, a random subset of correspondences is chosen (7 or 8 points based on the algorithm used). The chosen points are normalized to improve numerical stability. The fundamental matrix $ F $ is estimated using either the 7-point or 8-point method:

- For the 8-point algorithm, a single matrix $ F $ is computed.
- For the 7-point algorithm, multiple candidate matrices $ F $ are generated by solving for the real roots of a cubic polynomial.

Step 3: Counting Inliers

For each candidate $ F $, I calculate the epipolar distance for all points. The epipolar distance $ d $ between a point and its corresponding epipolar line is calculated as follows:

\[ d = \frac{|p_2^T F p_1|}{\sqrt{(F p_1)_1^2 + (F p_1)_2^2 + (F^T p_2)_1^2 + (F^T p_2)_2^2}} \]

Points with a distance $ d $ below the threshold are considered inliers.

Step 4: Updating the Best Model

If the number of inliers for the current $ F $ exceeds the previous maximum, this $ F $ is stored as the best estimate, along with the inlier set. The inlier percentage for each iteration is also logged to track the effectiveness of the RANSAC iterations.

Criteria for Considering Inliers

A correspondence is considered an inlier if the epipolar distance $ d $ is less than the specified threshold. This distance indicates how closely a point aligns with its corresponding epipolar line in the other image, providing a measure of geometric consistency.

Output

The best fundamental matrix $ F $ and the corresponding inliers are plotted. Additionally, a graph showing the percentage of inliers vs. the number of RANSAC iterations is generated to illustrate the convergence behavior of the algorithm for both the 7-point and 8-point methods.

Bench

Hydrant

Ball

Remote

Q3: Triangulation (20 points)

Brief Explanation of the Triangulation Implementation

In this task, I aim to compute the 3D coordinates of points from two sets of 2D correspondences by using two camera projection matrices. The triangulation algorithm allows us to reconstruct a 3D scene from these two 2D views.

1. Construct Matrix A

For each pair of 2D correspondences $(x_1, y_1)$ from the first image and $(x_2, y_2)$ from the second image, I construct a matrix $ A $ that incorporates information from both camera matrices ($P_1$ and $P_2$). Each correspondence generates two equations per camera matrix, resulting in a $4 \times 4$ matrix $ A $:

$$ A = \begin{bmatrix} x_1 P_{13} - P_{11} \\ y_1 P_{13} - P_{12} \\ x_2 P_{23} - P_{21} \\ y_2 P_{23} - P_{22} \end{bmatrix} $$

where $ P_{ij} $ represents elements from the camera matrices. This matrix $ A $ is essential for the triangulation process.

2. Triangulating the 3D Points

For each pair of corresponding points, I use Singular Value Decomposition (SVD) to solve $ A \cdot X = 0 $, where $ X $ represents the homogeneous 3D coordinates of the point.
The last row of $ V^T $ (from the SVD of $ A $) provides the solution in homogeneous coordinates. I normalize this solution to convert it to 3D Cartesian coordinates.

3. Visualizing Correspondences

A visualization of the correspondences between the two images is created by drawing lines between matching points on a combined image. This helps verify that the points in both images correspond correctly.

4. 3D Point Cloud Visualization

The reconstructed 3D points are visualized as a colored point cloud, providing a spatial representation of the scene captured from the multi views.

Q4: Reconstruct your own scene! (20 points)

3D object visualization and the extrinsics after running COLMAP is shown in below GIF.

I choose ETH3D facade dataset (76 images) like below, I just pick 12 examples below:

Q5: Bonus 1 - Fundamental matrix estimation on your own images. (10 points)

Brief Explanation of the Fundamental Matrix Estimation Implementation

In this task, I estimate the fundamental matrix $ F $ for a pair of images by detecting keypoints, matching them, and applying RANSAC to filter out outliers. I then visualize the epipolar lines using the estimated matrix $ F $.

1. Keypoint Detection and Description

I utilize the SIFT (Scale-Invariant Feature Transform) feature detector to find distinctive keypoints in each image and compute descriptors for these points. This allows us to uniquely identify and match points across the two images.

 
    sift_detector = cv2.SIFT_create()
    keypoints_photo1, descriptors_photo1 = sift_detector.detectAndCompute(photo1, None)
    keypoints_photo2, descriptors_photo2 = sift_detector.detectAndCompute(photo2, None)

2. Keypoint Matching

I use the FLANN-based matcher with KNN (k-nearest neighbors) to find potential matches between the two sets of descriptors. I apply a ratio test to filter out poor matches, keeping only those that pass the threshold.

The ratio test is defined as:

\[ \text{valid_matches} = \{ m | m.\text{distance} < \frac{n.\text{distance}}{1.5} \} \]

3. Fundamental Matrix Estimation with RANSAC

Using the matched points, I estimate the fundamental matrix $ F $ using the RANSAC (Random Sample Consensus) algorithm. RANSAC helps us eliminate outlier matches by iteratively fitting the model and maximizing the number of inliers:

\[ F, \text{ inliers } = \text{cv2.findFundamentalMat(pts_photo1, pts_photo2, cv2.FM_RANSAC)} \]

4. Epipolar Line Calculation

With the estimated fundamental matrix $ F $, I calculate the epipolar lines for a subset of inlier points. For points in one image, the epipolar line in the other image is given by:

\[ l' = F \cdot p \]

where $ p $ is a point in the first image and $ l' $ is the corresponding epipolar line in the second image.

5. Visualization of Epipolar Lines

Shown in below:/p>

Estimated F is shown in below:

Q6: Bonus 2 - Stress test the hyperparameters of COLMAP (10 points)

Question 1: What happens if I reduce the number of input images?

To test this, I used only 1/6 of the original dataset (origin includes 76 images), significantly reducing the number of input images.

The resulting 3D reconstruction was noticeably sparse, with far fewer details and missing portions of the structure. This sparsity is likely due to insufficient overlap between the images, which limited COLMAP's ability to identify corresponding points and generate a detailed 3D model. The effect is shown in below GIF:

Question 2: Under what scenario and conditions does the reconstruction pipeline break?

The reconstruction pipeline might break if there are very few feature matches between images due to either a low number of images, poor quality, or lack of distinct features. I expect that reducing the distinctiveness of features or having very few images with little overlap would cause the COLMAP pipeline to fail during feature matching or pose estimation.

Question 3: What happens if I play with some tolerance parameters?

Increasing tolerance parameters (e.g., reprojection error threshold) may allow more matches to be accepted, but this could reduce accuracy. Conversely, decreasing the tolerance might improve accuracy but reduce the number of valid points in the reconstruction.A high tolerance setting might yield a denser but potentially noisier reconstruction, while a low tolerance setting would yield a sparse but accurate model, depending on the balance between noise and detail in the dataset.