HW3: 3D Reconstruction

Q1: 8-point and 7-point Algorithm

(A1) Fundamental Matrix using 8-point Algorithm

Implementation

The implementation follows these steps:

Normalization: Normalize the point coordinates in both images to improve numerical stability. This involves computing a similarity transformation that translates the centroid of the points to the origin and scales the points so that the average distance from the origin is \( \sqrt{2} \).
Construct Matrix A: For each pair of corresponding normalized points \( (x_1, y_1) \) and \( (x_2, y_2) \), construct a row in matrix \( A \) using the equation derived from the epipolar constraint \( x'_i F x_i = 0 \).
Solve for F: Compute the Singular Value Decomposition (SVD) of \( A \) and extract the fundamental matrix \( F \) from the last column of \( V \) (or row of \( V^T \)).
Enforce Rank-2 Constraint: Since the fundamental matrix is rank 2, we enforce this by setting the smallest singular value of \( F \) to zero and reconstructing \( F \).
Denormalization: Transform the normalized fundamental matrix back to the original coordinate system using the similarity transformations.

Results

Below are the images showing the selected points in the first image and the corresponding epipolar lines in the second image.

(A2) Essential Matrix using 8-point Algorithm

Implementation

Given the estimated fundamental matrix \( F \) and the camera intrinsic matrices \( K_1 \) and \( K_2 \), the essential matrix \( E \) is computed using the relation \( E = K_2^\top F K_1 \).

The steps are:

Compute the initial essential matrix \( E_{\text{initial}} = K_2^\top F K_1 \).
Enforce the essential matrix constraints by setting the singular values to \( (1, 1, 0) \). This is done by performing SVD on \( E_{\text{initial}} \) and reconstructing \( E \) with the adjusted singular values.

Results

The estimated essential matrix \( E \) for the "bench" dataset is:

\[ E = \begin{bmatrix} -0.006006 & -0.12753473 & -0.03110523 \\ -0.01996524 & 0.01143614 & 0.99920025 \\ -0.05190939 & -0.99040242 & 0.01446384 \end{bmatrix} \]

The estimated essential matrix \( E \) for the "remote" dataset is:

\[ E = \begin{bmatrix} -0.08377958 & 0.02220699 & -0.24277898 \\ 0.21924535 & 0.10026782 & 0.93716513 \\ 0.48021994 & -0.87489293 & -0.03311699 \end{bmatrix} \]

(B) Fundamental Matrix using 7-point Algorithm

Implementation

The 7-point algorithm computes the fundamental matrix using only seven point correspondences. This involves solving a cubic polynomial to enforce the rank-2 constraint.

The implementation steps are:

Normalization: Similar to the 8-point algorithm, normalize the point coordinates in both images.
Construct Matrix A: Build the matrix \( A \) using the seven correspondences.
Solve for F: Compute the SVD of \( A \) to obtain two possible solutions \( F_1 \) and \( F_2 \).
Compute Determinant Polynomial: The fundamental matrix must satisfy \( \text{det}(F) = 0 \). We express \( F \) as a linear combination \( F = \lambda F_1 + (1 - \lambda) F_2 \) and solve for \( \lambda \) using the determinant constraint.
Select Correct Solution: From the real solutions of \( \lambda \), select the one that best fits the data or meets additional criteria.
Denormalization: Transform the fundamental matrix back to the original coordinate system.

Results

Below are the images showing the selected points in the first image and the corresponding epipolar lines in the second image.

Q2: RANSAC with 7-point and 8-point Algorithm

Implementation

Due to the presence of noise and outliers in real-world data, RANSAC (Random Sample Consensus) is used to robustly estimate the fundamental matrix by iteratively selecting random subsets of correspondences and computing \( F \) that best fits the majority of the data.

The RANSAC algorithm is implemented as follows:

Random Sampling: Randomly select a minimal subset of correspondences (7 or 8 points) required for estimating \( F \).
Estimate \( F \): Use the selected correspondences to compute the fundamental matrix using either the 7-point or 8-point algorithm.
Compute Inliers: For all correspondences, compute the epipolar constraint error and classify points as inliers if the error is below a threshold.
Iterate: Repeat the above steps for a fixed number of iterations or until a satisfactory model is found.
Select Best Model: Choose the \( F \) with the highest number of inliers.

Inlier Criteria

A correspondence is considered an inlier if the Sampson distance (or a similar error metric) is less than a predefined threshold (e.g., 1.0).

Results

Below is an example of the inlier ratio over iterations for the "bench" dataset using both the 7-point and 8-point algorithms and the epipolar lines computed using the best fundamental matrix found by RANSAC.

Q3: Triangulation

Implementation

Given the camera projection matrices \( P_1 \) and \( P_2 \), and corresponding points in two images, the goal is to reconstruct the 3D positions of the points using triangulation.

The steps for triangulation are:

For each point correspondence \( (x_1, y_1) \) and \( (x_2, y_2) \), set up a linear system based on the projection equations: \[ \begin{cases} (x_1 P_{1,3}^\top - P_{1,1}^\top) X = 0 \\ (y_1 P_{1,3}^\top - P_{1,2}^\top) X = 0 \\ (x_2 P_{2,3}^\top - P_{2,1}^\top) X = 0 \\ (y_2 P_{2,3}^\top - P_{2,2}^\top) X = 0 \end{cases} \] where \( X \) is the homogeneous coordinate of the 3D point.
Stack the equations into a matrix \( A \) and solve \( A X = 0 \) using SVD.
Extract the 3D point \( X \) from the last column of \( V \) and convert it to inhomogeneous coordinates.

Results

The reconstructed 3D points are visualized in a 3D scatter plot with colors extracted from the first image. We also visualize the 2D point correspondences.

Q4: Reconstruct Your Own Scene

Implementation

In this task, we use COLMAP to reconstruct our own scenes from multiple images. We capture images of a scene from different viewpoints and process them through COLMAP to obtain a 3D reconstruction and camera poses.

Results and Visualization

Below are the input images and the corresponding reconstruction outputs for scene 1 (wine). For better visualization, we use python code q4.py to visualize the point cloud again.

Similarly, below are the input images and the corresponding reconstruction outputs for scene 2 (medicine). For better visualization, we also use python code q4.py to visualize the point cloud again.

Q5: Bonus 1 - Fundamental Matrix Estimation on Your Own Images

Implementation

We capture or find our own pairs of images and estimate the fundamental matrix \( F \) using SIFT feature extraction, feature matching, and RANSAC.

The steps are:

Feature Extraction: Use SIFT to detect keypoints and compute descriptors in both images.
Feature Matching: Match features using a brute-force matcher with the ratio test to find good matches.
Estimate \( F \): Use the matched points to estimate the fundamental matrix using RANSAC to handle outliers.
Epipolar Lines: Compute and draw the epipolar lines on both images using the estimated \( F \).

Results

Below are the images with the epipolar lines drawn for two datasets: "wine" and "medicine".

Dataset: Wine

Dataset: Medicine

Q6: Bonus 2 - Stress Test the Hyperparameters of COLMAP

Question 1: Effect of Reducing Number of Input Images

We tested how reducing the number of input images affects the reconstruction quality in COLMAP. With fewer images, the reconstructed point cloud becomes sparser, and certain areas of the scene may not be reconstructed at all due to insufficient overlap and feature matches.

For example, if we reduce the input number of dataset wine from 6 to 3, the result would be sparser and less precise as shown below.

Question 2: Under what scenario and conditions does the reconstruction pipeline break?

The COLMAP reconstruction pipeline can encounter issues under certain conditions, especially when handling a large number of images or poor-quality data. Key factors include:

Too Many Input Images: When the number of images is very high, the feature matching step in COLMAP becomes computationally demanding, leading to memory constraints or long processing times that can break the pipeline.
Low Image Quality or Insufficient Overlap: For accurate reconstruction, images must have clear, repeatable features with good overlap. Blurry or poorly lit images and lack of overlap can lead to insufficient matches, causing the reconstruction to fail or result in incomplete structures.

Limiting the image count to only essential views and ensuring high-quality, well-overlapped images can help stabilize COLMAP’s reconstruction process.