**16-822
3D Reconstruction**
**Anisha Jain (anishaja)**
# Overview
In this assignment, we implement methods to estimate the fundamental matrix from corresponding points in two images. We further use the fundamental matrix and calibrated intrinsic to compute the essential matrix. We also implement RANSAC to improve the robustness of the fundamental matrix estimation and compute a 3D metric reconstruction of the scene from 2D correspondences using triangulation.
# 8-point algorithm
## Fundamental Matrix Estimation
We use the 8-point algorithm to estimate the fundamental matrix from corresponding points in two images. The fundamental matrix is a 3x3 matrix that relates points in two images. It is defined as:
$$ x'^T F x = 0 $$
where $x$ and $x'$ are corresponding points in the two images and $F$ is the fundamental matrix.
We implement the algorithm as follows:
1. Normalize the points to have zero mean and unit variance using the transformation matrices.
$$ x = x_{org} T^T $$
$$ x' = x'_{org} T'^T $$
2. Construct the matrix $A$ from the normalized points.
3. Solve for the fundamental matrix $F$ by computing the SVD of $A$ and selecting the right singular vector corresponding to the smallest singular value.
$$ A[i] = \begin{bmatrix} x_i' x_i & x_i' y_i & x_i' & y_i' x_i & y_i' y_i & y_i' & x_i & y_i & 1 \end{bmatrix} $$
$$ U, S, V = svd(A) $$
$$ F = V[:, -1] \text{reshape}(3, 3) $$
4. Enforce the rank-2 constraint on $F$ by setting the smallest singular value to zero.
$$ U, S, V = svd(F) $$
$$ S[-1] = 0 $$
$$ F = U diag(S) V $$
5. Denormalize the fundamental matrix using the transformation matrices.
$$ F = T'^T F T $$
6. Make the fundamental matrix scale-invariant by normalizing it.
$$ F = F / F[-1, -1] $$
7. Using the fundamental matrix, we can compute the epipolar lines in the second image corresponding to points in the first image.
$$ l' = F x $$
$$F_{bench} = \begin{bmatrix}-1.19265833e^{-07} & -2.53255522e^{-06} & 2.07584109e^{-04} \\
-3.96465204e^{-07} & 2.27096267e^{-07} & 2.48867750e^{-02} \\
-1.06890748e^{-03} & -2.30052117e^{-02} & 1.00 \end{bmatrix}$$
  
$$F_{remote} = \begin{bmatrix}7.19006698e^{-07} & -1.90583125e^{-07} & 2.36215166e^{-03} \\
-1.88159055e^{-06} & -8.60510729e^{-07} & -8.45685523e^{-03} \\
-3.98058321e^{-03} & 9.46500248e^{-03} & 1.00 \end{bmatrix}$$
  
## Essential Matrix Estimation
We can compute the essential matrix from the fundamental matrix and the intrinsic matrices of the two cameras. The essential matrix is defined as:
$$ E = K'^T F K $$
where $K$ and $K'$ are the intrinsic matrices of the two cameras.
| Bench | Remote |
| --- | --- |
| $\begin{bmatrix} -0.41524274 & -8.81748894 &-2.15055808 \\ -1.3803559 & 0.79067134 & 69.08265036 \\
-3.58890876 & -68.47438701 & 1.00 \end{bmatrix}$ | $\begin{bmatrix} 2.5298064 & -0.67056178 & 7.33094837 \\
-6.62032749 & -3.02768466 & -28.29861665 \\
-14.50071092 & 26.41824781 & 1. \end{bmatrix}$ |
# 7-point algorithm
## Fundamental Matrix Estimation
Since the fundamental matrix only has 7 degrees of freedom, we can use the 7-point algorithm to estimate the fundamental matrix. We implement the algorithm as follows:
1. Normalize the points to have zero mean and unit variance using the transformation matrices.
$$ x = x_{org} T^T $$
$$ x' = x'_{org} T'^T $$
2. Construct the matrix $A$ from the normalized points.
3. Solve for the fundamental matrix $F$ by computing the SVD of $A$ and selecting the right singular vector corresponding to the smallest singular value.
$$ A[i] = \begin{bmatrix} x_i' x_i & x_i' y_i & x_i' & y_i' x_i & y_i' y_i & y_i' & x_i & y_i & 1 \end{bmatrix}$$
$$U, S, V = svd(A)$$
$$F_1 = V[:, -1] \text{reshape}(3, 3)$$
$$F_2 = V[:, -2] \text{reshape}(3, 3)$$
4. We use the singularity constraint to solve the cubic equation and obtain the fundamental matrix.
$$ f_n(a) = det(a * F_1 + (1 - a) * F_2)$$
5. Solve the cubic equation to obtain the roots of the equation.
6. Compute the fundamental matrices using the real roots of the equation.
$$ F = a_{soln} * F_1 + (1 - a_{soln}) * F_2 $$
7. Enforce the rank-2 constraint on $F$ by setting the smallest singular value to zero.
$$ U, S, V = svd(F) $$
$$ S[-1] = 0 $$
$$ F = U diag(S) V $$
8. Denormalize the fundamental matrix using the transformation matrices.
$$ F = T'^T F T $$
9. Make the fundamental matrix scale-invariant by normalizing it.
$$ F = F / F[-1, -1] $$
10. Using the fundamental matrix, we can compute the epipolar lines in the second image corresponding to points in the first image.
$$ l' = F x $$
11. We choose the solution which gives us parallel epipolar lines for the two images. This is from our understanding that the epipolar lines should be parallel for the two images.
  
Epipolar lines using the other solutions for the fundamental matrix are as follows:
 
  
# RANSAC with 8-point and 7-point algorithms
We use RANSAC to improve the robustness of the fundamental matrix estimation. We implement the algorithm as follows:
1. Randomly sample 8 or 7 points from the correspondences.
2. Estimate the fundamental matrix using the 8-point or 7-point algorithm.
3. Compute the number of inliers by computing the distance between the epipolar lines and the corresponding points in both images.
4. The points are considered inliers if the distance is less than a threshold. Since, all datasets have different noise levels, the thresholds are different for each dataset. Though, we use a threshold range of 2 for all except one dataset, where we use a threshold of 15 (the threshold is pixel units).
5. We repeat the process for 10000 iterations and select the fundamental matrix with the maximum number of inliers.
**In the below visualizations we only show 100 inliers and the corresponding epipolar lines, even though the total number of inliers is much higher.**
| Dataset | F using eight point | F using seven point |
| --- | --- | --- |
| Bench |$\begin{bmatrix} -8.45015772e^{-08} & -2.05782051e^{-06} & 6.50425857e^{-05} \\ -7.88847267e^{-07} & 3.66548146e^{-07} & 2.45073361e^{-02} \\-9.44057712e^{-04} & -2.27662232e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ | $\begin{bmatrix} -1.20603454e^{-07} & -2.69074305e^{-06} & 1.57724877e^{-04} \\ -1.80978957e^{-07} & 1.54962834e^{-07} & 2.45520745e^{-02} \\ -1.01564597e^{-03} & -2.27163372e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ |
  
  
  
| Dataset | F using eight point | F using seven point |
| --- | --- | --- |
|Ball | $\begin{bmatrix} -1.26313354e^{-07} & -3.22476462e^{-06} & -7.56133422e^{-03} \\3.73854433e^{-06} & -9.35412726e^{-07} & -1.66172875e^{-02} \\ 1.04947433e^{-02} & 1.64757520e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ | $\begin{bmatrix} 8.36590565e^{-08} & -2.97984016e^{-06} & -8.78560405e^{-03} \\ 3.56516012e^{-06} & -1.00003803e^{-06} & -1.72524240e^{-02} \\ 1.17318459e^{-02} & 1.70888129e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ |
  
  
  
| Dataset | F using eight point | F using seven point |
| --- | --- | --- |
| Hydrant | $\begin{bmatrix} 1.17620829e^{-07} & -3.08977710e^{-06} & -1.45534219e^{-03} \\ 4.58113806e^{-06} & 2.16239040e^{-07} & -1.64215457e^{-02} \\ 1.20808403e^{-03} & 1.60184701e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ | $\begin{bmatrix} 1.04382932e^{-07} & -3.81173676e^{-06} & -1.39768854e^{-03} \\ 5.27975558e^{-06} & 5.28258478e^{-08} & -1.67458929e^{-02} \\ 1.13817004e^{-03} & 1.64903785e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ |
  
  
  
| Dataset | F using eight point | F using seven point |
| --- | --- | --- |
| Remote | $\begin{bmatrix} 9.23435898e^{-07} & 5.98723435e^{-06} & 2.71697163e^{-03} \\ -1.25206967e^{-05} & -5.80700527e^{-06} & -1.57105815e^{-02} \\ -4.64980227e^{-03} & 2.22244568e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ | $\begin{bmatrix} 2.27700130e^{-06} & -3.99672422e^{-06} & 3.98485015e^{-03} \\ -1.03667781e^{-06} & -1.10188070e^{-06} & -1.16600481e^{-02} \\ -6.08066567e^{-03} & 1.41274583e^{-02} & 1.00000000e^{+00} \end{bmatrix}$ |
  
  
  
# Triangulation
We can compute the 3D metric reconstruction of the scene from 2D correspondences using triangulation. We implement the algorithm as follows:
1. For each pair of corresponding points, compute the 3D point using the triangulation method.
2. The 3D point is computed as:
$$ A = \begin{bmatrix} x_1 P_3 - P_1 & x_2 P_3 - P_2 & x_1' P_3 - P_1' & x_2' P_3 - P_2' \end{bmatrix} $$
$$ U, S, V = svd(A) $$
$$ X = V[-1] / V[-1, -1] $$
3. The 3D points are computed for all corresponding points.
4. The color of the 3D points is the average of the color of the corresponding points in the two images.
5. The 3D points are visualized using a point cloud.

# Reconstruction of own scene using COLMAP
## Smith Hall
I use the below [images]("https://drive.google.com/drive/folders/1P1J9cEWSSFCL_XmHSVYfuWgtmcgAWprB") of the outside of Smith Hall for the reconstruction
    
    
    
    
    

## Taj Mahal
I use the below images of the Taj Mahal from the internet ([Wikicommons](https://commons.wikimedia.org/wiki/Category:Taj_Mahal_Complex_%E2%80%94_Tomb)) for the reconstruction
    
    
    
 

# $F$ matrix estimation using own images
We follow the below steps to estimate the fundamental matrix using our own images:
1. Run SIFT to detect keypoints and extract descriptors from the images.
2. Match the keypoints using FLANN on the descriptors.
3. Use RANSAC to estimate the fundamental matrix using the 8-point algorithm as detailed in the previous sections.
4. Visualize the epipolar lines and the corresponding points in the two images.
**We only show 100 inliers and the corresponding epipolar lines, even though the total number of inliers is much higher.**
 
 
 

 
 
 

# Stress Testing COLMAP hyperparameters
We stress test the COLMAP hyperparameters by varying below parameters:
## Number of images
Compared to the the 18 images used earlier for the Taj Mahal reconstruction, we use just 6 images for the reconstruction.
  
  
COLMAP can perform reconstruction in less than a minute with 6 images with only 4760 points, in comparison to couple of more minutes and 15000 points with 18 images.

## Absolute pose max error
We use all the 18 images but reduce the absolute pose max error threshold from default value of 12 pixels to 5 pixels. The reconstruction is still successful with the reduced error but has fewer points (about 14000 points, compared to 15000 points with default error threshold).

This just shows that with enough images that cover the scene from different angles, COLMAP can still perform reconstruction with reduced error threshold.