# Assignment 3: 3D Reconstruction
## Q1-A: 8-point Algorithm
### A1: F matrix using 8-point algorithm
To find the fundamental matrix F based on corresponding points $(x, x^{'})$ in two images using 8 points algorithm, we can convert this problem to $Ax = 0$ where $x$ is the 9 elements of the fundamental matrix. For the data preprocessing, we first need to convert the annotation to the homogeneous coordinates.
First, we need to find the transformation matrix $T$ which normalizes the points and makes the points have zero mean and unit variance. So, $T$ can be represented as:
$$T =
\begin{bmatrix}
s & 0 & -sx_0 \\
0 & s & -sy_0 \\
0 & 0 & 1
\end{bmatrix}$$
where $s$ is the scaling factor and $(x_0, y_0)$ is the mean of the points. Therfore, $s$ can be calculated as:
$$
s = \sqrt{2} / d_{avg} \\
d_{avg} = \frac{1}{N} \sum_{i=1}^{N} \sqrt{(x_i - x_0)^2 + (y_i - y_0)^2}
$$
Then, the normalized points can be calculated as:
$$
\hat{x} = Tx \\
\hat{x}^{'} = T^{'}x^{'}
$$
Second, we use the constraints $x^{'}Fx = 0$ to convert the problem to $A_if = 0$. The matrix $A_i$ can be calculated as:
$$
A_i =
\begin{bmatrix}
x^{'}_i x_i & x^{'}_i y_i & x^{'}_i z_i & y^{'}_i x_i & y^{'}_i y_i & y^{'}_i z_i & z^{'}_i x_i & z^{'}_i y_i & z^{'}_i z_i
\end{bmatrix}
$$
After using SVD, we have $F = V[-1]$, which is the last column of $V$.
Third, we need to enforce the rank-2 constraints on $F$. Therefore, we ignore the smallest singular value and set it to zero. If $F = UDV^T$, then $F = Udiag(d_1, d_2, 0)V^T$. And $F$ is divided by the last element $F[2, 2]$ to make it scale-invariant.
Finally, we need to denormalize the fundamental matrix $F$ by $F = T^{'T}FT$.
#### Results of bench
$$
F =
\begin{bmatrix}
-1.19265833e-07 & -2.53255522e-06 & 2.07584109e-04 \\
-3.96465204e-07 & 2.27096267e-07 & 2.48867750e-02 \\
-1.06890748e-03 & -2.30052117e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
|--------------------------|------------------------------|
| |
|
#### Results of remote
$$
F =
\begin{bmatrix}
7.19006698e-07 & -1.90583125e-07 & 2.36215166e-03 \\
-1.88159055e-06 & -8.60510729e-07 & -8.45685523e-03 \\
-3.98058321e-03 & 9.46500248e-03 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
|--------------------------|------------------------------|
|
|
|
### A2: E matrix using 8-point algorithm
By given the intrinsic matrix $K_1$ and $K_2$, and the fundamental matrix $F$ from the previous step, we can calculate the essential matrix $E$ by $E = K_2^TFK_1$. And $E$ is divided by the last element $E[2, 2]$ to make it scale-invariant. Moreover, if we do not use the $F$ matrix, we can find $E$ matrix using the following steps:
First, we can find the normalized points $\hat{x}$ and $\hat{x}^{'}$ by $x = K_1^{-1}x$ and $x^{'} = K_2^{-1}x^{'}$.
Second, using the same method to convert the problem to $Ax=0$, we can solve $E$ using SVD.
Finally, we need to enforce the rank-2 constraints for $E$. Therefore, if $E = UDV^T$, then $E = Udiag(d, d, 0)V^T$, where $d = (d_1 + d_2) / 2$. And $E$ is divided by the last element $E[2, 2]$ to make it scale-invariant.
#### Results of bench
$$
E =
\begin{bmatrix}
-0.41524274 & -8.81748894 & -2.15055808 \\
-1.3803559 & 0.79067134 & 69.08265036 \\
-3.58890876 & -68.47438701 & 1.00000000
\end{bmatrix}
$$
#### Results of remote
$$
E =
\begin{bmatrix}
2.5298064 & -0.67056178 & 7.33094837 \\
-6.62032749 & -3.02768466 & -28.29861665 \\
-14.50071092 & 26.41824781 & 1.00000000
\end{bmatrix}
$$
## Q1-B: 7-point Algorithm
To find the fundamental matrix F based on corresponding points $(x, x^{'})$ in two images using 7 points algorithm, we can also convert the problem to $Ax = 0$ using the same method as 8 points algorithm.
The different between 7 points and 8 points algorithm is that in 7 points we will find two linearly independent solutions $F_1$ and $F_2$ for $F$. The general solution for $F$ can be represented as:
$$
F = \alpha F_1 + (1 - \alpha)F_2
$$
Then, we need to enforce the rank-2 constraints on this general solution. It is a 3--degree cubic polynomial, so for the code implementation, I use the numpy polyfit function and root function to find the roots of this polynomial, which has 3 solutions. Then, we need to denormalize the fundamental matrix $F$ by $F = T^{'T}FT$.
Finally, to find the correct $F$ from the 3 solutions, we first remove all the invalid solutions, which is complex. Then we can visualize the results and manually select the correct one.
#### Results of hydrant
$$
F =
\begin{bmatrix}
1.06836663e-07 & -3.30854241e-06 & -1.37397509e-03 \\
4.79268901e-06 & 1.63305018e-07 & -1.67456794e-02 \\
1.12974017e-03 & 1.64136613e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
|--------------------------|------------------------------|
|
|
|
#### Results of ball
$$
F =
\begin{bmatrix}
-3.35078059e-09 & -2.91545005e-06 & -6.74891733e-03 \\
3.51682683e-06 & -8.84237457e-07 & -1.50044163e-02 \\
9.09398538e-03 & 1.48748649e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
|--------------------------|------------------------------|
|
|
|
## Q2: RANSAC with 7-point and 8-point algorithm
To find the fundamental matrix $F$ using RANSAC with 7-point or 8-point algorithm, we need a lot of corresponding points $(x, x^{'})$ in two images, and set up the number of iteration and the threshold of inlier selection to find the best $F$. For each iteration, we do the following steps:
First, we randomly select 7 or 8 points and use the 7-point or 8-point algorithm to find the fundamental matrix $F$.
Then, based on this $F$, we need to calculate the number of inliers which error is less than the threshold. The error is defined as the unit distance between the point $x^{'}$ and the epipolar line $l = Fx$. Therefore, we calculate the distance $d = |x^{'}l| / \sqrt{l_1^2 + l_2^2}$, which is in the pixel unit.
Finally, if the number of inliers is larger than the previous best inliers, we update the best $F$ and the best inliers.
To better visualize the results, I randomly select 20 inliers and plot the corresponding points and epipolar lines.
#### Results of bench
$$
F =
\begin{bmatrix}
-1.10298327e-07 & -2.54144323e-06 & 1.25368129e-04 \\
-2.92526744e-07 & 1.83055493e-07 & 2.46869125e-02 \\
-9.91266343e-04 & -2.28763202e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (8-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
$$
F =
\begin{bmatrix}
-1.15942355e-07 & -2.53922954e-06 & 1.88462295e-04 \\
-4.15597655e-07 & 1.79895760e-07 & 2.49239085e-02 \\
-1.03947640e-03 & -2.30293429e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (7-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
| Percentage of Inliers |
|--------------------------|
|
|
#### Results of remote
$$
F =
\begin{bmatrix}
9.16500167e-07 & -1.63178302e-06 & 2.63724198e-03 \\
-2.90171728e-07 & -3.72629293e-07 & -8.64245812e-03 \\
-4.39122095e-03 & 9.37145626e-03 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (8-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
$$
F =
\begin{bmatrix}
3.62803400e-07 & -2.86063327e-07 & 1.92308999e-03 \\
-8.08066346e-07 & -1.98757326e-07 & -7.46229202e-03 \\
-3.37572019e-03 & 7.62959396e-03 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (7-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
| Percentage of Inliers |
|--------------------------|
|
|
#### Results of hydrant
$$
F =
\begin{bmatrix}
1.09735377e-07 & -2.86150660e-06 & -1.45481484e-03 \\
4.29618678e-06 & 2.25849198e-07 & -1.66342262e-02 \\
1.24438581e-03 & 1.62626068e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (8-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
$$
F =
\begin{bmatrix}
1.46105346e-07 & -2.40477017e-06 & -1.70891899e-03 \\
3.84105786e-06 & 2.74610148e-07 & -1.55723285e-02 \\
1.46137546e-03 & 1.50828525e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (7-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
| Percentage of Inliers |
|--------------------------|
|
|
#### Results of ball
$$
F =
\begin{bmatrix}
-3.05949514e-08 & -2.94362551e-06 & -7.05278039e-03 \\
3.50347901e-06 & -8.84607240e-07 & -1.54493628e-02 \\
9.56484820e-03 & 1.53144433e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (8-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
$$
F =
\begin{bmatrix}
-3.78962167e-08 & -2.91968486e-06 & -6.76390614e-03 \\
3.48343112e-06 & -8.84517474e-07 & -1.52026022e-02 \\
9.22221690e-03 & 1.50956328e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (7-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
| Percentage of Inliers |
|--------------------------|
|
|
## Q3: Triangulation
To find the 3D points $X$ based on the corresponding points $(x, x^{'})$ in two images and the projection matrix $P$ and $P^{'}$, we can also convert this problem to serveral $AX = 0$ problem for each point. The matrix $A$ is created using two constraints: $x \times PX = [x]_{\times}PX = 0$ and $x^{'} \times P^{'}X = [x^{'}]_{\times}P^{'}X = 0$. Therefore, $A$ can be represented as:
$$
A =
\begin{bmatrix}
[x]_{\times}P \\
[x^{'}]_{\times}P^{'}
\end{bmatrix}
$$
Then, we can solve $X$ using SVD.
After that, we need to obtain the color of the 3D points. For each 3D point, we first get the color from both of its corresponding 2D points pair. Then, we use the average color as the color of the 3D point.
#### Result
| Colored 3D Points |
|--------------------------|
|
|
## Q4: Reconstruct your own scene
### Results of scene1
For this scene, I use 40 images as input.
| Multi-view images | Output |
|-------------------|--------|
|
|
|
### Results of scene2
For this scene, I use 22 images as input.
| Multi-view images | Output |
|-------------------|--------|
|
|
|
## Q5: Bonus 1 - Fundamental matrix estimation on your own images.
To estimate the fundamental matrix on our own images, we need to first find many correpsonding points $(x, x^{'})$ in two images using feature extraction and matching methods.
Followed the hint, first I use the SIFT feature extraction method to capture the keypoints and descriptors of the two images. Then, I use the Fast Library for Approximate Nearest Neighbors (FLANN) method to match the descriptors. In order to improve the matching quality, I use a threshold to filter the good matches. Finally, I use both 8-point and 7-point algorithm with RANSAC to estimate the fundamental matrix $F$.
For better visualization, I randomly select 25 inliers to plot the corresponding points and epipolar lines.
#### Results of scene1
$$
F =
\begin{bmatrix}
2.85676640e-08 & 8.16807703e-08 & -2.70214498e-04 \\
7.73562020e-07 & -1.18475595e-08 & -6.90451346e-03 \\
-2.29913842e-04 & 5.67618950e-03 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (8-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
$$
F =
\begin{bmatrix}
2.95595094e-08 & 9.40340450e-08 & -2.81635740e-04 \\
7.29470965e-07 & -1.26820694e-08 & -6.51452355e-03 \\
-2.23285567e-04 & 5.32723830e-03 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (7-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
| All matching Points |
|-----------------|
|
|
#### Results of scene2
$$
F =
\begin{bmatrix}
1.11400214e-07 & -2.28128910e-07 & -8.79454586e-04 \\
-1.31336344e-06 & 1.78767507e-07 & 1.40938853e-02 \\
1.54856560e-03 & -1.20843395e-02 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (8-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
$$
F =
\begin{bmatrix}
7.68908632e-08 & 9.34272317e-08 & -8.13304837e-04 \\
-1.07629532e-06 & 1.14161738e-07 & 9.43763507e-03 \\
1.18208751e-03 & -8.42666246e-03 & 1.00000000e+00
\end{bmatrix}
$$
| Viewpoint 1 (Points) | Viewpoint 2 (7-point Algorithm) |
|--------------------------|------------------------------|
|
|
|
| All matching Points |
|-----------------|
|
|
## Q6: Bonus 2 - Stress test the hyperparameters of COLMAP
Below are some questions that I have experimented with COLMAP.
#### Question 1: What happens if we reduce number of input images?
I test it using the first scene. The original reconsturction contains 40 images. If I reduce the number of images to 20, the point cloud become less dense and the accuracy of the extrinsic reconstruction slightly decreases. If I reduce the number of images to 10, the point cloud become more sparse and the accuracy of the extrinsic reconstruction decreases more. The results are shown below.
| 20 images | 10 images |
|-----------|-----------|
|
|
|
#### Question 2: Under what scenario and conditions does the reconstruction pipeline breaks?
I have tried many times, and mainly met two conditions that break the reconstruction pipeline. First, the feature extraction and matching process may fail if the object we want to reconstruct is too smooth or the texture is too similar. The results are shown below.
| Example image | Reconstruction |
|--------------|----------------|
|
|
|
Second, the bundle adjustment process may fail if there is not enough overlap between the images or the images are too far apart. Example result is shown below. From the colmap output, we can see that although there are 8 images successfully triangulated, the 9th image broke the bundle adjustment process, which make COLMAP to reinitialize.
| COLMAP Output |
|---------------|
|
|
## How to run
- main.py:
- python main.py [-q] [0, 1, 2, 3] [-r] [root_path] [-sq] [a, b, None]
- When you choose q=0, it will run all the questions, otherwise, it will run the question you choose. When sq is None, it will run all the sub-questions, otherwise, it will run the sub-question you choose.
- For the root path, it should be structured like this:
- root_path
- q1
- ...
- q3
- ...
- q5
- ...
- annotation
- ...