Q1: 8-point and 7-point algorithm (40 points)

(A1) F matrix using 8-point algorithm (15 points)

Firstlly, normalize the input point coordinates.

For each corresponding point, we get one row of the matrix A: \begin{bmatrix} x_1 x_2 & x_1 y_2 & x_1 & y_1 x_2 & y_1 y_2 & y_1 & x_2 & y_2 & 1 \end{bmatrix}

Next, use SVD on the matrix A. F is the last column of V, corresponding to the smallest singular value of A. F must have rank 2. Therefore, we need to enforce the singularity condition by modifying the estimated F to make it rank 2.

Finally, unscale and get the output F.

pic1

pic1

(A2) E matrix using 8-point algorithm (5 points)

We get E from $E = K_2^{T} F K_1 $

E=[[ -0.41524274 -8.81748894 -2.15055808] [ -1.3803559 0.79067134 69.08265036] [ -3.58890876 -68.47438701 1. ]]

(B) 7-point algorithm (20 points)

In 7-poiny algorithm, use the similar method in 8-point algorithm to get matrix A. Use SVD on the matrix A. Then, pick the last two colum vector of vT.T (the two null space solution f1 and f2).

Use the singularity constraint to solve for the cubic polynomial equation of F = af1 + (1-a)f2 that leads to det(F) = 0, where we can get three real solutions of the matrix F. Then, use np.polynomial.polynomial.polyroots to solve for the roots. We also use similar method to sigularize and unscale F, get the output of F array. Fianlly, calcualte the sum of squared distance between the corresponding points and the estimated epipolar lines to get the most appropriate F.

pic1

pic1

Q2: RANSAC with 7-point and 8-point algorithm (20 points)

In each iter, we firstly choose 8 points randomly and then use 8-point or 7-point algorithm to get the matrix F. To determine if a point pair is an inlier or outlier, we can use the epipolar error, which measures how well a point ( x' ) in the second image aligns with the epipolar line of the corresponding point ( x ) in the first image. We use the same method in 7-point to compute the error, that is, the sum of squared distance between the corresponding points and the estimated epipolar lines.

Set a threshold to determine whether a point pair is an inlier or outlier based on the error value. (the output image below has the tolerance value 1.5). Each iteration aims to find a matrix F with more inliers than the previous best. If not, the F matrix would not be updated.

After 10000 iterations (and the tolerance is 1.5), (in the picture below, the points are randomly selected from the inliers), we get F= [[ 1.33115615e-07 -2.19777589e-06 -1.79075692e-03] [ 3.55640870e-06 3.30194540e-07 -1.59158838e-02] [ 1.62019125e-03 1.54097757e-02 1.00000000e+00]]

pic1

pic1

Q3: Triangulation (20 points)

Given C1, pts1, C2, and pts2, we can get: A = \begin{bmatrix} y2 * C2[2, :] - C2[1, :]\\ C2[0, :] - x2 * C2[2, :]\\ y1 * C1[2, :] - C1[1, :]\\ C1[0, :] - x1 * C1[2, :] \end{bmatrix} (the point pairs obtained from the 3D point through the camera matrix C1, C2)

Then, we can get the corresponding 3D points through SVD(A).(the VT of SVD(A))

pic1

Q4: Reconstruct your own scene! (20 points)

pic1

gif1

pic1

gif1

Q5: Bonus 1 - Fundamental matrix estimation on your own images. (10 points)

Apply SIFT detector and detect descriptors. Then use BFMatcher to compute possible matches based on Euclidean distance. Use matched indices to get the corresponding points in image1 and image2.

Finally, use RANSAC and eight-point algorithm to get matrix F and inliers. The epipolar lines from the estimated F are as follows.

pic1

pic1

Q6: Bonus 2 - Stress test the hyperparameters of COLMAP (10 points)

(1) What happens if we reduce number of input images?

The following gifs show the reconstruction of the toolbox based on 2, 4, and 10 input images. It can be seen that when the number of input images is very small, the toolbox only shows the camera position, without 3D points; when the number of input images increases, it can reconstruct very sparse 3D points; as the number of images increases, the reconstructed objects become clearer and have more details.

gif1

gif1

gif1

(2) What happens if we play with various edge threshold parameters?

If we reduce the edge_threshold parameter in feature extraction stage, we will gain less features.

For examples, given ten-image input as follows. When edge_threshold=1.0, features in each image are all 0, the feature matching time is 0.002s; when edge_threshold=2.0, features in each image are around 7522-9773, the feature matching time is 8.934s; when edge_threshold=4.0, features in each image are around 9508-11878, the feature matching time is 10.224s; when edge_threshold=8.0, features in each image are around 8391-13053, the feature matching time is 12.879s.

We can see that when the edge_threshold parameter is too small, it can hardly get too many features; when it increases, it can get more features until it reaches a relatively stable value range.

pic1

When edge_threshold=1.0, the toolbox can not reconstruct the 3D points; when edge_threshold=2.0, the output is the first gif below; when edge_threshold=8.0, the output is the second gif below. It can be seen that the number of points is increasing.

gif1

gif1