In this assignment, you will implement methods to estimate the fundamental matrix from corresponding points in two images. Using the fundamental matrix and calibrated intrinsics (provided), you will compute the essential matrix. Then, you will implement RANSAC to improve your algorithm and compute a 3D metric reconstruction from 2D correspondences using triangulation.
\[ F = \begin{bmatrix} -1.19265833 \times 10^{-7} & -2.53255522 \times 10^{-6} & 2.07584109 \times 10^{-4} \\ -3.96465204 \times 10^{-7} & 2.27096267 \times 10^{-7} & 2.48867750 \times 10^{-2} \\ -1.06890748 \times 10^{-3} & -2.30052117 \times 10^{-2} & 1.00000000 \end{bmatrix} \]
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
\[ F = \begin{bmatrix} 7.19006698 \times 10^{-7} & -1.90583125 \times 10^{-7} & 2.36215166 \times 10^{-3} \\ -1.88159055 \times 10^{-6} & -8.60510729 \times 10^{-7} & -8.45685523 \times 10^{-3} \\ -3.98058321 \times 10^{-3} & 9.46500248 \times 10^{-3} & 1.00000000 \end{bmatrix} \]
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
Brief explanation of your implementation:
Start with at least 8 pairs of corresponding points \( \mathbf{x}_1^i = (x_1^i, y_1^i, 1) \) from the first image and \( \mathbf{x}_2^i = (x_2^i, y_2^i, 1) \) from the second image, which must satisfy the epipolar constraint \( \mathbf{x}_2^T F \mathbf{x}_1 = 0 \). Expanding this results in a linear equation for each point pair: \( x_2^i x_1^i f_{11} + x_2^i y_1^i f_{12} + x_2^i f_{13} + y_2^i x_1^i f_{21} + y_2^i y_1^i f_{22} + y_2^i f_{23} + x_1^i f_{31} + y_1^i f_{32} + f_{33} = 0 \), which can be written as \( A \mathbf{f} = 0 \), where \( A \) is a \( N \times 9 \) matrix and \( \mathbf{f} \) is the vectorized form of \( F \). Solve this system using SVD, and the solution is the right singular vector corresponding to the smallest singular value. Finally, impose the rank-2 constraint on \( F \) by setting the smallest singular value to zero, giving the final fundamental matrix \( F \).
Submission: Provide your estimated essential matrix \( E \).
E for bench: \[ \begin{bmatrix} -0.18528359 & -3.93441194 & -0.95959081 \\ -0.61592238 & 0.35280189 & 30.82505761 \\ -1.60139078 & -30.55364717 & 0.44620549 \end{bmatrix} \]
E for remote: \[ \begin{bmatrix} 1.01667679 & -0.26948489 & 2.9461563 \\ -2.66057249 & -1.21676375 & -11.37262789 \\ -5.82753537 & 10.6169466 & 0.40187929 \end{bmatrix} \]
Brief explanation of your implementation.
Given the fundamental matrix \( F \) and the intrinsic camera matrices \( K_1 \) and \( K_2 \), the essential matrix \( E \) is computed using the relation \( E = K_2^T F K_1 \)
\[ F = \begin{bmatrix} -3.35078059 \times 10^{-9} & -2.91545005 \times 10^{-6} & -6.74891733 \times 10^{-3} \\ 3.51682683 \times 10^{-6} & -8.84237457 \times 10^{-7} & -1.50044163 \times 10^{-2} \\ 9.09398538 \times 10^{-3} & 1.48748649 \times 10^{-2} & 1.00000000 \end{bmatrix} \]
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
\[ F = \begin{bmatrix} 1.06836663 \times 10^{-7} & -3.30854241 \times 10^{-6} & -1.37397509 \times 10^{-3} \\ 4.79268901 \times 10^{-6} & 1.63305018 \times 10^{-7} & -1.67456794 \times 10^{-2} \\ 1.12974017 \times 10^{-3} & 1.64136613 \times 10^{-2} & 1.00000000 \end{bmatrix} \]
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
Brief explanation of your implementation:
The 7-point algorithm estimates the fundamental matrix \( F \) using exactly 7 pairs of corresponding points \( \mathbf{x}_1^i \) and \( \mathbf{x}_2^i \) from two images. The algorithm works by solving the epipolar constraint \( \mathbf{x}_2^T F \mathbf{x}_1 = 0 \) for each pair, leading to the linear system \( A \mathbf{f} = 0 \), where \( A \) is a \( 7 \times 9 \) matrix and \( \mathbf{f} \) is the vectorized form of \( F \). Since the system is underdetermined, the solution involves finding a linear combination of two fundamental matrices \( F_1 \) and \( F_2 \) such that \( F = \alpha F_1 + (1 - \alpha) F_2 \). This results in a cubic polynomial in \( \alpha \), which has up to 3 real solutions. For each solution, I get a candidate fundamental matrix \( F \), I calculate the reprojection errors and set the F with the least error to be the answer.
Brief explanation of your RANSAC implementation:
Randomly select a minimal subset of 7 or 8 point correspondences. Using the selected subset, compute the fundamental matrix \( F \). Compute the number of inliers by checking how well each correspondence satisfies the Sampson distance. A point pair is considered an inlier if the Sampson distance is below the threshold 1. After all iterations, select the fundamental matrix \( F \) with the largest number of inliers.
8-point algorithm
\[ F = \begin{bmatrix} -1.10021407 \times 10^{-7} & -2.44234265 \times 10^{-6} & 1.69374514 \times 10^{-4} \\ -4.88960046 \times 10^{-7} & 2.70052239 \times 10^{-7} & 2.47787290 \times 10^{-2} \\ -1.02184587 \times 10^{-3} & -2.29491003 \times 10^{-2} & 1.00000000 \end{bmatrix} \]
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
7-point algorithm
\[ F = \begin{bmatrix} -8.02091319 \times 10^{-8} & -1.96766042 \times 10^{-6} & 5.60523818 \times 10^{-5} \\ -8.69043798 \times 10^{-7} & 4.77473931 \times 10^{-7} & 2.42694279 \times 10^{-2} \\ -9.34278636 \times 10^{-4} & -2.25996403 \times 10^{-2} & 1.00000000 \end{bmatrix} \]
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
Brief explanation of your implementation.
Given 2D correspondences and camera matrices, triangulation recovers the 3D point \( \mathbf{X} \) that satisfies \( \mathbf{x}_1 \approx P_1 \mathbf{X} \) and \( \mathbf{x}_2 \approx P_2 \mathbf{X} \). This gives a linear system \( A \mathbf{X} = 0 \), where \( A \) is built from the projection equations. Solving this via Singular Value Decomposition (SVD) gives as the right singular vector corresponding to the smallest singular value. To get the 3D point in non-homogeneous coordinates, divide \( \mathbf{X} = (X, Y, Z, W) \) by \( W \), yielding \( \mathbf{X}_{\text{3D}} = \left( \frac{X}{W}, \frac{Y}{W}, \frac{Z}{W} \right) \).
Submit multi-view input images and a gif of the reconstructed 3D structure and camera locations using COLMAP.
Example Multi-view images | Output |
---|---|
Show epipolar lines and briefly explain your implementation.
First reads two images, converts them to grayscale, and applies the SIFT detector to extract keypoints and descriptors. It uses BFMatcher with knnMatch to find descriptor matches, and retains good matches via ratio testing (ensuring the best match distance is less than 0.75 of the second-best match). The matched keypoints \( \text{pts1} \) and \( \text{pts2} \) are passed to the function \texttt{ransacF\_8}, which estimates the fundamental matrix \( F \) using the 8-point algorithm with RANSAC to remove outliers.
Viewpoint 1 (Points) | Viewpoint 2 (Epipolar Lines) |
---|---|
![]() |
![]() |
![]() |
![]() |