Normalization helps improve the numerical stability of the algorithm. I compute a normalization matrix \( T \) for each set of points so that the points are centered and have an average distance of \(\sqrt{2}\) from the origin. The normalization matrix \( T \) is defined as follows:
\[ T = \begin{bmatrix} \frac{\sqrt{2}}{\text{avg_distance}} & 0 & -\frac{\sqrt{2}}{\text{avg_distance}} \cdot \text{mean_x} \\ 0 & \frac{\sqrt{2}}{\text{avg_distance}} & -\frac{\sqrt{2}}{\text{avg_distance}} \cdot \text{mean_y} \\ 0 & 0 & 1 \end{bmatrix} \]
I then create matrix \( A \) using the normalized point correspondences. Each row of \( A \) is constructed using the elements of the normalized points from the two images as follows:
\[ A = \begin{bmatrix} x_2' \cdot x_1' & x_2' \cdot y_1' & x_2' & y_2' \cdot x_1' & y_2' \cdot y_1' & y_2' & x_1' & y_1' & 1 \end{bmatrix} \]
Using Singular Value Decomposition (SVD), I decompose \( A \) and extract the last row of \( V^T \) (from SVD of \( A \)), which corresponds to the solution for \( F \). To ensure that \( F \) is a rank-2 matrix, I impose the rank-2 constraint by setting the smallest singular value to zero:
\[ F_{\text{normalized}} = U \cdot \text{diag}(s_1, s_2, 0) \cdot V^T \]
Finally, I denormalize \( F \) by applying the normalization matrices \( T_1 \) and \( T_2 \):
\[ F = T_2^T \cdot F_{\text{normalized}} \cdot T_1 \]
Once I have \( F \), I can compute the Essential matrix \( E \) using the intrinsic camera matrices \( K_1 \) and \( K_2 \):
\[ E = K_2^T \cdot F \cdot K_1 \]
Similarly, I enforce the rank-2 constraint on \( E \) by setting the smallest singular value to zero.
To visualize the result, I draw the epipolar lines on the second image based on points from the first image. The epipolar line equation in the second image for a point \( x \) in the first image is:
\[ l = F \cdot x \]
The seven-point algorithm is used to estimate the fundamental matrix \( F \) between two views, given seven point correspondences. Since this is an underdetermined system, the algorithm generates multiple solutions for \( F \) by imposing the rank-2 constraint and solving a cubic polynomial.
To improve the stability of the algorithm, I normalize the points. The normalization matrix \( T \) centers the points and scales them so that they have an average distance of \( \sqrt{2} \) from the origin:
\[ T = \begin{bmatrix} \frac{\sqrt{2}}{\text{avg_distance}} & 0 & -\frac{\sqrt{2}}{\text{avg_distance}} \cdot \text{mean_x} \\ 0 & \frac{\sqrt{2}}{\text{avg_distance}} & -\frac{\sqrt{2}}{\text{avg_distance}} \cdot \text{mean_y} \\ 0 & 0 & 1 \end{bmatrix} \]
Using the normalized points, I construct matrix \( A \) for the seven-point algorithm. Each row of \( A \) is formed using the coordinates of corresponding points in the two images:
\[ A = \begin{bmatrix} x_2 \cdot x_1 & x_2 \cdot y_1 & x_2 & y_2 \cdot x_1 & y_2 \cdot y_1 & y_2 & x_1 & y_1 & 1 \end{bmatrix} \]
I perform SVD on \( A \), which gives two candidate fundamental matrices, \( F_1 \) and \( F_2 \). These matrices represent possible solutions to the seven-point system.
I combine \( F_1 \) and \( F_2 \) using a scalar \( \alpha \) and compute the determinant of the combination:
\[ \det(\alpha F_1 + (1 - \alpha) F_2) = 0 \]
This results in a cubic polynomial, and I find the real roots of this polynomial to generate potential solutions for \( F \).
For each real root \( \alpha \), I construct a candidate \( F \) matrix, apply the rank-2 constraint by setting the smallest singular value to zero, and denormalize the result by applying the inverse normalization matrices \( T_1 \) and \( T_2 \):
\[ F = T_2^T \cdot F_{\text{normalized}} \cdot T_1 \]
I calculate the epipolar error for each candidate \( F \) by measuring the distance of corresponding points from the epipolar lines. The matrix with the lowest average error is selected as the best estimate for \( F \).
Finally, I visualize the epipolar lines on the second image. For each point in the first image, the corresponding epipolar line in the second image is computed using the fundamental matrix \( F \):
\[ l = F \cdot x \]
The lines are drawn on the image to help verify the correctness of \( F \) visually.
The output images are saved under the directory output/q1/q1_b
with the epipolar points and
lines visualized for each dataset.
In this implementation, RANSAC algorithm is used to estimate the fundamental matrix \( F \) for each image set using both the 7-point and 8-point algorithms. RANSAC helps deal with noisy correspondences by iteratively selecting a subset of points, estimating \( F \), and determining inliers based on a defined threshold.
The parameters for RANSAC, including the threshold and the maximum number of iterations
(max_iterations
), are set. The threshold controls the maximum allowable distance for a
point to be considered an inlier.
In each iteration, a random subset of correspondences is chosen (7 or 8 points based on the algorithm used). The chosen points are normalized to improve numerical stability. The fundamental matrix \( F \) is estimated using either the 7-point or 8-point method:
- For the 8-point algorithm, a single matrix \( F \) is computed.
- For the 7-point algorithm, multiple candidate matrices \( F \) are generated by
solving for the real roots of a cubic polynomial.
For each candidate \( F \), I calculate the epipolar distance for all points. The epipolar distance \( d \) between a point and its corresponding epipolar line is calculated as follows:
\[ d = \frac{|p_2^T F p_1|}{\sqrt{(F p_1)_1^2 + (F p_1)_2^2 + (F^T p_2)_1^2 + (F^T p_2)_2^2}} \]
Points with a distance \( d \) below the threshold are considered inliers.
If the number of inliers for the current \( F \) exceeds the previous maximum, this \( F \) is stored as the best estimate, along with the inlier set. The inlier percentage for each iteration is also logged to track the effectiveness of the RANSAC iterations.
A correspondence is considered an inlier if the epipolar distance \( d \) is less than the specified threshold. This distance indicates how closely a point aligns with its corresponding epipolar line in the other image, providing a measure of geometric consistency.
The best fundamental matrix \( F \) and the corresponding inliers are plotted. Additionally, a graph showing the percentage of inliers vs. the number of RANSAC iterations is generated to illustrate the convergence behavior of the algorithm for both the 7-point and 8-point methods.
Bench
Hydrant
Ball
Remote
Remote
In this task, I aim to compute the 3D coordinates of points from two sets of 2D correspondences by using two camera projection matrices. The triangulation algorithm allows us to reconstruct a 3D scene from these two 2D views.
For each pair of 2D correspondences \((x_1, y_1)\) from the first image and \((x_2, y_2)\) from the second image, I construct a matrix \( A \) that incorporates information from both camera matrices (\(P_1\) and \(P_2\)). Each correspondence generates two equations per camera matrix, resulting in a \(4 \times 4\) matrix \( A \):
$$ A = \begin{bmatrix} x_1 P_{13} - P_{11} \\ y_1 P_{13} - P_{12} \\ x_2 P_{23} - P_{21} \\ y_2 P_{23} - P_{22} \end{bmatrix} $$
where \( P_{ij} \) represents elements from the camera matrices. This matrix \( A \) is essential for the triangulation process.
A visualization of the correspondences between the two images is created by drawing lines between matching points on a combined image. This helps verify that the points in both images correspond correctly.
The reconstructed 3D points are visualized as a colored point cloud, providing a spatial representation of the scene captured from the multi views.
In this task, I estimate the fundamental matrix \( F \) for a pair of images by detecting keypoints, matching them, and applying RANSAC to filter out outliers. I then visualize the epipolar lines using the estimated matrix \( F \).
I utilize the SIFT (Scale-Invariant Feature Transform) feature detector to find distinctive keypoints in each image and compute descriptors for these points. This allows us to uniquely identify and match points across the two images.
sift_detector = cv2.SIFT_create()
keypoints_photo1, descriptors_photo1 = sift_detector.detectAndCompute(photo1, None)
keypoints_photo2, descriptors_photo2 = sift_detector.detectAndCompute(photo2, None)
I use the FLANN-based matcher with KNN (k-nearest neighbors) to find potential matches between the two sets of descriptors. I apply a ratio test to filter out poor matches, keeping only those that pass the threshold.
The ratio test is defined as:
\[ \text{valid_matches} = \{ m | m.\text{distance} < \frac{n.\text{distance}}{1.5} \} \]
Using the matched points, I estimate the fundamental matrix \( F \) using the RANSAC (Random Sample Consensus) algorithm. RANSAC helps us eliminate outlier matches by iteratively fitting the model and maximizing the number of inliers:
\[ F, \text{ inliers } = \text{cv2.findFundamentalMat(pts_photo1, pts_photo2, cv2.FM_RANSAC)} \]
With the estimated fundamental matrix \( F \), I calculate the epipolar lines for a subset of inlier points. For points in one image, the epipolar line in the other image is given by:
\[ l' = F \cdot p \]
where \( p \) is a point in the first image and \( l' \) is the corresponding epipolar line in the second image.
Shown in below:/p>
To test this, I used only 1/6 of the original dataset (origin includes 76 images), significantly reducing the number of input images.