HW3: 3D reconstruction

Q1: Camera matrix P from 2D-3D correspondences

(A1) F matrix using 8-point algorithm

[Bench] Viewpoint 1 & Viepoint 2 (Both points and epipolar lines)

[Remote] Viewpoint 1 & Viepoint 2 (Both points and epipolar lines)

(left) Viewpoint 1 & (right) Viewpoint 2. The points and the corresponding epipolar lines have the same color. We can see that the epipolar lines cross the corresponding point on the image of different viewpoints.

Brief Explanation

Step 1: Normalize the Points Normalize points using similarity transforms \(T\) and \(T'\) to ensure zero mean and unit variance:

pts1_normalized, T1 = normalize_points(pts1)
pts2_normalized, T2 = normalize_points(pts2)

Step 2: Construct the Constraint Matrix A Construct matrix \(A\) using normalized points, where each correspondence provides one constraint.
Step 3: Solve for F using SVD Use SVD to solve \(Af = 0\) and get the fundamental matrix in the normalized space:

U, S, Vt = np.linalg.svd(A)
F_normalized = Vt[-1].reshape(3, 3)

Step 4: Enforce Rank-2 Constraint on F Set the smallest singular value to zero to enforce rank-2:

U, S, Vt = np.linalg.svd(F_normalized)
S[2] = 0
F_normalized = U @ np.diag(S) @ Vt

Step 5: Transform F Back to Original Space Transform F back to the original pixel space:

F = T2.T @ F_normalized @ T1
F /= F[-1, -1]

(A2) E matrix using 8-point algorithm

Brief Explanation: Given the fundamental matrix \(F\), the essential matrix can be computed using the formula:

\(E = K'^T F K\) where \(K\) and \(K'\) are the intrinsic camera matrices. This allows us to convert \(F\), which is in pixel coordinates, into \(E\), which represents the geometry in normalized camera coordinates.

Alternatively, the essential matrix can also be calculated directly using the 8-point algorithm:
1. Normalize the Coordinates: Transform the image points to normalized coordinates using the camera intrinsics: \(\hat{x} = K^{-1} x\).
2. Construct and Solve SVD: Use SVD on the constraint matrix from 8 point correspondences to estimate \(E\).
3. Enforce Rank-2 Constraint: Project E to a rank-2 matrix by setting the singular values to \(\text{diag}(d', d', 0)\), where \(d' = (d_1 + d_2) / 2\).
Implementing this method yields the same essential matrix, as demonstrated in the code q1.py.
Estimated E:

Bench:

\(E_1 \begin{bmatrix} -0.41524274 & -8.81748894 & -2.15055808\\ -1.3803559 & 0.79067134 & 69.08265036\\ -3.58890876 &-68.47438701 & 1. \\ \end{bmatrix}\)

Remote:

\(E_2 = \begin{bmatrix} 2.52980651 & -0.67056175 & 7.33094879 \\ -6.62032788 & -3.02768481 &-28.29861829 \\ -14.50071107 & 26.41824808 & 1. \end{bmatrix}\)

(B) 7-point algorithm

[Ball] Viewpoint 1 & Viepoint 2 (Both points and epipolar lines)

[Hydrant] Viewpoint 1 & Viepoint 2 (Both points and epipolar lines)

(Left) Viewpoint 1 and (right) Viewpoint 2. Points and their corresponding epipolar lines are shown in matching colors. Notice how each epipolar line passes through the corresponding point in the image from the other viewpoint.

Brief Explanation: The following method is similar to the 8-point algorithm, but with slight differences to handle 7 correspondences.

Step 1: Normalize the Points: Normalize points using similarity transforms \(T\) and \(T'\) to ensure zero mean and unit variance:
Step 2: Construct the Constraint Matrix A: Construct matrix \(A\) using normalized points, where each correspondence provides one constraint. In this case, a \(7 \times 9\) matrix is constructed since there are 7 correspondences.
Step 3: Solve for F using SVD: Use SVD to solve \(Af = 0\) and obtain the fundamental matrix in the normalized space. Since the rank of the null space is 2, two vectors \(f_1\) and \(f_2\) can be found:

U, S, Vt = np.linalg.svd(A)
F1 = Vt[-1].reshape(3, 3)
F2 = Vt[-2].reshape(3, 3)

The general solution for F is given by:

\(F = \lambda F_1 + (1 - \lambda) F_2\)

Step 4: Enforce Rank-2 Constraint on F: To ensure that F has rank 2, enforce the determinant condition \(\text{det}(F) = 0\). This determinant is represented as a third-order polynomial in \(\lambda\):

\(a_3 \lambda^3 + a_2 \lambda^2 + a_1 \lambda + a_0\)

The coefficients can be found as follows:

def det_polynomial(alpha):
    F = alpha * F1 + (1 - alpha) * F2
    return np.linalg.det(F)

coefficients = np.zeros(4)
coefficients[0] = det_polynomial(0)
coefficients[2] = (det_polynomial(1) - det_polynomial(-1)) / 2
coefficients[3] = (det_polynomial(2) - 2 * det_polynomial(1) - 2 * coefficients[2] + coefficients[0]) / 6
coefficients[1] = det_polynomial(1) - coefficients[0] - coefficients[2] - coefficients[3]

Using np.roots(coefficients), the values for \(\lambda\) are found. Only the real-valued solution is used.

Step 5: Transform F Back to Original Space: Transform F back to the original pixel space using the similarity transforms from Step 1.
Step 6: Choose the Best Solution: Since multiple solutions for F may exist, choose the one that minimizes the epipolar error, \(\sum_i x_i'^T F x_i\):

errors = np.zeros(len(Fs))
for j in range(len(Fs)):
    F = Fs[j]
    error = 0
    for i in range(pts1.shape[0]):
        x1 = np.array([pts1[i][0], pts1[i][1], 1])
        x2 = np.array([pts2[i][0], pts2[i][1], 1])
        error += np.abs(x2.T @ F @ x1)
    
    errors[j] = error

F = Fs[np.argmin(errors)]

Q2: RANSAC with 7-point and 8-point algorithm

In all the results, the points and their corresponding epipolar lines are depicted in matching colors. The four points are randomly sampled from the annotations, which may include some noisy labels. Note that only the valid correspondences without noise are accurately matched by the computed fundamental matrix, as the epipolar lines pass through only the correct annotations.

Ball

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (8pt algorithm)

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (7pt algorithm)

% of inliers vs. # of RANSAC iterations (blue: 8pt, orange: 7pt)

\(F = \begin{bmatrix} -2.09107219e-08 & -3.02470117e-06 & -6.89123821e-03\\ 3.60956323e-06 & -9.30216369e-07 & -1.53286060e-02\\ 9.33512247e-03 & 1.52377635e-02 & 1.00000000e+00\\ \end{bmatrix}\)

Bench

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (8pt algorithm)

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (7pt algorithm)

% of inliers vs. # of RANSAC iterations (blue: 8pt, orange: 7pt)

\(F = \begin{bmatrix} 7.29040371e-07 & 8.32382175e-06 & -3.65908188e-03 \\ -8.52448262e-06 & 4.33718835e-06 & 7.11822616e-03 \\ 2.38509499e-03 & -9.74101756e-03 & 1.00000000e+00 \\ \end{bmatrix}\)

Hydrant

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (8pt algorithm)

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (7pt algorithm)

% of inliers vs. # of RANSAC iterations (blue: 8pt, orange: 7pt)

\(F = \begin{bmatrix} 6.35918894e-07 & 7.79037556e-06 & -6.51790205e-03\\ -7.26010653e-06 & 1.71216453e-06 & 7.68984491e-04\\ 6.34554497e-03 & -3.55478347e-03 & 1.00000000e+00\\ \end{bmatrix}\)

Remote

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (8pt algorithm)

Viewpoint 1 & Viepoint 2 (Both points and epipolar lines) (7pt algorithm)

% of inliers vs. # of RANSAC iterations (blue: 8pt, orange: 7pt)

\(F = \begin{bmatrix} 1.71791593e-06 & -8.04922051e-06 & 3.07539900e-03 \\ 7.84457044e-06 & 3.34304187e-06 & -4.19533878e-03 \\ -4.71734082e-03 & 1.33775959e-03 & 1.00000000e+00 \\ \end{bmatrix}\)

Explanation
Explanation of RANSAC Implementation in : The RANSAC algorithm is used to robustly estimate the fundamental matrix in the presence of noisy point correspondences, based on the algorithm implemented in q1. The algorithm repeats n_iterations times to find the best estimate of F.

Draw Minimum Set of Correspondences: Randomly sample the minimum required points—8 points for the 8-point algorithm and 7 points for the 7-point algorithm.
Fit Fundamental Matrix: Estimate the fundamental matrix (F) using the sampled points based on the method described in q1.
Count Inliers: Calculate the number of correspondences whose distances to the fitted epipolar lines are below the threshold thr. These are considered inliers.
Refine Estimate if Enough Inliers: If the ratio of inliers to total points exceeds d_min, re-estimate F using all inliers to obtain a more accurate model:

if (len(inliers) / num_points) > d_min:
    inlier_pts1 = pts1[inliers]
    inlier_pts2 = pts2[inliers]
    F_candidate = estimate_fundamental_matrix(inlier_pts1, inlier_pts2)

    # Recompute the inliers using the updated fundamental matrix
    errors = np.abs(np.sum(pts2_h @ F_candidate * pts1_h, axis=1))
    inliers = np.where(errors < thr)[0]

The hyperparameter was manually set to either \(thr = 0.001\) or \(thr = 0.01\), as it is sensitive to the image size (when not normalized) and the proportion of inliers. Additionally, d_min was set to 0.3 based on the provided hint.

Select Best Fundamental Matrix: Finally, choose the fundamental matrix with the highest number of inliers as the output.

Q3: Triangulation

Due to the computation burden, I randomly sampled 1/5 of the total points for visualization.

Explanation: For each pair of correspondences and camera matrices, the 3D point can be recovered via triangulation, which is solved using SVD.

The triangulation problem is based on the relationship:

\(x \times P X = 0, \quad x' \times P' X = 0.\)

This leads to the formulation of the constraint matrix \([x]_{\times} P X = 0\), which can be implemented in code as follows:

A = np.zeros((num_points, 4, 4))
A[:, 0] = pts1[:, 1, np.newaxis] * P1[2] - P1[1]
A[:, 1] = P1[0] - pts1[:, 0, np.newaxis] * P1[2]
A[:, 2] = pts2[:, 1, np.newaxis] * P2[2] - P2[1]
A[:, 3] = P2[0] - pts2[:, 0, np.newaxis] * P2[2]

The 3D point is then recovered by solving this system \(AX = 0\) using SVD:

_, _, Vt = np.linalg.svd(A)
X = Vt[:, -1]

Q4: Reconstruct your own scene!

1. Indoor Scene.

I used COLMAP for Structure from Motion (SfM). I recorded a short video of an indoor scene consisting of a table and chairs. Due to the variety of textures present, the scene was reconstructed successfully

Input images (Video)	Reconstructed Points

2. Low-textured Cup.

Following that, I questioned whether an object with minimal texture could also be reconstructed. To test this, I recorded a short video of a black cup with low texture. The textureless black regions could not be reconstructed, while the red text and the boundaries were successfully reconstructed.

Input images (Video)	Reconstructed Points

Q5: Bonus 1 - Fundamental matrix estimation on your own images.

Explanation:

For feature matching, I used cv2 features. Specifically, the SIFT algorithm was used to extract features, and cv2.FlannBasedMatcher was applied for feature matching. From all the matches, only the best matches were selected. The feature matching results are shown below.


# Applying SIFT detector
sift = cv2.SIFT_create()
kp_1, des_1 = sift.detectAndCompute(gray_1, None)
kp_2, des_2 = sift.detectAndCompute(gray_2, None)

FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=100)  # Increase the number of checks for more accurate matching
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des_1, des_2, k=2)

# Apply a stricter ratio test to find good matches
good_matches = []
for m, n in matches:
    if m.distance < 0.68 * n.distance:
        good_matches.append(m)

Subsequently, the RANSAC variant of the 8-point algorithm was used to estimate the fundamental matrix. The results are shown below.

1. Indoor Scene with a table and chairs.

Input images 1	Input images 2	Feature Matching

Results

2. Grand Mosque in Abu Dhabi

The picture taken at Abu Dhabi where I visited for the IROS last week.

Input images 1	Input images 2	Feature Matching

Results