Assignment 2

Task 1: Camera matrix \(P\) from 2D-3D correspondences

Stanford Bunny

Input image Annotated image
Surface Points Bounding Box

The camera matrix is \[P = \begin{bmatrix} 6.1247e-01 & -2.8077e-01 & 1.0919e-01 & 2.1209e-01 \\ -8.9020e-02 & -6.4324e-01 & 1.9326e-01 & 1.7352e-01 \\ 5.5165e-05 & -1.3559e-04 & -7.0017e-05 & 9.5227e-05 \end{bmatrix} \]


Cuboid

Input image Annotated image Edges

The camera matrix is \[P = \begin{bmatrix} -2.7898e-02 & 3.9829e-02 & 3.0938e-03 & -8.1320e-01 \\ -8.8099e-03 & -9.0682e-03 & 4.7887e-02 & -5.7781e-01 \\ 1.8027e-05 & 1.5059e-05 & 1.0668e-05 & -2.4203e-03 \end{bmatrix} \]


Task 2: Camera calibration \(K\) from annotations

Camera calibration from vanishing points

Input image Annotated parallel lines Vanishing points and principal point

The camera intrinsic matrix is \[K = \begin{bmatrix} 1.1542e+03 & 0 & 5.7507e+02\\ 0 & 1.1542e+03 & 4.3194e+02\\ 0 & 0 & 1 \\ \end{bmatrix} \]


Implementation

  1. For each pair of points \(p_1\) and \(p_2\), we compute the line \(l\). \[ l = p_1 \times p_2 \] For each pair of parallel lines \(l_1\) and \(l_2\), we compute the vanishing point \(v\). \[ v = l_1 \times l_2 \] Since we have 3 pairs of parallel lines, we get 3 pairs of vanishing points.
  2. Since we the camera has zero skew and square pixels, the camera intrinsic matrix \[ K = \begin{bmatrix} f & 0 & c_x \\ 0 & f & c_y \\ 0 & 0 & 1 \end{bmatrix} \implies \omega = K^{-T} K^{-1} = \begin{bmatrix} \omega_0 & 0 & \omega_1 \\ 0 & \omega_0 & \omega_2 \\ \omega_1 & \omega_2 & \omega_3 \end{bmatrix} \] According to \[ v_1^T \omega v_2 = 0 \] Each pair of vanishing points can give us 1 constraint, we need 3 pairs of vanishing points to get 3 constraints.
  3. For each pair of vanishing points \(v_1\) and \(v_2\), we have \[ A_{sub} \omega' = 0 \quad \quad \text{where} \quad \quad A_{sub} = \begin{bmatrix} v_1[0] * v_2[0] + v_1[1] * v_2[1] & v_1[0] * v_2[2] + v_1[2] * v_2[0] & v_1[1] * v_2[2] + v_1[2] * v_2[1] & v_1[2] * v_2[2] \end{bmatrix} \] We have 3 pairs of \(v_1\) and \(v_2\) and concatenate 3 \(A_{sub}\) in to a big matrix \(A\).
  4. Conduct SVD over \(A\) accoridng to \(A = U \Sigma Vt\), extract the last row of \(Vt\) according to \(\omega' = Vt[-1]\).
  5. Conduct Cholesky decomposition over \(\omega\) we get \(K^{-T}\) and we convert \(K^{-T}\) to \(K\).


Camera calibration from metric planes

Input image Annotated Square 1 Annotated Square 2 Annotated Square 3
Angle between planes(degree)
Plane 1 & Plane 2        67.28
Plane 1 & Plane 3        92.20
Plane 2 & Plane 3        94.71

The camera intrinsic matrix is \[K = \begin{bmatrix} 1.0769e+03 & -4.5264e+00 & 5.1157e+02\\ 0 & 1.0763e+03 & 3.9553e+02\\ 0 & 0 & 1 \\ \end{bmatrix} \]


Implementation

  1. For each square plane, we compute the homography \(H\), \[ \begin{bmatrix} (0, 1) & (1, 1)\\ (1, 0) & (0, 0) \end{bmatrix} \overset{H}{\implies} \begin{bmatrix} p_0 & p_1 \\ p_2 & p_3 \end{bmatrix} \] Since we have 3 square planes, we get 3 different H.
  2. Since the camera is unknown, the camera intrinsic matrix \[ K = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \implies \omega = K^{-T} K^{-1} = \begin{bmatrix} \omega_0 & \omega_1 & \omega_3 \\ \omega_1 & \omega_2 & \omega_4 \\ \omega_3 & \omega_4 & \omega_5 \end{bmatrix} \] Denote the first 2 columns of \(H\) as \(h_1\) and \(h_2\), according to \[ h_1^T \omega h_2 = 0 \quad \quad h_1^T \omega h_1 = h_2^T \omega h_2\] Each \(H\) can give us 2 constraints, we need 3 H to get 6 constraints.
  3. For each pair of \(h_1\) and \(h_2\), we have \[ A_{sub} \omega' = 0 \quad \quad \text{where} \quad \quad A_{sub} = \begin{bmatrix} h_1[0] * h_2[0] & h_1[0] * h_2[1] + h_1[1] * h_2[0] & h_1[1] * h_2[1] & h_1[0] * h_2[2] + h_1[2] * h_2[0] & h_1[1] * h_2[2] + h_1[2] * h_2[1] & h_1[2] * h_2[2] \\ h_1[0]^2 - h_2[0]^2 & 2 * (h_1[0] * h_1[1] - h_2[0] * h_2[1]) & h_1[1]^2 - h_2[1]^2 & 2 * (h_1[0] * h_1[2] - h_2[0] * h_2[2]) & 2 * (h_1[1] * h_1[2] - h_2[1] * h_2[2]) & h_1[2]^2 - h_2[2]^2 \end{bmatrix} \] We hahe 3 pairs of \(h_1\) and \(h_2\) and concatenate 3 \(A_{sub}\) in to a big matrix \(A\).
  4. Conduct SVD over \(A\) accoridng to \(A = U \Sigma Vt\), extract the last row of \(Vt\) according to \(\omega' = Vt[-1]\).
  5. Conduct Cholesky decomposition over \(\omega\) we get \(K^{-T}\) and we convert \(K^{-T}\) to \(K\).
  6. The surface normal is computed via cross product of 2 direction vectors \(d = K^{-1} v\) corresponding to 2 vanishing points \(v\).


Camera calibration from rectangles with known sizes

Input image Annotated Rectangle 1 Annotated Rectangle 2 Annotated Rectangle 3
Width : Height 38:22 69:39 40:25
Angle between planes(degree)
Plane 1 & Plane 2        64.85
Plane 1 & Plane 3        58.46
Plane 2 & Plane 3        82.52

The camera intrinsic matrix is \[K = \begin{bmatrix} 2.7437e+03 & 8.0061e+01& 1.7676e+03\\ 0 & 2.5318e+03 & 1.4482e+03\\ 0 & 0 & 1 \\ \end{bmatrix} \]


Implementation

The only difference is how we find homography \(H'\). For each square plane, denote \(w\) as width and \(h\) as height, we compute the homography \(H'\), \[ \begin{bmatrix} (0, h) & (w, h)\\ (w, 0) & (0, 0) \end{bmatrix} \overset{H'}{\implies} \begin{bmatrix} p_0 & p_1 \\ p_2 & p_3 \end{bmatrix} \]


Task 3: Single View Reconstruction

Input image Annotatations for \(K\) Vanishing points and principal point
Annotatations for Reconstruction Reconstruction View 1 Reconstruction View 2

Implementation

  1. Compute \(K\) using parallel lines annotated shown above.
  2. Compute normals \(n\) for each of the 5 planes like q2b.
  3. Use cv2.fillPoly to create mask for each plane so that we can extract 2D coordinate and color for each point \(p\).
  4. Compute ray \(d\) for each point \(p\) on the image according to \(d = K^{-1} p\).
  5. Set the shared point \((519, 245)\) between plane 0, 1, 3, and 4 as reference \(\tilde{p}\), and then set the shared point \((514, 401)\) between plane 0 , 1, and 2 as reference \(\tilde{p}\).
  6. For each plane, compute 3D coordinate of all points on the plane via ray-plane intersection.
    • The plane equation is \[n^T \tilde{p} + a = 0 \quad \quad \text{where} \quad \quad a \ \text{is constant}\]
    • For all points intersecting the plane, the equation is \[n^T (t d) + a = 0 \quad \quad \text{where} \quad \quad t \ \text{is the distance}\]
    • Solve the equations above we get \[p = td = \frac{n^T \tilde{p}}{n^T d} d\]
  7. Visualize the Reconstruction.