Task 1: Camera matrix \(P\) from 2D-3D correspondences
Stanford Bunny
Input image |
Annotated image |
 |
 |
Surface Points |
Bounding Box |
 |
 |
The camera matrix is
\[P = \begin{bmatrix}
6.1247e-01 & -2.8077e-01 & 1.0919e-01 & 2.1209e-01 \\
-8.9020e-02 & -6.4324e-01 & 1.9326e-01 & 1.7352e-01 \\
5.5165e-05 & -1.3559e-04 & -7.0017e-05 & 9.5227e-05
\end{bmatrix}
\]
Cuboid
Input image |
Annotated image |
Edges |
 |
 |
 |
The camera matrix is
\[P = \begin{bmatrix}
-2.7898e-02 & 3.9829e-02 & 3.0938e-03 & -8.1320e-01 \\
-8.8099e-03 & -9.0682e-03 & 4.7887e-02 & -5.7781e-01 \\
1.8027e-05 & 1.5059e-05 & 1.0668e-05 & -2.4203e-03
\end{bmatrix}
\]
Task 2: Camera calibration \(K\) from annotations
Camera calibration from vanishing points
Input image |
Annotated parallel lines |
Vanishing points and principal point |
 |
 |
 |
The camera intrinsic matrix is
\[K = \begin{bmatrix}
1.1542e+03 & 0 & 5.7507e+02\\
0 & 1.1542e+03 & 4.3194e+02\\
0 & 0 & 1 \\
\end{bmatrix}
\]
Implementation
-
For each pair of points \(p_1\) and \(p_2\), we compute the line \(l\).
\[
l = p_1 \times p_2
\]
For each pair of parallel lines \(l_1\) and \(l_2\), we compute the vanishing point \(v\).
\[
v = l_1 \times l_2
\]
Since we have 3 pairs of parallel lines, we get 3 pairs of vanishing points.
-
Since we the camera has zero skew and square pixels, the camera intrinsic matrix
\[
K = \begin{bmatrix}
f & 0 & c_x \\
0 & f & c_y \\
0 & 0 & 1
\end{bmatrix}
\implies
\omega = K^{-T} K^{-1} = \begin{bmatrix}
\omega_0 & 0 & \omega_1 \\
0 & \omega_0 & \omega_2 \\
\omega_1 & \omega_2 & \omega_3
\end{bmatrix}
\]
According to \[ v_1^T \omega v_2 = 0 \]
Each pair of vanishing points can give us 1 constraint, we need 3 pairs of vanishing points to get 3 constraints.
-
For each pair of vanishing points \(v_1\) and \(v_2\), we have
\[
A_{sub} \omega' = 0
\quad \quad \text{where} \quad \quad
A_{sub} = \begin{bmatrix}
v_1[0] * v_2[0] + v_1[1] * v_2[1] & v_1[0] * v_2[2] + v_1[2] * v_2[0] & v_1[1] * v_2[2] + v_1[2] * v_2[1] & v_1[2] * v_2[2]
\end{bmatrix}
\]
We have 3 pairs of \(v_1\) and \(v_2\) and concatenate 3 \(A_{sub}\) in to a big matrix \(A\).
-
Conduct SVD over \(A\) accoridng to \(A = U \Sigma Vt\), extract the last row of \(Vt\) according to \(\omega' = Vt[-1]\).
-
Conduct Cholesky decomposition over \(\omega\) we get \(K^{-T}\) and we convert \(K^{-T}\) to \(K\).
Camera calibration from metric planes
Input image |
Annotated Square 1 |
Annotated Square 2 |
Annotated Square 3 |
 |
 |
 |
 |
|
Angle between planes(degree) |
Plane 1 & Plane 2 |
67.28 |
Plane 1 & Plane 3 |
92.20 |
Plane 2 & Plane 3 |
94.71 |
The camera intrinsic matrix is
\[K = \begin{bmatrix}
1.0769e+03 & -4.5264e+00 & 5.1157e+02\\
0 & 1.0763e+03 & 3.9553e+02\\
0 & 0 & 1 \\
\end{bmatrix}
\]
Implementation
-
For each square plane, we compute the homography \(H\),
\[
\begin{bmatrix}
(0, 1) & (1, 1)\\
(1, 0) & (0, 0)
\end{bmatrix} \overset{H}{\implies} \begin{bmatrix}
p_0 & p_1 \\
p_2 & p_3
\end{bmatrix}
\]
Since we have 3 square planes, we get 3 different H.
-
Since the camera is unknown, the camera intrinsic matrix
\[
K = \begin{bmatrix}
f_x & s & c_x \\
0 & f_y & c_y \\
0 & 0 & 1
\end{bmatrix}
\implies
\omega = K^{-T} K^{-1} = \begin{bmatrix}
\omega_0 & \omega_1 & \omega_3 \\
\omega_1 & \omega_2 & \omega_4 \\
\omega_3 & \omega_4 & \omega_5
\end{bmatrix}
\]
Denote the first 2 columns of \(H\) as \(h_1\) and \(h_2\), according to \[ h_1^T \omega h_2 = 0 \quad \quad h_1^T \omega h_1 = h_2^T \omega h_2\]
Each \(H\) can give us 2 constraints, we need 3 H to get 6 constraints.
-
For each pair of \(h_1\) and \(h_2\), we have
\[
A_{sub} \omega' = 0
\quad \quad \text{where} \quad \quad
A_{sub} = \begin{bmatrix}
h_1[0] * h_2[0] & h_1[0] * h_2[1] + h_1[1] * h_2[0] & h_1[1] * h_2[1] & h_1[0] * h_2[2] + h_1[2] * h_2[0] & h_1[1] * h_2[2] + h_1[2] * h_2[1] & h_1[2] * h_2[2] \\
h_1[0]^2 - h_2[0]^2 & 2 * (h_1[0] * h_1[1] - h_2[0] * h_2[1]) & h_1[1]^2 - h_2[1]^2 & 2 * (h_1[0] * h_1[2] - h_2[0] * h_2[2]) & 2 * (h_1[1] * h_1[2] - h_2[1] * h_2[2]) & h_1[2]^2 - h_2[2]^2
\end{bmatrix}
\]
We hahe 3 pairs of \(h_1\) and \(h_2\) and concatenate 3 \(A_{sub}\) in to a big matrix \(A\).
-
Conduct SVD over \(A\) accoridng to \(A = U \Sigma Vt\), extract the last row of \(Vt\) according to \(\omega' = Vt[-1]\).
-
Conduct Cholesky decomposition over \(\omega\) we get \(K^{-T}\) and we convert \(K^{-T}\) to \(K\).
-
The surface normal is computed via cross product of 2 direction vectors \(d = K^{-1} v\) corresponding to 2 vanishing points \(v\).
Camera calibration from rectangles with known sizes
Input image |
Annotated Rectangle 1 |
Annotated Rectangle 2 |
Annotated Rectangle 3 |
Width : Height |
38:22 |
69:39 |
40:25 |
 |
 |
 |
 |
|
Angle between planes(degree) |
Plane 1 & Plane 2 |
64.85 |
Plane 1 & Plane 3 |
58.46 |
Plane 2 & Plane 3 |
82.52 |
The camera intrinsic matrix is
\[K = \begin{bmatrix}
2.7437e+03 & 8.0061e+01& 1.7676e+03\\
0 & 2.5318e+03 & 1.4482e+03\\
0 & 0 & 1 \\
\end{bmatrix}
\]
Implementation
The only difference is how we find homography \(H'\). For each square plane, denote \(w\) as width and \(h\) as height, we compute the homography \(H'\),
\[
\begin{bmatrix}
(0, h) & (w, h)\\
(w, 0) & (0, 0)
\end{bmatrix} \overset{H'}{\implies} \begin{bmatrix}
p_0 & p_1 \\
p_2 & p_3
\end{bmatrix}
\]