Geometry-based Methods for Vision

Assignment 2

Aviral Agrawal (avirala)

Q1 (a). Camera matrix P from 2D-3D correspondences : Stanford Bunny

The projection matrix computted from the 2D to 3D correspondences is:
\[ P_{normalized} = \begin{bmatrix} 6.43360168 \times 10^{3} & -2.94894365 \times 10^{3} & 1.14605707 \times 10^{3} & 2.22742084 \times 10^{3} \\ -9.34882718 \times 10^{2} & -6.75530536 \times 10^{3} & 2.02900218 \times 10^{3} & 1.82289160 \times 10^{3} \\ 5.79221703 \times 10^{-1} & -1.42341186 \times 10^{0} & -7.35313344 \times 10^{-1} & 1.00000000 \times 10^{0} \end{bmatrix} \]
Table 1. Q1 (a) results on provided image
Input Image Annotated points on input image Points projected on input image Bounding box projected on input image
Cathedral Cathedral Cathedral Cathedral


Q1 (b). Camera matrix P from 2D-3D correspondences : Cuboid

The projection matrix computted from the 2D to 3D correspondences is:
\[ P_{normalized} = \begin{bmatrix} 8.67876810 \times 10^{1} & -9.92854783 \times 10^{0} & -2.17699611 \times 10^{2} & 5.53873518 \times 10^{1} \\ 7.28494609 \times 10^{1} & -1.24764799 \times 10^{2} & 3.60569451 \times 10^{1} & 3.61126278 \times 10^{2} \\ -9.12720191 \times 10^{-2} & -6.17049797 \times 10^{-2} & -9.60038073 \times 10^{-2} & 1.00000000 \times 10^{0} \end{bmatrix} \]
Table 2. Q1 (b) results on collected image
Input Image Annotated points on input image Bounding box projected on input image
Cathedral Cathedral Cathedral




Q2 (a). Camera calibration from vanishing points

In this question, we use 3 pair of parallel lines to compute the camera intrinsic matrix using the following algorithm:
  1. Given a pair of parallel lines, \(l\) and \(m\), we compute the vanishing point \(v\) as: \(l \times m = 0\)
  2. As per the assignment, the camera has zero skew and the pixels are square. Hence, the camera matrix can be defined as follows : \[K = \begin{bmatrix} f & 0 & c_x \\ 0 & f & c_y \\ 0 & 0 & 1 \end{bmatrix}\]
  3. We know that \(\omega\) is given by \(\omega = K^{-T} K^{-1}\)
  4. Thus, \(\omega\) will be of the form \[ \begin{bmatrix} a & 0 & b \\ 0 & a & c \\ b & c & d \end{bmatrix} \]
  5. We can obtain constraints by using the following equation: \[v_i^T \omega v_j = 0\]
  6. Note, we only need three equations since the Degrees of Freedom in \(\omega\) is only 3
  7. We solve the equation in point5 using SVD. We construct a matrix of form \(Ac=0\) with \(c = \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix}\)
  8. We use cholesky decomposition to obtain the matrix \(K\)


The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 1.15417805 \times 10^{3} & 0.00000000 & 5.75066022 \times 10^{2} \\ 0.00000000 & 1.15417805 \times 10^{3} & 4.31939101 \times 10^{2} \\ 0.00000000 & 0.00000000 & 1.00000000 \end{bmatrix} \]

The principal point is : \[ \begin{bmatrix} 5.75066022 \times 10^{2}, 4.31939101 \times 10^{2} \end{bmatrix} \]

Table 3. Q2 (a) results on provided images
Input Image Annotated parallel lines on input image Vanishing points and principal point
Cathedral Cathedral Cathedral
̊

Q2 (b). Camera calibration from metric planes

In this question, we use 3 squares with known metric un-rectification points to compute the camera intrinsic matrix using the following algorithm:
  1. Given a square with annotated points, we assume rectified coordinates : \((0, 0)\), \((1, 0)\), \((1, 1)\), \((0, 1)\)
  2. We compute the homography that does metric un-rectification for the above assumed points to the given annotation. We compute this homography using SVD.
  3. Assume the homography matrix, \(H\) to be of the form \[ \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \]
  4. For \(\omega\) we get the following two constrainsts: \[h_1{^T} \omega h_2 = 0\] and \[h_1{^T} \omega h_1 = h_2{^T} \omega h_2\]
  5. Thus, for each sqaure we obtain 2 constrains. Since we have three squares, hence, we get total 6 constrains. Since \(\omega\) has 5 degrees of freedom, we can solve for \(\omega\) using SVD.
  6. The \(\omega\) matrix is of the form \[ \begin{bmatrix} a & b & c \\ b & d & e \\ c & e & f \end{bmatrix} \]
  7. We construct a matrix of form \(Ac=0\) with \(c = \begin{bmatrix} a \\ b \\ c \\ d \\e \\f \end{bmatrix}\)
  8. We use cholesky decomposition to obtain the matrix \(K\)


The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 1.0858790 \times 10^{3} & -1.4729402 \times 10^{1} & 5.2055530 \times 10^{2} \\ 0.0000000 & 1.0792269 \times 10^{3} & 4.0178989 \times 10^{2} \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]

The principal point is : \[ \begin{bmatrix} 5.2055530 \times 10^{2}, 4.0178989 \times 10^{2} \end{bmatrix} \]

Table 4. Q2 (b) results on provided images
Input Image Square Annotations Annotated Square 1 Annotated Square 2 Annotated Square 3
Cathedral Cathedral Cathedral Cathedral Cathedral


Table 5. Q2 (b) Angle between planes (degree)
Angle between planes(degree)
Plane 1 & Plane 2 67.60315628
Plane 1 & Plane 3 92.25011409
Plane 2 & Plane 3 94.80551827


̊

Q2 (c). Camera calibration from metric planes

In this question we use the same algorithm as Q2 (b) but on a custom image. The only change in the algorith is the assumption of the metrically rectified points as follows for the laptops: \[(0, 0), (1.6, 0), (1.6, 1), (0, 1)\] and \[(0, 0), (1.755, 0), (1.755, 1), (0, 1)\] for the TV. These values are based on the width:length ratio of the rectangles.

The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 2.7673762 \times 10^{3} & -3.8915108 \times 10^{1} & 2.0566846 \times 10^{3} \\ 0.0000000 & 2.7130137 \times 10^{3} & 1.6046228 \times 10^{3} \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]

The principal point is : \[ \begin{bmatrix} 2.0566846 \times 10^{3}, 1.6046228 \times 10^{3} \end{bmatrix} \]

Table 6. Q2 (c) results on provided images
Input Image Square Annotations Annotated Square 1 Annotated Square 2 Annotated Square 3
Cathedral Cathedral Cathedral Cathedral Cathedral


Table 7. Q2 (c) Angle between planes (degree)
Angle between planes(degree)
Plane 1 & Plane 2 82.69671731
Plane 1 & Plane 3 39.72447461
Plane 2 & Plane 3 49.44409936


̊

Q3 (a). Single View Reconstruction

In this question, we use the following algorithm:
  1. We use the given plane annotations to obtain 3 pairs of parallel lines and compute \(K\) using the algorithm described for Q. 2(a)
  2. For each plane we compute the plane normals given by \(n = K^T v\) where \(v\) is the vanishing line of the plane
  3. Next, we compute the rays (along with the colors) for all the points on the plane using the following equation: \[r = K^{-1} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}\] and then normalize the rays with their norm
  4. We choose a reference point on the plane and compute the distance of all the points from the reference point using the equation of the plane given by \(n^T X + a = 0\)
  5. We also assign the depth of the reference point as 1
  6. Next, we find the intersection of the rays with the plane which gives us the 3D coordinates of the rest of the points on the plane
  7. Finally, we choose a point that is common to another plane, choose that as the new reference point and repeat the above steps
  8. We repeat these steps for all the planes


The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 808.5202361 & 0.0000000 & 510.71188676 \\ 0.0000000 & 808.5202361 & 363.63611542 \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]

Table 8. Q3 (a) Provided images
Input Image Annotations
Cathedral Cathedral


Table 9. Q3 (a) Reconstruction views
Reconstruction view 1 Reconstruction view 2 Reconstruction view 3 Reconstruction view 4
Cathedral Cathedral Cathedral Cathedral


̊