Geometry-based Methods for Vision

Assignment 2

Aviral Agrawal (avirala)

Q1 (a). Camera matrix P from 2D-3D correspondences : Stanford Bunny

The projection matrix computted from the 2D to 3D correspondences is:
\[ P_{normalized} = \begin{bmatrix} 6.43360168 \times 10^{3} & -2.94894365 \times 10^{3} & 1.14605707 \times 10^{3} & 2.22742084 \times 10^{3} \\ -9.34882718 \times 10^{2} & -6.75530536 \times 10^{3} & 2.02900218 \times 10^{3} & 1.82289160 \times 10^{3} \\ 5.79221703 \times 10^{-1} & -1.42341186 \times 10^{0} & -7.35313344 \times 10^{-1} & 1.00000000 \times 10^{0} \end{bmatrix} \] Table 1. Q1 (a) results on provided image

Input Image	Annotated points on input image	Points projected on input image	Bounding box projected on input image

Q1 (b). Camera matrix P from 2D-3D correspondences : Cuboid

The projection matrix computted from the 2D to 3D correspondences is:
\[ P_{normalized} = \begin{bmatrix} 8.67876810 \times 10^{1} & -9.92854783 \times 10^{0} & -2.17699611 \times 10^{2} & 5.53873518 \times 10^{1} \\ 7.28494609 \times 10^{1} & -1.24764799 \times 10^{2} & 3.60569451 \times 10^{1} & 3.61126278 \times 10^{2} \\ -9.12720191 \times 10^{-2} & -6.17049797 \times 10^{-2} & -9.60038073 \times 10^{-2} & 1.00000000 \times 10^{0} \end{bmatrix} \] Table 2. Q1 (b) results on collected image

Input Image	Annotated points on input image	Bounding box projected on input image

Q2 (a). Camera calibration from vanishing points

In this question, we use 3 pair of parallel lines to compute the camera intrinsic matrix using the following algorithm:

Given a pair of parallel lines, \(l\) and \(m\), we compute the vanishing point \(v\) as: \(l \times m = 0\)
As per the assignment, the camera has zero skew and the pixels are square. Hence, the camera matrix can be defined as follows : \[K = \begin{bmatrix} f & 0 & c_x \\ 0 & f & c_y \\ 0 & 0 & 1 \end{bmatrix}\]
We know that \(\omega\) is given by \(\omega = K^{-T} K^{-1}\)
Thus, \(\omega\) will be of the form \[ \begin{bmatrix} a & 0 & b \\ 0 & a & c \\ b & c & d \end{bmatrix} \]
We can obtain constraints by using the following equation: \[v_i^T \omega v_j = 0\]
Note, we only need three equations since the Degrees of Freedom in \(\omega\) is only 3
We solve the equation in point5 using SVD. We construct a matrix of form \(Ac=0\) with \(c = \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix}\)
We use cholesky decomposition to obtain the matrix \(K\)

The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 1.15417805 \times 10^{3} & 0.00000000 & 5.75066022 \times 10^{2} \\ 0.00000000 & 1.15417805 \times 10^{3} & 4.31939101 \times 10^{2} \\ 0.00000000 & 0.00000000 & 1.00000000 \end{bmatrix} \]

The principal point is : \[ \begin{bmatrix} 5.75066022 \times 10^{2}, 4.31939101 \times 10^{2} \end{bmatrix} \]

Table 3. Q2 (a) results on provided images

Input Image	Annotated parallel lines on input image	Vanishing points and principal point

Q2 (b). Camera calibration from metric planes

In this question, we use 3 squares with known metric un-rectification points to compute the camera intrinsic matrix using the following algorithm:

Given a square with annotated points, we assume rectified coordinates : \((0, 0)\), \((1, 0)\), \((1, 1)\), \((0, 1)\)
We compute the homography that does metric un-rectification for the above assumed points to the given annotation. We compute this homography using SVD.
Assume the homography matrix, \(H\) to be of the form \[ \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \]
For \(\omega\) we get the following two constrainsts: \[h_1{^T} \omega h_2 = 0\] and \[h_1{^T} \omega h_1 = h_2{^T} \omega h_2\]
Thus, for each sqaure we obtain 2 constrains. Since we have three squares, hence, we get total 6 constrains. Since \(\omega\) has 5 degrees of freedom, we can solve for \(\omega\) using SVD.
The \(\omega\) matrix is of the form \[ \begin{bmatrix} a & b & c \\ b & d & e \\ c & e & f \end{bmatrix} \]
We construct a matrix of form \(Ac=0\) with \(c = \begin{bmatrix} a \\ b \\ c \\ d \\e \\f \end{bmatrix}\)
We use cholesky decomposition to obtain the matrix \(K\)

The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 1.0858790 \times 10^{3} & -1.4729402 \times 10^{1} & 5.2055530 \times 10^{2} \\ 0.0000000 & 1.0792269 \times 10^{3} & 4.0178989 \times 10^{2} \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]

The principal point is : \[ \begin{bmatrix} 5.2055530 \times 10^{2}, 4.0178989 \times 10^{2} \end{bmatrix} \]

Table 4. Q2 (b) results on provided images

Input Image	Square Annotations	Annotated Square 1	Annotated Square 2	Annotated Square 3

Table 5. Q2 (b) Angle between planes (degree)

	Angle between planes(degree)
Plane 1 & Plane 2	67.60315628
Plane 1 & Plane 3	92.25011409
Plane 2 & Plane 3	94.80551827

Q2 (c). Camera calibration from metric planes

In this question we use the same algorithm as Q2 (b) but on a custom image. The only change in the algorith is the assumption of the metrically rectified points as follows for the laptops: \[(0, 0), (1.6, 0), (1.6, 1), (0, 1)\] and \[(0, 0), (1.755, 0), (1.755, 1), (0, 1)\] for the TV. These values are based on the width:length ratio of the rectangles.

The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 2.7673762 \times 10^{3} & -3.8915108 \times 10^{1} & 2.0566846 \times 10^{3} \\ 0.0000000 & 2.7130137 \times 10^{3} & 1.6046228 \times 10^{3} \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]

The principal point is : \[ \begin{bmatrix} 2.0566846 \times 10^{3}, 1.6046228 \times 10^{3} \end{bmatrix} \]

Table 6. Q2 (c) results on provided images

Input Image	Square Annotations	Annotated Square 1	Annotated Square 2	Annotated Square 3

Table 7. Q2 (c) Angle between planes (degree)

	Angle between planes(degree)
Plane 1 & Plane 2	82.69671731
Plane 1 & Plane 3	39.72447461
Plane 2 & Plane 3	49.44409936

Q3 (a). Single View Reconstruction

In this question, we use the following algorithm:

We use the given plane annotations to obtain 3 pairs of parallel lines and compute \(K\) using the algorithm described for Q. 2(a)
For each plane we compute the plane normals given by \(n = K^T v\) where \(v\) is the vanishing line of the plane
Next, we compute the rays (along with the colors) for all the points on the plane using the following equation: \[r = K^{-1} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}\] and then normalize the rays with their norm
We choose a reference point on the plane and compute the distance of all the points from the reference point using the equation of the plane given by \(n^T X + a = 0\)
We also assign the depth of the reference point as 1
Next, we find the intersection of the rays with the plane which gives us the 3D coordinates of the rest of the points on the plane
Finally, we choose a point that is common to another plane, choose that as the new reference point and repeat the above steps
We repeat these steps for all the planes

The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 808.5202361 & 0.0000000 & 510.71188676 \\ 0.0000000 & 808.5202361 & 363.63611542 \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]

Table 8. Q3 (a) Provided images

Input Image	Annotations

Table 9. Q3 (a) Reconstruction views

Reconstruction view 1	Reconstruction view 2	Reconstruction view 3	Reconstruction view 4