Geometry-based Methods for Vision
Assignment 2
Aviral Agrawal (avirala)
Q1 (a). Camera matrix P from 2D-3D correspondences : Stanford Bunny
The projection matrix computted from the 2D to 3D correspondences is:\[ P_{normalized} = \begin{bmatrix} 6.43360168 \times 10^{3} & -2.94894365 \times 10^{3} & 1.14605707 \times 10^{3} & 2.22742084 \times 10^{3} \\ -9.34882718 \times 10^{2} & -6.75530536 \times 10^{3} & 2.02900218 \times 10^{3} & 1.82289160 \times 10^{3} \\ 5.79221703 \times 10^{-1} & -1.42341186 \times 10^{0} & -7.35313344 \times 10^{-1} & 1.00000000 \times 10^{0} \end{bmatrix} \]
Input Image | Annotated points on input image | Points projected on input image | Bounding box projected on input image |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Q1 (b). Camera matrix P from 2D-3D correspondences : Cuboid
The projection matrix computted from the 2D to 3D correspondences is:\[ P_{normalized} = \begin{bmatrix} 8.67876810 \times 10^{1} & -9.92854783 \times 10^{0} & -2.17699611 \times 10^{2} & 5.53873518 \times 10^{1} \\ 7.28494609 \times 10^{1} & -1.24764799 \times 10^{2} & 3.60569451 \times 10^{1} & 3.61126278 \times 10^{2} \\ -9.12720191 \times 10^{-2} & -6.17049797 \times 10^{-2} & -9.60038073 \times 10^{-2} & 1.00000000 \times 10^{0} \end{bmatrix} \]
Input Image | Annotated points on input image | Bounding box projected on input image |
---|---|---|
![]() |
![]() |
![]() |
Q2 (a). Camera calibration from vanishing points
In this question, we use 3 pair of parallel lines to compute the camera intrinsic matrix using the following algorithm:- Given a pair of parallel lines, \(l\) and \(m\), we compute the vanishing point \(v\) as: \(l \times m = 0\)
- As per the assignment, the camera has zero skew and the pixels are square. Hence, the camera matrix can be defined as follows : \[K = \begin{bmatrix} f & 0 & c_x \\ 0 & f & c_y \\ 0 & 0 & 1 \end{bmatrix}\]
- We know that \(\omega\) is given by \(\omega = K^{-T} K^{-1}\)
- Thus, \(\omega\) will be of the form \[ \begin{bmatrix} a & 0 & b \\ 0 & a & c \\ b & c & d \end{bmatrix} \]
- We can obtain constraints by using the following equation: \[v_i^T \omega v_j = 0\]
- Note, we only need three equations since the Degrees of Freedom in \(\omega\) is only 3
- We solve the equation in point5 using SVD. We construct a matrix of form \(Ac=0\) with \(c = \begin{bmatrix} a \\ b \\ c \\ d \end{bmatrix}\)
- We use cholesky decomposition to obtain the matrix \(K\)
The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 1.15417805 \times 10^{3} & 0.00000000 & 5.75066022 \times 10^{2} \\ 0.00000000 & 1.15417805 \times 10^{3} & 4.31939101 \times 10^{2} \\ 0.00000000 & 0.00000000 & 1.00000000 \end{bmatrix} \]
The principal point is : \[ \begin{bmatrix} 5.75066022 \times 10^{2}, 4.31939101 \times 10^{2} \end{bmatrix} \]
Input Image | Annotated parallel lines on input image | Vanishing points and principal point |
---|---|---|
![]() |
![]() |
![]() |
Q2 (b). Camera calibration from metric planes
In this question, we use 3 squares with known metric un-rectification points to compute the camera intrinsic matrix using the following algorithm:- Given a square with annotated points, we assume rectified coordinates : \((0, 0)\), \((1, 0)\), \((1, 1)\), \((0, 1)\)
- We compute the homography that does metric un-rectification for the above assumed points to the given annotation. We compute this homography using SVD.
- Assume the homography matrix, \(H\) to be of the form \[ \begin{bmatrix} h_1 & h_2 & h_3 \end{bmatrix} \]
- For \(\omega\) we get the following two constrainsts: \[h_1{^T} \omega h_2 = 0\] and \[h_1{^T} \omega h_1 = h_2{^T} \omega h_2\]
- Thus, for each sqaure we obtain 2 constrains. Since we have three squares, hence, we get total 6 constrains. Since \(\omega\) has 5 degrees of freedom, we can solve for \(\omega\) using SVD.
- The \(\omega\) matrix is of the form \[ \begin{bmatrix} a & b & c \\ b & d & e \\ c & e & f \end{bmatrix} \]
- We construct a matrix of form \(Ac=0\) with \(c = \begin{bmatrix} a \\ b \\ c \\ d \\e \\f \end{bmatrix}\)
- We use cholesky decomposition to obtain the matrix \(K\)
The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 1.0858790 \times 10^{3} & -1.4729402 \times 10^{1} & 5.2055530 \times 10^{2} \\ 0.0000000 & 1.0792269 \times 10^{3} & 4.0178989 \times 10^{2} \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]
The principal point is : \[ \begin{bmatrix} 5.2055530 \times 10^{2}, 4.0178989 \times 10^{2} \end{bmatrix} \]
Input Image | Square Annotations | Annotated Square 1 | Annotated Square 2 | Annotated Square 3 |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
Angle between planes(degree) | |
---|---|
Plane 1 & Plane 2 | 67.60315628 |
Plane 1 & Plane 3 | 92.25011409 |
Plane 2 & Plane 3 | 94.80551827 |
̊
Q2 (c). Camera calibration from metric planes
In this question we use the same algorithm as Q2 (b) but on a custom image. The only change in the algorith is the assumption of the metrically rectified points as follows for the laptops: \[(0, 0), (1.6, 0), (1.6, 1), (0, 1)\] and \[(0, 0), (1.755, 0), (1.755, 1), (0, 1)\] for the TV. These values are based on the width:length ratio of the rectangles.The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 2.7673762 \times 10^{3} & -3.8915108 \times 10^{1} & 2.0566846 \times 10^{3} \\ 0.0000000 & 2.7130137 \times 10^{3} & 1.6046228 \times 10^{3} \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]
The principal point is : \[ \begin{bmatrix} 2.0566846 \times 10^{3}, 1.6046228 \times 10^{3} \end{bmatrix} \]
Input Image | Square Annotations | Annotated Square 1 | Annotated Square 2 | Annotated Square 3 |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
Angle between planes(degree) | |
---|---|
Plane 1 & Plane 2 | 82.69671731 |
Plane 1 & Plane 3 | 39.72447461 |
Plane 2 & Plane 3 | 49.44409936 |
̊
Q3 (a). Single View Reconstruction
In this question, we use the following algorithm:- We use the given plane annotations to obtain 3 pairs of parallel lines and compute \(K\) using the algorithm described for Q. 2(a)
- For each plane we compute the plane normals given by \(n = K^T v\) where \(v\) is the vanishing line of the plane
- Next, we compute the rays (along with the colors) for all the points on the plane using the following equation: \[r = K^{-1} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}\] and then normalize the rays with their norm
- We choose a reference point on the plane and compute the distance of all the points from the reference point using the equation of the plane given by \(n^T X + a = 0\)
- We also assign the depth of the reference point as 1
- Next, we find the intersection of the rays with the plane which gives us the 3D coordinates of the rest of the points on the plane
- Finally, we choose a point that is common to another plane, choose that as the new reference point and repeat the above steps
- We repeat these steps for all the planes
The camera matrix computed from the algorithm is: \[ K = \begin{bmatrix} 808.5202361 & 0.0000000 & 510.71188676 \\ 0.0000000 & 808.5202361 & 363.63611542 \\ 0.0000000 & 0.0000000 & 1.0000000 \end{bmatrix} \]
Input Image | Annotations |
---|---|
![]() |
![]() |
Reconstruction view 1 | Reconstruction view 2 | Reconstruction view 3 | Reconstruction view 4 |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
̊