Assignment 3: Vision


To run the code, run run.m with path of the rpg photo and the path of the depth array

In this assignment, we find the position and orientation of a block of wood, given RGBD images taken by a Kinect-like device. In our program, we used both the RBG photos and the depth arrays to locate the blocks and calculate their orientations. We transfrom the depth arrays into a 2D B&W picture, with white representing the shallowest spots and black representing the darkest spots. We also transform our RBG picture into an intensity array as well.

First, to minimize noise and other factors that influence the picture, we filter the RBG images with a 25x25 2nd degree Gaussian mask. It effectively blurs the pictures and therefore small dots and gaps are filtered out. After that, we transform the image matrices into binary pictures so that the number 1 represents the block and number 0 represents spaces and other objects that are not the block. However, because of the quality of pictures and the depth arrays, both pictures have gaps within the blocks as well as noise that is mistaken for the block. For example, shadows on the block are often mistaken for the background, and light spots are picked up as part of the blocks. We solve the problems and generate an optimal binary image array by using the depth array picture to fill out gaps in the RGB image, and to filter out far spots that are picked up as part of the block. We then further dilate the image to reduce the errors.

We calculate the centroid of the block by finding the average location of all the white spots in the image. The centroid represents the position of the block in an X-Y coordiante.

We then calculate the orientation of the block. First, we find the four corners by finding the four spots that are the furthest away from the centroid, then we calcualte the angles formed by the edges of the rectangles.

Overall the centroid we calculated is mostly accurate as we applied filters and use both the depth arrays and the RGB pictures to minimize errors and get rid of noise. The rotation we calculated is not as accurate because the arctan function is really sensitive to errors and small noise and errors in the images affect the result a lot.

1) Estimate the accuracy of your measurements, ideally in MKS units. What aspects of the measurements are likely to be accurate? What aspects of the measurements are likely to be inaccurate?

We do not make any calculations in MKS units. All our positions are calculated with an (x, y) coordinate relative to the pictures given. We calculate the position and the rotation of the blocks by first filtering the image and finding out which part of the image is the block, then calculating the rotation by finding the corners of the block and the angle of the edges formed. Our measurements are very accurate when the block in the input image is clearly defined, but we have trouble with our measurements when the background has lighter spots, or when a shadow falls on parts of the block, as this throws off the intensity-filtered image and our algorithm. To find the centroid, our method averages all the points that appear in white in the filtered image. Then, to find the corners, it searches the same set of points for those with the maximum distances from the determined centroid. When the block’s coloration is clearly distinct from the background, we can get a more correct set of “white points”, and, thus, a more accurate set of measurements.

2) What is the effect of the missing (0) depth values?

Miss depth values in the depth arrays result in holes in the block. The holes in the block affect our calculations as they are not considered part of the block and hence the centroid will be skewed. If the missing values are at the edges of the block, it affects our locations of the corners and hence the angles as well. We address this issue by using both the RBG image and the depth arrays to fill in gaps for each other and filter out outliers that are far from where the block is located.

The table demonstrates our program. First picture is the original picutre, second is the binary representation, and third is the rotated image to represent the orientation of the block. Red dot is the centroid, and red crosses are the corners found