Lab 2

Vision Algorithms

We need to find the distance away a picture was taken of a specified object. Below, I've attached pictures of the said object.

Big Straight       Small Straight

The one on the left is a 15" square plank with tennis balls attached on all 4 corners. The one on the right is a 5" square plank with tennis balls attached at the corners.



There are a couple things to consider when doing this lab. First, what if the pictures are taken at an angle? In the pictures above, they are taken relatively straight on, but not all pictures will be like that. To investigate this problem, I also took photos on the left and right of the images as seen below.

Big2b         small 2b         big 2c         Small 2c

Also, because there's a measuring tapethat was used to measure the distance away from the plank, it shows up in the image. The specific problem with thi measure tape is that it's yellow and so is the tennis balls. So we need to make sure we filter the data properly so that it does not affect the result.



Also, there are multiple ways of approaching this problem. Some I've considered are:

I'm sure there are other ways of approaching this problem but these are the ones I brainstorned.



There were a couple of changes after starting to programing. These were mostly due to unexpected problems arising, so I had to find ways to correct it. The code is broken down into segments, so I'll explain the segments individually in the report below.


Photo issue:

I had to get new photos because the ruler was really intefering with the image as it is yellow as well. If you have the parrot quadroter that is able to track an object of a specific colour using the same technology and a pid loop, it too also has the same problem when another object of the same colour is in frame. So for the purpose of this assignment, I took new photos with the measuring tape removed. Maybe I'll tackle the problem again with the tape in frame when I have the time.

Matlab also constantly yells at us since our phones take pretty amazing (big) pictures, so I put it through photoshop and reduced the image size so that it stops yelling at me in bright orange messages. :) I set it to 1000 pixels in height.

Also, I'll be using the small square at 4 feet for the example in this report. Then I'll show the algorithm being used on the other images at the end.


Hue, Saturation and Value

Upon receiving the image, we first notice that when we imshow it based on it's rgb value, it simply returns 3 grayscale images. Which is to be expected as these are the intensities of the 3 colour channels, but not really useful for what we want to do. Then changing it to the hue saturation and value channel, we seem to be able to detect the balls pretty well in the hue channel, so that is what I decided to do. I used the hue channel as the base image.

hue, Saturation, Value



So the next step in the process is threshing the hue image.


As you can see above, the hue channel has 4 grey dots that represent the ball. We need that to truly show in the image and remove the rest of the 'unimportant' or irrelevant information. So I decided to change it into a binary image by threshing it with a band pass filter. Ie. making a ceiling and floor threshhold for the image. This leaves me with the image on the right above. Pretty good and I can clearly see where the balls are, but there is still some noise in the image that should be removed.


Noise removal!

So. I started off by thinking, hey if I can filter everything out but the balls, then I don't have to spend too much time segmenting, since the only thing there will be the balls. Turns out it's not as easy as that. -.-

Regardless, removing the noise is important.


So I made my own noise removal algorithm. I first used imopen. It kinda removes the simple dots from the picture. Then I went through the whole image. I said to look around the border of the pixel and if the border has more than 1/3 black pixels, then change the pixel to black. There was some tweaking of the border values involved as well as the ratio but after some testing, that turned out to be the best numbers. (border = 6 px, black:white = 1:2)



This is the really exciting part (and painful), but things are about to get good. SEGMENT EVERYTHING. So we could flood fill everything, but matlab hates us (and doesn't have enough stack space) so recursion beyond 500 levels doesn't work and the picture is about 1000 by 1000 pixels. (I think.) You could create a queue to remember where things are but why do that. Also many times, the balls are not one full object. They are in bits and pieces. So flood fill is probably not the most ideal solution.

Either ways, I made my own segmentation fill (not flood fill, segmentation fill). What it does is that it goes through the image. It has a border that it looks at. If there is a pixel that is not black, it looks for other pixels that are not black and has a unique id. If there are no pixels around with a unique id, it increments the id and sets the pixel to that id. Then it continues. There is one... (not bug) but improvement that could be made, which I'll leave for the evaluation at the end.

Also, note that by setting them to a unique id and using imshow, we get a grayscale image. So that becomes pretty cool because it shows the different segments in different shades of grey.

Let me explain why I implemented this instead of raster. In raster, you only join the pieces up if they are next to each other. Given the case where the parts are not next to each other but are part of the same object (due to shadows and what not) what then? You could do reduce the threshold and such and play with the numbers, but with my method, it's better because you're looking at the borders of the pixel you're analysing. This almost seems like it's bluring the threshold. But then you say, blurring the image means losing information right? Wrong. In this case, we want to blur the image because we need to group the items together. In fact, my 'blur' is a loittle more sophistacated than that because earlier, I implement a weighting system to cancel out the noise already. So really it's like we gaussian blurred the image (kinda) and then threshed it, which is usually helpful in computer vision.

Below I have a close up of the segmented balls. See how the gray shows the 4 segments of the ball? Of course the noise that was not eliminated also gets segmented, but in its own colour.


So at this point, we only have an image with unique ids. Next step is to find the positions of the segments. I created a list of the segments by finding the bottom left and top right points. (Why those two? Well, I just kinda typed into the computer and screwed up some math and got that instead of top left and bottom right instead and then thought, hey, that's all the information I need so I'll fix it in the code later.) Also, you can see a better greyscale of the image when viewed with imagesc (bottom picture),


So now the segment's positions and indexes are in a list. I decided to be awesome and draw boxes around them so I could visualize them better, and it makes things look cool.

Ball detection

Ok. Now notice how that there are segments there that are false positives? Kinda. They are segments, but they aren't the ones that we care about. Also notice how the ball segments are roughly the same size? Cool. Let's try to get those.


So I did a least square difference to find the ones that were the closest in area. What that means is that, I sorted the segments based on the size of their area. Then, I took a group of 4 (because there are 4 balls) and found their average. The I took the difference , squared each of them then, find the differences from that average in that group and sumed it. The set of 4 with the smallest sum should contain the balls. I tested it with the other distances and it worked. YAY! (there was some tweaking involved) Also, the code is modular so you can choose the size of the window of balls you're looking for. YAY! Also, since it doesn't take account position of the balls to determine where they are, you can probably take the balls at an angle and it should be close to fine. This is what I get.


I think that's pretty cool.


Finding the distances between balls

So now that we know which ones are balls, we need to find the distances between them. What I did first was find the position of the center of each ball detected since I already know what the width and height of the found balls were.

Then I found the distances between all of the 4 points. This returns 6 distances: 4 for the sides of the board and 2 for the diagonals. I sorted the list and dropped the largest 2. Then I averaged the list to get a mean. This was the mean I used to do my regression plot. I went into excel and plotted the pixels I found against the target distance. And then I found the best fit line to the points plotted. I did the big board and the small board seperately. This gave me an equation that fit the trend pretty well. I will leave the data and graphs of this experimentation at the end of this section. After I found the equations, I tested them with the length of each side of the board to be sure. They all had less than 15% inaccuracy. So, Sucess!


Experimenting stage and Data collection

So, let me explain the process of my data collection to come up with the equations. There were 2 methods to approach finding the distance from the target. One was to use the equation learnt in class with image projection. That is:

h/d = h'/f

The other method is through regression, which was the method I chose. The main reason for choosing this method was because I didn't know the focal length of the camera and upon searching the web, they gave different numbers for it from 3.8 mm to 30 mm, so I decided to just do some experiments with the information I know.

So, I found the average number of pixels between 2 'adjacent' corner balls on the board and treated that as the target. Below lists the data:

Small Board (5 in wide)

Pixels (px) Distance (ft)
251.8 2
136 4
63.9 8
33 16


Small Regresson

Big Board (15 in wide)

Pixels (px) Distance (ft)
614.2 2
379.2 4
194.8 8
100.2 16

Big Regression

Notice how there is a best fit line in both of the graphs. They follow the power type, so that means that it's raised to a power and then multiplied by a scalar. I then used those equations in the code.


Experiment Data Test and Evaluation

So now that we have found the trend line, we should test it with the borders and find the best estimate from each picture. Here are the results in the following tables:

Small 2 feet

px 234.1 237.6 266.6 268.9
dist 2.21 2.18 1.94 1.92
err 10.47% 8.82% 3.13% 3.96%

Best Estimate: 2.06 feet

Error: 3.05%

Small 4 feet

px 123.5 135.5 141.6 143.6
dist 4.21 3.84 3.67 3.62
err 10.74% 8.10% 16.45% 19.03%

Best Estimate: 3.84 feet

Error: 4.11%

Small 8 feet

px 58.5 63.5 66 67.6
dist 8.96 8.25 7.94 7.75
err 48.23% 12.60% 3.18% 12.67%

Best Estimate: 8.22 feet

Error: 2.81%

Small 16 feet

px 30.5 32.5 34.5 34.5
dist 17.31 16.23 15.28 15.28
err 8.17% 1.45% 4.49% 4.49%

Best Estimate: 16.03 feet

Error: 0.16%


Big 2 feet

px 580.4 592.7 603 680.5
dist 2.28 2.23 2.19 1.91
err 14.19% 11.52% 9.37% 4.59%

Best Estimate: 2.14 feet

Error: 7.00%

Big 4 feet

px 369.7 373.5 370 375.2
dist 3.80 3.76 3.80 3.74
err 9.99% 12.18% 10.17% 13.14%

Best Estimate: 3.77 feet

Error: 5.68%

Big 8 feet

px 192.6 193.5 194 199.1
dist 7.93 7.89 7.87 7.64
err 3.27% 5.36% 6.50% 17.87%

Best Estimate: 7.83 feet

Error: 2.13%

Big 16 feet

px 99 99.5 100.5 102
dist 16.82 16.72 16.54 16.26
err 5.13% 4.53% 3.36% 1.64%

Best Estimate: 16.58 feet

Error: 3.62%


Note that all the final errors are within 7%, mostly in the 3% range.

The error seems to get reduced the further away the ball is as well, perhaps showing that the bulk of the error probably arises from the width and height of the ball. I'm sure I could have accounted for this somewhere in my code, but I didn't because this was good enough.


Outputs from each stage

Small Board


Big Board


Potential Errors

Let's return to the brainstorm I had at the start of the assignment:

I'll now explain why I chose and or did not chose to do the above ideas. The problem with counting the number of pixels in the threshed balls is that, sometimes, not the whole ball is threshed (due to shadowing). What this means is that not the full ball is filled so it doesn't give a good representation of the area of the ball so that method shouldn't be used.

We used the number of pixels between the balls because the positions of the balls seemed to be the most reliable points on the image. Also, by averaging all the distances and using all 4 points, we reduced the error that could occur.

We also did create a bounding box for the balls when we did ball detection. Really it was for visually seeing where the balls were, but we also used some heuristics on their area to determine which ones were balls and which weren't.

We didn't use edge triangulation because I couldn't find a reliable source for the focal length of my camera.

The measuring tape was a potential error that I removed by taking new photos without the measuring tape in the picture.

The noise removal step could have caused potential errors because if the numbers were tuned wrongly, then the balls themselves may get removed, or the areas of the balls may become too small that they throw the heuristic of determine which are balls later off.

The segmentation had the potential error of if the border looked at wasn't big enough, it may miss 2 segments that are actually next to each other. That can be solved by going through the image and doing a raster, but I managed to make the border big enough that that wasn't necessary. Also, making the border too big may join two segments that aren't together together, which again will throw the heuristic off again later.

The ball detection heuristic has a potential problem if the ball segments and the noise segments are too close together in size. There is unfortunately no better way to solve it than by filtering it better.


Evaluation of data and error

So as mentioned before, the best estimate of the distance away the target was was no worse that 7%. They mostly had an error of 2 - 3% which is certainly not bad. The error of 2 of the individual borders were more than 15% though. But that being said, it didn't change the error of the best estimate much because it was compensated by the other border of the board. This is the reason why we averaged all the lengths. The reason this happens (that one side is longer and the other is shorter) is probably because the pictures were taken at more of an angle than expected.

Als you notice that the smaller the area of the ball in the picture, the smaller the error. This is mainly because the area of the ball creates the most uncertainty. This is probably due to the threshing and the noise removal, so we can't really stay true to the size of the ball. Therefore getting the center positions of the ball was much more reliable and so when the balls are smaller in the image, the more reliable it tended to be. Furthermore, with a bigger area, the rectangle that blocks out the ball tends to be more rectangular than squarish that offsets the center position of the block more than it should.

That being said, again averaging everything out tends to cancel out the anomalies such that it gives pretty reliable results.