Learning for 3D Vision: Assignment 5


1. Classification Model

1.1. Test Accuracy

The test accuracy obtained was 96.54%.

1.2. Visualization

Classification
Correct Classification Examples
Predicted: Chair Predicted: Vase Predicted: Lamp
Incorrect Classification Examples
Predicted: Chair Predicted: Vase Predicted: Lamp

GT Class: Lamp

GT Class: Lamp

GT Class: Vase
No other incorrect classification!
GT Class: Lamp

GT Class: Vase
Interpretation
Chair Vase Lamp
The accuracy for classifying chairs is quite high, as is also evident in the confusion matrix below. The only incorrect classification seen is for a lamp. It is possible to see why the model could be confused by that example, as the lamp (shown above), does appear to vaguely resemble a high chair. This is however, relatively uncommon. The accuracy for classifying vases is also quite high, and is visible from the examples above, the lamps do somewhat resemble vases. In other words, it is possible for vases to exist in the shape of the above lamps. The lack of plants however, should have helped the model identify that the above objects were more likely to be lamps. Similar to the confusion of lamps being vases, in the above example we can see the reverse case also being true. For the above objects, it can be seen why the model would confuse these objects as lamps. The first object does somewhat resemble an inverted open lamp, while the plant in the second point cloud can easily be mistaken to be a lamp on a stand.
Confusion Matrix
Predicted: Chair Predicted: Vase Predicted: Lamp
GT: Chair 617 0 0
GT: Vase 0 90 12
GT: Lamp 1 20 213

2. Segmentation Model

2.1. Test Accuracy

The test accuracy obtained was 87.39%.

2.2. Visualizations

Classification
Good Prediction Examples
Ground Truth Prediction Accuracy Interpretation
94.5% Most of the chair is segmented accurately, with minor inaccuracies at the intersection of the blue and red regions.
97.05% The model is able to identify distinct regions of the chair even with significant differences in shape. Eg. the seating region, the legs, the armrests, etc.
85.15% The model identifies distinct types of armrests as well, but has inaccuracies in determining the boundaries of the red region.
72.69% Similar to the above, most of the chair is segmented accurately, with inaccuracies showing majorly in the boundaries of the red region from the back.
Bad Prediction Examples
Ground Truth Prediction Accuracy Interpretation
47.02% The model entirely misses identifying the headrest region in the point cloud. A reason for this could be that the model has seen a lot of data where there is no headrest, and thus the entire upright portion is classified as a backrest. Since the headrest is usually much smaller than the backrest, the model possibly does not get heavily penalized for missing this.
58.79% Based on the above examples, the model has learned to segment the "base" or "legs" of the chair in blue. However, in this example, it does not consider the entire bottom half of the chair to be the base, but only the very bottom, thus resulting in incorrect segmentation.

3. Robustness Analysis

3.1. Experiment 1: Effect of Rotation

3.1.1. Procedure

The trained classification and segmentation models were tested on rotated versions of the test data in order to determine the effect on accuracy. The entire test set was tested for varying amounts of rotation, ie. 10°, 20°, 30°, 45°, 90°, and 180°.

3.1.2. Classification Results

Overall Accuracy
No rotation 10° rotation 20° rotation 30° rotation 45° rotation 90° rotation 180° rotation
96.54% 94.54% 93.07% 89.50% 71.45% 33.05% 55.40%
Failure Cases Visualization
Rotation Rotated Point Cloud Ground Truth Prediction pre-rotation Prediction post-rotation Interpretation
10° Vase Vase Lamp In this case, the model identifies the object as a lamp, possibly due to the plant now appearing vertical and resembling the top of the lamp more. This also indicates that the orientation of the point cloud makes a difference in classification output.
10° Lamp Lamp Vase In this case, the opposite of the previous scenario is seen, where a slight rotation causes the model to believe that the lamp is a vase. This could be because now that the top of the lamp is tilted, it could be interpreted as a plant in a vase.
20° Chair Chair Vase At about 20° of rotation, we start seeing the accuracy of the chair class start to go down as well, and objects such as these are identified as vases, possibly due to the bottom not resembling a chair at all.
30° Lamp Lamp Chair At about 30° of rotation, we also see lamps being misclassified as chairs. Looking back at the confusion matrix in Q1 shows that this was very rarely the case without rotation.
45° Lamp Lamp Chair At about 45° of rotation, we see the most significant drop in accuracy, most likely due to the fact that objects stop looking like themselves. For eg. in this case, the object does not look much like a lamp anymore. This implies that the orientation of the input makes a difference to the model.
180° Lamp Lamp Vase In most cases, as the model depends on the orientation of the input, rotations of 90° and 180° are extremely difficult for the model to classify correctly. However, in this case, the lamp inverted ends up looking like a vase.

3.1.3. Segmentation Results

Accuracy Change with Rotation
Ground Truth No Rotation 10° Rotation 20° Rotation 30° Rotation 45° Rotation 90° Rotation Interpretation
Example 1
Segmentation
Looking at the segmentation results as the object is rotated, we can see the model has a bias to detect horizontal separation boundaries, thus reducing accuracy with more rotation.
Accuracy (Ground Truth) 94.5% 92% 85% 78% 66% 47%
Example 2
Segmentation
In this example, we see a lesser bias towards detecting horizontal boundaries, and thus the performance at higher rotations is than that in the previous example.
Accuracy (Ground Truth) 97.05% 95.35% 78% 76.98% 78.69% 58.16%
Example 3
Segmentation
The accuracy around the legs of the chair remains high even at large rotations, however, accuracy around other regions falls quickly.
Accuracy (Ground Truth) 85.15% 84.3% 75.77% 64.87% 47.47% 45.85%

3.1.4. Conclusion

Both the classification and segmentation models show a significant dependence on the orientation of the input point cloud, but are relatively unaffected by minor rotations of up to 15-20° as shown above. However, when the input point cloud is rotated further, a significant loss in accuracy is seen.

3.2. Experiment 2: Effect of Number of Points

3.2.1. Procedure

The trained classification and segmentation models were tested on undersampled versions of the test data (ie. reduced number of input points) in order to determine the effect on accuracy. The entire test set was tested for varying amounts of sampled points, ie. 100, 500, 750, 1000 and 5000.

3.2.2. Classification Results

Overall Accuracy
100 points 500 points 750 points 1000 points 5000 points 10000 points
88.56% 94.4% 95.2% 96.01% 96.32% 96.54%
Failure Cases Visualization
Points Sampled Sampled Point Cloud Ground Truth Prediction pre-sampling Prediction post-sampling Interpretation
100 Chair Chair Lamp As only 100 points are sampled, there isn't enough global information available for the classifier to accurately determine the class. Note: The overall accuracy drop is still only ~8%.
500 Vase Vase Lamp The global information provided by this subset of points is still not enough to help in accurate classification of the object.
750 Lamp Lamp Vase Due to the relatively higher similarity between the vase and lamp class, sampling 750 points is not enough to distinguish between them with confidence.
1000 Lamp Lamp Vase The same issue as the above case remains even at 1000 sampling points.
5000 Vase Vase Lamp The few cases in which differences are observed show that when more detail is present in terms of density of the point cloud, the model can perform better.

3.2.3. Segmentation Results

Accuracy Change with Points Sampled
Ground Truth 100 Points 500 Points 750 Points 1000 Points 5000 Points 10000 Points Interpretation
Example 1
Segmentation
From the results in this example, we can see good performance even upto 500 points (5% sampling), but we see a dropoff in performance below that.
Accuracy (Ground Truth) 82% 95.4% 94.5% 94.9% 94.54% 94.5%
Example 2
Segmentation
In this particular example, we see good performance even at 100 points. I believe this is due to the fact that every part of this object is already thin, and thus further thinning does not affect performance too much.
Accuracy (Ground Truth) 95% 97% 96.2% 97.5% 97.08% 97.05%
Example 3
Segmentation
In wider objects such as this one, we can see the effect of sparsity on the model more clearly, and there is a significant drop in performance on reducing the total number of points.
Accuracy (Ground Truth) 53% 70.6% 75.4% 74.9% 71.86% 72.69%

3.2.4. Conclusion

Both the classification and segmentation models show good resilience to a reduction in number of points (upto about 5% points selected out of the total). This could be due to the fact that even though the total number of points are less, the resulting points are able to reasonably represent the shape of the object. In cases where the sampling results in a significant reduction in structural information, we do see a significant loss in performance.