Q1. Classification Model (40 points)

I had to use 1000 points for the evaluation because of GPU out of memory errors. The test accuracy was 0.9759

Correct examples

alt text GT label is chair, predicted label is chair.

alt text GT label is vase, predicted label is vase.

alt text GT label is lamp, predicted label is lamp.

Failure cases

alt text GT label is chair, predicted label is lamp. There was only one misclassification for the chair class, so the model performs well for the chair class. When visualizing this point cloud, it seems like this chair does not have a clear 3D structure compared to the previous chair visualization. In other words, the seat of the chair in the positive sample is clearly defined, but with this sample, it is not very clear.

alt text GT label is vase, predicted label is lamp. This may be due to the fact that I was forced to use a smaller number of points due to GPU constraints. In the point cloud, we can see that there is a significant part of the point cloud that is missing. This may cause the model to make an incorrect classification.

alt text GT label is lamp, predicted label is vase. The lamp in this example is very long and thin. The previous visualization of the lamp had a clear defining shape. This root cause of this failure case may be that the global features are not strong enough for the model to correctly identify it as a lamp. The shape of this lamp is very thin and doesn't have any defining cone-like features, like the previous lamp example.

Q2. Segmentation Model (40 points)

Due to GPU constraints, I ran with num_points==900. The test accuracy was 0.8937.

The following visualizations show GT first, prediction second.

Example 1 (good) Accuracy: 0.8932. The segmentation results are accurate. alt text

Example 2 (good) Accuracy: 0.8920. The segmentation results are accurate. alt text

Example 3 (good) Accuracy: 0.8938. The segmentation results are accurate. alt text

Example 4 (bad) Accuracy: 0.8789. The qualitative results show that the in the prediction, the back of the chair (cyan) gets spread into the bottom. This may be because this sample has a lot more classes than the previous three, so it was more complex to predict.
alt text

Example 5 (bad) Accuracy: 0.8821. In this sample, we can see that the red class bleeds through to the bottom of the chair, which causes the poor performance. This may be caused by the fact that there are a lot more classes, similar to the previous example.
alt text

Q3. Robustness Analysis (20 points)

Rotate input point clouds

Classification

I rotated the input point clouds by 15, 30, 60, and 90 degrees counterclockwise along the z-axis inside eval_cls.py and ran the evaluation. I report quantitative and qualitative metrics below.

Quantitative Metrics:
original test accuraacy: 0.9759
15-degrees test accuracy: 0.9192
30-degrees test accuracy: 0.6631
60-degrees test accuracy: 0.3431
90-degrees test accuracy: 0.2339

Sample 0

Qualitative Metrics: 15-degree visualization. The GT label was chair, and predicted label was chair. alt text

30-degree visualization. The GT label was chair, and the predicted label was chair. alt text

60-degree visualization. The GT label was chair, and the predicted label was vase. alt text

90-degree visualization. The GT label was chair, and the predicted label was vase. alt text

Sample 1

Qualitative Metrics: 15-degree visualization. The GT label was lamp, and predicted label was lamp.
alt text

30-degree visualization. The GT label was lamp, and the predicted label was lamp.
alt text

60-degree visualization. The GT label was lamp, and the predicted label was vase.
alt text

90-degree visualization. The GT label was lamp, and the predicted label was vase.
alt text

Analysis/Interpretation

Based on these results, it is clear that the initially trained model is not robust to rotation. As the magnitude of the rotation increases, the overall test accuracy decreases. We can see that with slight rotations, the model is able to accurately classify the first example in the test set (chair), but with 60,90 degree rotations, the model makes an incorrect classification.

From the training data, the model learned orientation-specific features. We can mitigate the performance regression by applying data augmentation. By randomly adding rotations to the training point cloud data, we can train the model to be more robust to rotations. We can also try to inject rotation-invariant features that tell us relative distances/angles between points. If the model learns that points are aligned in a certain way (with respect to each other), it may make more accurate predictions on rotated inputs.

Segmentation

The procedure for rotating input point clouds on the segmentation task is the same as described above. We rotate the input point clouds by 15, 30, 60, and 90 degrees counterclockwise along the z-axis inside eval_seg.py. I report quantitative and qualitative metrics below.