16-825: Learning for 3D Vision - Assignment 5 (Point Cloud Processing)

Vaibhav Parekh | Fall 2025

Q1. Classification Model

Accuracy: 97.90%

Class Ground Truth Prediction Predicted class
Chair
Chair
Note: no incorrect predictions for this class
Vase
Vase
Vase
Lamp
Lamp
Lamp
Lamp
Vase

Interpretation: The classification model performs well, providing a high degree of accuracy. It does fail in situations where the extracted features align too closely with an incorrect class. This is because we are not training the model on any other information about the features, hence it only learns the structure of a feature while being dumb to what that feature actually is. For example, in the above set of GIFs, we can see vases being incorrectly classified as lamps, and vice-versa. This is because certain dominant features of one might be looking more like another. However, the class Chairs has no incorrect classification, since chairs are structurally distinct from vases and lamps, hence proving our hypothesis.

Q2. Segmentation Model

Accuracy: 90.28%

Ground Truth Prediction Accuracy
96.20%
98.76%
96.94%
41.70%
49.46%

Interpretation: The segmentation model performs reasonably well and is generally effective at learning the distinction between different features, successfully segmenting them in most cases. However, it struggles when boundaries between regions appear merged or structurally blended, leading to ambiguity in separation and making class differentiation challenging.

Q3. Robustness Analysis

Experiment 1

Procedure: For analysing classification robustness, I rotate the input point cloud in x-axis at different angles - 15°, 30°, 45°, 60°, 75°, and 90°. This is to check the effect of increasing rotation on classification accuracy of the model. The baseline for this experiment is 0° rotation, which can be visualized in Q1.

test_dataloader = get_data_loader(args=args, train=False)
rot = torch.tensor([1.5708,0,0])
R = pytorch3d.transforms.euler_angles_to_matrix(rot, 'XYZ')
test_dataloader.dataset.data = (R @ test_dataloader.dataset.data.transpose(1, 2)).transpose(1, 2)

test_data = test_dataloader.dataset.data
test_label = test_dataloader.dataset.label

Rotation Accuracy Ground Truth Prediction
15° 95.80%
30° 84.37%
45° 50.99%
60° 40.50%
75° 33.37%
90° 26.55%

Interpretation: The classification model remains robust to small rotation angles but exhibits a notable decline in performance under larger rotations. This is because rotational invariance is not inherently captured within the current architecture.

Experiment 2

Procedure: Here I test the robustness of model in both classification segmentation tasks when the number of points is decreased. To do this, I simply use the desired number of points in the argument while executing the code; for example, --num_points 7500. I do this for 2500, 5000, and 7500 points. The baseline here is 10,000 points, performance of which can be visualized in Q1 and Q2.

Num of Points Accuracy - cls GT - cls Pred - cls Accuracy - seg GT - seg Pred - seg
7500 97.69% 90.28%
5000 97.58% 90.27%
2500 97.37% 90.25%

Interpretation: The accuracy decreases only slightly as the number of points in the point cloud is reduced. However, both classification and segmentation remain feasible with a moderate number of points, indicating that the model retains robustness even with reduced input density.

Q4. Locality

Model implemented: DGCNN

Accuracy: 96.75%

Description: The architecture leverages graph-based convolution operations to extract expressive features from the point cloud data. It relies on utilities such as knn to determine k-nearest neighbors and get_graph_feature to build graph representations. The cls_model module is composed of convolutional and fully connected layers, employing batch normalization, LeakyReLU activations, and dropout, culminating in classification scores for the input point clouds.

Class Ground Truth Prediction Predicted Class
Chair Chair
Chair Lamp
Vase Vase
Vase Lamp
Lamp Lamp
Lamp Vase