16-825: Learning for 3D Vision - Assignment 5 (Point Cloud Processing)

Vaibhav Parekh | Fall 2025

Q1. Classification Model

Accuracy: 97.90%

Class

Ground Truth

Prediction

Predicted class

Chair

Chair
Note: no incorrect predictions for this class

Vase

Vase

Lamp

Lamp

Vase

Interpretation: The classification model performs well, providing a high degree of accuracy. It does fail in situations where the extracted features align too closely with an incorrect class. This is because we are not training the model on any other information about the features, hence it only learns the structure of a feature while being dumb to what that feature actually is. For example, in the above set of GIFs, we can see vases being incorrectly classified as lamps, and vice-versa. This is because certain dominant features of one might be looking more like another. However, the class Chairs has no incorrect classification, since chairs are structurally distinct from vases and lamps, hence proving our hypothesis.

Q2. Segmentation Model

Accuracy: 90.28%

Ground Truth	Prediction	Accuracy
		96.20%
		98.76%
		96.94%
		41.70%
		49.46%

Interpretation: The segmentation model performs reasonably well and is generally effective at learning the distinction between different features, successfully segmenting them in most cases. However, it struggles when boundaries between regions appear merged or structurally blended, leading to ambiguity in separation and making class differentiation challenging.

Q3. Robustness Analysis

Experiment 1

Procedure: For analysing classification robustness, I rotate the input point cloud in x-axis at different angles - 15°, 30°, 45°, 60°, 75°, and 90°. This is to check the effect of increasing rotation on classification accuracy of the model. The baseline for this experiment is 0° rotation, which can be visualized in Q1.

test_dataloader = get_data_loader(args=args, train=False) rot = torch.tensor([1.5708,0,0]) R = pytorch3d.transforms.euler_angles_to_matrix(rot, 'XYZ') test_dataloader.dataset.data = (R @ test_dataloader.dataset.data.transpose(1, 2)).transpose(1, 2) test_data = test_dataloader.dataset.data test_label = test_dataloader.dataset.label

Rotation	Accuracy	Ground Truth	Prediction
15°	95.80%
30°	84.37%
45°	50.99%
60°	40.50%
75°	33.37%
90°	26.55%

Interpretation: The classification model remains robust to small rotation angles but exhibits a notable decline in performance under larger rotations. This is because rotational invariance is not inherently captured within the current architecture.

Experiment 2

Procedure: Here I test the robustness of model in both classification segmentation tasks when the number of points is decreased. To do this, I simply use the desired number of points in the argument while executing the code; for example, --num_points 7500. I do this for 2500, 5000, and 7500 points. The baseline here is 10,000 points, performance of which can be visualized in Q1 and Q2.

Num of Points	Accuracy - cls	Accuracy - seg
7500	97.69%	90.28%
5000	97.58%	90.27%
2500	97.37%	90.25%

Interpretation: The accuracy decreases only slightly as the number of points in the point cloud is reduced. However, both classification and segmentation remain feasible with a moderate number of points, indicating that the model retains robustness even with reduced input density.

Q4. Locality

Model implemented: DGCNN

Accuracy: 96.75%

Description: The architecture leverages graph-based convolution operations to extract expressive features from the point cloud data. It relies on utilities such as knn to determine k-nearest neighbors and get_graph_feature to build graph representations. The cls_model module is composed of convolutional and fully connected layers, employing batch normalization, LeakyReLU activations, and dropout, culminating in classification scores for the input point clouds.

Class	Ground Truth	Prediction	Predicted Class
Chair			Chair
Chair			Lamp
Vase			Vase
Vase			Lamp
Lamp			Lamp
Lamp			Vase