Assignment5

Guying Lin (guyingl)

Q1. Classification Model (40 points)

poch: 249 train loss: 11.1515 test accuracy: 0.9224

Successful Cases

Class: vase

Class: lamp

Class: chair

Failure Cases

Predicted label: vase | GT label: chair

Predicted label: lamp | GT label: vase

Predicted label: chair | GT label: lamp

Misclassifications usually arise when the input point cloud has an ambiguous geometry—either it strongly resembles shapes from another category, or it represents an uncommon structure not well covered in the training data. That said, distinctive fine-grained patterns do exist in these examples. Since the current model does not explicitly encode such local geometric cues, enhancing its ability to extract local features would likely decrease these types of errors.

Q2. Segmentation Model

test accuracy: 0.8927

Successful Cases

Accuracy: 0.953 (Left: pred | Right: GT)

Accuracy: 0.983 (Left: pred | Right: GT)

Accuracy: 0.979 (Left: pred | Right: GT)

Failure Cases

Accuracy: 0.536 (Left: pred | Right: GT)

Accuracy: 0.479 (Left: pred | Right: GT)

The segmentation errors mostly occur when the chair geometry deviates from the common patterns seen in the training set, or when certain parts are not clearly distinguishable. Given the unclear structure, this type of misclassification is fairly reasonable.

Q3. Robustness Analysis

I conducted two robustness analyses. The first one examines how performance changes when varying the number of input points. For both tasks, I evaluated the models using the following point counts: 10000, 5000, 2500, 1000, 500, and 100, and recorded the corresponding accuracies. The second analysis introduces Gaussian noise to the input. I tested several noise levels with standard deviations 0, 0.05, 0.1, and 0.2. Specifically, the noise was added by applying zero-mean Gaussian perturbations independently to the position of each point.

Points Number

The following table summarizes the accuracy variations of both models under different input point counts. Both models remained quite stable under point reduction — even with only 100 points, the performance barely dropped.

Points Number 10000 5000 2500 1000 500 100
Classification 0.9213 0.9171 0.9223 0.9234 0.9192 0.8982
Segmentation 0.8927 0.8925 0.8921 0.8871 0.8779 0.8183

I visualize some of the examples on both tasks.

Classification

Correct prediction for chair. From left to right the points numbers are 100, 1000 and 10k.

Correct prediction for vase. From left to right the points numbers are 100, 1000 and 10k.

Segmentation

Accuracy: 0.91 | Chair | 100 points | Left is GT, right is prediction.

Accuracy: 0.95 | Chair | 10000 points | Left is GT, right is prediction.

Accuracy: 0.49 | Chair | 100 points | Left is GT, right is prediction.

Accuracy: 0.54 | Chair | 10000 points | Left is GT, right is prediction.

Gaussian Noise

The following table summarizes the accuracy variations of both models under different levels of Gaussian noise. In contrast to point reduction, adding Gaussian noise affects the models much more severely. As the noise level increases, both classification and segmentation accuracy degrade noticeably. This happens because both models rely on max-pooling to extract stable global features — which still work when we simply use fewer points. But once noise is injected into every point, the geometric structure becomes distorted, and the features extracted by the network are no longer reliable.

std 0 0.05 0.1 0.2
Classification 0.9213 0.9014 0.8520 0.6002
Segmentation 0.8927 0.8110 0.6779 0.5356

Classification

Left: noise std 0.2, predicted label: lamp | Right: no noise, predicted label: chair | GT label: Chair

Left: noise std 0.1, predicted label: chair | Right: no noise, predicted label: chair | GT label: Chair

Segmentation

Accuracy: 0.18 | Chair | noise std = 0.2 | Left is GT, right is prediction.

Accuracy: 0.83 | Chair | noise std = 0 | Left is GT, right is prediction.

Accuracy: 0.75 | Chair | noise std = 0.2 | Left is GT, right is prediction.

Accuracy: 0.94 | Chair | noise std = 0 | Left is GT, right is prediction.

Q4. Expressive architectures

I implemented PointNet++ in models.py. Due to time constraints, I compared PointNet and PointNet++ using their checkpoints at 30 training epochs.

Metric PointNet PointNet++
Classification Accuracy 0.9013 0.9706

One example where PointNet++ succeeds but PointNet fails is shown below: PointNet predicts lamp, while PointNet++ correctly predicts chair.