Assignment 5¶

Name: Xinyu Liu¶

Q1. Classification Model¶

Test Accuracy: 98.2%¶

Success Predictions:¶

Label: Chair

No description has been provided for this image

Label: Vase

No description has been provided for this image

Label: Lamp

No description has been provided for this image

Failure Predictions + Interpretation:¶

Note that there is no wrong prediction for Class 0 (Chair) in the test set. The model learns the class Chair better, and also infers that the class Chair has a more distinct shape than vase or lamp.
Predicted: Lamp; Ground Truth: Vase. This misclassification happens likely due to the shape similarity between this vase and a lamp. The model might misinterpret the flower as a lampshade, and the vase body as a lamp base.

No description has been provided for this image

Predicted: Vase; Ground Truth: Lamp. This misclassification happens again likely due to the shape similarity between this lamp and a vase. The model might misinterpret the lamp base as a vase body.

No description has been provided for this image

Interpretation:¶

Q2. Segmentation Model¶

Report the test accuracy.

Visualize segmentation results of at least 5 objects (including 2 bad predictions) with corresponding ground truth, report the prediction accuracy for each object, and provide interpretation in a few sentences.

Test Accuracy: 90.3%¶

Good Predictions:¶

Ground Truth #1:

GIF

Predicted #1:

GIF

Ground Truth #2:

GIF

Predicted #2:

GIF

Ground Truth #3:

GIF

Predicted #3:

GIF

Bad Predictions + Interpretation:¶

The predicted segmentation is imperfect because this chair has a very unique shape compared to a typical chair. It has only a very small seat area, but the model predicts a much broader region as the seat, which the ground truth labels instead as part of the legs.

Ground Truth #4:

GIF

Predicted #4:

GIF

This chair is a recliner, so it has a large seat area that extends past the backrest. This shape is rare in the dataset, causing the model to misinterpret that extended seat region as part of the legs.

Ground Truth #5:

GIF

Predicted #5:

GIF

Q3. Robustness Analysis¶

Q3.1 Rotation¶

Procedure:¶

To test the robustness of the model under rotation, I applied a random 3D rotation matrix to every test point cloud during evaluation. Specifically, for each batch of test samples, I generated a rotation matrix using random_rotation_matrix() and multiplied each point cloud by this matrix before feeding it to the model.

Q 3.1.1 Classification Model¶

Test Accuracy: 30.8%¶

Success Predictions:¶

Note that the visualizations below show the original orientation of the test point clouds before applying any rotations.

The successful examples are those with clear and more symmetric shapes, so small rotations do not significantly alter their geometry. As a result, the model is still able to predict the correct label.

Label: Chair

No description has been provided for this image

Label: Vase

No description has been provided for this image

Label: Lamp

No description has been provided for this image

Failure Predictions:¶

Note that the visualizations below show the original orientation of the test point clouds before applying any rotations.

We observe more failure cases after applying rotations. For example, previously there were no misclassifications for the class chair, but now some chairs are incorrectly predicted.

Predicted: Lamp ; Ground Truth: Chair
Predicted: Lamp; Ground Truth: Vase

No description has been provided for this image

Predicted: Chair; Ground Truth: Lamp

No description has been provided for this image

Comparison:¶

The accuracy drops noticeably when random rotations are applied. This behavior is expected because the model is PointNet-based architecture without T-Net (tranformation blocks). So the omdel does not learn a canonical alightment of the input point cloud. And as the input orientation changes, the learned features are no longer stable, leadning to misclassification.

Q 3.2.2 Segmentation Model¶

Test Accuracy: 31.0%¶

Predictions:¶

Note that the visualizations below show the original orientation of the test point clouds before applying any rotations.

Ground Truth #1:

GIF

Predicted #1:

GIF

Ground Truth #2:

GIF

Predicted #2:

GIF

Ground Truth #3:

GIF

Predicted #3:

GIF

Comparison:¶

After rotation, the segmentation results degrade significantly. Points that are actually at the bottom are predicted as legs, points above are predicted as seat, and so on. This happens because the model relies heavily on the absolute spatial positions of points learned during training, rather than fully rotation-invariant features. Without a T-Net or rotation augmentation, the network cannot correctly align unusual orientations, leading to widespread mislabeling across parts.

Q3.2 num_points = 3000¶

Procedure:¶

To test the robustness of the model to input point density, I randomly sampled 3,000 points from each test point cloud, which originally contained 10,000 points. The models were then evaluated on these subsampled inputs.

Q 3.2.1 Classification Model¶

Test Accuracy: 98.2%¶

Success Predictions¶

Label: Chair

No description has been provided for this image

Label: Vase

No description has been provided for this image

Label: Lamp

No description has been provided for this image

Failure Predictions:¶

Note that there is no wrong prediction for Class 0 (Chair) in the test set.
Predicted: Lamp; Ground Truth: Vase

No description has been provided for this image

Predicted: Vase; Ground Truth: Lamp

No description has been provided for this image

Comparison:¶

The performance of the classification model remained largely unchanged compared to using the full 10,000 points. This is because the model relies on global shape features, aggregated via the PointNet’s symmetric max-pooling operation. Even with fewer points, the overall geometry and distinctive structures of the objects are preserved, allowing the model to correctly predict the object class.

Q 3.2.2 Segmentation Model¶

Test Accuracy: 90.2%¶

Good Predictions:¶

Ground Truth #1:

GIF

Predicted #1:

GIF

Ground Truth #2:

GIF

Predicted #2:

GIF

Ground Truth #3:

GIF

Predicted #3:

GIF

Bad Predictions:¶

Ground Truth #4:

GIF

Predicted #4:

GIF

Ground Truth #5:

GIF

Predicted #5:

GIF

Comparison:¶

The performance of the segmentation model remained largely unchanged compared to using the full 10,000 points. This is because the segmentation network extracts local features for each point but also leverages global context through shared MLPs and pooling. As long as the subsampled points adequately cover the object’s key parts (e.g., chair legs, seat, backrest), the model can still assign correct part labels. Minor reductions in point density do not significantly affect the learned local or global features, which explains the robustness.

Q4. Bonus Question - Locality¶

Q4.1 Classification Model¶

Model Implemented:¶

I use EdgeConv layers with get_graph_feature() and k-nearest neighbors (KNN) to extract local neighborhood features. For each point, the network concatenates the point’s features with differences to its neighbors (neighbors - x) before applying convolutions. This allows the network to capture local geometric relationships (e.g., edges, corners) in addition to global shape.

Test Accuracy: 98.2%¶

Success Predictions¶

Label: Chair

No description has been provided for this image

Label: Vase

No description has been provided for this image

Label: Lamp

No description has been provided for this image

Failure Predictions:¶

Predicted: Lamp; Ground Truth: Chair
Predicted: Lamp; Ground Truth: Vase

No description has been provided for this image

Predicted: Vase; Ground Truth: Lamp

No description has been provided for this image

Comparison:¶

For the classification model, the accuracy remains roughly the same after using the PointNet++ architecture, likely because the global shape information is already sufficient for distinguishing the three classes.

Q4.2 Segmentation Model¶

Model Implemented:¶

I use Multiple EdgeConv layers stack local features hierarchically, so that the network can encode both local patterns (small structures) and global context (overall shape).

conv1 extracts low-level local features (32-dim per point).
conv2 extracts higher-level local features (64-dim per point).
conv3 computes global features (128-dim per point) by max pooling over all points.

Test Accuracy: 91.5%¶

Predictions:¶

Ground Truth #1:

GIF

Prediction #1 by PointNet:

GIF

Prediction #1 by PointNet++:

GIF

Ground Truth #2:

GIF

Prediction #2 by PointNet:

GIF

Prediction #2 by PointNet++:

GIF

Ground Truth #3:

GIF

Prediction #3 by PointNet:

GIF

Prediction #3 by PointNet++:

GIF

Comparison:¶

For the segmentation model, the results show noticeable improvements in certain object parts. For example, the chair seat area is more accurate (example #1), the chair backrest is more clearly segmented (example #2), and other fine structures are better distinguished. This demonstrates that PointNet++’s multi-scale local feature extraction benefits per-point labeling, even if overall classification does not change significantly.