Q1. Classification Model (40 points)¶
Best Accuracy Score: 98.22%
For better visualization, here I use the following mapping rule to identify the predicted class:
- Red = Chair
- Blue = Lamp
- Green = Vase
Success Cases¶
| Chair | Lamp | Vase |
|---|---|---|
Failure Cases¶
- Chair Interpretation: The trained model did not fail on any chair instance.
| Visualization | Ground Truth | Prediction |
|---|---|---|
| Lamp | Vase | |
| Lamp | Vase | |
| Lamp | Vase |
- Lamp interpretation: Many of the lamp samples misclassified as vases have a smooth, cylindrical appearance with fewer protrusions, which confuses the model given the shape similarity to vases.
| Visualization | Ground Truth | Prediction |
|---|---|---|
| Vase | Lamp | |
| Vase | Lamp | |
| Vase | Chair |
Q2. Segmentation Model (40 points)¶
Best Accuracy Score: 90.24%
Good Cases¶
| Ground truth segmentation | Predicted segmentation | Accuracy |
|---|---|---|
| 99% | ||
| 99% | ||
| 99% |
Bad Cases¶
| Ground truth segmentation | Predicted segmentation | Accuracy | Interpretation |
|---|---|---|---|
| 51% | The model confuses neighboring regions—yellow armrest and side-panel areas are incorrectly split into red and blue. Backrest and seat predictions are partially correct but suffer from noisy boundaries, reducing accuracy. These artifacts suggest the model has difficulty with overlapping structures. | ||
| 56% | The model captures major regions but produces poorly defined boundaries. Side and back panels are confused, with label bleeding across segments, indicating difficulty handling spatial discontinuities and geometrically similar parts. |
Q3. Robustness Analysis (20 points)¶
Rotation Around Z Axis¶
Procedure: We evaluated the classification and segmentation performance by rotating each input point cloud around the z-axis, from 0° to 90°, including 0°, 5°, 45° and 90°. This was done to test the model's sensitivity to orientation, which is important as PointNet is not inherently rotation-invariant.
Comparison: For the classification task, its performance drops from 98.22% to 22.25% after changing rotation degree from 0 to 90. For the segmentation task, its performance drops from 90.24% to 38.61% after changing rotation degree from 0 to 90.
Interpretation: For the classification task, the model tends to misclassify all objects as the dominant class under heavy rotation, suggesting limited rotational generalization. For the segmentation task, the model overfits to canonical orientations and lacks robust feature extraction for rotated geometries.
Classification¶
| Rotation Degree | Accuracy | Ground Truth Chair (RED) | Ground Truth Lamp (BLUE) | Ground Truth Vase (GREEN) | Observation |
|---|---|---|---|---|---|
| 0 | 98.22% |
|
|||
| 5 | 97.27% |
|
|||
| 45 | 52.36% |
|
|||
| 90 | 22.25% |
|
Segmentation¶
object_id=0
| Rotation Degree | Accuracy | Original GT | Original Pred | Transformed Pred |
|---|---|---|---|---|
| 0 | 94.29% | |||
| 5 | 94.83% | |||
| 45 | 75.25% | |||
| 90 | 40.28% |
object_id=50
| Rotation Degree | Accuracy | Original GT | Original Pred | Transformed Pred |
|---|---|---|---|---|
| 0 | 90.22% | 5 | 90.65% | |
| 45 | 69.24% | |||
| 90 | 35.86% |
object_id=100
| Rotation Degree | Accuracy | Original GT | Original Pred | Transformed Pred |
|---|---|---|---|---|
| 0 | 90.22% | |||
| 5 | 90.65% | |||
| 45 | 62.38% | |||
| 90 | 34.32% |
Number of Points¶
Procedure: We varied the number of input points during evaluation using the --num_points, including 10000, 1000, 100 and 10. Point clouds were downsampled to simulate sparse input conditions.
Comparison: For the classification task, its performance drops from 98.22% to 26.02% after changing the number of points from 10,000 to 10. For the segmentation task, its performance drops from 90.24% to 68.33% after changing the number of points from 10,000 to 10.
Interpretation: For both the classification and segmentation task, it indicates that classification is relatively robust as long as over 1000 points are present.
Classification¶
| Points# | Accuracy | Ground Truth Chair (RED) | Ground Truth Lamp (BLUE) | Ground Truth Vase (GREEN) | Observation |
|---|---|---|---|---|---|
| 10000 | 98.22% |
|
|||
| 1000 | 97.59% |
|
|||
| 100 | 92.03% |
|
|||
| 10 | 26.02% |
|
Segmentation¶
object_id=0
| Points# | Accuracy | Original GT | Original Pred | Transformed Pred |
|---|---|---|---|---|
| 10000 | 94.29% | |||
| 1000 | 95.10% | |||
| 100 | 98.00% | |||
| 10 | 70.00% |
object_id=50
| Points# | Accuracy | Original GT | Original Pred | Transformed Pred |
|---|---|---|---|---|
| 10000 | 94.29% | |||
| 1000 | 92.00% | |||
| 100 | 91.00% | |||
| 10 | 90.00% |
object_id=100
| Points# | Accuracy | Original GT | Original Pred | Transformed Pred |
|---|---|---|---|---|
| 10000 | 94.29% | |||
| 1000 | 94.60% | |||
| 100 | 72.00% | |||
| 10 | 60.00% |
Q4. Bonus Question - Locality (20 points)¶
Classification¶
Best Accuracy Score¶
- PointNet: 98.22%
- PointNet++: 98.95%
- Red = Chair
- Blue = Lamp
- Green = Vase
| GT Class | PointNet Prediction | PointNet++ Prediction |
|---|---|---|
| Vase | ||
| Vase | ||
| Lamp | ||
| Lamp |
Rotation
| Network Type | Rotation Degree | Accuracy | Ground Truth Chair (RED) | Ground Truth Lamp (BLUE) | Ground Truth Vase (GREEN) | Observation |
|---|---|---|---|---|---|---|
| PointNet | 45 | 52.36% |
|
|||
| PointNet++ | 45 | 80.06% |
|
|||
| PointNet | 90 | 22.25% |
|
|||
| PointNet++ | 90 | 50.05% |
|
Number of Point
| Network Type | Points# | Accuracy | Ground Truth Chair (RED) | Ground Truth Lamp (BLUE) | Ground Truth Vase (GREEN) | Observation |
|---|---|---|---|---|---|---|
| PointNet | 1000 | 97.59% |
|
|||
| PointNet++ | 1000 | 91.40% |
|
Interpretation:
PointNet++ demonstrates significant improvements over PointNet in several key aspects, while also revealing some trade-offs:
Overall Accuracy: PointNet++ achieves a slightly higher accuracy (98.95% vs 98.22%) on the standard test set, indicating that the hierarchical feature extraction with locality awareness helps capture more discriminative features. The hierarchical sampling and grouping mechanism allows the model to learn multi-scale features, which can better distinguish subtle differences between classes.
Rotation Robustness: PointNet++ shows dramatically better performance under rotation transformations. At 45° rotation, PointNet++ maintains 80.06% accuracy compared to PointNet's 52.36%, and at 90° rotation, PointNet++ achieves 50.05% versus PointNet's 22.25%. This improvement can be attributed to PointNet++'s local neighborhood processing: by aggregating features within local regions at multiple scales, the model learns more rotation-invariant representations. The hierarchical structure allows the network to capture geometric relationships that are less dependent on absolute orientation, whereas PointNet's global max-pooling is more sensitive to point cloud orientation.
Point Density Sensitivity: Interestingly, PointNet++ shows slightly lower performance (91.40% vs 97.59%) when evaluated with 1000 points compared to PointNet. This suggests that PointNet++'s hierarchical sampling strategy requires sufficient point density to effectively form local neighborhoods. With fewer points, the farthest point sampling and ball query operations may not capture representative local structures, leading to degraded performance. PointNet's simpler architecture, which directly processes all points, is more robust to moderate point density reductions.
Conclusion: PointNet++'s hierarchical architecture with locality provides substantial benefits for rotation robustness and overall accuracy, making it more suitable for real-world applications where objects may appear in various orientations. However, it requires sufficient point density to leverage its hierarchical sampling effectively, which is an important consideration for sparse point cloud scenarios.
Segmentation¶
Best Accuracy Score¶
- PointNet: 90.24%
- PointNet++: 91.97%
| Network Type | Accuracy | Ground Truth | Prediction |
|---|---|---|---|
| PointNet | 94.29% | ||
| PointNet++ | 95.61% |
Rotation
| Network Type | Rotation Degree | Accuracy | Ground Truth | Prediction |
|---|---|---|---|---|
| PointNet | 45 | 75.25% | ||
| PointNet++ | 45 | 72.86% | ||
| PointNet | 90 | 40.28% | ||
| PointNet++ | 90 | 42.16% |
Number of Points
| Network Type | Points# | Accuracy | Ground Truth | Prediction |
|---|---|---|---|---|
| PointNet | 1000 | 95.10% | ||
| PointNet++ | 1000 | 95.70% |
Segmentation Rotation Robustness Interpretation:
Unlike classification, PointNet++ segmentation does not show significant improvement over PointNet under large rotations (45°: 72.86% vs 75.25%, 90°: 42.16% vs 40.28%).
Conclusion: The hierarchical architecture that makes PointNet++ superior for classification (where a single global representation is needed) becomes a limitation for segmentation under rotation, where precise per-point spatial alignment is critical. This highlights that architectural choices must be carefully considered for the specific task and robustness requirements.