16-825 Learning for 3D Vision
Implemented a PointNet-based architecture for classifying point clouds into 3 object categories: chairs, vases, and lamps.
Model Performance: The classification model achieves 97.38% accuracy on the test set, demonstrating effective learning of geometric features across different object categories.
Random test point clouds with their predicted classes:
Chair - Correctly Classified
Chair - Correctly Classified
Chair - Correctly Classified
Vase - Correctly Classified
Vase - Correctly Classified
Vase - Correctly Classified
Lamp - Correctly Classified
Lamp - Correctly Classified
Lamp - Correctly Classified
Analysis of model failure cases across the three object categories:
Predicted: Chair | Actually: Vase
The base of this vase has a very rectangular shape which could be mistaken for the seat or base of many types of chairs, causing the model to misclassify it.
Predicted: Vase | Actually: Chair
This is a folded chair with geometry that differs significantly from typical chairs. The compact, cylindrical form when folded resembles a vase more than a traditional chair structure, leading to the misclassification.
Predicted: Lamp | Actually: Chair
This chair has an unusually tall backrest that creates a vertical, post-like structure. This distinctive geometry caused the model to mistake it for a lamp rather than recognize it as a chair.
Implemented a PointNet-based architecture for semantic segmentation of chair point clouds into 6 parts.
Per-Object Statistics: Mean accuracy: 90.11% | Std: 9.61% | Min: 34.08% | Max: 99.74%
Ground truth vs predicted segmentations for chair point clouds:
Accuracy: 99.74%
Accuracy: 99.74%
Accuracy: 99.47%
Accuracy: 99.47%
Accuracy: 99.41%
Accuracy: 99.41%
Accuracy: 99.36%
Accuracy: 99.36%
Accuracy: 99.27%
Accuracy: 99.27%
Examples where the segmentation model struggled:
Accuracy: 34.08%
Accuracy: 34.08%
Interpretation: This chair has a strange lounger design which doesn't conform to the typical chair design of having legs, a back, and a seat. It features unusual side pieces sticking out and a strange base, making this type of chair quite difficult for the model to understand and segment properly.
Accuracy: 47.24%
Accuracy: 47.24%
Interpretation: This chair has an oddly blocky and long design with what appears to be an ottoman or extension towards the bottom, which is uncommon in the dataset. This causes poor segmentation in that area as well as the bottom underneath it, as visible in the prediction. While the handles and back of the chair are well segmented, it's the longer extension piece with its unusual shape that is poorly segmented by the model.
The segmentation model achieves strong overall performance with a mean accuracy of 90.11%. The model excels at segmenting well-defined chair structures with clear boundaries between parts.
However, performance degrades significantly on chairs with:
The wide standard deviation (9.61%) and the presence of outliers (min: 34.08%) indicate that while the model generalizes well to common chair types, it struggles with edge cases that deviate from the training distribution.
We tested model robustness to sparse point cloud inputs by randomly sampling different numbers of points from the original 10,000-point objects. We tested with 100, 500, 1000, 2000, 5000, and 10000 points (1%, 5%, 10%, 20%, 50%, and 100% of the original density). The same trained model was evaluated on these varying point densities without any retraining or fine-tuning.
| Number of Points | % of Original | Test Accuracy | Accuracy Drop |
|---|---|---|---|
| 10,000 (baseline) | 100% | 97.38% | - |
| 5,000 | 50% | 97.17% | -0.21% |
| 2,000 | 20% | 96.43% | -0.95% |
| 1,000 | 10% | 96.22% | -1.16% |
| 500 | 5% | 94.86% | -2.52% |
| 100 | 1% | 92.44% | -4.94% |
| Number of Points | % of Original | Test Accuracy | Accuracy Drop |
|---|---|---|---|
| 10,000 (baseline) | 100% | 90.11% | - |
| 5,000 | 50% | 90.09% | -0.02% |
| 2,000 | 20% | 89.98% | -0.13% |
| 1,000 | 10% | 89.17% | -0.94% |
| 500 | 5% | 87.91% | -2.20% |
| 100 | 1% | 80.21% | -9.90% |
Classification: Chair (100 points)
Demonstrates classification with only 1% of original point density
Classification: Vase (100 points)
Model maintains 92.44% accuracy despite extreme sparsity
Segmentation GT (100 points)
Ground truth segmentation with sparse input
Segmentation Pred (100 points)
Prediction maintains 80.21% accuracy with limited points
Both models demonstrate strong robustness to sparse point clouds, with classification being slightly more resilient than segmentation.
Classification: Even with only 100 points (1% of original), the model maintains 92.44% accuracy - only a 4.94% drop. This remarkable performance suggests PointNet's global max pooling effectively captures essential shape features from very sparse data.
Segmentation: Shows similar robustness down to 500 points (87.91%, -2.20% drop), but degrades more at 100 points (80.21%, -9.90% drop). This makes sense since segmentation requires finer-grained local feature discrimination, which needs sufficient point density to accurately label individual points.
Key insight: Both tasks can operate effectively with 500-1000 points (5-10% of original density) with minimal accuracy loss, making PointNet practical for real-world scenarios with limited sensor resolution or computational constraints.
We tested the model's robustness to 3D rotations by rotating input point clouds around the z-axis (vertical axis) by various angles: 0°, 45°, 90°, and 180°. The rotation was applied to the entire test set using 3D rotation matrices, and the same trained model (without rotation augmentation during training) was evaluated on the rotated inputs.
| Rotation Angle | Test Accuracy | Accuracy Drop |
|---|---|---|
| 0° (baseline) | 97.38% | - |
| 45° | 32.53% | -64.85% |
| 90° | 53.20% | -44.18% |
| 180° | 62.12% | -35.26% |
| Rotation Angle | Test Accuracy | Accuracy Drop |
|---|---|---|
| 0° (baseline) | 90.11% | - |
| 45° | 56.32% | -33.79% |
| 90° | 35.95% | -54.16% |
| 180° | 32.06% | -58.05% |
Classification: Chair (45° rotation)
Rotated input causes severe performance degradation
Classification: Vase (45° rotation)
Accuracy drops to 32.53% at 45° rotation
Segmentation GT (45° rotation)
Ground truth for rotated input
Segmentation Pred (45° rotation)
Prediction quality severely degraded by rotation
Both models show severe sensitivity to rotations, confirming that PointNet is NOT rotation-invariant.
Classification: Performance catastrophically degrades, with 45° rotation being worst (32.53%, -64.85% drop). Interestingly, accuracy partially recovers at 90° (53.20%) and 180° (62.12%), suggesting the model may exploit certain symmetries in the training data. Chairs and lamps may have some rotational symmetry that helps at orthogonal angles, but arbitrary rotations still cause severe failures.
Segmentation: Shows even worse degradation, dropping from 90.11% to 32.06% at 180° rotation (-58% drop). The 45° rotation causes a 34% loss. Unlike classification, segmentation shows monotonic degradation, as per-point labeling requires precise spatial feature alignment.
This behavior is expected for vanilla PointNet, which processes raw (x,y,z) coordinates directly. The learned features are tied to the canonical orientation seen during training. Without rotation augmentation or rotation-invariant feature extraction (like using relative distances or normals), the model cannot generalize to rotated inputs.
Key insight: For real-world deployment where object orientations are arbitrary, this model would require either: (1) rotation augmentation during training, (2) canonical pose normalization at inference, or (3) rotation-invariant architectures like PointNet++.