Q1: Classification Model (40 points)
Overview
Results
Test Accuracy: 98.0%
Per-Class Accuracy:
- Chairs: 99.84% (616/617)
- Vases: 91.18 (93/102)
- Lamps: 96.15 (225/234)
Training Time: 1h 53 min
Visualizations
Sample Predictions
Prediction: Chair
Ground Truth: Chair
Prediction: Lamp
Ground Truth: Lamp
Prediction: Vase
Ground Truth: Vase
Failure Analysis
Chair Failures
Predicted: Lamp
Ground Truth: Chair
Interpretation: This is likely predicted as lamp due to a thin and elongated back rest with two legs while most chairs have four legs.
Vase Failures
Predicted: Lamp
Ground Truth: Vase
Interpretation: The pointcloud looks like a chandrlier lamp, especially the bottom parts.
Lamp Failures
Predicted: Vase
Ground Truth: Lamp
Interpretation: Since this model was trained without locality, the shape without the base legs looks like a vase in some aspects.
Metrics & Detailed performance analysis
Precision, Recall, F1-Score:
- Chairs: Precision: 0.9984, Recall: 0.9984, F1-Score: 0.9944
- Vases: Precision: 0.9118, Recall: 0.9118, F1-Score: 0.9118
- Lamps: Precision: 0.9615, Recall: 0.9615, F1-Score: 0.9615
Confusion Matrix:
Analysis:
- Chair class has the highest accuracy, indicating that the model is very effective at identifying chairs. The model has been trained with a large number of chair samples (4489), which likely contributes to this high performance. The other classes were having 741 samples and 1554 samples respectively which clearly shows data imbalance and likely the reason why the accuracy of chair is very high.
- As we can see from the confusion matrix that chair class has been learned very well but there is equal confusion among vase and lamp and this is expected due to structural similarity among the two.
Q2: Segmentation Model (40 points)
Overview
Implemented a PointNet-based architecture for semantic segmentation of chair point clouds into 6 semantic classes.
Results
Test Accuracy: 89.91%
Segmentation Visualizations
Below are segmentation results for 5 objects (including 2 failure cases):
Object 1 - Good Prediction
Ground Truth
Prediction
Accuracy: 90.46%
Object 2 - Good Prediction
Ground Truth
Prediction
Accuracy: 93.80%
Object 3 - Good Prediction
Ground Truth
Prediction
Accuracy: 97.28%
Object 4 - Failure Case (< 60%)
Ground Truth
Prediction
Accuracy: 50.83%
Object 5 - Failure Case (< 60%)
Ground Truth
Prediction
Accuracy: 58%
Object 6 - Failure Case (< 60%)
Ground Truth
Prediction
Accuracy: 51.26%
Analysis: The model seems to struggle around smooth transitions and fine details in the segmentation masks, leading to lower accuracy for these objects. Transition between parts also varies in training data from chair to chair as there is ambiguity in what we also think as armrest or the back.
Distribution of segmentation accuracies
Analysis: The distribution indicates that while a significant number of objects achieve high segmentation accuracy, there is a notable tail of objects with lower accuracy. This suggests that while the model performs well on many chairs, it struggles with certain geometries or configurations, leading to a wider spread in performance.
Q3: Robustness Analysis (20 points)
Experiment 1: Rotation Robustness
Procedure
Rotated input point clouds by varying degrees (0°, 35.05°, 78.75°, 112.5°, 146.25°) around the Y-axis and evaluated classification and segmentation accuracy.
Classification Results
| Rotation Angle | Accuracy (%) | vs Baseline |
|---|---|---|
| 0° (Baseline) | 98.01 | — |
| 45° | 35.05 | -62.96 |
| 78.75° | 24.66 | -73.35 |
| 112.5° | 36.73 | -61.28 |
| 146.25° | 68.84 | -29.17 |
Sample Visualizations for classification robustness
| Rotation Degrees | 0° | 45° | 78.75° | 112.5° | 146.25° |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
![]() |
| Prediction | Vase |
Lamp | Lamp | Lamp | Vase |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
![]() |
| Prediction | Chair |
Lamp | Vase | Lamp | Chair |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
![]() |
| Prediction | Lamp |
Lamp | Lamp | Vase | Vase |
Data points where rotation did not affect classification result
| Rotation Degrees | 0° | 45° | 78.75° | 112.5° | 146.25° |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
![]() |
| Prediction | Lamp |
Lamp | Lamp | Lamp | Lamp |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
![]() |
Segmentation Results
| Rotation Angle | Accuracy (%) |
|---|---|
| 0° (Baseline) | 89.91 |
| 45° | 61.38 |
| 78.75° | 29.77 |
| 112.5° | 23.16 |
| 146.25° | 27.42 |
| Rotation Degrees | 0° | 45° | 78.75° | 112.5° | 146.25° |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
![]() |
| Accuracy | 0.781 |
0.619 | 0.405 | 0.310 | 0.184 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
![]() |
| Rotation Degrees | 0° | 45° | 78.75° | 112.5° | 146.25° |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
![]() |
| Accuracy | 0.928 |
0.616 | 0.491 | 0.221 | 0.074 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
![]() |
| Rotation Degrees | 0° | 45° | 78.75° | 112.5° | 146.25° |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
![]() |
| Accuracy | 0.909 |
0.715 | 0.326 | 0.235 | 0.124 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
![]() |
| Rotation Degrees | 0° | 45° | 78.75° | 112.5° | 146.25° |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
![]() |
| Accuracy | 0.933 |
0.570 | 0.227 | 0.160 | 0.231 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
![]() |
Analysis
- The PointNet model exhibits severe vulnerability to rotations and is decidedly not rotation-invariant, with classification accuracy plummeting from 98.01% at baseline to just 24.66% at 78.75° rotation—a catastrophic 73.35% drop. Segmentation performance degrades equally dramatically, declining from 89.91% to 23.16% at 112.5° rotation, indicating that the model relies heavily on canonical object orientations learned during training rather than learning orientation-agnostic features.
- Interestingly, performance partially recovers at 146.25° (68.84% for classification, 27.42% for segmentation), suggesting some symmetry in how the model perceives rotated objects, though still far below baseline.
- This fundamental limitation reveals that PointNet lacks the architectural mechanisms to handle viewpoint variations, making it unsuitable for real-world applications where objects appear in arbitrary orientations without additional rotation augmentation or data normalization.
Experiment 2: Point Density Robustness
Procedure
Evaluated the model with different numbers of input points (4000, 1000, 500) to test sensitivity to point cloud density.
Classification Results
| Number of Points | Accuracy (%) |
|---|---|
| 10000 (Baseline) | 98.01 |
| 4000 | 98.22 |
| 1000 | 97.17 |
| 500 | 96.54 |
Sample Visualizations for classification robustness
| Point Density | 10000 | 4000 | 1000 | 500 |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
| Prediction | Chair |
Chair | Lamp | Lamp |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
| Prediction | Chair |
Chair | Chair | Chair |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
| Prediction | Vase |
Vase | Vase | Vase |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
| Prediction | Vase |
Vase | Vase | Vase |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Point Cloud | ![]() |
![]() |
![]() |
![]() |
| Prediction | Chair |
Chair | Chair | Chair |
Segmentation Results
| Number of Points | Accuracy (%) |
|---|---|
| 10000 | 89.91 |
| 4000 | 89.88 |
| 1000 | 89.55 |
| 500 | 88.39 |
Sample Visualizations
| Point Density | 10000 | 4000 | 1000 | 500 |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
| Accuracy | 78.1 |
80.1 | 80.1 | 81.6 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
| Accuracy | 92.8 |
93.1 | 93.7 | 95.2 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
| Accuracy | 90.9 |
91.5 | 90.7 | 8.4 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
| Accuracy | 93.3 |
93.1 | 93.9 | 94.6 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
| Point Density | 10000 | 4000 | 1000 | 500 |
| Ground Truth | ![]() |
![]() |
![]() |
![]() |
| Accuracy | 83.9 |
81.9 | 80.8 | 77.6 |
| Prediction | ![]() |
![]() |
![]() |
![]() |
Analysis
- The PointNet model demonstrates strong robustness to reduced point cloud density, with classification accuracy only declining 1.47% when dropping from 10,000 to 500 points (98.01% → 96.54%), and segmentation accuracy declining just 1.52% under the same reduction (89.91% → 88.39%).
- Interestingly, reducing points from 10,000 to 4,000 slightly improves classification performance, suggesting the baseline resolution contains some redundancy that can be effectively compressed. This indicates the model captures essential geometric information efficiently and could be deployed on computationally constrained devices without significant accuracy loss.
Q4: Bonus - Locality (20 points)
Overview
Implemented an advanced architecture incorporating local geometric structure: PointNet++
Model Architecture
Architecture Choice: PointNet++
Motivation: PointNet++ addresses the limitation of PointNet by capturing local geometric features through hierarchical feature learning, which improves performance on complex shapes.
Results Comparison
| Model | Classification Acc. (%) | Segmentation Acc. (%) | Training Time |
|---|---|---|---|
| PointNet | 98.0 | 89.91 | 1h 53 min |
| Pointnet ++ (With locality) | 98.64 | 90.75 | 3h 25 min |
Performance Improvement
Classification: +0.64% over PointNet
Segmentation: +0.84% over PointNet
Classification Results Deep Dive
Per Class Accuracy Comparison
| Model | Chair Acc. (%) | Vases Acc. (%) | Lamps Acc. (%) |
|---|---|---|---|
| PointNet | 99.84 | 91.18 | 96.15 |
| Pointnet ++ | 99.84 | 93.14 | 97.86 |
Confusion Matrix Comparison
PointNet
PointNet++
Visualizations
Classification Examples
| Input Point cloud | Ground Truth | PointNet | PointNet++ (with Locality) |
![]() |
Lamp | Vase | Lamp |
![]() |
Vase | Lamp | Vase |
Segmentation Examples
Ground Truth vs Prediction Comparison
GT
Prediction (without locality)
53.45%
Prediction (with locality)
92.75%
Analysis
Test accuracy comparison segmentation task
- PointNet++ leverages local neighborhoods to capture fine-grained geometric details, leading to improved accuracy, especially in complex shapes.
- The model shows performance gains on classes with intricate structures, such as lamps and vases, where local context is crucial.
- However, the increased complexity results in longer training times and higher computational costs.
- I initially tried Point transformer on a H100 GPU but it was going out of memory indicating that it was requiring a lot of compute due to which I shifted to using PointNet ++.






























































































































