Assignment 5: PointNet Classification and Segmentation

AndrewID: kpullala

Course: 16-825 Learning for 3D Vision

Q1: Classification Model (40 points)

Overview

Results

Test Accuracy: 98.0%

Per-Class Accuracy:

  • Chairs: 99.84% (616/617)
  • Vases: 91.18 (93/102)
  • Lamps: 96.15 (225/234)

Training Time: 1h 53 min

Visualizations

Sample Predictions

Sample prediction 1

Prediction: Chair
Ground Truth: Chair

Sample prediction 2

Prediction: Lamp
Ground Truth: Lamp

Sample prediction 3

Prediction: Vase
Ground Truth: Vase

Failure Analysis

Chair Failures
Chair failure

Predicted: Lamp
Ground Truth: Chair

Interpretation: This is likely predicted as lamp due to a thin and elongated back rest with two legs while most chairs have four legs.

Vase Failures
Vase failure

Predicted: Lamp
Ground Truth: Vase

Interpretation: The pointcloud looks like a chandrlier lamp, especially the bottom parts.

Lamp Failures
Lamp failure

Predicted: Vase
Ground Truth: Lamp

Interpretation: Since this model was trained without locality, the shape without the base legs looks like a vase in some aspects.

Metrics & Detailed performance analysis

Precision, Recall, F1-Score:

  • Chairs: Precision: 0.9984, Recall: 0.9984, F1-Score: 0.9944
  • Vases: Precision: 0.9118, Recall: 0.9118, F1-Score: 0.9118
  • Lamps: Precision: 0.9615, Recall: 0.9615, F1-Score: 0.9615

Confusion Matrix:

Confusion Matrix

Analysis:

  • Chair class has the highest accuracy, indicating that the model is very effective at identifying chairs. The model has been trained with a large number of chair samples (4489), which likely contributes to this high performance. The other classes were having 741 samples and 1554 samples respectively which clearly shows data imbalance and likely the reason why the accuracy of chair is very high.
  • As we can see from the confusion matrix that chair class has been learned very well but there is equal confusion among vase and lamp and this is expected due to structural similarity among the two.

Q2: Segmentation Model (40 points)

Overview

Implemented a PointNet-based architecture for semantic segmentation of chair point clouds into 6 semantic classes.

Results

Test Accuracy: 89.91%

Segmentation Visualizations

Below are segmentation results for 5 objects (including 2 failure cases):

Object 1 - Good Prediction

Object 1 ground truth

Ground Truth

Object 1 prediction

Prediction
Accuracy: 90.46%

Object 2 - Good Prediction

Object 2 ground truth

Ground Truth

Object 2 prediction

Prediction
Accuracy: 93.80%

Object 3 - Good Prediction

Object 3 ground truth

Ground Truth

Object 3 prediction

Prediction
Accuracy: 97.28%

Object 4 - Failure Case (< 60%)

Object 4 ground truth

Ground Truth

Object 4 prediction

Prediction
Accuracy: 50.83%

Object 5 - Failure Case (< 60%)

Object 5 ground truth

Ground Truth

Object 5 prediction

Prediction
Accuracy: 58%

Object 6 - Failure Case (< 60%)

Object 6 ground truth

Ground Truth

Object 6 prediction

Prediction
Accuracy: 51.26%

Analysis: The model seems to struggle around smooth transitions and fine details in the segmentation masks, leading to lower accuracy for these objects. Transition between parts also varies in training data from chair to chair as there is ambiguity in what we also think as armrest or the back.

Distribution of segmentation accuracies

Segmentation Accuracy Distribution

Analysis: The distribution indicates that while a significant number of objects achieve high segmentation accuracy, there is a notable tail of objects with lower accuracy. This suggests that while the model performs well on many chairs, it struggles with certain geometries or configurations, leading to a wider spread in performance.

Q3: Robustness Analysis (20 points)

Experiment 1: Rotation Robustness

Procedure

Rotated input point clouds by varying degrees (0°, 35.05°, 78.75°, 112.5°, 146.25°) around the Y-axis and evaluated classification and segmentation accuracy.

Classification Results

Rotation Angle Accuracy (%) vs Baseline
0° (Baseline) 98.01
45° 35.05 -62.96
78.75° 24.66 -73.35
112.5° 36.73 -61.28
146.25° 68.84 -29.17

Sample Visualizations for classification robustness

Rotation Degrees 45° 78.75° 112.5° 146.25°
Point Cloud 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Prediction
Vase
Lamp Lamp Lamp Vase
Point Cloud 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Prediction
Chair
Lamp Vase Lamp Chair
Point Cloud 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Prediction
Lamp
Lamp Lamp Vase Vase

Data points where rotation did not affect classification result

Rotation Degrees 45° 78.75° 112.5° 146.25°
Point Cloud 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Prediction
Lamp
Lamp Lamp Lamp Lamp
Point Cloud 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation

Segmentation Results

Rotation Angle Accuracy (%)
0° (Baseline) 89.91
45° 61.38
78.75° 29.77
112.5° 23.16
146.25° 27.42
Rotation Degrees 45° 78.75° 112.5° 146.25°
Ground Truth 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Accuracy
0.781
0.619 0.405 0.310 0.184
Prediction 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Rotation Degrees 45° 78.75° 112.5° 146.25°
Ground Truth 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Accuracy
0.928
0.616 0.491 0.221 0.074
Prediction 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Rotation Degrees 45° 78.75° 112.5° 146.25°
Ground Truth 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Accuracy
0.909
0.715 0.326 0.235 0.124
Prediction 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Rotation Degrees 45° 78.75° 112.5° 146.25°
Ground Truth 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation
Accuracy
0.933
0.570 0.227 0.160 0.231
Prediction 0 degree rotation 45 degree rotation 78.75 degree rotation 112.5 degree rotation 146.25 degree rotation

Analysis

  • The PointNet model exhibits severe vulnerability to rotations and is decidedly not rotation-invariant, with classification accuracy plummeting from 98.01% at baseline to just 24.66% at 78.75° rotation—a catastrophic 73.35% drop.
  • Segmentation performance degrades equally dramatically, declining from 89.91% to 23.16% at 112.5° rotation, indicating that the model relies heavily on canonical object orientations learned during training rather than learning orientation-agnostic features.
  • Interestingly, performance partially recovers at 146.25° (68.84% for classification, 27.42% for segmentation), suggesting some symmetry in how the model perceives rotated objects, though still far below baseline.
  • This fundamental limitation reveals that PointNet lacks the architectural mechanisms to handle viewpoint variations, making it unsuitable for real-world applications where objects appear in arbitrary orientations without additional rotation augmentation or data normalization.

Experiment 2: Point Density Robustness

Procedure

Evaluated the model with different numbers of input points (4000, 1000, 500) to test sensitivity to point cloud density.

Classification Results

Number of Points Accuracy (%)
10000 (Baseline) 98.01
4000 98.22
1000 97.17
500 96.54

Sample Visualizations for classification robustness

Point Density 10000 4000 1000 500
Point Cloud 10000 points 4000 points 1000 points 500 points
Prediction
Chair
Chair Lamp Lamp
Point Density 10000 4000 1000 500
Point Cloud 10000 points 4000 points 1000 points 500 points
Prediction
Chair
Chair Chair Chair
Point Density 10000 4000 1000 500
Point Cloud 10000 points 4000 points 1000 points 500 points
Prediction
Vase
Vase Vase Vase
Point Density 10000 4000 1000 500
Point Cloud 10000 points 4000 points 1000 points 500 points
Prediction
Vase
Vase Vase Vase
Point Density 10000 4000 1000 500
Point Cloud 10000 points 4000 points 1000 points 500 points
Prediction
Chair
Chair Chair Chair

Segmentation Results

Number of Points Accuracy (%)
10000 89.91
4000 89.88
1000 89.55
500 88.39

Sample Visualizations

Point Density 10000 4000 1000 500
Ground Truth 10000 points 4000 points 1000 points 500 points
Accuracy
78.1
80.1 80.1 81.6
Prediction 10000 points 4000 points 1000 points 500 points
Point Density 10000 4000 1000 500
Ground Truth 10000 points 4000 points 1000 points 500 points
Accuracy
92.8
93.1 93.7 95.2
Prediction 10000 points 4000 points 1000 points 500 points
Point Density 10000 4000 1000 500
Ground Truth 10000 points 4000 points 1000 points 500 points
Accuracy
90.9
91.5 90.7 8.4
Prediction 10000 points 4000 points 1000 points 500 points
Point Density 10000 4000 1000 500
Ground Truth 10000 points 4000 points 1000 points 500 points
Accuracy
93.3
93.1 93.9 94.6
Prediction 10000 points 4000 points 1000 points 500 points
Point Density 10000 4000 1000 500
Ground Truth 10000 points 4000 points 1000 points 500 points
Accuracy
83.9
81.9 80.8 77.6
Prediction 10000 points 4000 points 1000 points 500 points

Analysis

  • The PointNet model demonstrates strong robustness to reduced point cloud density, with classification accuracy only declining 1.47% when dropping from 10,000 to 500 points (98.01% → 96.54%), and segmentation accuracy declining just 1.52% under the same reduction (89.91% → 88.39%).
  • Interestingly, reducing points from 10,000 to 4,000 slightly improves classification performance, suggesting the baseline resolution contains some redundancy that can be effectively compressed. This indicates the model captures essential geometric information efficiently and could be deployed on computationally constrained devices without significant accuracy loss.

Q4: Bonus - Locality (20 points)

Overview

Implemented an advanced architecture incorporating local geometric structure: PointNet++

Model Architecture

Architecture Choice: PointNet++

Motivation: PointNet++ addresses the limitation of PointNet by capturing local geometric features through hierarchical feature learning, which improves performance on complex shapes.

Results Comparison

Model Classification Acc. (%) Segmentation Acc. (%) Training Time
PointNet 98.0 89.91 1h 53 min
Pointnet ++ (With locality) 98.64 90.75 3h 25 min

Performance Improvement

Classification: +0.64% over PointNet

Segmentation: +0.84% over PointNet

Classification Results Deep Dive

Per Class Accuracy Comparison

Model Chair Acc. (%) Vases Acc. (%) Lamps Acc. (%)
PointNet 99.84 91.18 96.15
Pointnet ++ 99.84 93.14 97.86

Confusion Matrix Comparison

PointNet Confusion Matrix

PointNet

PointNet++ Confusion Matrix

PointNet++

Visualizations

Classification Examples

Input Point cloud Ground Truth PointNet PointNet++ (with Locality)
10000 points Lamp Vase Lamp
10000 points Vase Lamp Vase

Segmentation Examples

Ground Truth vs Prediction Comparison
Segmentation ground truth 1

GT

Segmentation prediction 1

Prediction (without locality)

53.45%

Segmentation additional view

Prediction (with locality)

92.75%

Analysis

WandB Training Curves

Test accuracy comparison segmentation task

  • PointNet++ leverages local neighborhoods to capture fine-grained geometric details, leading to improved accuracy, especially in complex shapes.
  • The model shows performance gains on classes with intricate structures, such as lamps and vases, where local context is crucial.
  • However, the increased complexity results in longer training times and higher computational costs.
  • I initially tried Point transformer on a H100 GPU but it was going out of memory indicating that it was requiring a lot of compute due to which I shifted to using PointNet ++.