Assignment 5: PointNet Classification and Segmentation

Q1: Classification Model (40 points)

Overview

Results

Test Accuracy: 98.0%

Per-Class Accuracy:

Chairs: 99.84% (616/617)
Vases: 91.18 (93/102)
Lamps: 96.15 (225/234)

Training Time: 1h 53 min

Visualizations

Sample Predictions

Prediction: Chair
Ground Truth: Chair

Prediction: Lamp
Ground Truth: Lamp

Prediction: Vase
Ground Truth: Vase

Failure Analysis

Chair Failures

Predicted: Lamp
Ground Truth: Chair

Interpretation: This is likely predicted as lamp due to a thin and elongated back rest with two legs while most chairs have four legs.

Vase Failures

Predicted: Lamp
Ground Truth: Vase

Interpretation: The pointcloud looks like a chandrlier lamp, especially the bottom parts.

Lamp Failures

Predicted: Vase
Ground Truth: Lamp

Interpretation: Since this model was trained without locality, the shape without the base legs looks like a vase in some aspects.

Metrics & Detailed performance analysis

Precision, Recall, F1-Score:

Chairs: Precision: 0.9984, Recall: 0.9984, F1-Score: 0.9944
Vases: Precision: 0.9118, Recall: 0.9118, F1-Score: 0.9118
Lamps: Precision: 0.9615, Recall: 0.9615, F1-Score: 0.9615

Confusion Matrix:

Analysis:

Chair class has the highest accuracy, indicating that the model is very effective at identifying chairs. The model has been trained with a large number of chair samples (4489), which likely contributes to this high performance. The other classes were having 741 samples and 1554 samples respectively which clearly shows data imbalance and likely the reason why the accuracy of chair is very high.
As we can see from the confusion matrix that chair class has been learned very well but there is equal confusion among vase and lamp and this is expected due to structural similarity among the two.

Q2: Segmentation Model (40 points)

Overview

Implemented a PointNet-based architecture for semantic segmentation of chair point clouds into 6 semantic classes.

Results

Test Accuracy: 89.91%

Segmentation Visualizations

Below are segmentation results for 5 objects (including 2 failure cases):

Object 1 - Good Prediction

Ground Truth

Prediction
Accuracy: 90.46%

Object 2 - Good Prediction

Ground Truth

Prediction
Accuracy: 93.80%

Object 3 - Good Prediction

Ground Truth

Prediction
Accuracy: 97.28%

Object 4 - Failure Case (< 60%)

Ground Truth

Prediction
Accuracy: 50.83%

Object 5 - Failure Case (< 60%)

Ground Truth

Prediction
Accuracy: 58%

Object 6 - Failure Case (< 60%)

Ground Truth

Prediction
Accuracy: 51.26%

Analysis: The model seems to struggle around smooth transitions and fine details in the segmentation masks, leading to lower accuracy for these objects. Transition between parts also varies in training data from chair to chair as there is ambiguity in what we also think as armrest or the back.

Distribution of segmentation accuracies

Analysis: The distribution indicates that while a significant number of objects achieve high segmentation accuracy, there is a notable tail of objects with lower accuracy. This suggests that while the model performs well on many chairs, it struggles with certain geometries or configurations, leading to a wider spread in performance.

Q3: Robustness Analysis (20 points)

Experiment 1: Rotation Robustness

Procedure

Rotated input point clouds by varying degrees (0°, 35.05°, 78.75°, 112.5°, 146.25°) around the Y-axis and evaluated classification and segmentation accuracy.

Classification Results

Rotation Angle	Accuracy (%)	vs Baseline
0° (Baseline)	98.01	—
45°	35.05	-62.96
78.75°	24.66	-73.35
112.5°	36.73	-61.28
146.25°	68.84	-29.17

Sample Visualizations for classification robustness

Rotation Degrees	0°	45°	78.75°	112.5°	146.25°
Point Cloud
Prediction	Vase	Lamp	Lamp	Lamp	Vase

Point Cloud
Prediction	Chair	Lamp	Vase	Lamp	Chair

Point Cloud
Prediction	Lamp	Lamp	Lamp	Vase	Vase

Data points where rotation did not affect classification result

Rotation Degrees	0°	45°	78.75°	112.5°	146.25°
Point Cloud
Prediction	Lamp	Lamp	Lamp	Lamp	Lamp
Point Cloud

Segmentation Results

Rotation Angle	Accuracy (%)
0° (Baseline)	89.91
45°	61.38
78.75°	29.77
112.5°	23.16
146.25°	27.42

Rotation Degrees	0°	45°	78.75°	112.5°	146.25°
Ground Truth
Accuracy	0.781	0.619	0.405	0.310	0.184
Prediction

Rotation Degrees	0°	45°	78.75°	112.5°	146.25°
Ground Truth
Accuracy	0.928	0.616	0.491	0.221	0.074
Prediction

Rotation Degrees	0°	45°	78.75°	112.5°	146.25°
Ground Truth
Accuracy	0.909	0.715	0.326	0.235	0.124
Prediction

Rotation Degrees	0°	45°	78.75°	112.5°	146.25°
Ground Truth
Accuracy	0.933	0.570	0.227	0.160	0.231
Prediction

Analysis

The PointNet model exhibits severe vulnerability to rotations and is decidedly not rotation-invariant, with classification accuracy plummeting from 98.01% at baseline to just 24.66% at 78.75° rotation—a catastrophic 73.35% drop.
Interestingly, performance partially recovers at 146.25° (68.84% for classification, 27.42% for segmentation), suggesting some symmetry in how the model perceives rotated objects, though still far below baseline.
This fundamental limitation reveals that PointNet lacks the architectural mechanisms to handle viewpoint variations, making it unsuitable for real-world applications where objects appear in arbitrary orientations without additional rotation augmentation or data normalization.

Experiment 2: Point Density Robustness

Procedure

Evaluated the model with different numbers of input points (4000, 1000, 500) to test sensitivity to point cloud density.

Classification Results

Number of Points	Accuracy (%)
10000 (Baseline)	98.01
4000	98.22
1000	97.17
500	96.54

Sample Visualizations for classification robustness

Point Density	10000	4000	1000	500
Point Cloud
Prediction	Chair	Chair	Lamp	Lamp

Point Density	10000	4000	1000	500
Point Cloud
Prediction	Chair	Chair	Chair	Chair

Point Density	10000	4000	1000	500
Point Cloud
Prediction	Vase	Vase	Vase	Vase

Point Density	10000	4000	1000	500
Point Cloud
Prediction	Vase	Vase	Vase	Vase

Point Density	10000	4000	1000	500
Point Cloud
Prediction	Chair	Chair	Chair	Chair

Segmentation Results

Number of Points	Accuracy (%)
10000	89.91
4000	89.88
1000	89.55
500	88.39

Sample Visualizations

Point Density	10000	4000	1000	500
Ground Truth
Accuracy	78.1	80.1	80.1	81.6
Prediction

Point Density	10000	4000	1000	500
Ground Truth
Accuracy	92.8	93.1	93.7	95.2
Prediction

Point Density	10000	4000	1000	500
Ground Truth
Accuracy	90.9	91.5	90.7	8.4
Prediction

Point Density	10000	4000	1000	500
Ground Truth
Accuracy	93.3	93.1	93.9	94.6
Prediction

Point Density	10000	4000	1000	500
Ground Truth
Accuracy	83.9	81.9	80.8	77.6
Prediction

Analysis

The PointNet model demonstrates strong robustness to reduced point cloud density, with classification accuracy only declining 1.47% when dropping from 10,000 to 500 points (98.01% → 96.54%), and segmentation accuracy declining just 1.52% under the same reduction (89.91% → 88.39%).
Interestingly, reducing points from 10,000 to 4,000 slightly improves classification performance, suggesting the baseline resolution contains some redundancy that can be effectively compressed. This indicates the model captures essential geometric information efficiently and could be deployed on computationally constrained devices without significant accuracy loss.

Q4: Bonus - Locality (20 points)

Overview

Implemented an advanced architecture incorporating local geometric structure: PointNet++

Model Architecture

Architecture Choice: PointNet++

Motivation: PointNet++ addresses the limitation of PointNet by capturing local geometric features through hierarchical feature learning, which improves performance on complex shapes.

Results Comparison

Model	Classification Acc. (%)	Segmentation Acc. (%)	Training Time
PointNet	98.0	89.91	1h 53 min
Pointnet ++ (With locality)	98.64	90.75	3h 25 min

Performance Improvement

Classification: +0.64% over PointNet

Segmentation: +0.84% over PointNet

Classification Results Deep Dive

Per Class Accuracy Comparison

Model	Chair Acc. (%)	Vases Acc. (%)	Lamps Acc. (%)
PointNet	99.84	91.18	96.15
Pointnet ++	99.84	93.14	97.86

Confusion Matrix Comparison

PointNet

PointNet++

Visualizations

Classification Examples

Input Point cloud	Ground Truth	PointNet	PointNet++ (with Locality)
	Lamp	Vase	Lamp
	Vase	Lamp	Vase

Segmentation Examples

Ground Truth vs Prediction Comparison

GT

Prediction (without locality)

53.45%

Prediction (with locality)

92.75%

Analysis

Test accuracy comparison segmentation task

PointNet++ leverages local neighborhoods to capture fine-grained geometric details, leading to improved accuracy, especially in complex shapes.
The model shows performance gains on classes with intricate structures, such as lamps and vases, where local context is crucial.
However, the increased complexity results in longer training times and higher computational costs.
I initially tried Point transformer on a H100 GPU but it was going out of memory indicating that it was requiring a lot of compute due to which I shifted to using PointNet ++.