Assignment 5: Point Cloud Classification and Segmentation

16-825 Learning for 3D Vision

Question 1: Classification Model (40 Points)

Model Architecture

Implemented a PointNet-based architecture for classifying point clouds into 3 object categories: chairs, vases, and lamps.

Architecture Details:

SharedMLP1: 3 → 64 → 64
SharedMLP2: 64 → 64 → 128 → 1024
Max pooling across all points to obtain global features
FinalMLP: 1024 → 512 → 256 → 3 (classification head)

Test Accuracy: 97.38%

Model Performance: The classification model achieves 97.38% accuracy on the test set, demonstrating effective learning of geometric features across different object categories.

Classification Results

Random test point clouds with their predicted classes:

Chair - Correctly Classified

Vase - Correctly Classified

Lamp - Correctly Classified

Failure Case Analysis

Analysis of model failure cases across the three object categories:

Predicted: Chair | Actually: Vase

The base of this vase has a very rectangular shape which could be mistaken for the seat or base of many types of chairs, causing the model to misclassify it.

Predicted: Vase | Actually: Chair

This is a folded chair with geometry that differs significantly from typical chairs. The compact, cylindrical form when folded resembles a vase more than a traditional chair structure, leading to the misclassification.

Predicted: Lamp | Actually: Chair

This chair has an unusually tall backrest that creates a vertical, post-like structure. This distinctive geometry caused the model to mistake it for a lamp rather than recognize it as a chair.

Question 2: Segmentation Model (40 Points)

Model Architecture

Implemented a PointNet-based architecture for semantic segmentation of chair point clouds into 6 parts.

Architecture Details:

SharedMLP1: 3 → 64 → 64
SharedMLP2: 64 → 64 → 128 → 1024
Max pooling to obtain global features (1024 dims)
Concatenate local (64 dims) and global features (1024 dims) → 1088 dims per point
ConcatSharedMLP: 1088 → 512 → 256 → 128
FinalMLP: 128 → 128 → 6 (per-point segmentation)

Test Accuracy: 90.11%

Per-Object Statistics: Mean accuracy: 90.11% | Std: 9.61% | Min: 34.08% | Max: 99.74%

Segmentation Color Scheme:

Class 0 (White): [1.0, 1.0, 1.0]
Class 1 (Magenta): [1.0, 0.0, 1.0]
Class 2 (Cyan): [0.0, 1.0, 1.0]
Class 3 (Yellow): [1.0, 1.0, 0.0]
Class 4 (Blue): [0.0, 0.0, 1.0]
Class 5 (Red): [1.0, 0.0, 0.0]

Accuracy: 99.27%

Bad Predictions (2 Examples)

Examples where the segmentation model struggled:

Ground Truth

Accuracy: 34.08%

Prediction

Accuracy: 34.08%

Interpretation: This chair has a strange lounger design which doesn't conform to the typical chair design of having legs, a back, and a seat. It features unusual side pieces sticking out and a strange base, making this type of chair quite difficult for the model to understand and segment properly.

Ground Truth

Accuracy: 47.24%

Prediction

Accuracy: 47.24%

Interpretation: This chair has an oddly blocky and long design with what appears to be an ottoman or extension towards the bottom, which is uncommon in the dataset. This causes poor segmentation in that area as well as the bottom underneath it, as visible in the prediction. While the handles and back of the chair are well segmented, it's the longer extension piece with its unusual shape that is poorly segmented by the model.

Interpretation

The segmentation model achieves strong overall performance with a mean accuracy of 90.11%. The model excels at segmenting well-defined chair structures with clear boundaries between parts.

However, performance degrades significantly on chairs with:

Complex geometries: Chairs with unconventional designs or merged structural elements confuse the part boundaries
Thin structures: Delicate or sparse components may lack sufficient point density for reliable segmentation
Ambiguous boundaries: Parts that blend together or share similar local geometric features lead to mislabeling

The wide standard deviation (9.61%) and the presence of outliers (min: 34.08%) indicate that while the model generalizes well to common chair types, it struggles with edge cases that deviate from the training distribution.

Question 3: Robustness Analysis (20 Points)

Experiment 1: Varying Number of Points

Procedure:

We tested model robustness to sparse point cloud inputs by randomly sampling different numbers of points from the original 10,000-point objects. We tested with 100, 500, 1000, 2000, 5000, and 10000 points (1%, 5%, 10%, 20%, 50%, and 100% of the original density). The same trained model was evaluated on these varying point densities without any retraining or fine-tuning.

Classification Results:

Number of Points	% of Original	Test Accuracy	Accuracy Drop
10,000 (baseline)	100%	97.38%	-
5,000	50%	97.17%	-0.21%
2,000	20%	96.43%	-0.95%
1,000	10%	96.22%	-1.16%
500	5%	94.86%	-2.52%
100	1%	92.44%	-4.94%

Segmentation Results:

Number of Points	% of Original	Test Accuracy	Accuracy Drop
10,000 (baseline)	100%	90.11%	-
5,000	50%	90.09%	-0.02%
2,000	20%	89.98%	-0.13%
1,000	10%	89.17%	-0.94%
500	5%	87.91%	-2.20%
100	1%	80.21%	-9.90%

Classification: Chair (100 points)

Demonstrates classification with only 1% of original point density

Classification: Vase (100 points)

Model maintains 92.44% accuracy despite extreme sparsity

Segmentation GT (100 points)

Ground truth segmentation with sparse input

Segmentation Pred (100 points)

Prediction maintains 80.21% accuracy with limited points

Interpretation:

Both models demonstrate strong robustness to sparse point clouds, with classification being slightly more resilient than segmentation.

Classification: Even with only 100 points (1% of original), the model maintains 92.44% accuracy - only a 4.94% drop. This remarkable performance suggests PointNet's global max pooling effectively captures essential shape features from very sparse data.

Segmentation: Shows similar robustness down to 500 points (87.91%, -2.20% drop), but degrades more at 100 points (80.21%, -9.90% drop). This makes sense since segmentation requires finer-grained local feature discrimination, which needs sufficient point density to accurately label individual points.

Key insight: Both tasks can operate effectively with 500-1000 points (5-10% of original density) with minimal accuracy loss, making PointNet practical for real-world scenarios with limited sensor resolution or computational constraints.

Experiment 2: Rotation Invariance

Procedure:

We tested the model's robustness to 3D rotations by rotating input point clouds around the z-axis (vertical axis) by various angles: 0°, 45°, 90°, and 180°. The rotation was applied to the entire test set using 3D rotation matrices, and the same trained model (without rotation augmentation during training) was evaluated on the rotated inputs.

Classification Results:

Rotation Angle	Test Accuracy	Accuracy Drop
0° (baseline)	97.38%	-
45°	32.53%	-64.85%
90°	53.20%	-44.18%
180°	62.12%	-35.26%

Segmentation Results:

Rotation Angle	Test Accuracy	Accuracy Drop
0° (baseline)	90.11%	-
45°	56.32%	-33.79%
90°	35.95%	-54.16%
180°	32.06%	-58.05%

Classification: Chair (45° rotation)

Rotated input causes severe performance degradation

Classification: Vase (45° rotation)

Accuracy drops to 32.53% at 45° rotation

Segmentation GT (45° rotation)

Ground truth for rotated input

Segmentation Pred (45° rotation)

Prediction quality severely degraded by rotation

Interpretation:

Both models show severe sensitivity to rotations, confirming that PointNet is NOT rotation-invariant.

Classification: Performance catastrophically degrades, with 45° rotation being worst (32.53%, -64.85% drop). Interestingly, accuracy partially recovers at 90° (53.20%) and 180° (62.12%), suggesting the model may exploit certain symmetries in the training data. Chairs and lamps may have some rotational symmetry that helps at orthogonal angles, but arbitrary rotations still cause severe failures.

Segmentation: Shows even worse degradation, dropping from 90.11% to 32.06% at 180° rotation (-58% drop). The 45° rotation causes a 34% loss. Unlike classification, segmentation shows monotonic degradation, as per-point labeling requires precise spatial feature alignment.

This behavior is expected for vanilla PointNet, which processes raw (x,y,z) coordinates directly. The learned features are tied to the canonical orientation seen during training. Without rotation augmentation or rotation-invariant feature extraction (like using relative distances or normals), the model cannot generalize to rotated inputs.

Key insight: For real-world deployment where object orientations are arbitrary, this model would require either: (1) rotation augmentation during training, (2) canonical pose normalization at inference, or (3) rotation-invariant architectures like PointNet++.