Assignment 5: Point Cloud Classification and Segmentation

16-825 Learning for 3D Vision

Question 1: Classification Model (40 Points)

Model Architecture

Implemented a PointNet-based architecture for classifying point clouds into 3 object categories: chairs, vases, and lamps.

Architecture Details:
  • SharedMLP1: 3 → 64 → 64
  • SharedMLP2: 64 → 64 → 128 → 1024
  • Max pooling across all points to obtain global features
  • FinalMLP: 1024 → 512 → 256 → 3 (classification head)
Test Accuracy: 97.38%

Model Performance: The classification model achieves 97.38% accuracy on the test set, demonstrating effective learning of geometric features across different object categories.

Classification Results

Random test point clouds with their predicted classes:

Chair Classification

Chair - Correctly Classified

Chair Classification

Chair - Correctly Classified

Chair Classification

Chair - Correctly Classified

Vase Classification

Vase - Correctly Classified

Vase Classification

Vase - Correctly Classified

Vase Classification

Vase - Correctly Classified

Lamp Classification

Lamp - Correctly Classified

Lamp Classification

Lamp - Correctly Classified

Lamp Classification

Lamp - Correctly Classified

Failure Case Analysis

Analysis of model failure cases across the three object categories:

Chair Failure

Predicted: Chair | Actually: Vase

The base of this vase has a very rectangular shape which could be mistaken for the seat or base of many types of chairs, causing the model to misclassify it.

Vase Failure

Predicted: Vase | Actually: Chair

This is a folded chair with geometry that differs significantly from typical chairs. The compact, cylindrical form when folded resembles a vase more than a traditional chair structure, leading to the misclassification.

Lamp Failure

Predicted: Lamp | Actually: Chair

This chair has an unusually tall backrest that creates a vertical, post-like structure. This distinctive geometry caused the model to mistake it for a lamp rather than recognize it as a chair.

Question 2: Segmentation Model (40 Points)

Model Architecture

Implemented a PointNet-based architecture for semantic segmentation of chair point clouds into 6 parts.

Architecture Details:
  • SharedMLP1: 3 → 64 → 64
  • SharedMLP2: 64 → 64 → 128 → 1024
  • Max pooling to obtain global features (1024 dims)
  • Concatenate local (64 dims) and global features (1024 dims) → 1088 dims per point
  • ConcatSharedMLP: 1088 → 512 → 256 → 128
  • FinalMLP: 128 → 128 → 6 (per-point segmentation)
Test Accuracy: 90.11%

Per-Object Statistics: Mean accuracy: 90.11% | Std: 9.61% | Min: 34.08% | Max: 99.74%

Segmentation Color Scheme:
  • Class 0 (White): [1.0, 1.0, 1.0]
  • Class 1 (Magenta): [1.0, 0.0, 1.0]
  • Class 2 (Cyan): [0.0, 1.0, 1.0]
  • Class 3 (Yellow): [1.0, 1.0, 0.0]
  • Class 4 (Blue): [0.0, 0.0, 1.0]
  • Class 5 (Red): [1.0, 0.0, 0.0]

Segmentation Results (5 Examples)

Ground truth vs predicted segmentations for chair point clouds:

Ground Truth

Ground Truth 1

Accuracy: 99.74%

Prediction

Prediction 1

Accuracy: 99.74%

Ground Truth

Ground Truth 2

Accuracy: 99.47%

Prediction

Prediction 2

Accuracy: 99.47%

Ground Truth

Ground Truth 3

Accuracy: 99.41%

Prediction

Prediction 3

Accuracy: 99.41%

Ground Truth

Ground Truth 4

Accuracy: 99.36%

Prediction

Prediction 4

Accuracy: 99.36%

Ground Truth

Ground Truth 5

Accuracy: 99.27%

Prediction

Prediction 5

Accuracy: 99.27%

Bad Predictions (2 Examples)

Examples where the segmentation model struggled:

Ground Truth

Bad Ground Truth 1

Accuracy: 34.08%

Prediction

Bad Prediction 1

Accuracy: 34.08%

Interpretation: This chair has a strange lounger design which doesn't conform to the typical chair design of having legs, a back, and a seat. It features unusual side pieces sticking out and a strange base, making this type of chair quite difficult for the model to understand and segment properly.

Ground Truth

Bad Ground Truth 2

Accuracy: 47.24%

Prediction

Bad Prediction 2

Accuracy: 47.24%

Interpretation: This chair has an oddly blocky and long design with what appears to be an ottoman or extension towards the bottom, which is uncommon in the dataset. This causes poor segmentation in that area as well as the bottom underneath it, as visible in the prediction. While the handles and back of the chair are well segmented, it's the longer extension piece with its unusual shape that is poorly segmented by the model.

Interpretation

The segmentation model achieves strong overall performance with a mean accuracy of 90.11%. The model excels at segmenting well-defined chair structures with clear boundaries between parts.

However, performance degrades significantly on chairs with:

  • Complex geometries: Chairs with unconventional designs or merged structural elements confuse the part boundaries
  • Thin structures: Delicate or sparse components may lack sufficient point density for reliable segmentation
  • Ambiguous boundaries: Parts that blend together or share similar local geometric features lead to mislabeling

The wide standard deviation (9.61%) and the presence of outliers (min: 34.08%) indicate that while the model generalizes well to common chair types, it struggles with edge cases that deviate from the training distribution.

Question 3: Robustness Analysis (20 Points)

Experiment 1: Varying Number of Points

Procedure:

We tested model robustness to sparse point cloud inputs by randomly sampling different numbers of points from the original 10,000-point objects. We tested with 100, 500, 1000, 2000, 5000, and 10000 points (1%, 5%, 10%, 20%, 50%, and 100% of the original density). The same trained model was evaluated on these varying point densities without any retraining or fine-tuning.

Classification Results:

Number of Points % of Original Test Accuracy Accuracy Drop
10,000 (baseline) 100% 97.38% -
5,000 50% 97.17% -0.21%
2,000 20% 96.43% -0.95%
1,000 10% 96.22% -1.16%
500 5% 94.86% -2.52%
100 1% 92.44% -4.94%

Segmentation Results:

Number of Points % of Original Test Accuracy Accuracy Drop
10,000 (baseline) 100% 90.11% -
5,000 50% 90.09% -0.02%
2,000 20% 89.98% -0.13%
1,000 10% 89.17% -0.94%
500 5% 87.91% -2.20%
100 1% 80.21% -9.90%
100 points classification

Classification: Chair (100 points)

Demonstrates classification with only 1% of original point density

100 points classification

Classification: Vase (100 points)

Model maintains 92.44% accuracy despite extreme sparsity

100 points segmentation GT

Segmentation GT (100 points)

Ground truth segmentation with sparse input

100 points segmentation pred

Segmentation Pred (100 points)

Prediction maintains 80.21% accuracy with limited points

Interpretation:

Both models demonstrate strong robustness to sparse point clouds, with classification being slightly more resilient than segmentation.

Classification: Even with only 100 points (1% of original), the model maintains 92.44% accuracy - only a 4.94% drop. This remarkable performance suggests PointNet's global max pooling effectively captures essential shape features from very sparse data.

Segmentation: Shows similar robustness down to 500 points (87.91%, -2.20% drop), but degrades more at 100 points (80.21%, -9.90% drop). This makes sense since segmentation requires finer-grained local feature discrimination, which needs sufficient point density to accurately label individual points.

Key insight: Both tasks can operate effectively with 500-1000 points (5-10% of original density) with minimal accuracy loss, making PointNet practical for real-world scenarios with limited sensor resolution or computational constraints.

Experiment 2: Rotation Invariance

Procedure:

We tested the model's robustness to 3D rotations by rotating input point clouds around the z-axis (vertical axis) by various angles: 0°, 45°, 90°, and 180°. The rotation was applied to the entire test set using 3D rotation matrices, and the same trained model (without rotation augmentation during training) was evaluated on the rotated inputs.

Classification Results:

Rotation Angle Test Accuracy Accuracy Drop
0° (baseline) 97.38% -
45° 32.53% -64.85%
90° 53.20% -44.18%
180° 62.12% -35.26%

Segmentation Results:

Rotation Angle Test Accuracy Accuracy Drop
0° (baseline) 90.11% -
45° 56.32% -33.79%
90° 35.95% -54.16%
180° 32.06% -58.05%
45 degree rotation classification

Classification: Chair (45° rotation)

Rotated input causes severe performance degradation

45 degree rotation classification

Classification: Vase (45° rotation)

Accuracy drops to 32.53% at 45° rotation

45 degree rotation segmentation GT

Segmentation GT (45° rotation)

Ground truth for rotated input

45 degree rotation segmentation pred

Segmentation Pred (45° rotation)

Prediction quality severely degraded by rotation

Interpretation:

Both models show severe sensitivity to rotations, confirming that PointNet is NOT rotation-invariant.

Classification: Performance catastrophically degrades, with 45° rotation being worst (32.53%, -64.85% drop). Interestingly, accuracy partially recovers at 90° (53.20%) and 180° (62.12%), suggesting the model may exploit certain symmetries in the training data. Chairs and lamps may have some rotational symmetry that helps at orthogonal angles, but arbitrary rotations still cause severe failures.

Segmentation: Shows even worse degradation, dropping from 90.11% to 32.06% at 180° rotation (-58% drop). The 45° rotation causes a 34% loss. Unlike classification, segmentation shows monotonic degradation, as per-point labeling requires precise spatial feature alignment.

This behavior is expected for vanilla PointNet, which processes raw (x,y,z) coordinates directly. The learned features are tied to the canonical orientation seen during training. Without rotation augmentation or rotation-invariant feature extraction (like using relative distances or normals), the model cannot generalize to rotated inputs.

Key insight: For real-world deployment where object orientations are arbitrary, this model would require either: (1) rotation augmentation during training, (2) canonical pose normalization at inference, or (3) rotation-invariant architectures like PointNet++.