16-825 Learning for 3D Vision • Fall 2025
Name: Haejoon Lee (andrewid: haejoonl)

Assignment 5: PointNet for Classification and Segmentation

Table of Contents

Q1. Classification Model (40 points)

Implemented a PointNet-based classification model to classify point clouds into three categories: chairs, vases, and lamps.

Model Architecture

The classification model follows the PointNet architecture:

Test Accuracy

Final Test Accuracy: 98.22% (0.9822)

The model achieved excellent performance on the test set, correctly classifying 98.22% of the point cloud objects.

Q2. Segmentation Model (40 points)

Implemented a PointNet-based segmentation model to perform per-point semantic segmentation on chair point clouds with 6 semantic classes.

Model Architecture

The segmentation model uses an encoder-decoder architecture:

Test Accuracy

Final Test Accuracy: 90.45% (0.9045)

The model correctly segments 90.45% of all points across all test objects.

Segmentation Results

Visualized segmentation results for 5 objects, including 2 failure cases:

Object 0
GT Object 0

Ground Truth

Pred Object 0

Prediction

Object 1
GT Object 1

Ground Truth

Pred Object 1

Prediction

Object 2
GT Object 2

Ground Truth

Pred Object 2

Prediction

Object 3
GT Object 3

Ground Truth

Pred Object 3

Prediction

Object 4 (Failure case on armrests)
GT Object 4

Ground Truth

Pred Object 4

Prediction

Object 5
GT Object 5

Ground Truth

Pred Object 5

Prediction

Interpretation

The PointNet segmentation model achieves good overall performance (90.45% accuracy) by combining local point features with global context. Key observations:

  • Good Cases (Objects 0, 1, 2, 3, 5): The model correctly segments most parts of the chairs, with clear boundaries between different semantic regions (seat, backrest, legs, etc.). The global feature provides useful context for disambiguating similar local geometries.
  • Failure Case - Object 4: This chair exhibits a notable segmentation error where the model incorrectly predicts parts of the flat chair seat as chair arms (armrests). This failure can be attributed to two main factors:
    • Dataset Bias: The training data likely contains a majority of armchairs (as seen in Objects 0-3), causing the model to develop a bias toward predicting arm structures on the sides of chairs.
    • Poor Local Information Extraction: PointNet's reliance on global max pooling limits its ability to capture fine-grained local geometric differences between chair seats and armrests. The slightly distinguished local geometry of these regions is not adequately captured by the model's point-wise features, leading to confusion between semantically different but geometrically similar regions.
  • Limitations: PointNet's lack of explicit local neighborhood modeling means it may miss fine-grained details at region boundaries. The max pooling operation aggregates information globally but may not preserve important local spatial relationships needed for precise segmentation, particularly when distinguishing between parts with subtle geometric differences.

Q3. Robustness Analysis (20 points)

Conducted two experiments to analyze the robustness of the learned models: rotation robustness and sensitivity to the number of input points.

Experiment 1: Rotation Robustness

Tested model performance when input point clouds are randomly rotated around all three axes (X, Y, Z) at different angles: 0°, 30°, 60°, 90°, and 180°.

Classification Results

Rotation Angle Accuracy Accuracy Drop
0° (baseline) 98.22% 0.00%
30° 78.28% -19.94%
60° 33.58% -64.64%
90° 29.07% -69.15%
180° 30.95% -67.26%

Segmentation Results

Rotation Angle Accuracy Accuracy Drop
0° (baseline) 90.45% 0.00%
30° 75.10% -15.35%
60° 55.46% -34.99%
90° 36.52% -53.92%
180° 29.23% -61.22%

Interpretation

Key Findings:

  • Severe Rotation Sensitivity: Both models show significant performance degradation with rotation. Even a 30° rotation causes a ~20% drop in classification accuracy and ~15% drop in segmentation accuracy.
  • Root Cause: PointNet is not rotation-invariant. The model learns features in the original coordinate system, and rotations change the absolute positions of points, breaking the learned feature representations. The max pooling operation is permutation-invariant but not rotation-invariant.
  • Implications: This demonstrates a major limitation of vanilla PointNet - it requires data augmentation with rotations during training, or the use of rotation-invariant features, to handle rotated inputs effectively.

Segmentation Visualization: Effect of Rotation

Below are visualizations showing how rotation affects segmentation quality. Each pair shows ground truth (top) and prediction (bottom) at different rotation angles:

0° (No Rotation)
GT 0 degrees

Ground Truth

Pred 0 degrees

Prediction - Accuracy: 90.45%

30° Rotation
GT 30 degrees

Ground Truth (same object)

Pred 30 degrees

Prediction - Accuracy: 75.10%

60° Rotation
GT 60 degrees

Ground Truth (same object)

Pred 60 degrees

Prediction - Accuracy: 55.46%

90° Rotation
GT 90 degrees

Ground Truth (same object)

Pred 90 degrees

Prediction - Accuracy: 36.52%

180° Rotation
GT 180 degrees

Ground Truth (same object)

Pred 180 degrees

Prediction - Accuracy: 29.23%

Experiment 2: Number of Points

Tested model performance with different numbers of points per object: 10000, 5000, 2000, 1000, and 500. Points were sampled using nested subsets to ensure fair comparison.

Classification Results

Number of Points Accuracy Accuracy Drop
10000 (baseline) 98.22% 0.00%
5000 98.11% -0.10%
2000 97.48% -0.73%
1000 97.38% -0.84%
500 96.96% -1.26%

Segmentation Results

Number of Points Accuracy Accuracy Drop
10000 (baseline) 90.45% 0.00%
5000 90.39% -0.05%
2000 90.22% -0.23%
1000 89.74% -0.71%
500 88.55% -1.89%

Interpretation

Key Findings:

  • Robust to Point Reduction: Both models show remarkable robustness to reducing the number of input points. Even with only 500 points (5% of original), classification maintains 96.96% accuracy and segmentation maintains 88.55% accuracy.
  • Why It Works: PointNet's max pooling operation is particularly effective here - it extracts the most salient features regardless of how many points contribute. As long as the key discriminative points are present, the model can make accurate predictions.
  • Practical Implications: This robustness is valuable for real-world applications where point cloud density may vary, or where computational efficiency requires downsampling.

Segmentation Visualization: Effect of Number of Points

Below are visualizations showing how segmentation quality is maintained even with significantly fewer points. Each pair shows ground truth (top) and prediction (bottom) at different point counts:

10000 Points (Baseline)
GT 10000 points

Ground Truth

Pred 10000 points

Prediction - Accuracy: 90.45%

5000 Points
GT 5000 points

Ground Truth (same object)

Pred 5000 points

Prediction - Accuracy: 90.39%

2000 Points
GT 2000 points

Ground Truth (same object)

Pred 2000 points

Prediction - Accuracy: 90.22%

1000 Points
GT 1000 points

Ground Truth (same object)

Pred 1000 points

Prediction - Accuracy: 89.74%

500 Points
GT 500 points

Ground Truth (same object)

Pred 500 points

Prediction - Accuracy: 88.55%

Q4. Bonus Question - Locality (20 points)

Implemented simplified PointNet++.

Model Implemented: PointNet++

PointNet++ addresses PointNet's limitation of lacking local structure modeling by:

Architecture Details

Classification Model (PointNet++): - SA1: Samples 512 centers, groups k=32 neighbors, outputs 128-dim features - SA2: Samples 128 centers, groups k=32 neighbors, outputs 512-dim features - Global max pooling + MLP classifier (512→256→128→3) Segmentation Model (PointNet++): - Per-point MLP: 3→64→64 - Local aggregation: k-NN (k=16) with relative coordinates - Global feature concatenation: local(128) + global(128) = 256 - Decoder: 256→256→128→6

Comparison Results

Classification Task

Model Accuracy Improvement
PointNet 98.22% -
PointNet++ 98.64% +0.42% (+0.43%)

Segmentation Task

Model Accuracy Improvement
PointNet 90.45% -
PointNet++ 88.44% -2.01% (-2.22%)

Analysis

Classification Results:

  • PointNet++ achieves a modest improvement (+0.42%) over PointNet for classification. The hierarchical local feature learning helps capture more discriminative features, especially for objects with complex geometric structures.
  • The improvement is relatively small because PointNet already performs very well (98.22%), leaving little room for improvement. However, PointNet++'s local aggregation may help with edge cases.

Segmentation Results:

  • Our implementation of PointNet++ performs slightly worse (-2.01%) than PointNet for segmentation. This is unexpected and can be attributed to several factors:
  • Simplified Architecture: The implemented PointNet++ segmentation model differs significantly from the original paper (Qi et al., 2017). While the classification model follows the hierarchical Set Abstraction architecture from the paper, the segmentation model uses a simplified approach that does NOT implement the full feature propagation (FP) architecture.
  • Missing Components: The original PointNet++ segmentation uses an hourglass architecture: SA↓ → SA↓ → SA↓ → FP↑ → FP↑ → FP↑. Our implementation instead uses: per-point MLP → local k-NN aggregation → global feature → decoder. This means we're missing:
    • Hierarchical downsampling with Set Abstraction layers
    • Feature Propagation (upsampling) layers with interpolation
    • Skip connections between encoder and decoder
    • Multi-scale grouping at different resolutions
  • Why This Simplification? The full PointNet++ segmentation is complex to implement without custom CUDA kernels for ball query and efficient FPS. Our simplified version captures the "spirit" of locality through k-NN grouping but doesn't have the hierarchical structure.
  • Training Issues: The more complex architecture with local neighborhoods requires more careful hyperparameter tuning. The model may not have been fully optimized for this specific task.

Key Takeaway: The classification PointNet++ follows the paper's architecture closely and shows improvement, while the segmentation PointNet++ is a simplified "local-enhanced PointNet" rather than true hierarchical PointNet++, explaining its lower performance.

Visualization Comparison: PointNet vs. PointNet++

Below are side-by-side comparisons showing qualitative differences between PointNet and PointNet++ segmentation:

Object 1

Ground Truth
GT Object 1

Ground truth segmentation

PointNet
PointNet Object 1

PointNet prediction

PointNet++
PointNet++ Object 1

PointNet++ prediction

Object 3

Ground Truth
GT Object 3

Ground truth segmentation

PointNet
PointNet Object 3

PointNet prediction

PointNet++
PointNet++ Object 3

PointNet++ prediction

Object 4

Ground Truth
GT Object 4

Ground truth segmentation

PointNet
PointNet Object 4

PointNet prediction

PointNet++
PointNet++ Object 4

PointNet++ prediction