Assignment 5: PointNet for Classification and Segmentation

Q1. Classification Model (40 points)

Implemented a PointNet-based classification model to classify point clouds into three categories: chairs, vases, and lamps.

Model Architecture

The classification model follows the PointNet architecture:

Shared MLP: Three 1D convolutional layers (3→64→128→1024) with BatchNorm and ReLU
Global Feature: Max pooling over all points to extract global feature vector
Classifier: Fully connected layers (1024→512→256→3) with BatchNorm, ReLU, and Dropout

Test Accuracy

Final Test Accuracy: 98.22% (0.9822)

The model achieved excellent performance on the test set, correctly classifying 98.22% of the point cloud objects.

Q2. Segmentation Model (40 points)

Implemented a PointNet-based segmentation model to perform per-point semantic segmentation on chair point clouds with 6 semantic classes.

Model Architecture

The segmentation model uses an encoder-decoder architecture:

Encoder: Three 1D convolutional layers (3→64→128→1024) to extract point features
Global Feature: Max pooling to get global context (1024-dim vector)
Feature Concatenation: Concatenate local features (64-dim) with global feature (1024-dim) → 1088-dim
Decoder: MLP layers (1088→512→256→128→6) to predict per-point class labels

Test Accuracy

Final Test Accuracy: 90.45% (0.9045)

The model correctly segments 90.45% of all points across all test objects.

Segmentation Results

Visualized segmentation results for 5 objects, including 2 failure cases:

Object 0

Ground Truth

Prediction

Object 1

Ground Truth

Prediction

Object 2

Ground Truth

Prediction

Object 3

Ground Truth

Prediction

Object 4 (Failure case on armrests)

Ground Truth

Prediction

Object 5

Ground Truth

Prediction

Interpretation

The PointNet segmentation model achieves good overall performance (90.45% accuracy) by combining local point features with global context. Key observations:

Good Cases (Objects 0, 1, 2, 3, 5): The model correctly segments most parts of the chairs, with clear boundaries between different semantic regions (seat, backrest, legs, etc.). The global feature provides useful context for disambiguating similar local geometries.
Failure Case - Object 4: This chair exhibits a notable segmentation error where the model incorrectly predicts parts of the flat chair seat as chair arms (armrests). This failure can be attributed to two main factors:
- Dataset Bias: The training data likely contains a majority of armchairs (as seen in Objects 0-3), causing the model to develop a bias toward predicting arm structures on the sides of chairs.
- Poor Local Information Extraction: PointNet's reliance on global max pooling limits its ability to capture fine-grained local geometric differences between chair seats and armrests. The slightly distinguished local geometry of these regions is not adequately captured by the model's point-wise features, leading to confusion between semantically different but geometrically similar regions.
Limitations: PointNet's lack of explicit local neighborhood modeling means it may miss fine-grained details at region boundaries. The max pooling operation aggregates information globally but may not preserve important local spatial relationships needed for precise segmentation, particularly when distinguishing between parts with subtle geometric differences.

Q3. Robustness Analysis (20 points)

Conducted two experiments to analyze the robustness of the learned models: rotation robustness and sensitivity to the number of input points.

Experiment 1: Rotation Robustness

Tested model performance when input point clouds are randomly rotated around all three axes (X, Y, Z) at different angles: 0°, 30°, 60°, 90°, and 180°.

Classification Results

Rotation Angle	Accuracy	Accuracy Drop
0° (baseline)	98.22%	0.00%
30°	78.28%	-19.94%
60°	33.58%	-64.64%
90°	29.07%	-69.15%
180°	30.95%	-67.26%

Segmentation Results

Rotation Angle	Accuracy	Accuracy Drop
0° (baseline)	90.45%	0.00%
30°	75.10%	-15.35%
60°	55.46%	-34.99%
90°	36.52%	-53.92%
180°	29.23%	-61.22%

Interpretation

Key Findings:

Severe Rotation Sensitivity: Both models show significant performance degradation with rotation. Even a 30° rotation causes a ~20% drop in classification accuracy and ~15% drop in segmentation accuracy.
Root Cause: PointNet is not rotation-invariant. The model learns features in the original coordinate system, and rotations change the absolute positions of points, breaking the learned feature representations. The max pooling operation is permutation-invariant but not rotation-invariant.
Implications: This demonstrates a major limitation of vanilla PointNet - it requires data augmentation with rotations during training, or the use of rotation-invariant features, to handle rotated inputs effectively.

Segmentation Visualization: Effect of Rotation

Below are visualizations showing how rotation affects segmentation quality. Each pair shows ground truth (top) and prediction (bottom) at different rotation angles:

0° (No Rotation)

Ground Truth

Prediction - Accuracy: 90.45%

30° Rotation

Ground Truth (same object)

Prediction - Accuracy: 75.10%

60° Rotation

Ground Truth (same object)

Prediction - Accuracy: 55.46%

90° Rotation

Ground Truth (same object)

Prediction - Accuracy: 36.52%

180° Rotation

Ground Truth (same object)

Prediction - Accuracy: 29.23%

Experiment 2: Number of Points

Tested model performance with different numbers of points per object: 10000, 5000, 2000, 1000, and 500. Points were sampled using nested subsets to ensure fair comparison.

Classification Results

Number of Points	Accuracy	Accuracy Drop
10000 (baseline)	98.22%	0.00%
5000	98.11%	-0.10%
2000	97.48%	-0.73%
1000	97.38%	-0.84%
500	96.96%	-1.26%

Segmentation Results

Number of Points	Accuracy	Accuracy Drop
10000 (baseline)	90.45%	0.00%
5000	90.39%	-0.05%
2000	90.22%	-0.23%
1000	89.74%	-0.71%
500	88.55%	-1.89%

Interpretation

Key Findings:

Robust to Point Reduction: Both models show remarkable robustness to reducing the number of input points. Even with only 500 points (5% of original), classification maintains 96.96% accuracy and segmentation maintains 88.55% accuracy.
Why It Works: PointNet's max pooling operation is particularly effective here - it extracts the most salient features regardless of how many points contribute. As long as the key discriminative points are present, the model can make accurate predictions.
Practical Implications: This robustness is valuable for real-world applications where point cloud density may vary, or where computational efficiency requires downsampling.

Segmentation Visualization: Effect of Number of Points

Below are visualizations showing how segmentation quality is maintained even with significantly fewer points. Each pair shows ground truth (top) and prediction (bottom) at different point counts:

10000 Points (Baseline)

Ground Truth

Prediction - Accuracy: 90.45%

5000 Points

Ground Truth (same object)

Prediction - Accuracy: 90.39%

2000 Points

Ground Truth (same object)

Prediction - Accuracy: 90.22%

1000 Points

Ground Truth (same object)

Prediction - Accuracy: 89.74%

500 Points

Ground Truth (same object)

Prediction - Accuracy: 88.55%

Q4. Bonus Question - Locality (20 points)

Implemented simplified PointNet++.

Model Implemented: PointNet++

PointNet++ addresses PointNet's limitation of lacking local structure modeling by:

Hierarchical Feature Learning: Uses Set Abstraction (SA) layers that sample representative points and group local neighborhoods
Local Aggregation: For each sampled point, aggregates features from k nearest neighbors using PointNet-style MLPs
Multi-Scale Processing: Processes point clouds at multiple scales (e.g., 10000→512→128 points) to capture both local and global features

Architecture Details

Classification Model (PointNet++):
- SA1: Samples 512 centers, groups k=32 neighbors, outputs 128-dim features
- SA2: Samples 128 centers, groups k=32 neighbors, outputs 512-dim features
- Global max pooling + MLP classifier (512→256→128→3)

Segmentation Model (PointNet++):
- Per-point MLP: 3→64→64
- Local aggregation: k-NN (k=16) with relative coordinates
- Global feature concatenation: local(128) + global(128) = 256
- Decoder: 256→256→128→6
            

Comparison Results

Classification Task

Model	Accuracy	Improvement
PointNet	98.22%	-
PointNet++	98.64%	+0.42% (+0.43%)

Segmentation Task

Model	Accuracy	Improvement
PointNet	90.45%	-
PointNet++	88.44%	-2.01% (-2.22%)

Analysis

Classification Results:

PointNet++ achieves a modest improvement (+0.42%) over PointNet for classification. The hierarchical local feature learning helps capture more discriminative features, especially for objects with complex geometric structures.
The improvement is relatively small because PointNet already performs very well (98.22%), leaving little room for improvement. However, PointNet++'s local aggregation may help with edge cases.

Segmentation Results:

Our implementation of PointNet++ performs slightly worse (-2.01%) than PointNet for segmentation. This is unexpected and can be attributed to several factors:
Simplified Architecture: The implemented PointNet++ segmentation model differs significantly from the original paper (Qi et al., 2017). While the classification model follows the hierarchical Set Abstraction architecture from the paper, the segmentation model uses a simplified approach that does NOT implement the full feature propagation (FP) architecture.
Missing Components: The original PointNet++ segmentation uses an hourglass architecture: SA↓ → SA↓ → SA↓ → FP↑ → FP↑ → FP↑. Our implementation instead uses: per-point MLP → local k-NN aggregation → global feature → decoder. This means we're missing:
- Hierarchical downsampling with Set Abstraction layers
- Feature Propagation (upsampling) layers with interpolation
- Skip connections between encoder and decoder
- Multi-scale grouping at different resolutions
Why This Simplification? The full PointNet++ segmentation is complex to implement without custom CUDA kernels for ball query and efficient FPS. Our simplified version captures the "spirit" of locality through k-NN grouping but doesn't have the hierarchical structure.
Training Issues: The more complex architecture with local neighborhoods requires more careful hyperparameter tuning. The model may not have been fully optimized for this specific task.

Key Takeaway: The classification PointNet++ follows the paper's architecture closely and shows improvement, while the segmentation PointNet++ is a simplified "local-enhanced PointNet" rather than true hierarchical PointNet++, explaining its lower performance.

Visualization Comparison: PointNet vs. PointNet++

Below are side-by-side comparisons showing qualitative differences between PointNet and PointNet++ segmentation:

Object 1

Ground Truth

Ground truth segmentation

PointNet

PointNet prediction

PointNet++

PointNet++ prediction

Object 3

Ground Truth

Ground truth segmentation

PointNet

PointNet prediction

PointNet++

PointNet++ prediction

Object 4

Ground Truth

Ground truth segmentation

PointNet

PointNet prediction

PointNet++

PointNet++ prediction

Assignment 5: PointNet for Classification and Segmentation

Table of Contents

Q1. Classification Model (40 points)

Model Architecture

Test Accuracy

Final Test Accuracy: 98.22% (0.9822)

Q2. Segmentation Model (40 points)

Model Architecture

Test Accuracy

Final Test Accuracy: 90.45% (0.9045)

Segmentation Results

Object 0

Object 1

Object 2

Object 3

Object 4 (Failure case on armrests)

Object 5

Interpretation

Q3. Robustness Analysis (20 points)

Experiment 1: Rotation Robustness

Classification Results

Segmentation Results

Interpretation

Segmentation Visualization: Effect of Rotation

0° (No Rotation)

30° Rotation

60° Rotation

90° Rotation

180° Rotation

Experiment 2: Number of Points

Classification Results

Segmentation Results

Interpretation

Segmentation Visualization: Effect of Number of Points

10000 Points (Baseline)

5000 Points

2000 Points

1000 Points

500 Points

Q4. Bonus Question - Locality (20 points)

Model Implemented: PointNet++

Architecture Details

Comparison Results

Classification Task

Segmentation Task

Analysis

Visualization Comparison: PointNet vs. PointNet++

Object 1

Ground Truth

PointNet

PointNet++

Object 3

Ground Truth

PointNet

PointNet++

Object 4

Ground Truth

PointNet

PointNet++