16-825: Learning for 3D Vision — Assignment 5

Manyung Emma Hon · mehon · Fall 2025

Q1. Classification Model

Overall Test Accuracy: 98.32%

Per-Class Performance

Class	Accuracy
Chair	99.84%
Vase	91.18%
Lamp	97.44%

Correct Predictions

Correct chair prediction — Chair prediction class of chair

Correct vase prediction — Vase prediction class of vase

Correct lamp prediction — Lamp prediction class of lamp

Failure Cases

True Class	Predicted Class	Analysis
Chair	Lamp	This chair has a vertical structure with thin legs and a tall back, which may resemble lamp-like features to the model.
Vase	Lamp	The narrow shape of this vase creates ambiguity with lamp structures, particularly given the vertical symmetry.
Lamp	Vase	This lamp has a wide base that shares geometric similarities with vase shapes, confusing the classifier.

Interpretation

The PointNet classification model achieves great overall performance (98.32%), with particularly strong results on chairs (99.84%). The model struggles slightly more with vases (91.18%), likely because of their high shape variability and similarity to lamps.

Common failure modes include:

Shape ambiguity: Objects with features spanning multiple categories (e.g., tall vases resembling lamps)
Structural similarity: Thin, vertical structures can be confused between chairs and lamps
Class imbalance: Vases have fewer training examples (102 vs 617 chairs), leading to lower accuracy

Q2. Segmentation Model

Overall Test Accuracy: 90.25%

Challenging Cases

Object 26 ground truth — Object 26 - Ground Truth

Object 26 prediction — Object 26 - Prediction
Accuracy: 44.40%

Analysis: This chair has complex, difficult to distinguish structures where the model struggles to maintain consistent part boundaries.

Object 351 ground truth — Object 351 - Ground Truth

Object 351 prediction — Object 351 - Prediction
Accuracy: 45.67%

Analysis: Ambiguous part boundaries and uneven point density contribute to segmentation errors.

High-Quality Segmentations

Object 397 ground truth — Object 397 - Ground Truth

Object 397 prediction — Object 397 - Prediction
Accuracy: 99.37%

Object 600 ground truth — Object 600 - Ground Truth

Object 600 prediction — Object 600 - Prediction
Accuracy: 99.43%

Object 471 ground truth — Object 471 - Ground Truth

Object 471 prediction — Object 471 - Prediction
Accuracy: 99.62%

Interpretation

The segmentation model achieves strong overall performance (90.25%), with a clear distinction between easy and challenging cases. Best predictions (99%+ accuracy) occur on chairs with clear geometric boundaries between parts. The model successfully segments well-separated components.

Challenging predictions (44-46% accuracy) reveal common difficulties:

Ambiguous transition regions between adjacent parts
Complex geometric designs with non-standard chair configurations

Q3. Robustness Analysis

Experiment 1: Rotation Robustness

Procedure: Rotated point clouds around the z-axis by varying degrees (0, 45, 90) and evaluated segmentation accuracy.

Results - Segmentation Task

Rotation Angle	Accuracy	Change from 0 degree
0 (Baseline)	90.25%	-
45	63.19%	-27.06%
90	38.12%	-52.13%

Visual Comparison

0 degree rotation — 0 rotation
Acc: 90.25%

45 degree rotation — 45 rotation
Acc: 63.19%

90 degree rotation — 90 rotation
Acc: 38.12%

Interpretation

The model shows significant degradation with rotation, losing over 50% accuracy at 90 rotation. This indicates the model has learned orientation-dependent features rather than rotation-invariant representations.

Potential improvements: Adding data augmentation with random rotations during training, or using rotation-invariant features (e.g., local reference frames, DGCNN's edge convolutions).

Experiment 2: Point Density Robustness

Procedure: Varied the number of points per object (500, 2,500, 10,000) to test how point cloud sparsity affects segmentation performance.

Results - Segmentation Task

Number of Points	Accuracy	Change from 10,000
10,000 (Baseline)	90.25%	-
2,500	90.22%	-0.03% (negligible)
500	89.21%	-1.04%

Visual Comparison

10000 points — 10,000 points
Acc: 90.25%

Interpretation

Key Findings:

The model demonstrates resilience to reduced point density. Even with only 2,500 points (75% reduction), accuracy remains pretty much unchanged (90.22% vs 90.25%).
With just 500 points (95% reduction), the model only loses 1.04% accuracy.
Comparison to rotation robustness: The model is far more robust to point sparsity than to rotation, indicating that the number of points matters less than their spatial orientation for this architecture.
Practical implications: This robustness enables deployment with lower-resolution sensors or real-time applications where computational constraints limit point cloud density.

Contrast with Experiment 1: Unlike rotation (58% accuracy drop at 90), point density reduction has minimal impact. This highlights that PointNet's learned features are more dependent on orientation than on dense sampling.

Q4. Bonus Question - Locality with DGCNN

Classification Results: DGCNN vs PointNet

Model	Overall Accuracy	Chair	Vase	Lamp
PointNet (Q1)	98.32%	99.84%	91.18%	97.44%
DGCNN (Q4)	97.59%	99.84%	82.35%	98.29%

Segmentation Results: DGCNN vs PointNet

Model	Overall Accuracy	Best Case	Worst Case
PointNet (Q2)	90.25%	99.62%	44.40%
DGCNN (Q4)	91.43%	100.00%	41.02%

Visual Comparisons

Segmentation Examples

High-quality case: Both models perform well, but DGCNN achieves near-perfect segmentation. Note that I had to reduce the number of points for DGCNN due to memory constraints.

PointNet segmentation — PointNet
Obj 397: 99.37%

DGCNN segmentation — DGCNN
Obj 397: 99.80%

Challenging case: Both models struggle with complex geometry

PointNet difficult case — PointNet
Obj 351: 45.67%

DGCNN difficult case — DGCNN
Obj 351: 41.02%

Classification Examples

Correct predictions: DGCNN successfully classifies objects across all categories

Analysis and Interpretation

Classification Performance

Unexpectedly, DGCNN performs slightly worse than PointNet (97.59% vs 98.32%).

Segmentation Performance

DGCNN shows improvement: 1.18% accuracy gain (90.25% --> 91.43%) demonstrates the value of local geometric features for part-level tasks.