PointNet for Classification and Segmentation
Q1. Classification of Point Clouds
For the classification task, I implemented a PointNet-based model that processes unordered point sets using shared MLP layers and a global max-pooling operation. The network takes 10,000 3D points per object and predicts one of three classes: chair, vase, or lamp. The best checkpoint achieves 97.90% test accuracy.
Representative Correct Predictions
Below are a few randomly sampled test objects along with their predicted labels.
The large circular dish with a thin supporting rod is distinctive. The global max-pooled feature in PointNet captures this geometry well.
The L-shaped backrest and seat form a clear structure shared across many training chairs, which the network recognizes robustly.
Vases tend to be smooth, symmetric volumes. The classifier leverages these global
shape properties rather than fine details.
Despite variations in stem length and dish shape, the global geometry is consistent enough for PointNet to classify these as lamps.
Failure Cases
While overall accuracy is high, some shapes are ambiguous under global pooling. Here are two representative misclassifications, one for lamp and one for vase.
This lamp has a large, hollow cylindrical outer shell that resembles a vase. The thin inner support and light source contribute little to the max-pooled feature, so the model confuses it with the vase class.
The tall, narrow shape with a pronounced indentation looks similar to some lamp reflectors. Without explicit local reasoning, the model relies on global geometry and swaps these two categories.
Overall, Q1 highlights a core tradeoff in PointNet: global pooling gives invariance to permutation and sampling, but can lose subtle structural cues needed to disambiguate unusual shapes.
Q2. Semantic Segmentation of Chair Parts
For segmentation, I extended the PointNet backbone to predict a semantic label for each point in a chair point cloud. The network first extracts a per-point feature and a global feature via max-pooling, then concatenates them and applies additional shared MLP layers to output per-point logits over 6 chair part classes.
Evaluated on the test set, the model reaches a per-point accuracy of 90.33%. Performance is strong on canonical chairs but degrades on complex or ambiguous designs where parts overlap or share similar geometry.
High-Accuracy Chairs
Below are three objects with the highest per-object accuracies. Each example shows a 360° visualization of ground-truth versus predicted segmentation.
A very canonical chair with simple, well-separated parts. The model segments backrest, seat, and legs almost perfectly.
Thin legs and rectangular surfaces are labeled accurately, indicating that the network is robust to small-scale structures on otherwise simple shapes.
Even with a slightly rounded backrest, the model correctly distinguishes all chair components, showing good generalization to mild geometric variation.
Failure Cases
The worst-performing chairs reveal typical failure modes for PointNet segmentation: ambiguity at part boundaries and confusion across geometrically similar components.
This chair has complex overlapping geometry and thick supports. The network merges several parts and mislabels regions, likely because global max-pooling discards fine spatial relationships.
Thin vertical supports and closely spaced bars create ambiguous part boundaries. Without explicit neighborhood reasoning (as in PointNet++ or DGCNN), the model often confuses legs, supports, and backrest regions.
Overall, Q2 confirms that PointNet is strong at per-point labeling on simple shapes, but struggles when local geometric context becomes crucial. This motivates more local architectures (PointNet++, graph-based networks, transformers) explored in the bonus question.
Q3: Robustness Analysis
To evaluate how well the trained PointNet models generalize beyond the test distribution, I performed two robustness experiments: (1) rotation perturbations and (2) varying the number of input points. Both classification and segmentation models were evaluated.
Experiment 1 — Robustness to Rotation
I rotated every test point cloud around the z-axis by different angles (0°, 30°, 60°, 90°) and measured accuracy drops.
| Rotation (°) | Classification Accuracy | Segmentation Accuracy |
|---|---|---|
| 0° | 0.9790 | 0.9033 |
| 30° | 0.7251 | 0.6446 |
| 60° | 0.2529 | 0.4519 |
| 90° | 0.2204 | 0.3886 |
Visualization — Classification
0° Rotation — Failure Case
90° Rotation — Corrected Prediction
Visualization — Segmentation
Ground Truth
0° Prediction
90° Prediction
Interpretation: PointNet is not rotation-invariant — accuracy drops sharply after 30°, especially for classification. Since PointNet processes raw coordinates without canonical alignment or rotation augmentation, rotated shapes appear as different objects. Segmentation is slightly more robust but still degrades significantly.
Experiment 2 — Robustness to Number of Points
I randomly subsampled each test object to different point counts: 1024, 2048, 4096, and full 10000 points.
| # Points | Classification Accuracy | Segmentation Accuracy |
|---|---|---|
| 1024 | 0.9738 | 0.8996 |
| 2048 | 0.9790 | 0.9028 |
| 4096 | 0.9790 | 0.9035 |
| 10000 | 0.9790 | 0.9033 |
Interpretation: Performance stays stable even when using as few as 1024 points because PointNet’s max-pooling aggregation focuses on the most informative points. This shows PointNet is highly robust to point sparsity, unlike its weakness to rotation.
Q4. Bonus — Incorporating Locality (DGCNN)
For the bonus question, I implemented a Dynamic Graph CNN (DGCNN) to introduce local geometric reasoning using k-nearest-neighbor graphs. Unlike PointNet, which processes each point independently before global pooling, DGCNN captures local surface structures via edge features.
Model Implemented
- DGCNN for classification: uses EdgeConv layers + global pooling
- DGCNN for segmentation: per-point features + global context fusion
- kNN graph recomputed at every layer (dynamic graph)
Performance Comparison
| Model | Classification Accuracy | Segmentation Accuracy |
|---|---|---|
| PointNet (Q1/Q2) | 97.90% | 90.33% |
| DGCNN (Bonus) | 98.85% | 90.70% |
DGCNN significantly improves classification accuracy (+0.95%). Segmentation also improves slightly, showing better local geometric reasoning.
DGCNN — Representative Visualizations
Below are sample predictions from the DGCNN classification model.
Correct — lamp
Correct — chair
Correct — chair
Failure — vase → lamp
Failure — vase → lamp
Failure — lamp → vase
Most failures occur on extremely sparse or elongated shapes, where local neighborhoods become unstable and kNN edges fail to reflect true structure. However, DGCNN generally shows better robustness than PointNet.