PointNet for Classification and Segmentation

Implementing a PointNet-style architecture for object classification (chairs, vases, lamps) and semantic segmentation of chair parts, using raw point clouds as input.

Q1. Classification of Point Clouds

Overall test accuracy 97.90%

For the classification task, I implemented a PointNet-based model that processes unordered point sets using shared MLP layers and a global max-pooling operation. The network takes 10,000 3D points per object and predicts one of three classes: chair, vase, or lamp. The best checkpoint achieves 97.90% test accuracy.

Representative Correct Predictions

Below are a few randomly sampled test objects along with their predicted labels.

Lamp – correct prediction

Sample idx = 807 • GT: lamp • Pred: lamp

The large circular dish with a thin supporting rod is distinctive. The global max-pooled feature in PointNet captures this geometry well.

Chair – correct prediction

Sample idx = 86 • GT: chair • Pred: chair

The L-shaped backrest and seat form a clear structure shared across many training chairs, which the network recognizes robustly.

Vase – correct prediction

Sample idx = 683 • GT: vase • Pred: vase

Vases tend to be smooth, symmetric volumes. The classifier leverages these global shape properties rather than fine details.

Lamp variants – correct

Samples idx = 725, 829 • GT: lamp • Pred: lamp

Sample 725

Sample 829

Despite variations in stem length and dish shape, the global geometry is consistent enough for PointNet to classify these as lamps.

Failure Cases

While overall accuracy is high, some shapes are ambiguous under global pooling. Here are two representative misclassifications, one for lamp and one for vase.

Lamp misclassified as Vase

FAILURE • idx = 721 • GT: lamp • Pred: vase

This lamp has a large, hollow cylindrical outer shell that resembles a vase. The thin inner support and light source contribute little to the max-pooled feature, so the model confuses it with the vase class.

Vase misclassified as Lamp

FAILURE • idx = 618 • GT: vase • Pred: lamp

The tall, narrow shape with a pronounced indentation looks similar to some lamp reflectors. Without explicit local reasoning, the model relies on global geometry and swaps these two categories.

Overall, Q1 highlights a core tradeoff in PointNet: global pooling gives invariance to permutation and sampling, but can lose subtle structural cues needed to disambiguate unusual shapes.

Q2. Semantic Segmentation of Chair Parts

Per-point test accuracy 90.33%

For segmentation, I extended the PointNet backbone to predict a semantic label for each point in a chair point cloud. The network first extracts a per-point feature and a global feature via max-pooling, then concatenates them and applies additional shared MLP layers to output per-point logits over 6 chair part classes.

Evaluated on the test set, the model reaches a per-point accuracy of 90.33%. Performance is strong on canonical chairs but degrades on complex or ambiguous designs where parts overlap or share similar geometry.

High-Accuracy Chairs

Below are three objects with the highest per-object accuracies. Each example shows a 360° visualization of ground-truth versus predicted segmentation.

Object 397 – 99.59% accuracy

Ground Truth

Prediction

A very canonical chair with simple, well-separated parts. The model segments backrest, seat, and legs almost perfectly.

Object 471 – 99.33% accuracy

Ground Truth

Prediction

Thin legs and rectangular surfaces are labeled accurately, indicating that the network is robust to small-scale structures on otherwise simple shapes.

Object 616 – 99.27% accuracy

Ground Truth

Prediction

Even with a slightly rounded backrest, the model correctly distinguishes all chair components, showing good generalization to mild geometric variation.

Failure Cases

The worst-performing chairs reveal typical failure modes for PointNet segmentation: ambiguity at part boundaries and confusion across geometrically similar components.

Object 26 – 44.33% accuracy

Ground Truth

Prediction

This chair has complex overlapping geometry and thick supports. The network merges several parts and mislabels regions, likely because global max-pooling discards fine spatial relationships.

Object 351 – 47.23% accuracy

Ground Truth

Prediction

Thin vertical supports and closely spaced bars create ambiguous part boundaries. Without explicit neighborhood reasoning (as in PointNet++ or DGCNN), the model often confuses legs, supports, and backrest regions.

Overall, Q2 confirms that PointNet is strong at per-point labeling on simple shapes, but struggles when local geometric context becomes crucial. This motivates more local architectures (PointNet++, graph-based networks, transformers) explored in the bonus question.

Q3: Robustness Analysis

To evaluate how well the trained PointNet models generalize beyond the test distribution, I performed two robustness experiments: (1) rotation perturbations and (2) varying the number of input points. Both classification and segmentation models were evaluated.

Experiment 1 — Robustness to Rotation

I rotated every test point cloud around the z-axis by different angles (0°, 30°, 60°, 90°) and measured accuracy drops.

Rotation (°)	Classification Accuracy	Segmentation Accuracy
0°	0.9790	0.9033
30°	0.7251	0.6446
60°	0.2529	0.4519
90°	0.2204	0.3886

Visualization — Classification

0° Rotation — Failure Case

90° Rotation — Corrected Prediction

Visualization — Segmentation

Ground Truth

0° Prediction

90° Prediction

Interpretation: PointNet is not rotation-invariant — accuracy drops sharply after 30°, especially for classification. Since PointNet processes raw coordinates without canonical alignment or rotation augmentation, rotated shapes appear as different objects. Segmentation is slightly more robust but still degrades significantly.

Experiment 2 — Robustness to Number of Points

I randomly subsampled each test object to different point counts: 1024, 2048, 4096, and full 10000 points.

# Points	Classification Accuracy	Segmentation Accuracy
1024	0.9738	0.8996
2048	0.9790	0.9028
4096	0.9790	0.9035
10000	0.9790	0.9033

Interpretation: Performance stays stable even when using as few as 1024 points because PointNet’s max-pooling aggregation focuses on the most informative points. This shows PointNet is highly robust to point sparsity, unlike its weakness to rotation.

Q4. Bonus — Incorporating Locality (DGCNN)

For the bonus question, I implemented a Dynamic Graph CNN (DGCNN) to introduce local geometric reasoning using k-nearest-neighbor graphs. Unlike PointNet, which processes each point independently before global pooling, DGCNN captures local surface structures via edge features.

Model Implemented

DGCNN for classification: uses EdgeConv layers + global pooling
DGCNN for segmentation: per-point features + global context fusion
kNN graph recomputed at every layer (dynamic graph)

Performance Comparison

Model	Classification Accuracy	Segmentation Accuracy
PointNet (Q1/Q2)	97.90%	90.33%
DGCNN (Bonus)	98.85%	90.70%

DGCNN significantly improves classification accuracy (+0.95%). Segmentation also improves slightly, showing better local geometric reasoning.

DGCNN — Representative Visualizations

Below are sample predictions from the DGCNN classification model.

Correct — lamp

Correct — chair

Correct — chair

Failure — vase → lamp

Failure — lamp → vase

Most failures occur on extremely sparse or elongated shapes, where local neighborhoods become unstable and kNN edges fail to reflect true structure. However, DGCNN generally shows better robustness than PointNet.