16-825 Assignment 5 — PointNet Classification & Segmentation

Q1 — Classification Model

Test Accuracy

default values

test accuracy: 0.8992654774396642

Visualized Test Examples

Below I show a few random test point clouds (rendered as rotating point-cloud GIFs) together with the predicted class. At least one failure example per class is included. All GIFs are rendered using the evaluation script and saved under ./data/. As a key, red is chair, green is vase, blue is lamp. 0: [1.0, 0.0, 0.0], # chair 1: [0.0, 1.0, 0.0], # vase 2: [0.0, 0.0, 1.0], # lamp

Failure Cases

lamp misclassified as vase — Failure (lamp). The model misclassifies this as a vase.

lamp misclassified — Failure (lamp). The model misclassifies this as a vase. The blobby figure of this object and its narrow height likely resulted in the misclassification.

vase misclassified — Failure (vase). The model misclassifies this as a chair. Likely due to the boxy shape of the object.

Q2 — Segmentation Model

Test Accuracy

ran with default flags

test accuracy: 0.6381837925445705

Per-object Segmentation Results

I visualize segmentation for three representative test chairs (indices 0-2). For each object I show the ground-truth part labels and the predicted labels.

Ground truth segmentation, object 0 — Object 0 — Ground truth

Predicted segmentation, object 0 — Object 0 — Ground truth

Ground truth segmentation, object 1 — Object 1 — Ground truth

Predicted segmentation, object 1 — Object 1 — Ground truth

Ground truth segmentation, object 2 — Object 2 — Ground truth

Predicted segmentation, object 2 — Object 2 — Ground truth

Most errors concentrate near part boundaries (e.g., between seat and back or between legs and base). Since the model only uses global pooling and no explicit local neighborhoods, it sometimes smooths labels over these ambiguous regions.

Additional “Bad” Segmentations

The following visualizations highlight cases where segmentation is noticeably worse, either due to aggressive subsampling or large rotations (also reused in the robustness experiments).

GT segmentation, 1024 points — Subsampled to 1024 points — ground truth

Pred segmentation, 1024 points — Subsampled to 1024 points — ground truth

GT segmentation, rotated 90 degrees — Chair rotated 90° around z-axis — ground truth

Pred segmentation, rotated 90 degrees — Chair rotated 90° around z-axis — ground truth

Q3 — Robustness Analysis

I ran two main robustness experiments: (1) varying the number of input points per object, and (2) rotating the point clouds around the z-axis. All runs use the same trained checkpoints (model_epoch_0.pt for both tasks). More detailed results are outlined in writeup.md

Experiment 1 — Varying Number of Points

For both classification and segmentation, I subsampled each test point cloud from the original 10k points down to {4096, 2048, 1024} points using random sampling without replacement (via --num_points in eval_cls.py and eval_seg.py). The model weights are kept fixed; only the input density changes.

Classification

#points / object	Test accuracy
10 000	0.8993
4096	0.8982
2048	0.9014
1024	0.8993

The classification accuracy is very stable as long as a reasonable number of points are kept. This matches the intuition that PointNet aggregates features over the whole set; global max-pooling is fairly robust to moderate subsampling.

Segmentation

#points / object	Test accuracy
10 000	0.6382
4096	0.6314
2048	0.6231
1024	0.6125

For segmentation, performance degrades more noticeably as the number of points decreases. Thin structures like legs become sparsely sampled, so the model has fewer points to correctly label those regions. Qualitatively, the legs and back edges become noisier and more fragmented in the low-point GIFs (see gt_points*.gif and pred_points*.gif).

Experiment 2 — Rotations Around the z-axis

In the second experiment I fixed the number of points at 1024 and rotated each test point cloud around the z-axis by {45°, 90°, 180°, 270°}, then evaluated the trained models on the rotated sets. The scripts report both the (unrotated) baseline test accuracy and the accuracy on the rotated data (rotation test accuracy).

Classification (1024 points)

Rotation	Baseline acc	Rotation acc
0°	0.8993	—
45°	0.8993	0.7020
90°	0.9003	0.3820
180°	0.8982	0.4753
270°	0.8972	0.3148

The classification model is not rotation-invariant: accuracy drops a lot for large rotations, especially at 90° and 270°. Since the model was trained on a fixed canonical orientation, the global features noticeably change under rotation, causing chairs to look more like other objects (vases or lamps) from unusual viewpoints.

Segmentation (1024 points)

Rotation	Baseline acc	Rotation acc
0°	≈0.613	—
45°	0.6140	0.5985
90°	0.6129	0.4023
180°	0.6133	0.1504
270°	0.6130	0.3521

Segmentation accuracy is even more sensitive to rotation, and 180° rotation is particularly destructive. Many points, especially on the seat/back interface, flip to incorrect labels. In the corresponding GIFs (gt_rot*.gif vs pred_rot*.gif) you can see entire parts miscolored when the chair is upside-down or sideways.

Qualitative Rotation Visuals

GT segmentation, rot45 — Segmentation GT, 45° rotation

Pred segmentation, rot45 — Segmentation GT, 45° rotation

GT segmentation, rot180 — Segmentation GT, 180° rotation

Pred segmentation, rot180 — Segmentation GT, 180° rotation

Overall, both tasks are robust to moderate point subsampling but quite sensitive to large unseen rotations. This matches the design of PointNet, which is permutation-invariant but not inherently rotation-invariant.