16-825 Assignment 5

ylchen

Q1 — Classification Model

Test Accuracy

default values

test accuracy: 0.8992654774396642

Visualized Test Examples

Below I show a few random test point clouds (rendered as rotating point-cloud GIFs) together with the predicted class. At least one failure example per class is included. All GIFs are rendered using the evaluation script and saved under ./data/. As a key, red is chair, green is vase, blue is lamp. 0: [1.0, 0.0, 0.0], # chair 1: [0.0, 1.0, 0.0], # vase 2: [0.0, 0.0, 1.0], # lamp

Correctly classified chair
Correctly classified chair
Correctly classified chair
Correctly classified vase
Correctly classified lamp

Failure Cases

lamp misclassified as vase
Failure (lamp). The model misclassifies this as a vase.
lamp misclassified
Failure (lamp). The model misclassifies this as a vase. The blobby figure of this object and its narrow height likely resulted in the misclassification.
vase misclassified
Failure (vase). The model misclassifies this as a chair. Likely due to the boxy shape of the object.

Q2 — Segmentation Model

Test Accuracy

ran with default flags

test accuracy: 0.6381837925445705

Per-object Segmentation Results

I visualize segmentation for three representative test chairs (indices 0-2). For each object I show the ground-truth part labels and the predicted labels.

Ground truth segmentation, object 0
Object 0 — Ground truth
Predicted segmentation, object 0
Object 0 — Prediction (per-point accuracy ≈ 0.64)
Ground truth segmentation, object 1
Object 1 — Ground truth
Predicted segmentation, object 1
Object 1 — Prediction (per-point accuracy ≈ 0.64)
Ground truth segmentation, object 2
Object 2 — Ground truth
Predicted segmentation, object 2
Object 2 — Prediction (per-point accuracy ≈ 0.64)

Most errors concentrate near part boundaries (e.g., between seat and back or between legs and base). Since the model only uses global pooling and no explicit local neighborhoods, it sometimes smooths labels over these ambiguous regions.

Additional “Bad” Segmentations

The following visualizations highlight cases where segmentation is noticeably worse, either due to aggressive subsampling or large rotations (also reused in the robustness experiments).

GT segmentation, 1024 points
Subsampled to 1024 points — ground truth
Pred segmentation, 1024 points
Subsampled to 1024 points — prediction.
Fine details on thin legs are lost; the model tends to over-smooth.
GT segmentation, rotated 90 degrees
Chair rotated 90° around z-axis — ground truth
Pred segmentation, rotated 90 degrees
Chair rotated 90° — prediction.
Rotation breaks some orientation cues; several seat/back points are mislabeled.

Q3 — Robustness Analysis

I ran two main robustness experiments: (1) varying the number of input points per object, and (2) rotating the point clouds around the z-axis. All runs use the same trained checkpoints (model_epoch_0.pt for both tasks). More detailed results are outlined in writeup.md

Experiment 1 — Varying Number of Points

For both classification and segmentation, I subsampled each test point cloud from the original 10k points down to {4096, 2048, 1024} points using random sampling without replacement (via --num_points in eval_cls.py and eval_seg.py). The model weights are kept fixed; only the input density changes.

Classification

#points / objectTest accuracy
10 0000.8993
40960.8982
20480.9014
10240.8993

The classification accuracy is very stable as long as a reasonable number of points are kept. This matches the intuition that PointNet aggregates features over the whole set; global max-pooling is fairly robust to moderate subsampling.

Segmentation

#points / objectTest accuracy
10 0000.6382
40960.6314
20480.6231
10240.6125

For segmentation, performance degrades more noticeably as the number of points decreases. Thin structures like legs become sparsely sampled, so the model has fewer points to correctly label those regions. Qualitatively, the legs and back edges become noisier and more fragmented in the low-point GIFs (see gt_points*.gif and pred_points*.gif).

Experiment 2 — Rotations Around the z-axis

In the second experiment I fixed the number of points at 1024 and rotated each test point cloud around the z-axis by {45°, 90°, 180°, 270°}, then evaluated the trained models on the rotated sets. The scripts report both the (unrotated) baseline test accuracy and the accuracy on the rotated data (rotation test accuracy).

Classification (1024 points)

RotationBaseline accRotation acc
0.8993
45°0.89930.7020
90°0.90030.3820
180°0.89820.4753
270°0.89720.3148

The classification model is not rotation-invariant: accuracy drops a lot for large rotations, especially at 90° and 270°. Since the model was trained on a fixed canonical orientation, the global features noticeably change under rotation, causing chairs to look more like other objects (vases or lamps) from unusual viewpoints.

Segmentation (1024 points)

RotationBaseline accRotation acc
≈0.613
45°0.61400.5985
90°0.61290.4023
180°0.61330.1504
270°0.61300.3521

Segmentation accuracy is even more sensitive to rotation, and 180° rotation is particularly destructive. Many points, especially on the seat/back interface, flip to incorrect labels. In the corresponding GIFs (gt_rot*.gif vs pred_rot*.gif) you can see entire parts miscolored when the chair is upside-down or sideways.

Qualitative Rotation Visuals

GT segmentation, rot45
Segmentation GT, 45° rotation
Pred segmentation, rot45
Segmentation prediction, 45° rotation (moderate quality)
GT segmentation, rot180
Segmentation GT, 180° rotation
Pred segmentation, rot180
Segmentation prediction, 180° rotation — severe confusion between parts, consistent with the large quantitative accuracy drop.

Overall, both tasks are robust to moderate point subsampling but quite sensitive to large unseen rotations. This matches the design of PointNet, which is permutation-invariant but not inherently rotation-invariant.