ylchen
default values
test accuracy: 0.8992654774396642
Below I show a few random test point clouds (rendered as rotating point-cloud GIFs)
together with the predicted class. At least one failure example per class is included.
All GIFs are rendered using the evaluation script and saved under ./data/.
As a key, red is chair, green is vase, blue is lamp.
0: [1.0, 0.0, 0.0], # chair
1: [0.0, 1.0, 0.0], # vase
2: [0.0, 0.0, 1.0], # lamp
ran with default flags
test accuracy: 0.6381837925445705
I visualize segmentation for three representative test chairs (indices 0-2). For each object I show the ground-truth part labels and the predicted labels.
Most errors concentrate near part boundaries (e.g., between seat and back or between legs and base). Since the model only uses global pooling and no explicit local neighborhoods, it sometimes smooths labels over these ambiguous regions.
The following visualizations highlight cases where segmentation is noticeably worse, either due to aggressive subsampling or large rotations (also reused in the robustness experiments).
I ran two main robustness experiments:
(1) varying the number of input points per object, and
(2) rotating the point clouds around the z-axis.
All runs use the same trained checkpoints (model_epoch_0.pt for both tasks).
More detailed results are outlined in writeup.md
For both classification and segmentation, I subsampled each test point cloud
from the original 10k points down to {4096, 2048, 1024} points using random
sampling without replacement (via --num_points in
eval_cls.py and eval_seg.py). The model weights are
kept fixed; only the input density changes.
| #points / object | Test accuracy |
|---|---|
| 10 000 | 0.8993 |
| 4096 | 0.8982 |
| 2048 | 0.9014 |
| 1024 | 0.8993 |
The classification accuracy is very stable as long as a reasonable number of points are kept. This matches the intuition that PointNet aggregates features over the whole set; global max-pooling is fairly robust to moderate subsampling.
| #points / object | Test accuracy |
|---|---|
| 10 000 | 0.6382 |
| 4096 | 0.6314 |
| 2048 | 0.6231 |
| 1024 | 0.6125 |
For segmentation, performance degrades more noticeably as the number of points
decreases. Thin structures like legs become sparsely sampled, so the model has
fewer points to correctly label those regions. Qualitatively, the legs and
back edges become noisier and more fragmented in the low-point GIFs
(see gt_points*.gif and pred_points*.gif).
In the second experiment I fixed the number of points at 1024 and rotated each
test point cloud around the z-axis by {45°, 90°, 180°, 270°}, then evaluated the
trained models on the rotated sets. The scripts report both the (unrotated)
baseline test accuracy and the accuracy on the rotated data
(rotation test accuracy).
| Rotation | Baseline acc | Rotation acc |
|---|---|---|
| 0° | 0.8993 | — |
| 45° | 0.8993 | 0.7020 |
| 90° | 0.9003 | 0.3820 |
| 180° | 0.8982 | 0.4753 |
| 270° | 0.8972 | 0.3148 |
The classification model is not rotation-invariant: accuracy drops a lot for large rotations, especially at 90° and 270°. Since the model was trained on a fixed canonical orientation, the global features noticeably change under rotation, causing chairs to look more like other objects (vases or lamps) from unusual viewpoints.
| Rotation | Baseline acc | Rotation acc |
|---|---|---|
| 0° | ≈0.613 | — |
| 45° | 0.6140 | 0.5985 |
| 90° | 0.6129 | 0.4023 |
| 180° | 0.6133 | 0.1504 |
| 270° | 0.6130 | 0.3521 |
Segmentation accuracy is even more sensitive to rotation, and 180° rotation
is particularly destructive. Many points, especially on the seat/back interface,
flip to incorrect labels. In the corresponding GIFs
(gt_rot*.gif vs pred_rot*.gif) you can see entire
parts miscolored when the chair is upside-down or sideways.
Overall, both tasks are robust to moderate point subsampling but quite sensitive to large unseen rotations. This matches the design of PointNet, which is permutation-invariant but not inherently rotation-invariant.