Author: Minghao Xu — Andrew ID: mxu3
Model: PointNet-like classifier (custom lightweight implementation)
Training: 30 epochs, batch size 16, Adam optimizer (lr=0.001)
Best test accuracy: 0.9738 (97.38%) — checkpoint: ./checkpoints/cls/best_model.pt
The implemented PointNet-like model reaches high accuracy (~97.4%) after 30 epochs. Visual inspection of failure examples indicates common causes: silhouettes shared across classes (vase vs lamp), sparse sampling in some views, and intra-class shape variability. Improving robustness could include data augmentation (rotations, jitter), ensemble models, or stronger local-feature architectures (PointNet++ / DGCNN).
Model: PointNet-style segmentation network (shared MLP + global feature concatenation)
Training: 30 epochs, batch size 8, Adam optimizer (lr=0.001)
Overall test accuracy: 0.8830 (88.30%) — checkpoint: ./checkpoints/seg/best_model.pt
Below are 5 example objects (first two are low-accuracy failures, the other three are high-accuracy examples). For each object we show the ground-truth segmentation and the model prediction (animated GIFs). Per-object accuracy is reported under each pair.
The segmentation model achieves reasonably high overall point-wise accuracy (~88%). The two low-accuracy examples show common failure modes: heavy class imbalance across parts (small semantic parts are easily missed), and ambiguous local geometry where the model confuses adjacent part labels. The high-accuracy examples demonstrate the model's ability to correctly capture both global shape and local part boundaries in many cases.
I ran two experiments to probe robustness for both tasks: (A) Vary the number of input points; (B) Rotate point clouds around the vertical (z) axis. Below I present compact, side-by-side GIF comparisons arranged in tables so each variant is directly comparable to the original.
Procedure: for each test object we subsampled the point cloud to 10000, 2048, 1024, 512 points (seeded) and measured test accuracy. The table below shows classification & segmentation accuracy, and for segmentation we show one example object (idx=0) with ground-truth and model prediction GIFs side-by-side.
| Num Points | Classification Acc | Segmentation Acc | Seg GT (idx=0) | Seg Pred (idx=0) |
|---|---|---|---|---|
| 10000 | 0.9738 | 0.8830 |
|
|
| 2048 | 0.9685 | 0.8818 |
|
|
| 1024 | 0.9706 | 0.8791 |
|
|
| 512 | 0.9601 | 0.8719 |
|
|
Reducing point count from 10k down to a few hundred slightly degrades performance for both tasks (numbers shown above). The GIFs illustrate that small local details can be missed at lower sampling densities, while global shape cues are still often preserved.
Procedure: we rotate test point clouds around z by angles 0°, 15°, 30°, 45°, 60° and measure accuracy. Below each row shows the numeric accuracy and one example object (idx=0) with rotated GT and prediction GIFs side-by-side — the GIFs are aligned and the same canvas size so rotation-only effects are visible without apparent shape-scaling artifacts.
| Angle | Classification Acc | Segmentation Acc | Seg GT (idx=0) | Seg Pred (idx=0) |
|---|---|---|---|---|
| 0° | 0.9748 | 0.8826 |
|
|
| 15° | 0.9622 | 0.8701 |
|
|
| 30° | 0.9202 | 0.7844 |
|
|
| 45° | 0.7261 | 0.7086 |
|
|
| 60° | 0.6211 | 0.6370 |
|
|
The GIFs above are rendered on fixed-size canvases so rotation shows orientation changes only; axis limits are kept consistent across variants to avoid perceived shape scaling. The numeric accuracy drop is reported alongside each row.
Model: DGCNN-style EdgeConv (local neighborhood feature aggregation via KNN)
Training (classification): 40 epochs, batch size 16, subsampled to 2048 points for graph construction. Best test accuracy: 0.9832 — checkpoint: ./checkpoints/cls/dgcnn/best_model.pt
Training (segmentation): 30 epochs, batch size 8, subsampled to 2048 points. Segmentation test accuracy (dgcnn): 0.9135 — checkpoint: ./checkpoints/seg/dgcnn/best_model.pt
Below are five example objects from the test set. Each row shows the ground-truth animated view (top) and the DGCNN prediction (bottom).
DGCNN improves both tasks: classification rose from ~97.38% to ~98.32%, and segmentation improved from ~88.30% to ~91.35% (point-wise). Visual comparisons show DGCNN is better at resolving local part boundaries in many cases, thanks to EdgeConv-style local aggregation. However, DGCNN requires subsampling (2048 points) to keep graph construction feasible; this is a trade-off between locality and memory/time.