Visualization of random test point clouds with predicted classes:
| Point Cloud 1 | Point Cloud 2 | Point Cloud 3 |
|---|---|---|
Predicted: Chair |
Predicted: Lamp |
Predicted: Lamp |
Visualization of failure predictions for each class with interpretation:
| Failure Case |
|---|
Predicted: Lamp |
| Failure Case |
|---|
Predicted: Vase |
Interpretation: The model sometimes mixes up lamps and vases because they look similar - both are round and hollow. In this case, the lamp's round base looks a lot like a vase's shape.
Visualization of segmentation results for at least 5 objects (including 2 bad predictions) with corresponding ground truth:
| Ground Truth | Prediction |
|---|---|
![]() |
![]() |
Prediction Accuracy: 95.6%
Interpretation: The model works very well on this chair. It correctly identifies and separates the different parts like the legs, seat, and back.
| Ground Truth | Prediction |
|---|---|
![]() |
![]() |
Prediction Accuracy: 52.61%
Interpretation: The model has trouble telling where the base ends and the main body begins on this round object. The boundary between these parts is blurry, which leads to mistakes.
| Ground Truth | Prediction |
|---|---|
![]() |
![]() |
Prediction Accuracy: 79.39%
Interpretation: The model does pretty well overall, but it makes some mistakes where the seat and backrest meet. These areas look similar, so the model gets confused about which part is which.
| Ground Truth | Prediction |
|---|---|
![]() |
![]() |
Prediction Accuracy: 42.46%
Interpretation: This is a failure case. The model can't tell the difference between the seat and the legs, so it incorrectly labels many leg points as part of the seat.
| Ground Truth | Prediction |
|---|---|
![]() |
![]() |
Prediction Accuracy: 98.03%
Interpretation: The overall accuracy is very high, but if you look closely, there are small mistakes where the lamp shade connects to the base. The model has trouble with thin connecting parts.
Procedure: We rotated the input point clouds by 15 degrees and evaluated the model's performance on both classification and segmentation tasks to test robustness to geometric transformations.
| Sample 1 | Sample 2 | Sample 3 |
|---|---|---|
Predicted: Chair |
Predicted: Lamp |
Predicted: Lamp |
| Object 66 | Object 92 | Object 351 |
|---|---|---|
|
Ground Truth
Prediction
|
Ground Truth
Prediction
|
Ground Truth
Prediction
|
Interpretation: The model stays accurate even when objects are rotated by 15 degrees. It still correctly identifies and segments objects, showing that it learned features that don't change much with small rotations.
Procedure: We evaluated the model's performance with a different number of points per object (5000 points instead of the default 10000) to test robustness to point cloud density variations.
| Sample 1 | Sample 2 | Sample 3 |
|---|---|---|
Predicted: Chair |
Predicted: Lamp |
Predicted: Lamp |
| Object 66 | Object 92 | Object 351 |
|---|---|---|
|
Ground Truth
Prediction
|
Ground Truth
Prediction
|
Ground Truth
Prediction
|
Interpretation: The model works just as well with 5000 points as it does with fewer points. It still correctly identifies and segments objects because PointNet's design focuses on the most important features, no matter how many points are used. The primary reson for sampling invariance is due Global average pooling and shared weights. The primary reason of somewhat of rotation invariance is due to the T-Net architecture which allows the model to learn a rotation-invariant representation of the input data.
Implemented Model: PointNet++
Architecture Details: PointNet++ works by dividing the point cloud into smaller groups, then applying PointNet to each group to understand local patterns. This process is repeated at different scales to capture both local details and overall structure.
| Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 |
|---|---|---|---|---|
Predicted: Chair |
Predicted: Lamp |
Predicted: Lamp |
Predicted: Lamp |
Predicted: Lamp |
Comparison: PointNet++ is better at understanding local details, which helps it distinguish between different object parts. However, it still has trouble with objects that look very similar (like lamps and vases), just like the baseline model. Overall, it's more confident when objects have clearly different shapes.
Comparison of PointNet++ (with locality) vs PointNet (baseline) for all available objects:
| Object | Ground Truth | Prediction (PointNet++) | Baseline Prediction (PointNet) |
|---|---|---|---|
| Object 92 | ![]() |
![]() |
![]() |
| Object 351 | ![]() |
![]() |
![]() |
| Object 402 | ![]() |
![]() |
![]() |
| Object 426 | ![]() |
![]() |
![]() |
| Object 512 | ![]() |
![]() |
![]() |
Comparison: PointNet++ creates much cleaner boundaries between different parts compared to the baseline PointNet (see Object 426 for example). Because it learns from local patterns, it's better at figuring out where one part ends and another begins, even in tricky areas.