Test accuracy: 0.9790136411332634
| Image | Predicted class | Ground truth class |
|---|---|---|
![]() |
Chair | Chair |
![]() |
Vase | Vase |
![]() |
Lamp | Lamp |
![]() |
Lamp | Chair |
![]() |
Lamp | Vase |
![]() |
Vase | Lamp |
Failure cases:
Test accuracy: 0.9044106969205835
| ID | GT | Pred | Accuracy |
|---|---|---|---|
| 0 | ![]() |
![]() |
0.9601 |
| 1 | ![]() |
![]() |
0.9876 |
| 2 | ![]() |
![]() |
0.9102 |
| 20 | ![]() |
![]() |
0.9816 |
| 200 | ![]() |
![]() |
0.7496 |
| 300 | ![]() |
![]() |
0.9582 |
| 500 | ![]() |
![]() |
0.7669 |
Bad cases:
Procedure: the point clouds are rotated by the same angle around all 3 axes with increasing magnitude.
| Angle (rad) | Classification accuracy | Segmentation accuracy |
|---|---|---|
| 0.0 | 0.9790 | 0.9044 |
| 0.1 | 0.9654 | 0.8722 |
| 0.3 | 0.8342 | 0.7320 |
| 0.6 | 0.2455 | 0.4922 |
Classification visualization at 0.6 rad:
| Image | Predicted class | Ground truth class |
|---|---|---|
![]() |
Chair | Chair |
![]() |
Vase | Vase |
![]() |
Lamp | Lamp |
![]() |
Lamp | Chair |
![]() |
Lamp | Vase |
![]() |
Vase | Lamp |
Segmentation visualization at 0.6 rad:
| ID | GT (unrotated) | Pred (unrotated) | Pred (rotated) | Accuracy (unrotated) | Accuracy (rotated) |
|---|---|---|---|---|---|
| 0 | ![]() |
![]() |
![]() |
0.9601 | 0.5481 |
| 1 | ![]() |
![]() |
![]() |
0.9876 | 0.6355 |
| 2 | ![]() |
![]() |
![]() |
0.9102 | 0.3823 |
| 20 | ![]() |
![]() |
![]() |
0.9816 | 0.5734 |
| 200 | ![]() |
![]() |
![]() |
0.7496 | 0.4502 |
| 300 | ![]() |
![]() |
![]() |
0.9582 | 0.5124 |
| 500 | ![]() |
![]() |
![]() |
0.7669 | 0.6411 |
The results are as expected that as the rotation gets larger, the accuracy drops for both classification and segmentation tasks. This is because the model does not have rotation invariance built in (T-net removed) while the data only contain upright poses.
The segmentation visualization shows an interesting thing when there is rotation: the segmentation is as if the object hasn't rotated. This shows that the model learned a mapping from the upright pose coordinates directly to the part labels, instead of learning the true semantic parts of the objects.
Procedure: I tested with 5 different number of points, from 10k (original) to 100.
| Number of points | Classification accuracy | Segmentation accuracy |
|---|---|---|
| 10000 | 0.9790 | 0.9044 |
| 5000 | 0.9769 | 0.9042 |
| 1000 | 0.9748 | 0.8943 |
| 500 | 0.9664 | 0.8731 |
| 100 | 0.8909 | 0.8145 |
Classification visualization at 100 points:
| Image | Predicted class | Ground truth class |
|---|---|---|
![]() |
Chair | Chair |
![]() |
Vase | Vase |
![]() |
Lamp | Lamp |
![]() |
Lamp | Chair |
![]() |
Lamp | Vase |
![]() |
Vase | Lamp |
Segmentation visualization at 100 points:
| ID | GT (10k) | Pred (10k) | Pred (100) | Accuracy (10k) | Accuracy (100) |
|---|---|---|---|---|---|
| 0 | ![]() |
![]() |
![]() |
0.9601 | 0.9 |
| 1 | ![]() |
![]() |
![]() |
0.9876 | 0.99 |
| 2 | ![]() |
![]() |
![]() |
0.9102 | 0.89 |
| 20 | ![]() |
![]() |
![]() |
0.9816 | 0.96 |
| 200 | ![]() |
![]() |
![]() |
0.7496 | 0.64 |
| 300 | ![]() |
![]() |
![]() |
0.9582 | 0.96 |
| 500 | ![]() |
![]() |
![]() |
0.7669 | 0.8 |
The results show that the model is quite robust to the number of points for both tasks, even though the accuracy generally drops as the number of points decreases. The deterioration is from the fact that important details of the shape disappear when there are fewer points. However, as long as there are still enough details, the model generally holds its quality. This robustness probably comes from the use of pooling operations, which helps the model focus on the global shape.