HW 5¶
Q1. Classification Model (40 points)¶
Correct¶
| chair | vase | lamp |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Incorrect¶
| gt:chair, pred:lamp | gt:vase, pred:lamp | gt:lamp, pred:vase |
|---|---|---|
![]() |
![]() |
![]() |
The misclassified chair was folded. The misclassified vase might be thought of a lamp because of the flower got mistaken for the arm of a desk lamp. The misclassified lamp had an ambigious shape.
Q3. Robustness Analysis (20 points)¶
Classification¶
Rotations¶
Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: PointNet learns on absolute global coordinates which is quickly broken by rotation.
| rotation | overall acc |
|---|---|
| 0, vec(0,0,1) | 0.974 |
| 45, vec(0,0,1) | 0.512 |
| 90, vec(0,0,1) | 0.205 |
0¶
| Rotation | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 0, vec(0,0,1) | ![]() |
chair | chair |
| 45, vec(0,0,1) | ![]() |
chair | chair |
| 90, vec(0,0,1) | ![]() |
chair | lamp |
1¶
| Rotation | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 0, vec(0,0,1) | ![]() |
chair | chair |
| 45, vec(0,0,1) | ![]() |
chair | lamp |
| 90, vec(0,0,1) | ![]() |
chair | lamp |
617¶
| Rotation | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 0, vec(0,0,1) | ![]() |
vase | vase |
| 45, vec(0,0,1) | ![]() |
vase | vase |
| 90, vec(0,0,1) | ![]() |
vase | vase |
618¶
| Rotation | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 0, vec(0,0,1) | ![]() |
vase | vase |
| 45, vec(0,0,1) | ![]() |
vase | vase |
| 90, vec(0,0,1) | ![]() |
vase | chair |
719¶
| Rotation | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 0, vec(0,0,1) | ![]() |
lamp | lamp |
| 45, vec(0,0,1) | ![]() |
lamp | lamp |
| 90, vec(0,0,1) | ![]() |
lamp | vase |
720¶
| Rotation | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 0, vec(0,0,1) | ![]() |
lamp | lamp |
| 45, vec(0,0,1) | ![]() |
lamp | lamp |
| 90, vec(0,0,1) | ![]() |
lamp | lamp |
Number of points¶
Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q1 reference.
Interpretation: Due to use of Global Max Pooling, PointNet performs rather robust as long as there's around 100 points, enough to define the rough structure of the object such that its class may still be determined.
| num of points | overall acc |
|---|---|
| 10 | 0.654 |
| 100 | 0.942 |
| 1000 | 0.969 |
| 10000 | 0.974 |
0¶
| Num of points | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 10 | ![]() |
chair | chair |
| 100 | ![]() |
chair | chair |
| 1000 | ![]() |
chair | chair |
| 10000 | ![]() |
chair | chair |
1¶
| Num of points | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 10 | ![]() |
chair | lamp |
| 100 | ![]() |
chair | chair |
| 1000 | ![]() |
chair | chair |
| 10000 | ![]() |
chair | chair |
617¶
| Num of points | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 10 | ![]() |
vase | lamp |
| 100 | ![]() |
vase | vase |
| 1000 | ![]() |
vase | vase |
| 10000 | ![]() |
vase | vase |
618¶
| Num of points | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 10 | ![]() |
vase | lamp |
| 100 | ![]() |
vase | lamp |
| 1000 | ![]() |
vase | vase |
| 10000 | ![]() |
vase | vase |
719¶
| Num of points | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 10 | ![]() |
lamp | lamp |
| 100 | ![]() |
lamp | lamp |
| 1000 | ![]() |
lamp | lamp |
| 10000 | ![]() |
lamp | lamp |
720¶
| Num of points | Pointcloud | GT label | Predicted label |
|---|---|---|---|
| 10 | ![]() |
lamp | lamp |
| 100 | ![]() |
lamp | lamp |
| 1000 | ![]() |
lamp | lamp |
| 10000 | ![]() |
lamp | lamp |
Segmentation¶
Rotation¶
Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: Similar to classification, PointNet isn't rotationally invariant and performance decreases as test examples deviate further from train orientations.
| rotation | overall acc |
|---|---|
| 0, vec(0,0,1) | 0.902 |
| 45, vec(0,0,1) | 0.598 |
| 90, vec(0,0,1) | 0.462 |
0¶
| rotation | gt | pred | acc |
|---|---|---|---|
| 0,vec(0,0,1) | ![]() |
![]() |
0.954 |
| 45,vec(0,0,1) | ![]() |
![]() |
0.766 |
| 90,vec(0,0,1) | ![]() |
![]() |
0.523 |
1¶
| rotation | gt | pred | acc |
|---|---|---|---|
| 0,vec(0,0,1) | ![]() |
![]() |
0.988 |
| 45,vec(0,0,1) | ![]() |
![]() |
0.602 |
| 90,vec(0,0,1) | ![]() |
![]() |
0.523 |
2¶
| rotation | gt | pred | acc |
|---|---|---|---|
| 0,vec(0,0,1) | ![]() |
![]() |
0.876 |
| 45,vec(0,0,1) | ![]() |
![]() |
0.551 |
| 90,vec(0,0,1) | ![]() |
![]() |
0.389 |
Number of points¶
Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q2 reference.
Interpretation: Similar to classification, PointNet performs rather robust (>0.8 accuracy) as long as there's around 100 points.
| num of points | overall acc |
|---|---|
| 10 | 0.565 |
| 100 | 0.819 |
| 1000 | 0.896 |
| 10000 | 0.902 |
0¶
| num of points | gt | pred | acc |
|---|---|---|---|
| 10 | ![]() |
![]() |
0.700 |
| 100 | ![]() |
![]() |
0.920 |
| 1000 | ![]() |
![]() |
0.960 |
| 10000 | ![]() |
![]() |
0.954 |
1¶
| num of points | gt | pred | acc |
|---|---|---|---|
| 10 | ![]() |
![]() |
0.400 |
| 100 | ![]() |
![]() |
0.970 |
| 1000 | ![]() |
![]() |
0.988 |
| 10000 | ![]() |
![]() |
0.988 |
2¶
| num of points | gt | pred | acc |
|---|---|---|---|
| 10 | ![]() |
![]() |
0.700 |
| 100 | ![]() |
![]() |
0.920 |
| 1000 | ![]() |
![]() |
0.869 |
| 10000 | ![]() |
![]() |
0.876 |
Q4. Bonus Question - Locality (20 points)¶
Classification¶
Rotations¶
Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: PointNet learns on absolute global coordinates which is quickly broken by rotation. PointTransformer is fine for moderate rotation since it uses relative positions (neighbors), but still kind of fails when rotated 90 degrees because vertical relationship gets turned into horizontal ones which is semantically different to the network.
| rotation | overall acc - PointNet | overall acc - PointTransformer |
|---|---|---|
| 0, vec(0,0,1) | 0.974 | 0.963 |
| 45, vec(0,0,1) | 0.512 | 0.855 |
| 90, vec(0,0,1) | 0.205 | 0.643 |
0¶
| Rotation | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 0, vec(0,0,1) | ![]() ![]() |
chair | chair | chair |
| 45, vec(0,0,1) | ![]() ![]() |
chair | chair | chair |
| 90, vec(0,0,1) | ![]() ![]() |
chair | lamp | lamp |
1¶
| Rotation | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 0, vec(0,0,1) | ![]() ![]() |
chair | chair | chair |
| 45, vec(0,0,1) | ![]() ![]() |
chair | lamp | chair |
| 90, vec(0,0,1) | ![]() ![]() |
chair | lamp | lamp |
617¶
| Rotation | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 0, vec(0,0,1) | ![]() ![]() |
vase | vase | vase |
| 45, vec(0,0,1) | ![]() ![]() |
vase | vase | lamp |
| 90, vec(0,0,1) | ![]() ![]() |
vase | vase | chair |
618¶
| Rotation | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 0, vec(0,0,1) | ![]() ![]() |
vase | vase | vase |
| 45, vec(0,0,1) | ![]() ![]() |
vase | vase | lamp |
| 90, vec(0,0,1) | ![]() ![]() |
vase | chair | chair |
719¶
| Rotation | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 0, vec(0,0,1) | ![]() ![]() |
lamp | lamp | lamp |
| 45, vec(0,0,1) | ![]() ![]() |
lamp | lamp | lamp |
| 90, vec(0,0,1) | ![]() ![]() |
lamp | vase | chair |
720¶
| Rotation | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 0, vec(0,0,1) | ![]() ![]() |
lamp | lamp | lamp |
| 45, vec(0,0,1) | ![]() ![]() |
lamp | lamp | lamp |
| 90, vec(0,0,1) | ![]() ![]() |
lamp | lamp | lamp |
Number of points¶
Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q1 reference.
Interpretation: Due to use of Global Max Pooling, PointNet performs rather robust as long as there's around 100 points, enough to define the rough structure of the object such that its class may still be determined. PointTransformer suffers from catastrophic failure. Fixed $k=16$ neighbor probably made the network learns to equate that region to a tiny surface patch, and when we perform severe downsampling it made $k=16$ equate to the entire object. Trying to process global geometry as local ones results in garbage features.
| num of points | overall acc - PointNet | overall acc - PointTransformer |
|---|---|---|
| 10 | 0.654 | 0.362 |
| 100 | 0.942 | 0.169 |
| 1000 | 0.969 | 0.751 |
| 10000 | 0.974 | 0.965 |
0¶
| Num of points | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 10 | ![]() ![]() |
chair | chair | chair |
| 100 | ![]() ![]() |
chair | chair | vase |
| 1000 | ![]() ![]() |
chair | chair | chair |
| 10000 | ![]() ![]() |
chair | chair | chair |
1¶
| Num of points | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 10 | ![]() ![]() |
chair | lamp | vase |
| 100 | ![]() ![]() |
chair | chair | vase |
| 1000 | ![]() ![]() |
chair | chair | chair |
| 10000 | ![]() ![]() |
chair | chair | chair |
617¶
| Num of points | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 10 | ![]() ![]() |
vase | lamp | vase |
| 100 | ![]() ![]() |
vase | vase | vase |
| 1000 | ![]() ![]() |
vase | vase | chair |
| 10000 | ![]() ![]() |
vase | vase | vase |
618¶
| Num of points | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 10 | ![]() ![]() |
vase | lamp | vase |
| 100 | ![]() ![]() |
vase | lamp | vase |
| 1000 | ![]() ![]() |
vase | vase | chair |
| 10000 | ![]() ![]() |
vase | vase | vase |
719¶
| Num of points | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 10 | ![]() ![]() |
lamp | lamp | vase |
| 100 | ![]() ![]() |
lamp | lamp | vase |
| 1000 | ![]() ![]() |
lamp | lamp | chair |
| 10000 | ![]() ![]() |
lamp | lamp | lamp |
720¶
| Num of points | Pointcloud | GT label | Predicted label - PointNet | Predicted label - PointTransformer |
|---|---|---|---|---|
| 10 | ![]() ![]() |
lamp | lamp | vase |
| 100 | ![]() ![]() |
lamp | lamp | lamp |
| 1000 | ![]() ![]() |
lamp | lamp | chair |
| 10000 | ![]() ![]() |
lamp | lamp | lamp |
Segmentation¶
Rotation¶
Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: Similar to classification. PointTransformer performs slightly better at 45 degree probably due to the relative position encoding. Both fails eventually at 90 degrees because neither is truly rotationally invariant.
| rotation | overall acc - PointNet | overall acc - PointTransformer |
|---|---|---|
| 0, vec(0,0,1) | 0.902 | 0.927 |
| 45, vec(0,0,1) | 0.598 | 0.788 |
| 90, vec(0,0,1) | 0.462 | 0.456 |
0¶
| rotation | gt | pred - PointNet | acc - PointNet | pred - PointTransformer | acc - PointTransformer |
|---|---|---|---|---|---|
| 0,vec(0,0,1) | ![]() |
![]() |
0.954 | ![]() |
0.966 |
| 45,vec(0,0,1) | ![]() |
![]() |
0.766 | ![]() |
0.808 |
| 90,vec(0,0,1) | ![]() |
![]() |
0.523 | ![]() |
0.439 |
1¶
| rotation | gt | pred - PointNet | acc - PointNet | pred - PointTransformer | acc - PointTransformer |
|---|---|---|---|---|---|
| 0,vec(0,0,1) | ![]() |
![]() |
0.988 | ![]() |
0.994 |
| 45,vec(0,0,1) | ![]() |
![]() |
0.602 | ![]() |
0.701 |
| 90,vec(0,0,1) | ![]() |
![]() |
0.523 | ![]() |
0.256 |
2¶
| rotation | gt | pred - PointNet | acc - PointNet | pred - PointTransformer | acc - PointTransformer |
|---|---|---|---|---|---|
| 0,vec(0,0,1) | ![]() |
![]() |
0.876 | ![]() |
0.876 |
| 45,vec(0,0,1) | ![]() |
![]() |
0.551 | ![]() |
0.636 |
| 90,vec(0,0,1) | ![]() |
![]() |
0.389 | ![]() |
0.312 |
Number of points¶
Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q2 reference.
Interpretation: Similar to classification, PointNet performs rather robust (>0.8 accuracy) as long as there's around 100 points to recognize the rough skeleton of the object, thanks to its global max pooling. PointTransformer completely collapses at low density because the aggressive internal downsampling during encoder stage makes deeper layers run out of points, turning features into noise.
| num of points | overall acc - PointNet | overall acc - PointTransformer |
|---|---|---|
| 10 | 0.565 | 0.047 |
| 100 | 0.819 | 0.194 |
| 1000 | 0.896 | 0.664 |
| 10000 | 0.902 | 0.926 |
0¶
| num of points | gt | pred - PointNet | acc - PointNet | pred - PointTransformer | acc - PointTransformer |
|---|---|---|---|---|---|
| 10 | ![]() |
![]() |
0.700 | ![]() |
0.000 |
| 100 | ![]() |
![]() |
0.920 | ![]() |
0.510 |
| 1000 | ![]() |
![]() |
0.960 | ![]() |
0.724 |
| 10000 | ![]() |
![]() |
0.954 | ![]() |
0.963 |
1¶
| num of points | gt | pred - PointNet | acc - PointNet | pred - PointTransformer | acc - PointTransformer |
|---|---|---|---|---|---|
| 10 | ![]() |
![]() |
0.400 | ![]() |
0.100 |
| 100 | ![]() |
![]() |
0.970 | ![]() |
0.150 |
| 1000 | ![]() |
![]() |
0.988 | ![]() |
0.196 |
| 10000 | ![]() |
![]() |
0.988 | ![]() |
0.994 |
2¶
| num of points | gt | pred - PointNet | acc - PointNet | pred - PointTransformer | acc - PointTransformer |
|---|---|---|---|---|---|
| 10 | ![]() |
![]() |
0.700 | ![]() |
0.000 |
| 100 | ![]() |
![]() |
0.920 | ![]() |
0.140 |
| 1000 | ![]() |
![]() |
0.869 | ![]() |
0.548 |
| 10000 | ![]() |
![]() |
0.876 | ![]() |
0.880 |









































































































































































