HW 5¶

Q1. Classification Model (40 points)¶

Correct¶

chair vase lamp

Incorrect¶

gt:chair, pred:lamp gt:vase, pred:lamp gt:lamp, pred:vase

The misclassified chair was folded. The misclassified vase might be thought of a lamp because of the flower got mistaken for the arm of a desk lamp. The misclassified lamp had an ambigious shape.

Q2. Segmentation Model (40 points)¶

Good¶

GT Predicted Accuracy
0.954
0.988
0.876

Bad¶

GT Predicted Accuracy
0.544
0.522
0.595
0.548

Q3. Robustness Analysis (20 points)¶

Classification¶

Rotations¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: PointNet learns on absolute global coordinates which is quickly broken by rotation.

rotation overall acc
0, vec(0,0,1) 0.974
45, vec(0,0,1) 0.512
90, vec(0,0,1) 0.205
0¶
Rotation Pointcloud GT label Predicted label
0, vec(0,0,1) chair chair
45, vec(0,0,1) chair chair
90, vec(0,0,1) chair lamp
1¶
Rotation Pointcloud GT label Predicted label
0, vec(0,0,1) chair chair
45, vec(0,0,1) chair lamp
90, vec(0,0,1) chair lamp
617¶
Rotation Pointcloud GT label Predicted label
0, vec(0,0,1) vase vase
45, vec(0,0,1) vase vase
90, vec(0,0,1) vase vase
618¶
Rotation Pointcloud GT label Predicted label
0, vec(0,0,1) vase vase
45, vec(0,0,1) vase vase
90, vec(0,0,1) vase chair
719¶
Rotation Pointcloud GT label Predicted label
0, vec(0,0,1) lamp lamp
45, vec(0,0,1) lamp lamp
90, vec(0,0,1) lamp vase
720¶
Rotation Pointcloud GT label Predicted label
0, vec(0,0,1) lamp lamp
45, vec(0,0,1) lamp lamp
90, vec(0,0,1) lamp lamp

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q1 reference.
Interpretation: Due to use of Global Max Pooling, PointNet performs rather robust as long as there's around 100 points, enough to define the rough structure of the object such that its class may still be determined.

num of points overall acc
10 0.654
100 0.942
1000 0.969
10000 0.974
0¶
Num of points Pointcloud GT label Predicted label
10 chair chair
100 chair chair
1000 chair chair
10000 chair chair
1¶
Num of points Pointcloud GT label Predicted label
10 chair lamp
100 chair chair
1000 chair chair
10000 chair chair
617¶
Num of points Pointcloud GT label Predicted label
10 vase lamp
100 vase vase
1000 vase vase
10000 vase vase
618¶
Num of points Pointcloud GT label Predicted label
10 vase lamp
100 vase lamp
1000 vase vase
10000 vase vase
719¶
Num of points Pointcloud GT label Predicted label
10 lamp lamp
100 lamp lamp
1000 lamp lamp
10000 lamp lamp
720¶
Num of points Pointcloud GT label Predicted label
10 lamp lamp
100 lamp lamp
1000 lamp lamp
10000 lamp lamp

Segmentation¶

Rotation¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: Similar to classification, PointNet isn't rotationally invariant and performance decreases as test examples deviate further from train orientations.

rotation overall acc
0, vec(0,0,1) 0.902
45, vec(0,0,1) 0.598
90, vec(0,0,1) 0.462
0¶
rotation gt pred acc
0,vec(0,0,1) 0.954
45,vec(0,0,1) 0.766
90,vec(0,0,1) 0.523
1¶
rotation gt pred acc
0,vec(0,0,1) 0.988
45,vec(0,0,1) 0.602
90,vec(0,0,1) 0.523
2¶
rotation gt pred acc
0,vec(0,0,1) 0.876
45,vec(0,0,1) 0.551
90,vec(0,0,1) 0.389

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q2 reference.
Interpretation: Similar to classification, PointNet performs rather robust (>0.8 accuracy) as long as there's around 100 points.

num of points overall acc
10 0.565
100 0.819
1000 0.896
10000 0.902
0¶
num of points gt pred acc
10 0.700
100 0.920
1000 0.960
10000 0.954
1¶
num of points gt pred acc
10 0.400
100 0.970
1000 0.988
10000 0.988
2¶
num of points gt pred acc
10 0.700
100 0.920
1000 0.869
10000 0.876

Q4. Bonus Question - Locality (20 points)¶

Classification¶

Rotations¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: PointNet learns on absolute global coordinates which is quickly broken by rotation. PointTransformer is fine for moderate rotation since it uses relative positions (neighbors), but still kind of fails when rotated 90 degrees because vertical relationship gets turned into horizontal ones which is semantically different to the network.

rotation overall acc - PointNet overall acc - PointTransformer
0, vec(0,0,1) 0.974 0.963
45, vec(0,0,1) 0.512 0.855
90, vec(0,0,1) 0.205 0.643
0¶
Rotation Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
0, vec(0,0,1) chair chair chair
45, vec(0,0,1) chair chair chair
90, vec(0,0,1) chair lamp lamp
1¶
Rotation Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
0, vec(0,0,1) chair chair chair
45, vec(0,0,1) chair lamp chair
90, vec(0,0,1) chair lamp lamp
617¶
Rotation Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
0, vec(0,0,1) vase vase vase
45, vec(0,0,1) vase vase lamp
90, vec(0,0,1) vase vase chair
618¶
Rotation Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
0, vec(0,0,1) vase vase vase
45, vec(0,0,1) vase vase lamp
90, vec(0,0,1) vase chair chair
719¶
Rotation Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
0, vec(0,0,1) lamp lamp lamp
45, vec(0,0,1) lamp lamp lamp
90, vec(0,0,1) lamp vase chair
720¶
Rotation Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
0, vec(0,0,1) lamp lamp lamp
45, vec(0,0,1) lamp lamp lamp
90, vec(0,0,1) lamp lamp lamp

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q1 reference.
Interpretation: Due to use of Global Max Pooling, PointNet performs rather robust as long as there's around 100 points, enough to define the rough structure of the object such that its class may still be determined. PointTransformer suffers from catastrophic failure. Fixed $k=16$ neighbor probably made the network learns to equate that region to a tiny surface patch, and when we perform severe downsampling it made $k=16$ equate to the entire object. Trying to process global geometry as local ones results in garbage features.

num of points overall acc - PointNet overall acc - PointTransformer
10 0.654 0.362
100 0.942 0.169
1000 0.969 0.751
10000 0.974 0.965
0¶
Num of points Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
10 chair chair chair
100 chair chair vase
1000 chair chair chair
10000 chair chair chair
1¶
Num of points Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
10 chair lamp vase
100 chair chair vase
1000 chair chair chair
10000 chair chair chair
617¶
Num of points Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
10 vase lamp vase
100 vase vase vase
1000 vase vase chair
10000 vase vase vase
618¶
Num of points Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
10 vase lamp vase
100 vase lamp vase
1000 vase vase chair
10000 vase vase vase
719¶
Num of points Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
10 lamp lamp vase
100 lamp lamp vase
1000 lamp lamp chair
10000 lamp lamp lamp
720¶
Num of points Pointcloud GT label Predicted label - PointNet Predicted label - PointTransformer
10 lamp lamp vase
100 lamp lamp lamp
1000 lamp lamp chair
10000 lamp lamp lamp

Segmentation¶

Rotation¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: Similar to classification. PointTransformer performs slightly better at 45 degree probably due to the relative position encoding. Both fails eventually at 90 degrees because neither is truly rotationally invariant.

rotation overall acc - PointNet overall acc - PointTransformer
0, vec(0,0,1) 0.902 0.927
45, vec(0,0,1) 0.598 0.788
90, vec(0,0,1) 0.462 0.456
0¶
rotation gt pred - PointNet acc - PointNet pred - PointTransformer acc - PointTransformer
0,vec(0,0,1) 0.954 0.966
45,vec(0,0,1) 0.766 0.808
90,vec(0,0,1) 0.523 0.439
1¶
rotation gt pred - PointNet acc - PointNet pred - PointTransformer acc - PointTransformer
0,vec(0,0,1) 0.988 0.994
45,vec(0,0,1) 0.602 0.701
90,vec(0,0,1) 0.523 0.256
2¶
rotation gt pred - PointNet acc - PointNet pred - PointTransformer acc - PointTransformer
0,vec(0,0,1) 0.876 0.876
45,vec(0,0,1) 0.551 0.636
90,vec(0,0,1) 0.389 0.312

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q2 reference.
Interpretation: Similar to classification, PointNet performs rather robust (>0.8 accuracy) as long as there's around 100 points to recognize the rough skeleton of the object, thanks to its global max pooling. PointTransformer completely collapses at low density because the aggressive internal downsampling during encoder stage makes deeper layers run out of points, turning features into noise.

num of points overall acc - PointNet overall acc - PointTransformer
10 0.565 0.047
100 0.819 0.194
1000 0.896 0.664
10000 0.902 0.926
0¶
num of points gt pred - PointNet acc - PointNet pred - PointTransformer acc - PointTransformer
10 0.700 0.000
100 0.920 0.510
1000 0.960 0.724
10000 0.954 0.963
1¶
num of points gt pred - PointNet acc - PointNet pred - PointTransformer acc - PointTransformer
10 0.400 0.100
100 0.970 0.150
1000 0.988 0.196
10000 0.988 0.994
2¶
num of points gt pred - PointNet acc - PointNet pred - PointTransformer acc - PointTransformer
10 0.700 0.000
100 0.920 0.140
1000 0.869 0.548
10000 0.876 0.880