HW 5¶

Q1. Classification Model (40 points)¶

Correct¶

chair	vase	lamp

Incorrect¶

gt:chair, pred:lamp	gt:vase, pred:lamp	gt:lamp, pred:vase

The misclassified chair was folded. The misclassified vase might be thought of a lamp because of the flower got mistaken for the arm of a desk lamp. The misclassified lamp had an ambigious shape.

Q2. Segmentation Model (40 points)¶

Good¶

GT	Predicted	Accuracy
		0.954
		0.988
		0.876

Bad¶

GT	Predicted	Accuracy
		0.544
		0.522
		0.595
		0.548

Q3. Robustness Analysis (20 points)¶

Classification¶

Rotations¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: PointNet learns on absolute global coordinates which is quickly broken by rotation.

rotation	overall acc
0, vec(0,0,1)	0.974
45, vec(0,0,1)	0.512
90, vec(0,0,1)	0.205

0¶

Rotation	GT label	Predicted label
0, vec(0,0,1)	chair	chair
45, vec(0,0,1)	chair	chair
90, vec(0,0,1)	chair	lamp

1¶

Rotation	GT label	Predicted label
0, vec(0,0,1)	chair	chair
45, vec(0,0,1)	chair	lamp
90, vec(0,0,1)	chair	lamp

617¶

Rotation	GT label	Predicted label
0, vec(0,0,1)	vase	vase
45, vec(0,0,1)	vase	vase
90, vec(0,0,1)	vase	vase

618¶

Rotation	GT label	Predicted label
0, vec(0,0,1)	vase	vase
45, vec(0,0,1)	vase	vase
90, vec(0,0,1)	vase	chair

719¶

Rotation	GT label	Predicted label
0, vec(0,0,1)	lamp	lamp
45, vec(0,0,1)	lamp	lamp
90, vec(0,0,1)	lamp	vase

720¶

Rotation	GT label	Predicted label
0, vec(0,0,1)	lamp	lamp
45, vec(0,0,1)	lamp	lamp
90, vec(0,0,1)	lamp	lamp

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q1 reference.
Interpretation: Due to use of Global Max Pooling, PointNet performs rather robust as long as there's around 100 points, enough to define the rough structure of the object such that its class may still be determined.

num of points	overall acc
10	0.654
100	0.942
1000	0.969
10000	0.974

0¶

Num of points	GT label	Predicted label
10	chair	chair
100	chair	chair
1000	chair	chair
10000	chair	chair

1¶

Num of points	GT label	Predicted label
10	chair	lamp
100	chair	chair
1000	chair	chair
10000	chair	chair

617¶

Num of points	GT label	Predicted label
10	vase	lamp
100	vase	vase
1000	vase	vase
10000	vase	vase

618¶

Num of points	GT label	Predicted label
10	vase	lamp
100	vase	lamp
1000	vase	vase
10000	vase	vase

719¶

Num of points	GT label	Predicted label
10	lamp	lamp
100	lamp	lamp
1000	lamp	lamp
10000	lamp	lamp

720¶

Num of points	GT label	Predicted label
10	lamp	lamp
100	lamp	lamp
1000	lamp	lamp
10000	lamp	lamp

Segmentation¶

Rotation¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: Similar to classification, PointNet isn't rotationally invariant and performance decreases as test examples deviate further from train orientations.

rotation	overall acc
0, vec(0,0,1)	0.902
45, vec(0,0,1)	0.598
90, vec(0,0,1)	0.462

0¶

rotation	gt	pred	acc
0,vec(0,0,1)			0.954
45,vec(0,0,1)			0.766
90,vec(0,0,1)			0.523

1¶

rotation	gt	pred	acc
0,vec(0,0,1)			0.988
45,vec(0,0,1)			0.602
90,vec(0,0,1)			0.523

2¶

rotation	gt	pred	acc
0,vec(0,0,1)			0.876
45,vec(0,0,1)			0.551
90,vec(0,0,1)			0.389

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q2 reference.
Interpretation: Similar to classification, PointNet performs rather robust (>0.8 accuracy) as long as there's around 100 points.

num of points	overall acc
10	0.565
100	0.819
1000	0.896
10000	0.902

0¶

num of points	gt	pred	acc
10			0.700
100			0.920
1000			0.960
10000			0.954

1¶

num of points	gt	pred	acc
10			0.400
100			0.970
1000			0.988
10000			0.988

2¶

num of points	gt	pred	acc
10			0.700
100			0.920
1000			0.869
10000			0.876

Q4. Bonus Question - Locality (20 points)¶

Classification¶

Rotations¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: PointNet learns on absolute global coordinates which is quickly broken by rotation. PointTransformer is fine for moderate rotation since it uses relative positions (neighbors), but still kind of fails when rotated 90 degrees because vertical relationship gets turned into horizontal ones which is semantically different to the network.

rotation	overall acc - PointNet	overall acc - PointTransformer
0, vec(0,0,1)	0.974	0.963
45, vec(0,0,1)	0.512	0.855
90, vec(0,0,1)	0.205	0.643

0¶

Rotation	GT label	Predicted label - PointNet	Predicted label - PointTransformer
0, vec(0,0,1)	chair	chair	chair
45, vec(0,0,1)	chair	chair	chair
90, vec(0,0,1)	chair	lamp	lamp

1¶

Rotation	GT label	Predicted label - PointNet	Predicted label - PointTransformer
0, vec(0,0,1)	chair	chair	chair
45, vec(0,0,1)	chair	lamp	chair
90, vec(0,0,1)	chair	lamp	lamp

617¶

Rotation	GT label	Predicted label - PointNet	Predicted label - PointTransformer
0, vec(0,0,1)	vase	vase	vase
45, vec(0,0,1)	vase	vase	lamp
90, vec(0,0,1)	vase	vase	chair

618¶

Rotation	GT label	Predicted label - PointNet	Predicted label - PointTransformer
0, vec(0,0,1)	vase	vase	vase
45, vec(0,0,1)	vase	vase	lamp
90, vec(0,0,1)	vase	chair	chair

719¶

Rotation	GT label	Predicted label - PointNet	Predicted label - PointTransformer
0, vec(0,0,1)	lamp	lamp	lamp
45, vec(0,0,1)	lamp	lamp	lamp
90, vec(0,0,1)	lamp	vase	chair

720¶

Rotation	GT label	Predicted label - PointNet	Predicted label - PointTransformer
0, vec(0,0,1)	lamp	lamp	lamp
45, vec(0,0,1)	lamp	lamp	lamp
90, vec(0,0,1)	lamp	lamp	lamp

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q1 reference.
Interpretation: Due to use of Global Max Pooling, PointNet performs rather robust as long as there's around 100 points, enough to define the rough structure of the object such that its class may still be determined. PointTransformer suffers from catastrophic failure. Fixed $k=16$ neighbor probably made the network learns to equate that region to a tiny surface patch, and when we perform severe downsampling it made $k=16$ equate to the entire object. Trying to process global geometry as local ones results in garbage features.

num of points	overall acc - PointNet	overall acc - PointTransformer
10	0.654	0.362
100	0.942	0.169
1000	0.969	0.751
10000	0.974	0.965

0¶

Num of points	GT label	Predicted label - PointNet	Predicted label - PointTransformer
10	chair	chair	chair
100	chair	chair	vase
1000	chair	chair	chair
10000	chair	chair	chair

1¶

Num of points	GT label	Predicted label - PointNet	Predicted label - PointTransformer
10	chair	lamp	vase
100	chair	chair	vase
1000	chair	chair	chair
10000	chair	chair	chair

617¶

Num of points	GT label	Predicted label - PointNet	Predicted label - PointTransformer
10	vase	lamp	vase
100	vase	vase	vase
1000	vase	vase	chair
10000	vase	vase	vase

618¶

Num of points	GT label	Predicted label - PointNet	Predicted label - PointTransformer
10	vase	lamp	vase
100	vase	lamp	vase
1000	vase	vase	chair
10000	vase	vase	vase

719¶

Num of points	GT label	Predicted label - PointNet	Predicted label - PointTransformer
10	lamp	lamp	vase
100	lamp	lamp	vase
1000	lamp	lamp	chair
10000	lamp	lamp	lamp

720¶

Num of points	GT label	Predicted label - PointNet	Predicted label - PointTransformer
10	lamp	lamp	vase
100	lamp	lamp	lamp
1000	lamp	lamp	chair
10000	lamp	lamp	lamp

Segmentation¶

Rotation¶

Procedure: Rotate 0/45/90 degree around vec(0,0,1), 0 degree as the Q1 reference
Interpretation: Similar to classification. PointTransformer performs slightly better at 45 degree probably due to the relative position encoding. Both fails eventually at 90 degrees because neither is truly rotationally invariant.

rotation	overall acc - PointNet	overall acc - PointTransformer
0, vec(0,0,1)	0.902	0.927
45, vec(0,0,1)	0.598	0.788
90, vec(0,0,1)	0.462	0.456

0¶

rotation	acc - PointNet	acc - PointTransformer
0,vec(0,0,1)	0.954	0.966
45,vec(0,0,1)	0.766	0.808
90,vec(0,0,1)	0.523	0.439

1¶

rotation	acc - PointNet	acc - PointTransformer
0,vec(0,0,1)	0.988	0.994
45,vec(0,0,1)	0.602	0.701
90,vec(0,0,1)	0.523	0.256

2¶

rotation	acc - PointNet	acc - PointTransformer
0,vec(0,0,1)	0.876	0.876
45,vec(0,0,1)	0.551	0.636
90,vec(0,0,1)	0.389	0.312

Number of points¶

Procedure: Subsample points to 10/100/1000/10000 points, where 10000 points as the Q2 reference.
Interpretation: Similar to classification, PointNet performs rather robust (>0.8 accuracy) as long as there's around 100 points to recognize the rough skeleton of the object, thanks to its global max pooling. PointTransformer completely collapses at low density because the aggressive internal downsampling during encoder stage makes deeper layers run out of points, turning features into noise.

num of points	overall acc - PointNet	overall acc - PointTransformer
10	0.565	0.047
100	0.819	0.194
1000	0.896	0.664
10000	0.902	0.926

0¶

num of points	acc - PointNet	acc - PointTransformer
10	0.700	0.000
100	0.920	0.510
1000	0.960	0.724
10000	0.954	0.963

1¶

num of points	acc - PointNet	acc - PointTransformer
10	0.400	0.100
100	0.970	0.150
1000	0.988	0.196
10000	0.988	0.994

2¶

num of points	acc - PointNet	acc - PointTransformer
10	0.700	0.000
100	0.920	0.140
1000	0.869	0.548
10000	0.876	0.880