16-825 Assignment 5

Duc Doan

Q1. Classification Model

Test accuracy: 0.9790136411332634

Image	Predicted class	Ground truth class
	Chair	Chair
	Vase	Vase
	Lamp	Lamp
	Lamp	Chair
	Lamp	Vase
	Vase	Lamp

Failure cases:

Chair misclassified as Lamp: this chair is folded to a very thin shape, while the majority of chairs in the dataset have standard squared shapes.
Vase misclassified as Lamp: this vase does look like a lamp, especially that it has holes on the sides, which is quite unusual for a vase while being more normal for lamps.
Lamp misclassified as Vase: I think I myself can't differentiate this lamp from a vase because it does look like a vase.

Q2. Segmentation Model

Test accuracy: 0.9044106969205835

ID	GT	Pred	Accuracy
0			0.9601
1			0.9876
2			0.9102
20			0.9816
200			0.7496
300			0.9582
500			0.7669

Bad cases:

200: the low accuracy comes from the large intersection of rectangular parts (the back and the seat). I still think the model is more correct than the annotation because the annotation of the seat included a large part of the back.
500: this chair has an unusual shape and an unusual annotation as well. The model correctly predicted that the seat is the thin rectangular part, but the annotation also included the legs. I think the annotation is bad in this case.

Q3. Robustness Analysis

Rotation

Procedure: the point clouds are rotated by the same angle around all 3 axes with increasing magnitude.

Angle (rad)	Classification accuracy	Segmentation accuracy
0.0	0.9790	0.9044
0.1	0.9654	0.8722
0.3	0.8342	0.7320
0.6	0.2455	0.4922

Classification visualization at 0.6 rad:

Image	Predicted class	Ground truth class
	Chair	Chair
	Vase	Vase
	Lamp	Lamp
	Lamp	Chair
	Lamp	Vase
	Vase	Lamp

Segmentation visualization at 0.6 rad:

ID	Accuracy (unrotated)	Accuracy (rotated)
0	0.9601	0.5481
1	0.9876	0.6355
2	0.9102	0.3823
20	0.9816	0.5734
200	0.7496	0.4502
300	0.9582	0.5124
500	0.7669	0.6411

The results are as expected that as the rotation gets larger, the accuracy drops for both classification and segmentation tasks. This is because the model does not have rotation invariance built in (T-net removed) while the data only contain upright poses.

The segmentation visualization shows an interesting thing when there is rotation: the segmentation is as if the object hasn't rotated. This shows that the model learned a mapping from the upright pose coordinates directly to the part labels, instead of learning the true semantic parts of the objects.

Number of points

Procedure: I tested with 5 different number of points, from 10k (original) to 100.

Number of points	Classification accuracy	Segmentation accuracy
10000	0.9790	0.9044
5000	0.9769	0.9042
1000	0.9748	0.8943
500	0.9664	0.8731
100	0.8909	0.8145

Classification visualization at 100 points:

Image	Predicted class	Ground truth class
	Chair	Chair
	Vase	Vase
	Lamp	Lamp
	Lamp	Chair
	Lamp	Vase
	Vase	Lamp

Segmentation visualization at 100 points:

ID	Accuracy (10k)	Accuracy (100)
0	0.9601	0.9
1	0.9876	0.99
2	0.9102	0.89
20	0.9816	0.96
200	0.7496	0.64
300	0.9582	0.96
500	0.7669	0.8

The results show that the model is quite robust to the number of points for both tasks, even though the accuracy generally drops as the number of points decreases. The deterioration is from the fact that important details of the shape disappear when there are fewer points. However, as long as there are still enough details, the model generally holds its quality. This robustness probably comes from the use of pooling operations, which helps the model focus on the global shape.