Q1

The test accuracy was high at 0.9811.

Correct predictions:

Chair:

Vase:

Lamp:

Incorrect predictions:

Ground truth: Chair, predicted: Lamp

Ground truth: Vase, predicted: Lamp

Ground truth: Lamp, predicted: Vase

Interpretations:

Overall, the network distinguishes between the majority of the chair, vase, and lamp objects very well given the accuracy score. The incorrect classifications are likely due to the ambiguous design made in the objects that, lacking further context, is hard to decisively classify as one of the three classes. For example, the chair failure case could be a square-ish design lamp; the vase might look like a lamp on a wooden square stand; and the lamp instance is abstract and lacking features and could be a large vase with a wide body.

Q2

The test accuracy is 0.8028.

Correct predictions (predicted point cloud | ground truth point cloud):

Accuracy for object: 0.9711

Accuracy for object: 0.9638

Accuracy for object: 0.9416

Incorrect predictions:

Accuracy for object: 0.3804

Accuracy for object: 0.2490

Interpretations:

While scoring lower accuracy than the classification network, overall the segmentation network performs well on more 'standard' chairs that are upright, four legged, and has an obvious seatback that stands up straight from the seat. However, we can see from the failure cases that it performs poorly on the rarer instances, such as a reclined chair, or a blocky chair with different dimensions than most of the other chairs in the training set.

Q3

Experiment 1: Rotating point clouds

The input point clouds are all placed at identity rotation poses. Since transformation nets (T-nets) were not implemented for this homework, it is expected that any significant rotations of the input point cloud will cause accuracy to drop significantly.

For this experiment, the rotate_degs parameter in the evaluation scripts will be changed so that the original point clouds are all rotated about the origin by this amount. Then, the accuracy will be evaluated and compared to the results from the above questions.

Classification Model

Here is the obtained accuracy when rotating by the X axis of the object by the following amount (in degrees):

Rotation (deg) Accuracy (Classification, overall) Object GIF Accuracy (Segmentation, single object) Segmentation GIF
0 0.9811 0.9000
10 0.9654 0.8995
20 0.9066 0.7200
30 0.7629 0.5600
45 0.4124* 0.4600
90 0.2424* 0.2600

*: After rotating by more than 30 deg, the shown chair was predicted as a vase.

Neither network does well when the input object is rotated by any amount. The classification accuracy suffers significantly as soon as the objects are rotated by 10-20 degrees, and is worse than random chance when rotated by close to 90 degrees. This means that the features learned are highly dependent on the chair/vase/lamp being upright and in a canonical rotation. Similarly, the segmentation of the chair parts suffers and starts to bleed into other parts of the chair as it leans more and more forward. This suggests that the current networks are not rotationally invariant at all.

Experiment 2: Reducing input number of sample points per object

Since the networks are trained with 10k points as input, we can see how much the accuracy falls as we provide fewer points in each input point cloud and see how the resulting predictions degrade.

For this experiment, the --num_points parameter will be changed when running the evaluation script, and the resulting accuracies compared to previous results from Q1 and Q2.

Number of Points Accuracy (Classification, overall) Object GIF Accuracy (Segmentation, single object) Segmentation GIF
10000 0.9811 0.9000
5000 0.9801 0.9000
1000 0.9738 0.8800
500 0.9685 0.9000
100 0.9381 0.8800
50 0.7827 0.9200

The networks perform very well when the number of input points are reduced and randomly sampled from. In the classification case, the chair is still predicted as a chair even when only given 50 points in the point cloud. The accuracy of the classification network doesn't fall much at all until it gets below 100 input points. For the segmentation case, the accuracy of the segmentation output for this object remains similar regardless of the number of input points, suggesting that only 50 points are required to segment this chair generally correctly. We cna conclude that the models trained are robust to different number of input points over a significantly varying range.

Q4 (Bonus)

Not Attempted.