Q1

test accuracy: 0.9716684155299056

Correctly Classified

Class	Viz
Chair
Vase
Lamp

Failure Modes

Class	Viz
Chair
Vase
Lamp

The main failure cases I believe are due to shape ambiguity in point clouds and lack of color and texture. It seems like chair generally performs well since we didn't have that many false positives. It's able to capture structures that have distinct base, backrest, legs, and open side to actually sit. However in the failure mode, the base where one sits on is folded. This can be confusing to the model if most data it has seen has been for opened chairs. The classifier probably interpreted the curved top + thin legs as lamp with a shade on a stand. Moreover, classes like lamps and vases have high variation in shapes and designs and hence it might be difficult for model to disciminate between these subtle differences. For example in the case of vase, the model is generally good at capturing open structures. But in the case where it's filled with flowers, the small thin structure (stem) and the flower might be seen as the stand and shade of the lamp. The gemetric lamp was predicted as a vase since it was hollow and has a symmetric outer shape that can resemble a vase more than a lmap. And the tripod lamp stand is classified as a vase. This tells us the model is focusing here on the overall verticle shape rather than the tripos structure. Hence the model fails when there are finer details in the pointcloud and model hasn't seen enough training data to learn these subtle differences.

Q2

test accuracy: 0.901028525121556

Correct Samples

GT	Pred	Accuracy
		0.938
		0.985
		0.968

Incorrect Samples

GT	Pred	Accuracy
		0.448
		0.491

As seen in the correct samples, the model is able to segment different areas for longer chairs quite well. A lot of the error comes from shorter legs. We can see compared to ground truth, the model is slightly biased towards taller legs and the blue area extends more than the actual leg. This might be due to not seeing enough samples with shorter legs. The model also struggles with assymetric structures. For example, in the first failure case, we also see that there is no arm rest on one side of the couch. A lot of inaccuracies in this case comes from marking other areas as the arm rest which doesn't exist. Moreover, it completely misses out on the headrest, which again could be not seeing enough data.

Q3

Procedure

I trained the segmentation model and the classification model on 10000 points. For evaluation I checked across 2 dimensions:

Rotations of the points cloud: 0 (default), 15, 45, 90 degrees
Number of points in the input point cloud: 10000 (default), 5000, 2500, 1000. For segmentation I also checked 500, and 100 points.

For rotations, I rotate the pointcloud in Y and Z dimensions by the same angle.

Segmentation

alt text

Rotation	test_accuracy	GT	Pred
0 (default)	0.901029
15	0.762586
45	0.526841
90	0.298219

alt text

Number of Points	test_accuracy	GT	Pred
10000 (default)	0.901029
5000	0.901374
2500	0.901314
1000	0.898642
500	0.88919
100	0.834133

W.r.t varying degrees of rotation, we do see that the model performance drops significantly as we rotate the object from max (≈90%) at 0 degrees to least (≈29%). We can see the model is biased to have a horizontal base segmentation as we can see even when we rotate the chairs, the base is always horizontal. Similarly for backrest, it expects an upright structure at the top of the chair. Hence when we rotate it by 90 degrees it is hard for the model to figure out the true backrest. Maybe adding rotation augmentation might help here. Conversely we don't see huge drops in model performance when we vary the number of points. Only at 100 points does the performance drop significantly, and that might be due to missing geometric structure which makes it hard for the model to predict which area belongs to what part of the chair.

Classification

alt text

Rotation	test_accuracy	Viz
0 (Default)	0.971668
15	0.927597
45	0.411333
90	0.233998

alt text

Number of Points	test_accuracy	Viz
10000 (Default)	0.971668
5000	0.970619
2500	0.970619
1000	0.971668

In the classiification case, we do see that the model is not robust to rotation. The accuracy drops as we rotation increases. Slight rotations are still okay for example the 15degrees case. As we increase to 45, the chair resembles a structure where we can keep things and hence the model starts thinking it's a vase. Similarly, for 90 degrees the lamp head starts resempling a chair's backrest. This again might be due to the fact that the model doesn't see these rotations during training time. Adding data augmentations can help.

Changing the number of points did not affect overall accuracy significantly, which means that the model is fairly robust to downsampling. Most objects were predicted the same across 10,000, 5,000, and 1,000 points. However, the vase with many plants was only classified correctly at 1,000 points, suggesting that downsampling can help remove clutter and noisy details and the cleaner geometric shape made the underlying vase shape easier for the model to recognize.

Q4

I implemented a model similar to DGCNN where instead of processing each point individually, we create specific features that capture the concept of locality. We find neighboring points using kNN to define a neighborhood. This is then used to compute edge features where each edge feature for a pair of points is the vector from point to the center of the neighborhood and the center of the neighborhood. I then use multiple edge convolution layers to create hierarchial features by concatenating them. Similar to Q2, for segmentation I also add a global context. Finally, since knn can be memory heavy, I downsampled to 1000 points.

Task	With Locality Accuracy	Without Locality Accuracy
classification	0.981	0.9716
segmentation	0.899	0.898

Task	With Locality Viz	Without Locality Viz
classification
segmentation	GT PRED	GT PRED

When using locality model, we see that accuracy increases slightly despite using less number of points. This is especially true for vases ground truth class, where local structures helps the models understand the shape difference between lamp and vases. For segmentation, the imrovement is minimal. This might be due to the fact that this dataset structures are already capture well enough by the global context. Maybe adding more convolution layers or using higher number of neighborhoods might help achieve better accuracy. We do however see that the locality based model shows minimal qualitative improvement on the base of the couch where it captures a much more complete structure of the base rather than just top fragments of it.