Q1

test accuracy: 0.9716684155299056

Correctly Classified

Class Viz
Chair alt text alt text
Vase alt text alt text
Lamp alt text alt text

Failure Modes

Class Viz
Chair alt text
Vase alt text alt text
Lamp alt text alt text

The main failure cases I believe are due to shape ambiguity in point clouds and lack of color and texture. It seems like chair generally performs well since we didn't have that many false positives. It's able to capture structures that have distinct base, backrest, legs, and open side to actually sit. However in the failure mode, the base where one sits on is folded. This can be confusing to the model if most data it has seen has been for opened chairs. The classifier probably interpreted the curved top + thin legs as lamp with a shade on a stand. Moreover, classes like lamps and vases have high variation in shapes and designs and hence it might be difficult for model to disciminate between these subtle differences. For example in the case of vase, the model is generally good at capturing open structures. But in the case where it's filled with flowers, the small thin structure (stem) and the flower might be seen as the stand and shade of the lamp. The gemetric lamp was predicted as a vase since it was hollow and has a symmetric outer shape that can resemble a vase more than a lmap. And the tripod lamp stand is classified as a vase. This tells us the model is focusing here on the overall verticle shape rather than the tripos structure. Hence the model fails when there are finer details in the pointcloud and model hasn't seen enough training data to learn these subtle differences.

Q2

test accuracy: 0.901028525121556

Correct Samples

GT Pred Accuracy
alt text alt text 0.938
alt text alt text 0.985
alt text alt text 0.968

Incorrect Samples

GT Pred Accuracy
alt text alt text 0.448
alt text alt text 0.491

As seen in the correct samples, the model is able to segment different areas for longer chairs quite well. A lot of the error comes from shorter legs. We can see compared to ground truth, the model is slightly biased towards taller legs and the blue area extends more than the actual leg. This might be due to not seeing enough samples with shorter legs. The model also struggles with assymetric structures. For example, in the first failure case, we also see that there is no arm rest on one side of the couch. A lot of inaccuracies in this case comes from marking other areas as the arm rest which doesn't exist. Moreover, it completely misses out on the headrest, which again could be not seeing enough data.

Q3

Procedure

I trained the segmentation model and the classification model on 10000 points. For evaluation I checked across 2 dimensions:

For rotations, I rotate the pointcloud in Y and Z dimensions by the same angle.

Segmentation

alt text

Rotation test_accuracy GT Pred
0 (default) 0.901029 alt text alt text
15 0.762586 alt text alt text
45 0.526841 alt text alt text
90 0.298219 alt text alt text

alt text

Number of Points test_accuracy GT Pred
10000 (default) 0.901029 alt text alt text
5000 0.901374 alt text alt text
2500 0.901314 alt text alt text
1000 0.898642 alt text alt text
500 0.88919 alt text alt text
100 0.834133 alt text alt text

W.r.t varying degrees of rotation, we do see that the model performance drops significantly as we rotate the object from max (≈90%) at 0 degrees to least (≈29%). We can see the model is biased to have a horizontal base segmentation as we can see even when we rotate the chairs, the base is always horizontal. Similarly for backrest, it expects an upright structure at the top of the chair. Hence when we rotate it by 90 degrees it is hard for the model to figure out the true backrest. Maybe adding rotation augmentation might help here. Conversely we don't see huge drops in model performance when we vary the number of points. Only at 100 points does the performance drop significantly, and that might be due to missing geometric structure which makes it hard for the model to predict which area belongs to what part of the chair.

Classification

alt text

Rotation test_accuracy Viz
0 (Default) 0.971668 alt text alt text
15 0.927597 alt text alt text
45 0.411333 alt text alt text
90 0.233998 alt text alt text

alt text

Number of Points test_accuracy Viz
10000 (Default) 0.971668 alt text
5000 0.970619 alt text
2500 0.970619 alt text
1000 0.971668 alt text

In the classiification case, we do see that the model is not robust to rotation. The accuracy drops as we rotation increases. Slight rotations are still okay for example the 15degrees case. As we increase to 45, the chair resembles a structure where we can keep things and hence the model starts thinking it's a vase. Similarly, for 90 degrees the lamp head starts resempling a chair's backrest. This again might be due to the fact that the model doesn't see these rotations during training time. Adding data augmentations can help.

Changing the number of points did not affect overall accuracy significantly, which means that the model is fairly robust to downsampling. Most objects were predicted the same across 10,000, 5,000, and 1,000 points. However, the vase with many plants was only classified correctly at 1,000 points, suggesting that downsampling can help remove clutter and noisy details and the cleaner geometric shape made the underlying vase shape easier for the model to recognize.

Q4

I implemented a model similar to DGCNN where instead of processing each point individually, we create specific features that capture the concept of locality. We find neighboring points using kNN to define a neighborhood. This is then used to compute edge features where each edge feature for a pair of points is the vector from point to the center of the neighborhood and the center of the neighborhood. I then use multiple edge convolution layers to create hierarchial features by concatenating them. Similar to Q2, for segmentation I also add a global context. Finally, since knn can be memory heavy, I downsampled to 1000 points.

Task With Locality Accuracy Without Locality Accuracy
classification 0.981 0.9716
segmentation 0.899 0.898
Task With Locality Viz Without Locality Viz
classification alt textalt text alt textalt text
segmentation GT GT PRED PRED GT GT PRED PRED

When using locality model, we see that accuracy increases slightly despite using less number of points. This is especially true for vases ground truth class, where local structures helps the models understand the shape difference between lamp and vases. For segmentation, the imrovement is minimal. This might be due to the fact that this dataset structures are already capture well enough by the global context. Maybe adding more convolution layers or using higher number of neighborhoods might help achieve better accuracy. We do however see that the locality based model shows minimal qualitative improvement on the base of the couch where it captures a much more complete structure of the base rather than just top fragments of it.