Home Work 5: Ananya Bal (abal@andrew.cmu.edu)¶

Collaborated with FNU Abhimanyu¶

Q1. Classification Model (40 points)¶

The test accuracy from the best classification model is 97.9%. Some correctly classifie examples are shown below:

Vase

Chair

Lamp

Some incorrectly classified examples are:

True label: Chair

Predicted Label: Lamp

True label: Lamp

Predicted Label: Vase

True label: Vase

Predicted Label: Lamp

Interpretation:

The model performs classification very well with a final test accuracy of 97.9%. Most samples in the text set are correctly classified. As seen above, the 3 misclassified samples are rather hard cases. The chair is observed at an angle that makes it confusing to discern from a lamp. Similarly, the vase is a hard example as it doesn't resemble a typical vase. The network ends up classifying it as a lamp. Although, the second example of the lamp is somewhat simpler and would look easier to classify, the network falters here probably because of the lamp base which somewhat resembles a vase.

Q2. Segmentation Model¶

The test accuracy from the best model is 89.72%. The ground truth and outputs from 5 test samples are shown below:

Sample 1 - Good segmentation

Segmentation accuracy: 93.5

Sample 2 - Good segmentation

Segmentation accuracy: 98.5

Sample 3 - Bad segmentation

Segmentation accuracy: 72.7

Sample 4 - Good segmentation

Segmentation accuracy: 82.3

Sample 5 - Good segmentation

Segmentation accuracy: 74.2

Interpretation:

The segmentation model performs decently well on the entire test set giving an accuracy of 89.7%. However, it struggles in segmenting parts of certain chairs in the test set. In the examples provided above, sample 1, 2 and 4 were well-segmented by the network. They do indeed look like simpler chairs for the network to understand the parts of. Their individual segmentation accuracies (82-98) also indicate that the network has donne a good job in predicting the per-pixel classes. However, sample 3 and 5 were poorly segmented by the network and this is seen both visually and by the individual segmenttaion accuracy values. Both of them have a segmentation accuracy in the 70s. In sample 3, the network is unable to properly distinguish between the back-rest and the base (cyan and red points overlap). In sample 5, the network is unable to properly distinguish between the base and the legs with the red part being much larger in the predicted cloud than in the ground truth cloud.

Q3. Robustness Analysis¶

Experiment 1:

I tested the Classification and segmentation models with different values of num_points. The results are summarized below:

num_points Classification accuracy Segmentation Accuracy
100 92.7 83.66
500 97.58 89.01
1000 97.37 89.70
1500 97.69 89.88

As can be seen in the table above, increasing the number of points per object helps increase both the classification and segmentation accuracies.

Experiment 2:

I tested the Classification and segmentation models on rotated point clouds. The point clouds were rotated by 90 degrees along the z axis.

Classification accuracy: 27.3

Segmentation accuracy: 34.16

Outputs from classification:

Ground truth: Chair

Predicted: Vase

Ground truth: Chair

Predicted: Vase

Outputs from segmentation:

Ground truth

Segmentation prediction

Ground truth

Segmentation prediction

Q4. Bonus Question - Locality¶

I have implemented the PointNet++ architecture for classification.

Procedure: In addition to using Linear layers, the PointNet++ architecture makes use of set abstraction levels where each set abstraction extracts a new point set with fewer features. For each set of linear layers which use set abstraction, I have implemented code to sample points from the point cloud which serve as centroids. For each of these centroids, I sample 50 points using Pytorch3D's ball query function. This set of points is then passed through the linear layers.

This is the network structure:

SA(512, 0.2, [64, 64, 128]) → SA(128, 0.4, [128, 128, 256]) → SA([256, 512, 1024]) → FC(512, 0.5) → FC(256, 0.5) → FC(3)

The classification accuracy of PointNet++ was 95.47%

Although this network is expected to perform better than the vanilla network, I have obtained a lower accuracy. This, I believe, is due to my lower num_points (1500) value as I had GPU constraints while testing the network.

[NbConvertApp] Converting notebook index.ipynb to html
[NbConvertApp] Writing 2077219 bytes to index.html