Assignment 5 L3D: PointNet based architecture¶

Nanaki Singh

Question 1: Classification Model¶

Model architecture: utilized the PointNet paper for inspiration. Split the model into 3 components: an n-dim transformation block, a Conv net back bone, and a classification head. The segmentation model utilized a similar modular network but worked directly with the transformation module without the Conv net back bone.

Model Training Results: trained the architecture for 10 epochs.¶

Train loss: 24.9477

Test accuracy: 0.9517

Model evaluation = Test Statistics with Visualizations.

Cases of Correct Model Performance:

image

Sample 946: Ground Truth = lamp, Pred = lamp, Correct = Yes

image

Sample 320: Ground Truth = chair, Pred = chair, Correct = Yes

image

Sample 647: Ground Truth = vase, Pred = vase, Correct = Yes

image

Sample 27: Ground Truth = chair, Pred = chair, Correct = Yes

image

Sample 546: Ground Truth = chair, Pred = chair, Correct = Yes

Cases of Incorrect Model Performance:

image

Sample 734: True=lamp, Pred=vase, Correct=False

image

Sample 806: True=lamp, Pred=chair, Correct=False

Interpretation:

With the given data, I rarely saw cases of poor performance, especially within the chairs class. This is because the chair structure was very distinct. Cases of misclassification can be attributed to point cloud scenes that have a much boxier structure. In the above cases, we see that the lamp sits atop a box structure. So the model's focus is on the primary center of the point cloud (cube/cuboid shaped lamp), with the flower petal and light sitting on the top of the square structure.

Question 2: Segmentation Model¶

Note that with the given model architecture and simplistic data, we did not see bad predictions. Any errors in point class prediction was very minor, and occured near the border between different classes. Examples are model performance are shown below - ground truth on the left, predicted on the right. ordered in decreasing model accuracy values.

Model Training Results: trained the architecture for 10 epochs.¶

Train loss: 0.0184658

Test accuracy: 0.058479

Test accuracy: 0.91093

image image

Test accuracy: 0.89872

image image

Test accuracy: 0.89314

image image

Test accuracy: 0.88308

image image

Test accuracy: 0.87872

image image

Test accuracy: 0.86754

image image

Results Interpretation:

We can very see that the ground truth segmentations very closely match the predicted segmentations. Any errors in the model seemed to occur at the boundaries at both objects, especially when the border was not a distinct clear geometric edge. It was worse when the border occured on a curved surface.

Question 3: Experimentation¶

Experiment 1: Rotate the input point clouds by certain degrees and report how much the accuracy falls¶

A singular point cloud scene is defined as a 1 x N x 3 array. It is possible to define a rotation matrix, which is a matmul of individual rotation matrices along the x, y, and z axis. If the original point cloud is matmuled with this rotation matrix, we can extract a rotated point cloud scene. This was then fed into the model.

A few variations of angle values along each dimension was defined, and the results can be seen below.

Case 1: 40 degree rotation along y axis only

Overall test accuracy: 1.0000 (5/5)

image image image image image

Case 2: 40 degree rotation along z axis only

Overall test accuracy: 0.8000 (4/5)

image image image image image

Case 3: 20 degree rotation along z and x axis

Overall test accuracy: 0.8000 (4/5)

image image image image image

Case 4: 40 degree rotation along z and x axis

Overall test accuracy: 0.4000 (2/5)

image image image image image

Case 5: 80 degree rotation along z and x axis

Overall test accuracy: 0.2000 (1/5)

image image image image image

Interpretation: clearly, any drastic rotation (> 30 degrees along any axis) led to substatially poorer performance. The model specifically had difficulty with chairs when they were rotated along their x and z axis - since the chairs no lay at at a diagonal or were perpendicular to the ground, the model very often classified them as lamps.

Experiment 2: Input a different number of points points per object¶

5 data points were evaluated using the best trained classification model but with 4 different variations in the size of the point cloud scene.

Below are the point cloud scenes used for different number of points.

Points = 1000

Overall test accuracy: 1.0000 (5/5)

image image image image image

Points = 500

Overall test accuracy: 1.0000 (5/5)

image image image image image

Points = 100

Overall test accuracy: 1.0000 (5/5)

image image image image image

Points = 20

Overall test accuracy: 0.6000 (3/5)

image image image image image

Interpretation: As can be seen, the model performs reasonably well and is, for the most part, able to distnguish objects from each class, even if they are composed of fewer points. This can be due to the distribution of points in the scene - chairs are much boxier and have many more points centered along the arms and backrest. In comparison, lamps are much slimmer and lean - points seen only across the vertical axistypically corresponded to a lamp. Vases sometimes took on a more spherical and elongated structure. Thus even with fewer points (even down to 20 points in some cases), the model could distinguish between the classes.