Assignment 5 L3D: PointNet based architecture¶
Nanaki Singh
Question 1: Classification Model¶
Model architecture: utilized the PointNet paper for inspiration. Split the model into 3 components: an n-dim transformation block, a Conv net back bone, and a classification head. The segmentation model utilized a similar modular network but worked directly with the transformation module without the Conv net back bone.
Model Training Results: trained the architecture for 10 epochs.¶
Train loss: 24.9477
Test accuracy: 0.9517
Model evaluation = Test Statistics with Visualizations.
Cases of Correct Model Performance:

Sample 946: Ground Truth = lamp, Pred = lamp, Correct = Yes

Sample 320: Ground Truth = chair, Pred = chair, Correct = Yes

Sample 647: Ground Truth = vase, Pred = vase, Correct = Yes

Sample 27: Ground Truth = chair, Pred = chair, Correct = Yes

Sample 546: Ground Truth = chair, Pred = chair, Correct = Yes
Cases of Incorrect Model Performance:

Sample 734: True=lamp, Pred=vase, Correct=False

Sample 806: True=lamp, Pred=chair, Correct=False
Interpretation:
With the given data, I rarely saw cases of poor performance, especially within the chairs class. This is because the chair structure was very distinct. Cases of misclassification can be attributed to point cloud scenes that have a much boxier structure. In the above cases, we see that the lamp sits atop a box structure. So the model's focus is on the primary center of the point cloud (cube/cuboid shaped lamp), with the flower petal and light sitting on the top of the square structure.
Question 2: Segmentation Model¶
Note that with the given model architecture and simplistic data, we did not see bad predictions. Any errors in point class prediction was very minor, and occured near the border between different classes. Examples are model performance are shown below - ground truth on the left, predicted on the right. ordered in decreasing model accuracy values.
Model Training Results: trained the architecture for 10 epochs.¶
Train loss: 0.0184658
Test accuracy: 0.058479
Test accuracy: 0.91093

Test accuracy: 0.89872

Test accuracy: 0.89314

Test accuracy: 0.88308

Test accuracy: 0.87872

Test accuracy: 0.86754

Results Interpretation:
We can very see that the ground truth segmentations very closely match the predicted segmentations. Any errors in the model seemed to occur at the boundaries at both objects, especially when the border was not a distinct clear geometric edge. It was worse when the border occured on a curved surface.
Question 3: Experimentation¶
Experiment 1: Rotate the input point clouds by certain degrees and report how much the accuracy falls¶
A singular point cloud scene is defined as a 1 x N x 3 array. It is possible to define a rotation matrix, which is a matmul of individual rotation matrices along the x, y, and z axis. If the original point cloud is matmuled with this rotation matrix, we can extract a rotated point cloud scene. This was then fed into the model.
A few variations of angle values along each dimension was defined, and the results can be seen below.
Case 1: 40 degree rotation along y axis only
Overall test accuracy: 1.0000 (5/5)

Case 2: 40 degree rotation along z axis only
Overall test accuracy: 0.8000 (4/5)

Case 3: 20 degree rotation along z and x axis
Overall test accuracy: 0.8000 (4/5)

Case 4: 40 degree rotation along z and x axis
Overall test accuracy: 0.4000 (2/5)

Case 5: 80 degree rotation along z and x axis
Overall test accuracy: 0.2000 (1/5)

Interpretation: clearly, any drastic rotation (> 30 degrees along any axis) led to substatially poorer performance. The model specifically had difficulty with chairs when they were rotated along their x and z axis - since the chairs no lay at at a diagonal or were perpendicular to the ground, the model very often classified them as lamps.
Experiment 2: Input a different number of points points per object¶
5 data points were evaluated using the best trained classification model but with 4 different variations in the size of the point cloud scene.
Below are the point cloud scenes used for different number of points.
Points = 1000
Overall test accuracy: 1.0000 (5/5)

Points = 500
Overall test accuracy: 1.0000 (5/5)

Points = 100
Overall test accuracy: 1.0000 (5/5)

Points = 20
Overall test accuracy: 0.6000 (3/5)

Interpretation: As can be seen, the model performs reasonably well and is, for the most part, able to distnguish objects from each class, even if they are composed of fewer points. This can be due to the distribution of points in the scene - chairs are much boxier and have many more points centered along the arms and backrest. In comparison, lamps are much slimmer and lean - points seen only across the vertical axistypically corresponded to a lamp. Vases sometimes took on a more spherical and elongated structure. Thus even with fewer points (even down to 20 points in some cases), the model could distinguish between the classes.