16825 Assignment 5¶

Andrew ID: nleone¶

Note: I only had 8 GB GPU (RTX 3070) for this assignment¶

Collaborators/Reference (for Q4 only): ChatGPT + https://github.com/yanx27/Pointnet_Pointnet2_pytorch/tree/master¶

Q1¶

Test Accuracy: 0.9790136411332634

Correct Predictions

Predictions from right to left (chair, vase, lamp)

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Incorrect Predictions

Predictions from right to left (lamp, chair, chair)

Ground Truth from right to left (chair, vase, lamp)

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

The chair was predicted as a lamp most likely due to it being a folded-up chair. The vase was predicted as a chair most likely due to having four legs and looking like a bench. The lamp was predicted as a chair due to the top surface looks like a seat, and the wide base looks like a base of a circular chair.

Q2¶

Test Accuracy: 0.8398361426256078

GIFs on the left are predictions, on the right are ground truth.

Prediction 1: 0.9342

No description has been provided for this imageNo description has been provided for this image

In the prediction, red segment (the cushion) also appears on the "handles"/blue section of the chair. Besides that, the prediction is close to the ground truth.

Prediction 2: 0.5753

No description has been provided for this imageNo description has been provided for this image

In the prediction, red segment (the cushion) protrudes outside of the sides of the chair, interfering with the arms' segmentation colors. Also, in the back of the chair, the arms are segmented as different colors than the rest of the arms, which is not the case in the ground truth.

Prediction 3: 0.96

No description has been provided for this imageNo description has been provided for this image

Similar to prediction 1, the red segment/"cushion" bleeds into the legs to the chair. Other than that, the segmentation is close to the ground truth.

Prediction 4: 0.9536

No description has been provided for this imageNo description has been provided for this image

The intersections between the different segments have segments bleed into each other. Other than that, the segmentation is very close to the ground truth.

Prediction 5: 0.5054

No description has been provided for this imageNo description has been provided for this image

In the prediction, most of the seat and legs of the chair is part of the back/head of the chair. The ground truth properly segments the seat, legs, and head of the chair.

Q3¶

Experiment 1: Number of points¶

Classifier Model¶

I ran the model and modified the num_points parameter to the following values: 5000, 1000, 100

Test Accuracies:

(5000 points, 0.9779643231899265), (1000, 0.9790136411332634), (100, 0.9045120671563484)

The model wasn't impacted by the number of points until it was ran with only 100 points. Based on the samples below, I believe that the classifier model is robust to limited number of points.

From left to right, the gifs below have 5000, 1000, and 100 points.

Visualization 1:

Predictions match with ground truth for all three gifs (chair).

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

Model correctly predicted with 5000 and 1000 points. At 100 points, the model predicted the vase as a lamp.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Segmentation Model¶

I ran the model and modified the num_points parameter to the following values: 5000, 1000, 100

Test Accuracies: (5000 points, 0.8379724473257698), (1000 points, 0.8302431118314425), (100 points, 0.6797244732576986)

Similar to the classifier model, the performance didn't dip until we ran with 100 points. The samples below also show the same behavior.

From left to right, the gifs below have 5000, 1000, and 100 points.

Visualization 1:

Accuracies from left to right: 0.9354, 0.938, 0.82

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

Accuracies from left to right: 0.9638, 0.962, 0.44

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Experiment 2: Rotating clouds¶

Procedures: Rotate clouds on the z-axis by 10, 20, and 30 degrees.

Classifier Model¶

Test Accuracies: (10 degrees, 0.9653725078698846), (20 degrees, 0.9265477439664218), (30 degrees, 0.757607555089192)

The model's perforrmance quickly deteriorates after 10 degrees. The model is not very robust to rotation. The samples below further demonstrates this behavior.

From left to right the GIFs display 10, 20 and 30 degrees rotated point clouds.

Visualization 1:

The model correctly predicted the 10 degrees and 20 degrees rotated pointclouds as chair, but predicted the 30 degrees point cloud as a vase.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

The model correctly predicted this point cloud correctly as a lamp with no rotation. All three rotated point clouds were predicted as vase instead of lamp.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Segmentation Model¶

Test Accuracies: (10 degrees, 0.8126659643435981), (20 degrees, 0.759834035656402), (30 degrees, 0.6917768233387358)

Similar to the classifier model, the model's perforrmance quickly deteriorates after 10 degrees. The model is not very robust to rotation. The samples below further demonstrates this behavior, as the model only segments based on the position of the points individually rather than the neighbors of surrounding points (locality).

From left to right the GIFs display 10, 20 and 30 degrees rotated point clouds.

Visualization 1:

Accuracies from left to right: 0.9087, 0.8971, 0.8318. In the 30 degrees point cloud, the one of the top posts of the chair is segmented as yellow/arm of a chair.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

Accuracies from left to right: 0.9323, 0.8264, 0.7552. In the 20 and 30 degrees point cloud, one of the corners of the seat is labled as yellow/arm of a chair.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Q4¶

Q4.1¶

I implemented MSG Pointnet++, with some modifications to the layers due to memory issues. In particular, the MLP in the MSG layers only have two layers each, and in the classifier model, there are only two scales in the first and second MSG layers.

Compared to the base pointnet model, the Pointnet++ performs a little worse. The previous examples from Q1 are shown below and the predictions mostly match with the original base model.

Classifier Accuracy: 0.9695697796432319

Correct Predictions

Predictions from right to left (chair, vase, lamp, lamp). The last GIF was predicted incorrectly by the base model.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Incorrect Predictions

Predictions from right to left (lamp, chair, chair, lamp)

Ground Truth from right to left (chair, vase, lamp, chair)

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

The last GIF was predicted correctly by the base model but incorrectly by Pointnet++. I don't know why/how the model confuses the obvious lawn chair as a lamp. Only God knows.

Q4.2¶

Segmentation Accuracy: 0.8964745542949757

The MSG PointNet++ performs about 0.06 (6%) better than the base model.For all predictions below, the PointNet++ model is at least 5% more accurate. With Prediciton 2, the improved model is 30% more accurate than the base model

GIFs on the left are predictions, on the right are ground truth.

Prediction 1: 0.9508

No description has been provided for this imageNo description has been provided for this image

In the prediction, red segment (the cushion) also appears on the "handles"/blue section of the chair. Similar to the base model's prediciton.

Prediction 2: 0.8462

No description has been provided for this imageNo description has been provided for this image

The prediction correctly segments the arms/sides of the chair. Less the of the red segment protrudes to other regions of the chair.

Prediction 3: 0.9777

No description has been provided for this imageNo description has been provided for this image

Similar to prediction 1, the red segment/"cushion" bleeds into the legs to the chair. Similar to the base model's prediciton.

Prediction 4: 0.9731

No description has been provided for this imageNo description has been provided for this image

The intersections between the different segments have segments bleed into each other. Similar to the base model's prediciton.

Prediction 5: 0.5807

No description has been provided for this imageNo description has been provided for this image

In the prediction, most of the seat and legs of the chair is part of the back/head of the chair. The ground truth properly segments the seat, legs, and head of the chair. Both models struggle on the folded chair.

Q4.3.1¶

Experiment 1: Number of points¶

Classifier Model¶

I ran the model and modified the num_points parameter to the following values: 5000, 1000, 100

Test Accuracies:

(5000 points, 0.9632738719832109), (1000, 0.472193074501574), (100, 0.2633788037775446)

Unlike the base model, the PointNet++ was impacted by the number of points from 1000 points onwards. Removing the extra scales and layers heavily impacted the model's robustness to number of noise.

From left to right, the gifs below have 5000, 1000, and 100 points.

Visualization 1:

The model correctly predicts the first pointcloud (5000 pts), but predicts the next two pointclouds (1000, 100 points) as lamps.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

Model correctly predicts all three pointclouds. The base model fails to predict the 100 points pointcloud.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Segmentation Model¶

I ran the model and modified the num_points parameter to the following values: 5000, 1000, 100

Test Accuracies: (5000 points, 0.8900262560777958), (1000 points, 0.8045299837925446), (100 points, 0.5560777957860615)

Similar to the base segmentation model, the performance didn't dip significantly until we ran with 100 points. The dip from 5000 points to 1000 points is more significant than the dip seen in the base model. The samples below also show the same behavior.

From left to right, the gifs below have 5000, 1000, and 100 points.

Visualization 1:

Accuracies from left to right: 0.953, 0.891, 0.75

The PointNet++ model performs better with 5000 points but worse in the 1000 points and 100 points than the base model.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

Accuracies from left to right: 0.9772, 0.971, 0.56

The PointNet++ model performs better than the base model on all three pointclouds.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Q4.3.2¶

Experiment 2: Rotating clouds¶

Procedures: Rotate clouds on the z-axis by 10, 20, and 30 degrees.

Classifier Model¶

Test Accuracies: (10 degrees, 0.9716684155299056), (20 degrees, 0.9370409233997902), (30 degrees, 0.789087093389297)

The model's perforrmance deteriorates after 10 degrees, but performs better than the base model on all three experiments by 1-3%. The model is not very robust to rotation. The samples below further demonstrates this behavior.

From left to right the GIFs display 10, 20 and 30 degrees rotated point clouds.

Visualization 1:

The model correctly predicted the 10 degrees and 20 degrees rotated pointclouds as chair, but predicted the 30 degrees point cloud as a lamp, where the base model predicts the last pointcloud as a vase.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

The PointNet++ model correctly predicts all three pointclouds, something which the base model fails to do.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Segmentation Model¶

Test Accuracies: (10 degrees, 0.8892776337115073), (20 degrees, 0.86167471636953), (30 degrees, 0.799734035656402)

Similar to the classifier model, the model's perforrmance deteriorates after 10 degrees, but performs significantly better than the base model.The model is more robust than the base model with rotated pointclouds. As the accuracy difference between 10 degrees and 20 degrees is around 3%, and 6% from 20 to 30 degrees. The base model is 5% and 7% respectively.

In the two examples below, we see that thanks to the locality, the segmentations are better matching parts of the chair. For example, the cushion segments are properly aligned with the rotated chairs.

From left to right the GIFs display 10, 20 and 30 degrees rotated point clouds.

Visualization 1:

Accuracies from left to right: 0.9476, 0.8928, 0.8633.

In the 20 and 30 degrees point cloud, the one of the top posts of the chair is segmented as yellow/arm of a chair and parts of the legs are segmented as the seat of the chair. Also, the PointNet++ model performs better on all three pointclouds than the base model, on average by 5%.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

Visualization 2:

Accuracies from left to right: 0.9718, 0.9486, 0.8997.

In the 30 degrees point cloud, one of the corners of the seat is labled as yellow/arm of a chair. The PointNet++ performs significantly better than the base model, by about 10% on average.

No description has been provided for this imageNo description has been provided for this imageNo description has been provided for this image

In [ ]: