16-825 Assignment 5: PointNet Classification and Segmentation

Q1. Classification Model (40 points)

Test Accuracy

Test Accuracy: 97.3%

Random Test Point Clouds

Visualization of random test point clouds with predicted classes:

Point Cloud 1	Point Cloud 2	Point Cloud 3
Predicted: Chair Ground Truth: Chair	Predicted: Lamp Ground Truth: Lamp	Predicted: Lamp Ground Truth: Lamp

Failure Cases

Visualization of failure predictions for each class with interpretation:

Vase Failures

Failure Case
Predicted: Lamp Ground Truth: Vase

Lamp Failures

Failure Case
Predicted: Vase Ground Truth: Lamp

Interpretation: The model sometimes mixes up lamps and vases because they look similar - both are round and hollow. In this case, the lamp's round base looks a lot like a vase's shape.

Q2. Segmentation Model (40 points)

Test Accuracy

Test Accuracy: 90.88%

Segmentation Results

Visualization of segmentation results for at least 5 objects (including 2 bad predictions) with corresponding ground truth:

Object 92

Ground Truth	Prediction

Prediction Accuracy: 95.6%
Interpretation: The model works very well on this chair. It correctly identifies and separates the different parts like the legs, seat, and back.

Object 351

Ground Truth	Prediction

Prediction Accuracy: 52.61%
Interpretation: The model has trouble telling where the base ends and the main body begins on this round object. The boundary between these parts is blurry, which leads to mistakes.

Object 402

Ground Truth	Prediction

Prediction Accuracy: 79.39%
Interpretation: The model does pretty well overall, but it makes some mistakes where the seat and backrest meet. These areas look similar, so the model gets confused about which part is which.

Object 426

Ground Truth	Prediction

Prediction Accuracy: 42.46%
Interpretation: This is a failure case. The model can't tell the difference between the seat and the legs, so it incorrectly labels many leg points as part of the seat.

Object 512

Ground Truth	Prediction

Prediction Accuracy: 98.03%
Interpretation: The overall accuracy is very high, but if you look closely, there are small mistakes where the lamp shade connects to the base. The model has trouble with thin connecting parts.

Q3. Robustness Analysis (20 points)

Experiment 1: Rotation 15 Degrees (10 points)

Procedure: We rotated the input point clouds by 15 degrees and evaluated the model's performance on both classification and segmentation tasks to test robustness to geometric transformations.

Classification Task

Test Accuracy: 91.81%
Baseline Accuracy (from Q1): 97.3%

Sample 1	Sample 2	Sample 3
Predicted: Chair Ground Truth: Chair	Predicted: Lamp Ground Truth: Lamp	Predicted: Lamp Ground Truth: Lamp

Segmentation Task

Test Accuracy: 83.12%
Baseline Accuracy (from Q2): 90.88%

Object 66	Object 92	Object 351
Ground Truth Prediction	Ground Truth Prediction	Ground Truth Prediction

Object 66

Object 92

Object 351

Ground Truth

Prediction

Ground Truth

Prediction

Ground Truth

Prediction

Interpretation: The model stays accurate even when objects are rotated by 15 degrees. It still correctly identifies and segments objects, showing that it learned features that don't change much with small rotations.

Experiment 2: Number of Points 5000 (10 points)

Procedure: We evaluated the model's performance with a different number of points per object (5000 points instead of the default 10000) to test robustness to point cloud density variations.

Classification Task

Test Accuracy: 97.3%
Baseline Accuracy (from Q1): 97.3%

Sample 1	Sample 2	Sample 3
Predicted: Chair Ground Truth: Chair	Predicted: Lamp Ground Truth: Lamp	Predicted: Lamp Ground Truth: Lamp

Segmentation Task

Test Accuracy: 90.89%
Baseline Accuracy (from Q2): 90.88%

Object 66	Object 92	Object 351
Ground Truth Prediction	Ground Truth Prediction	Ground Truth Prediction

Object 66

Object 92

Object 351

Ground Truth

Prediction

Ground Truth

Prediction

Ground Truth

Prediction

Interpretation: The model works just as well with 5000 points as it does with fewer points. It still correctly identifies and segments objects because PointNet's design focuses on the most important features, no matter how many points are used. The primary reson for sampling invariance is due Global average pooling and shared weights. The primary reason of somewhat of rotation invariance is due to the T-Net architecture which allows the model to learn a rotation-invariant representation of the input data.

Q4. Bonus Question - Locality (20 points)

Model Specification

Implemented Model: PointNet++

Architecture Details: PointNet++ works by dividing the point cloud into smaller groups, then applying PointNet to each group to understand local patterns. This process is repeated at different scales to capture both local details and overall structure.

Classification Task Results

Test Accuracy (with Locality): 97.8%
Baseline Accuracy (from Q1): 97.3%

Random Classification Samples

Sample 1	Sample 2	Sample 3	Sample 4	Sample 5
Predicted: Chair Ground Truth: Chair	Predicted: Lamp Ground Truth: Lamp	Predicted: Lamp Ground Truth: Lamp	Predicted: Lamp Ground Truth: Lamp	Predicted: Lamp Ground Truth: Vase

Comparison: PointNet++ is better at understanding local details, which helps it distinguish between different object parts. However, it still has trouble with objects that look very similar (like lamps and vases), just like the baseline model. Overall, it's more confident when objects have clearly different shapes.

Segmentation Task Results

Test Accuracy (with Locality) [Double the epochs, Initial (50 epochs similar to all) was less than baseline]: 92.83%
Baseline Accuracy (from Q2): 90.88%

Segmentation Comparisons - All Objects

Comparison of PointNet++ (with locality) vs PointNet (baseline) for all available objects:

Object	Ground Truth	Prediction (PointNet++)	Baseline Prediction (PointNet)
Object 92
Object 351
Object 402
Object 426
Object 512

Comparison: PointNet++ creates much cleaner boundaries between different parts compared to the baseline PointNet (see Object 426 for example). Because it learns from local patterns, it's better at figuring out where one part ends and another begins, even in tricky areas.