16-825 Assignment 5: PointNet Classification and Segmentation

Q1. Classification Model (40 points)

Test Accuracy

Test Accuracy: 97.3%

Random Test Point Clouds

Visualization of random test point clouds with predicted classes:

Point Cloud 1 Point Cloud 2 Point Cloud 3
Random test point cloud 1

Predicted: Chair
Ground Truth: Chair

Random test point cloud 2

Predicted: Lamp
Ground Truth: Lamp

Random test point cloud 3

Predicted: Lamp
Ground Truth: Lamp

Failure Cases

Visualization of failure predictions for each class with interpretation:

Vase Failures

Failure Case
Vase failure case

Predicted: Lamp
Ground Truth: Vase

Lamp Failures

Failure Case
Lamp failure case

Predicted: Vase
Ground Truth: Lamp

Interpretation: The model sometimes mixes up lamps and vases because they look similar - both are round and hollow. In this case, the lamp's round base looks a lot like a vase's shape.

Q2. Segmentation Model (40 points)

Test Accuracy

Test Accuracy: 90.88%

Segmentation Results

Visualization of segmentation results for at least 5 objects (including 2 bad predictions) with corresponding ground truth:

Object 92

Ground Truth Prediction
Object 92 ground truth Object 92 prediction

Prediction Accuracy: 95.6%
Interpretation: The model works very well on this chair. It correctly identifies and separates the different parts like the legs, seat, and back.

Object 351

Ground Truth Prediction
Object 351 ground truth Object 351 prediction

Prediction Accuracy: 52.61%
Interpretation: The model has trouble telling where the base ends and the main body begins on this round object. The boundary between these parts is blurry, which leads to mistakes.

Object 402

Ground Truth Prediction
Object 402 ground truth Object 402 prediction

Prediction Accuracy: 79.39%
Interpretation: The model does pretty well overall, but it makes some mistakes where the seat and backrest meet. These areas look similar, so the model gets confused about which part is which.

Object 426

Ground Truth Prediction
Object 426 ground truth Object 426 prediction

Prediction Accuracy: 42.46%
Interpretation: This is a failure case. The model can't tell the difference between the seat and the legs, so it incorrectly labels many leg points as part of the seat.

Object 512

Ground Truth Prediction
Object 512 ground truth Object 512 prediction

Prediction Accuracy: 98.03%
Interpretation: The overall accuracy is very high, but if you look closely, there are small mistakes where the lamp shade connects to the base. The model has trouble with thin connecting parts.

Q3. Robustness Analysis (20 points)

Experiment 1: Rotation 15 Degrees (10 points)

Procedure: We rotated the input point clouds by 15 degrees and evaluated the model's performance on both classification and segmentation tasks to test robustness to geometric transformations.

Classification Task

Test Accuracy: 91.81%
Baseline Accuracy (from Q1): 97.3%
Sample 1 Sample 2 Sample 3
Rotation classification sample 1

Predicted: Chair
Ground Truth: Chair

Rotation classification sample 2

Predicted: Lamp
Ground Truth: Lamp

Rotation classification sample 3

Predicted: Lamp
Ground Truth: Lamp

Segmentation Task

Test Accuracy: 83.12%
Baseline Accuracy (from Q2): 90.88%
Object 66 Object 92 Object 351

Ground Truth

Rotation segmentation object 66 ground truth

Prediction

Rotation segmentation object 66 prediction

Ground Truth

Rotation segmentation object 92 ground truth

Prediction

Rotation segmentation object 92 prediction

Ground Truth

Rotation segmentation object 351 ground truth

Prediction

Rotation segmentation object 351 prediction

Interpretation: The model stays accurate even when objects are rotated by 15 degrees. It still correctly identifies and segments objects, showing that it learned features that don't change much with small rotations.

Experiment 2: Number of Points 5000 (10 points)

Procedure: We evaluated the model's performance with a different number of points per object (5000 points instead of the default 10000) to test robustness to point cloud density variations.

Classification Task

Test Accuracy: 97.3%
Baseline Accuracy (from Q1): 97.3%
Sample 1 Sample 2 Sample 3
Num points classification sample 1

Predicted: Chair
Ground Truth: Chair

Num points classification sample 2

Predicted: Lamp
Ground Truth: Lamp

Num points classification sample 3

Predicted: Lamp
Ground Truth: Lamp

Segmentation Task

Test Accuracy: 90.89%
Baseline Accuracy (from Q2): 90.88%
Object 66 Object 92 Object 351

Ground Truth

Num points segmentation object 66 ground truth

Prediction

Num points segmentation object 66 prediction

Ground Truth

Num points segmentation object 92 ground truth

Prediction

Num points segmentation object 92 prediction

Ground Truth

Num points segmentation object 351 ground truth

Prediction

Num points segmentation object 351 prediction

Interpretation: The model works just as well with 5000 points as it does with fewer points. It still correctly identifies and segments objects because PointNet's design focuses on the most important features, no matter how many points are used. The primary reson for sampling invariance is due Global average pooling and shared weights. The primary reason of somewhat of rotation invariance is due to the T-Net architecture which allows the model to learn a rotation-invariant representation of the input data.

Q4. Bonus Question - Locality (20 points)

Model Specification

Implemented Model: PointNet++

Architecture Details: PointNet++ works by dividing the point cloud into smaller groups, then applying PointNet to each group to understand local patterns. This process is repeated at different scales to capture both local details and overall structure.

Classification Task Results

Test Accuracy (with Locality): 97.8%
Baseline Accuracy (from Q1): 97.3%

Random Classification Samples

Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
Locality classification sample 1

Predicted: Chair
Ground Truth: Chair

Locality classification sample 2

Predicted: Lamp
Ground Truth: Lamp

Locality classification sample 3

Predicted: Lamp
Ground Truth: Lamp

Locality classification sample 4

Predicted: Lamp
Ground Truth: Lamp

Locality classification sample 5

Predicted: Lamp
Ground Truth: Vase

Comparison: PointNet++ is better at understanding local details, which helps it distinguish between different object parts. However, it still has trouble with objects that look very similar (like lamps and vases), just like the baseline model. Overall, it's more confident when objects have clearly different shapes.

Segmentation Task Results

Test Accuracy (with Locality) [Double the epochs, Initial (50 epochs similar to all) was less than baseline]: 92.83%
Baseline Accuracy (from Q2): 90.88%

Segmentation Comparisons - All Objects

Comparison of PointNet++ (with locality) vs PointNet (baseline) for all available objects:

Object Ground Truth Prediction (PointNet++) Baseline Prediction (PointNet)
Object 92 Object 92 ground truth Object 92 PointNet++ prediction Object 92 PointNet prediction
Object 351 Object 351 ground truth Object 351 PointNet++ prediction Object 351 PointNet prediction
Object 402 Object 402 ground truth Object 402 PointNet++ prediction Object 402 PointNet prediction
Object 426 Object 426 ground truth Object 426 PointNet++ prediction Object 426 PointNet prediction
Object 512 Object 512 ground truth Object 512 PointNet++ prediction Object 512 PointNet prediction

Comparison: PointNet++ creates much cleaner boundaries between different parts compared to the baseline PointNet (see Object 426 for example). Because it learns from local patterns, it's better at figuring out where one part ends and another begins, even in tricky areas.