Assignment 5

PointNet for Point Cloud Classification and Segmentation

Q1: Classification Model 40 points

Test Accuracy: 97.90%

Random Test Point Clouds with Predictions

Below are visualizations of random test point clouds with their predicted classes.
Classification Sample 1

Sample 1 - Predicted: Chair, Actual: Chair ✓

Classification Sample 2

Sample 2 - Predicted: Vase, Actual: Vase ✓

Classification Sample 3

Sample 3 - Predicted: Lamp, Actual: Lamp ✓

Classification Sample 4

Sample 4 - Predicted: Vase, Actual: Vase ✓

Overall Performance: The model gets 97.90% test accuracy, showing it's good classification capability across all three object categories (chairs, vases, and lamps). The successful predictions shown above represent typical cases where the model correctly identifies distinctive features of the classes.

Failure Cases Analysis

Chair Failure (Predicted: Vase)

Chair Failure

Predicted: Vase ✗, Actual: Chair

Interpretation: The model misclassified this chair as a vase, likely due to the chair having a rounded or cylindrical backrest that resembles the profile of a vase. The chair may have an unconventional design with smooth curved surfaces and minimal angular features typical of chairs. This suggests the model may be relying heavily on overall silhouette and curvature patterns rather than discriminative structural details like distinct legs, armrests, or a flat seat surface. The failure highlights that when geometric ambiguity exists, PointNet's global feature aggregation may not capture sufficient local structural context to distinguish between similar overall shapes.

Vase Failure (Predicted: Lamp)

Vase Failure

Predicted: Lamp ✗, Actual: Vase

Interpretation: This vase was incorrectly classified as a lamp. The elongated vertical shape with a wider base may have resembled a lamp stand, particularly if the vase has decorative elements at the top that could be confused with a lampshade or has a tall narrow neck similar to a lamp post. Both vases and lamps can share similar vertical, axially-symmetric structures, making them challenging to distinguish based solely on global geometric features. This error indicates that the model may need more discriminative features to differentiate between objects with similar aspect ratios and vertical alignment but different functional semantics.

Lamp Failure (Predicted: Vase)

Lamp Failure

Predicted: Vase ✗, Actual: Lamp

Interpretation: The model confused this lamp with a vase, possibly because the lamp has a simple columnar design without a prominent lampshade, or the lampshade's shape closely resembles a vase's opening or body. Minimalist lamp designs with smooth surfaces and symmetric profiles can exhibit geometric properties nearly identical to decorative vases. This misclassification reveals the model's limitation in handling objects with ambiguous geometric features where functional context cannot be gotten from shape alone. The error suggests that pure geometry-based classification may be insufficient for objects with overlapping form factors.
Summary of Failure Patterns: The failure cases reveal a consistent pattern: the model struggles most with objects that share geometric similarity across categories, particularly when dealing with smooth, curved, or axially-symmetric shapes. The confusion between vases and lamps and between chairs and vases indicates that PointNet's global max-pooling may be capturing high-level shape descriptors but missing fine-grained structural details that would provide class-specific discriminative power. These errors account for the ~2% of test samples where the model fails, suggesting that data augmentation with rotation and scaling, or architectural improvements incorporating local context could improve geeralization to geometrically ambiguous cases.

Q2: Segmentation Model 40 points

Test Accuracy: 90.40%

Segmentation Results

Below are segmentation results for 7 objects, including 2 bad predictions. Each visualization shows the ground truth segmentation (left) compared with the model's prediction (right), where different colors represent different chair parts

Object 0 - Good Segmentation

Object 0 Ground Truth

Ground Truth

Object 0 Prediction

Prediction

Analysis: There is moderately accurate segmentation with clear part boundaries. The model successfully identifies all major chair components including the seat, backrest, and legs. The part boundaries are well-defined, and there is minimal confusion between adjacent parts. This represents an ideal case where the chair geometry is canonical and part transitions are geometrically distinct.

Object 1 - Good Segmentation

Object 1 Ground Truth

Ground Truth

Object 1 Prediction

Prediction

Analysis: Accurate part segmentation with minor boundary errors. The model correctly segments the major structural components, though there may be slight misclassifications at part junctions where geometric features blend together. These boundary effects are common in point cloud segmentation due to the discrete nature of point sampling and local geometric ambiguity.

Object 2 - Good Segmentation

Object 2 Ground Truth

Ground Truth

Object 2 Prediction

Prediction

Analysis: Well-segmented chair parts with clear distinction between seat, back, and legs. The model demonstrates its ability to leverage both local geometric features (from the shared point feature extraction) and global context to produce the correct part labels. The consistency across symmetric parts (e.g., left and right legs) indicates the model has learned generalizable part features.

Object 3 - Good Segmentation

Object 3 Ground Truth

Ground Truth

Object 3 Prediction

Prediction

Analysis: Consistent segmentation with good part recognition across all chair components. The model maintains coherent labeling even for thin structures like chair legs, which can be challenging due to sparse point sampling. This demonstrates the effectiveness of the point-wise feature learning approach combined with global shape context.

Object 4 (Bad Prediction)

Object 4 Ground Truth

Ground Truth

Object 4 Prediction

Prediction

Analysis: There is poor segmentation with significant errors. The model struggles to correctly identify part boundaries, particularly in regions where parts connect or transition (e.g., seat-to-leg junction, seat-to-back connection). This may be because of ambiguous geometric features where various parts share similar local normal directions or insufficient context locally in the point neighborhood to disambiguate parts. This indicates PointNet's limitation in realizing fine-grained local geometric relationships without explicit hierarchical feature aggregation.

Object 5 - Moderate Quality

Object 5 Ground Truth

Ground Truth

Object 5 Prediction

Prediction

Analysis: Reasonable segmentation quality with some minor classification errors at part boundaries. While major parts are correctly identified, there are scattered misclassifications that likely occur in transition regions or areas with geometric complexity. These errors are tolerable and represent the typical performance level of the model on moderately challenging examples.

Object 9 (Bad Prediction)

Object 9 Ground Truth

Ground Truth

Object 9 Prediction

Prediction

Analysis: Significant segmentation errors with multiple misclassified parts. The model fails to capture the correct part structure, with entire structural components potentially mislabeled (e.g., backrest points classified as seat, or leg points confused with armrests). This failure could be caused by: (1) unusual or complex chair geometry that is underrepresented in the training data, (2) occlusion-like effects in the point cloud sampling where certain parts have very sparse point coverage, or (3) geometric ambiguity where parts merge smoothly without clear boundaries. The widespread nature of the errors (not just boundary confusion) suggests the global context feature may be misleading for this shape, causing the model to apply an incorrect part labeling scheme. This highlights the fundamental challenge in part segmentation: the need to balance local geometric detail with global structural understanding, which the basic PointNet architecture struggles with for atypical or geo

Q3: Robustness Analysis 20 points

Experiment 1: Rotation Robustness

Procedure: I tested the model's robustness to rotation by rotating input point clouds around the X-axis at various angles (0°, 15°, 30°, 45°, 60°, 75°, 90°). Each test point cloud was transformed using a rotation matrix before being fed to the model. This tests whether the model has learned rotation-invariant features or if it relies on canonical orientations seen during training.

Classification Task Results

Baseline Test Accuracy (0°): 97.90%
Test Accuracy at 90°: 27.70%
Comparison with Q1: -70.20% accuracy drop at 90° rotation
Detailed Results:
Rotation Accuracy Change
97.90% +0.00%
15° 94.54% -3.36%
30° 82.79% -15.11%
45° 54.88% -43.02%
60° 33.26% -64.64%
75° 28.12% -69.78%
90° 27.70% -70.20%

Classification Visualizations

Comparison of classification performance at 0° (baseline) and 90° rotation:
Q1 Original 0°

Q1 Original - 0° (97.90%)

90° Rotated

Rotated 90° (27.70%)

Segmentation Task Results

Baseline Test Accuracy (0°): 90.40%
Test Accuracy at 90°: 24.13%
Comparison with Q2: -66.27% accuracy drop at 90° rotation
Detailed Results:
Rotation Accuracy Change
90.40% +0.00%
15° 83.00% -7.40%
30° 72.62% -17.78%
45° 63.99% -26.41%
60° 45.00% -45.40%
75° 30.84% -59.55%
90° 24.13% -66.27%

Segmentation Visualizations - Baseline (0°)

Q2 Original GT

Q2 Original - Ground Truth

Q2 Original Prediction

Q2 Original - Prediction (90.40%)

Segmentation Visualizations - 0° Rotation

0° GT

0° - Ground Truth

0° Prediction

0° - Prediction (90.40%)

Segmentation Visualizations - 45° Rotation

45° GT

45° - Ground Truth

45° Prediction

45° - Prediction (63.99%)

Segmentation Visualizations - 90° Rotation

90° GT

90° - Ground Truth

90° Prediction

90° - Prediction (24.13%)

Interpretation: Both models show HIGH sensitivity to rotation, with severe accuracy degradation at larger rotation angles. Classification accuracy drops from 97.90% to 27.70% (70.20% drop), while segmentation drops from 90.40% to 24.13% (66.27% drop). Both tasks exhibit similar vulnerability to rotation, indicating that neither the classification nor segmentation network has learned rotation-invariant features. Even small rotations cause noticeable degradation (15°: -3.36% for classification, -7.40% for segmentation), with performance collapsing beyond 45°. This reveals that the models have memorized canonical orientations from training rather than learning geometric properties invariant to rotation. The similar degradation patterns across both tasks suggest this is a fundamental limitation of the basic PointNet architecture without T-Net or data augmentation. The visualizations clearly show how the segmentation quality deteriorates as rotation increases, with part boundaries becoming increasingly confused at 45° and nearly random at 90°.

Experiment 2: Point Density Robustness

Procedure: I tested the model's robustness to varying point density by randomly sampling different numbers of points from each object (100, 500, 1000, 2500, 5000, 7500, 10000 points). This evaluates whether the model can maintain performance with sparse point clouds and how much geometric information is truly needed for accurate classification.

Classification Task Results

Baseline Test Accuracy (10000 points): 97.90%
Test Accuracy at 100 points: 92.76%
Comparison with Q1: Only -5.14% accuracy drop with 100x fewer points
Detailed Results:
# Points Accuracy Change
100 92.76% -5.14%
500 96.85% -1.05%
1000 97.80% -0.10%
2500 98.01% +0.11%
5000 98.01% +0.11%
7500 97.80% -0.10%
10000 97.90% 0.00%

Classification Visualizations

Q1 Original

Q1 Original (10000 points - 97.90%)

100 Points

100 Points (92.76%)

Segmentation Task Results

Baseline Test Accuracy (10000 points): 90.40%
Test Accuracy at 100 points: 83.92%
Comparison with Q2: -6.48% accuracy drop with 100x fewer points
Detailed Results:
# Points Accuracy Change
100 83.92% -6.48%
500 88.81% -1.59%
1000 89.83% -0.57%
2500 90.31% -0.09%
5000 90.37% -0.03%
7500 90.43% +0.03%
10000 90.40% 0.00%

Segmentation Visualizations - 10000 Points (Baseline)

10000 Points GT

10000 Points - Ground Truth

10000 Points Prediction

10000 Points - Prediction (90.40%)

Segmentation Visualizations - 5000 Points

5000 Points GT

5000 Points - Ground Truth

5000 Points Prediction

5000 Points - Prediction (90.37%)

Segmentation Visualizations - 1000 Points

1000 Points GT

1000 Points - Ground Truth

1000 Points Prediction

1000 Points - Prediction (89.83%)

Segmentation Visualizations - 100 Points

100 Points GT

100 Points - Ground Truth

100 Points Prediction

100 Points - Prediction (83.92%)

Interpretation: Both models demonstrate excellent robustness to reduced point density, in stark contrast to their rotation sensitivity. Classification maintains 92.76% accuracy with just 100 points (5.14% drop), while segmentation achieves 83.92% (6.48% drop). The slightly larger degradation in segmentation is expected since it requires more local geometric detail for per-point labeling. Performance plateaus around 500 points for classification (96.85%) and 1000 points for segmentation (89.83%), showing that PointNet efficiently extracts salient geometric features without requiring dense sampling. This robustness stems from the symmetric max-pooling aggregation function, which is invariant to point set size and focuses on the most discriminative features. The visualizations demonstrate that even with dramatically reduced point density (100 points vs 10000), the model maintains coherent segmentation with only minor quality degradation at part boundaries. This confirms that PointNet successfully learns compact shape representations that generalize across different sampling densities, making it practical for real-world applications with varying sensor quality and point cloud resolution.