Assignment 5

Name: Ishita Gupta

Andrew ID: ishitag

Q1. Classification Model (40 points)
Q2. Segmentation Model (40 points)
Q3. Robustness Analysis (20 points)
Q4. Bonus Question - Locality (20 points)

Q1. Classification Model

Test accuracy: 97.38%

Random Correct Predictions

Sample 1 - Chair (Correct) Chair Correct 1

Sample 2 - Chair (Correct) Chair Correct 2

Sample 3 - Vase (Correct) Vase Correct 1

Sample 4 - Vase (Correct) Vase Correct 2

Sample 5 - Lamp (Correct) Lamp Correct 1

Sample 6 - Lamp (Correct) Lamp Correct 2

Failure Cases

Failure 1 - Chair misclassified as Lamp Chair Failed

Failure 2 - Vase misclassified as Lamp Vase Failed

Failure 3 - Lamp misclassified as Vase Lamp Failed

Per-Class Performance

Chair: 616/617 correct (99.84%) - Only 1 failure
Vase: 94/102 correct (92.16%) - 8 failures
Lamp: 218/234 correct (93.16%) - 16 failures

The model achieves excellent performance on chair classification (99.84%) due to possibly their distinctive features like four legs and backrests. However, vases and lamps are more challenging as they share similar cylindrical geometries. The failure cases show confusion primarily between vases and lamps, which can have similar elongated vertical structures. The chair that was misclassified as a lamp likely has an unusual design (collapsed) with less prominent leg structures as compared to generic chair examples. Random point sampling may miss distinctive features in some cases, especially for objects with complex or sparse geometries.

Q2. Segmentation Model

Test accuracy: 90.05%
Visualize segmentation results of at least 5 objects (including 2 bad predictions) with corresponding ground truth, report the prediction accuracy for each object, and provide interpretation in a few sentences.

Segmentation Results

Object	Ground Truth	Prediction	Accuracy
Object 0 (Good)			94.25%
Object 4 (Good)			73.80%
Object 57 (Good)			99.07%
Object 616 (Good)			99.35%
Object 351 (Bad)			51.66%
Object 40 (Bad)			53.37%

Analysis

Q3. Robustness Analysis

Experiment 1: Rotation Robustness

Evaluated robustness to rotations by applying rotations of varying angles (15, 30, 45, 90, 180) degrees around the z-axis to test point clouds. Also tested rotations around x and y axes at 45degrees to understand axis-specific sensitivity. The PointNet architecture without T-Net transformation blocks is expected to be sensitive to rotations since it processes raw point coordinates directly.

Procedure:
- Load the trained classification and segmentation models
- Apply rotation transformations to test data using rotation matrices
- Evaluate accuracy on rotated point clouds
- Generate visualizations for segmentation results at different rotation angles

Classification Results (Baseline: 97.38%)

Rotation Angle	Axis	Test Accuracy	Accuracy Drop
0deg (baseline)	z	97.38%	-
15deg	z	91.92%	-5.46%
30deg	z	56.24%	-41.14%
45deg	z	24.87%	-72.51%
90deg	z	24.24%	-73.14%
180deg	z	53.31%	-44.07%
45deg	x	49.84%	-47.54%
45deg	y	63.06%	-34.32%

Segmentation Results (Baseline: 90.05%)

Rotation Angle	Axis	Test Accuracy	Accuracy Drop
0deg (baseline)	z	90.05%	-
15deg	z	83.11%	-6.94%
30deg	z	70.31%	-19.74%
45deg	z	59.36%	-30.69%
90deg	z	43.02%	-47.03%

Visualization:

Rotation Angle (degrees)	GT	Pred
0
45
90

Interpretation: The model shows significant sensitivity to rotations, with accuracy dropping dramatically even at moderate rotation angles (30-45deg). For classification, accuracy drops from 97.38% to 24.87% at 45deg rotation around z-axis, indicating the model relies heavily on absolute coordinate positions rather than learned geometric features. Segmentation is more robust, dropping from 90.05% to 59.36% at 45deg, likely because per-point predictions can leverage local geometric relationships. The z-axis rotation causes the most severe degradation, while y-axis rotation (63.06% at 45deg) is more tolerable, suggesting the model has learned some vertical symmetry. This confirms that without T-Net transformation blocks, PointNet is not rotation-invariant and would benefit from data augmentation with rotations during training or proper transformation network implementation.

Experiment 2: Number of Points

Description: Evaluated model robustness to varying point cloud densities by subsampling different numbers of points (100, 500, 1000, 2500, 5000, 10000) from the original 10,000 points per object. This tests whether the model can maintain performance with sparser point clouds, which is common in real-world scenarios with varying sensor resolutions or occlusions.

Approach:

Randomly sample different numbers of points from the full 10,000-point test data
Evaluate both classification and segmentation models
Compare accuracy against baseline (10,000 points)
Generate visualizations for segmentation at different point densities

Classification Results (Baseline from Q1: 97.38% with 10,000 points)

Number of Points	Test Accuracy	vs. Baseline (97.38%)
50	65.37%	-32.01%
100	89.72%	-7.66%
500	97.69%	+0.31%
1000	97.80%	+0.42%
2500	98.11%	+0.73%
5000	98.11%	+0.73%
10000 (baseline)	97.38%	0%

Segmentation Results (Baseline from Q2: 90.05% with 10,000 points)

Number of Points	Test Accuracy	vs. Baseline (90.05%)
50	79.28%	-10.77%
100	83.23%	-6.82%
500	89.28%	-0.77%
1000	90.26%	+0.21%
2500	90.48%	+0.43%
5000	90.46%	+0.41%
10000 (baseline)	90.05%	0%

Num points	GT	Pred
50
100
500
1000

Interpretation: The model demonstrates strong robustness to point density variations. For classification, performance remains excellent (97.69%) even with just 500 points, dropping only to 89.72% with 100 points. This indicates that the max pooling operation in PointNet effectively captures global features from a subset of points, and the model doesn't require the full 10,000 points to make accurate predictions. The slight variation in accuracy with different point counts (including values slightly above the Q1 baseline of 97.38%) is due to different random sampling of points. For segmentation, the model maintains 89.28% accuracy with 500 points (very close to the Q2 baseline of 90.05%) and performs comparably with 1000+ points, showing that local geometric features can be captured effectively even with sparser point clouds. The robustness to point density is a key strength of PointNet's architecture, making it suitable for real-world applications with varying sensor resolutions.

Q4. Bonus Question - Locality (20 points)

Model Implemented

DGCNN (Dynamic Graph CNN) - A locality-aware architecture that builds dynamic k-NN graphs and applies edge convolutions to capture local geometric features.

Key differences from PointNet:

Uses k-NN graphs (k=20) to connect each point to its nearest neighbors
EdgeConv layers aggregate features from local neighborhoods by computing edge features as [center, neighbor-center]
Captures local geometric patterns better than point-wise MLPs through dynamic graph construction
Each EdgeConv layer applies a 2D convolution on edge features followed by max pooling over neighbors

Classification Results

Model	Test Accuracy	Improvement
PointNet (Q1)	97.38%	Baseline
DGCNN (Q4)	97.69%	+0.31%

Per-Class Performance Comparison

Class	PointNet (Q1)	DGCNN (Q4)	Improvement
Chair	616/617 (99.84%)	616/617 (99.84%)	0%
Vase	94/102 (92.16%)	90/102 (88.24%)	-3.92%
Lamp	218/234 (93.16%)	225/234 (96.15%)	+2.99%

Classification Visualizations

Correct Predictions

Sample 1 - Chair (Correct) DGCNN Chair Correct 1

Sample 2 - Chair (Correct) DGCNN Chair Correct 2

Sample 3 - Vase (Correct) DGCNN Vase Correct 1

Sample 4 - Vase (Correct) DGCNN Vase Correct 2

Sample 5 - Lamp (Correct) DGCNN Lamp Correct 1

Sample 6 - Lamp (Correct) DGCNN Lamp Correct 2

Failure Cases

Failure 1 - Chair misclassified as Lamp DGCNN Chair Failed

Failure 2 - Vase misclassified as Lamp DGCNN Vase Failed

Failure 3 - Lamp misclassified as Vase DGCNN Lamp Failed

Segmentation model training for DGCNN was not completed due to computational constraints. The DGCNN architecture requires significantly more memory due to k-NN graph computation, making it challenging to train the segmentation model with the available resources.

DGCNN shows a modest improvement in overall classification accuracy (+0.31%) compared to PointNet. The most notable improvement is in lamp classification, where DGCNN achieves 96.15% accuracy compared to PointNet's 93.16% (+2.99%). This suggests that the local neighborhood features captured by EdgeConv layers help distinguish lamp structures, which often have complex local geometric patterns.

However, DGCNN shows a slight decrease in vase classification accuracy (88.24%, -3.92%), which may indicate that for simpler geometric shapes like vases, the additional complexity of k-NN graph construction doesn't provide significant benefits and may even introduce noise.

Assignment 5

Table of Contents

Q1. Classification Model

Random Correct Predictions

Failure Cases

Per-Class Performance

Q2. Segmentation Model

Segmentation Results

Analysis

Q3. Robustness Analysis

Experiment 1: Rotation Robustness

Classification Results (Baseline: 97.38%)

Segmentation Results (Baseline: 90.05%)

Experiment 2: Number of Points

Classification Results (Baseline from Q1: 97.38% with 10,000 points)

Segmentation Results (Baseline from Q2: 90.05% with 10,000 points)

Q4. Bonus Question - Locality (20 points)

Model Implemented

Classification Results

Per-Class Performance Comparison

Classification Visualizations

Correct Predictions

Failure Cases