Assignment 5 – PointNet on Point Clouds

Q1. Classification Model

1.1 Model Architecture Summary

class cls_model(nn.Module):
    def __init__(self, num_classes=3):
        super(cls_model, self).__init__()
        self.point_features = nn.Sequential(
            nn.Conv1d(3, 64, 1),
            nn.ReLU(),
            nn.Conv1d(64, 128, 1),
            nn.ReLU(),
            nn.Conv1d(128, 1024, 1),
        )
        self.classifier = nn.Sequential(
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, num_classes),
        )

    def forward(self, points):
        '''
        points: tensor of size (B, N, 3)
                , where B is batch size and N is the number of points per object (N=10000 by default)
        output: tensor of size (B, num_classes)
        '''
        # transpose points to (B, 3, N)
        points_transposed = points.transpose(1, 2)

        #per-point features
        x = self.point_features(points_transposed)

        # max pooling to get global features
        x_global = torch.max(x, dim=2)[0] # (B, 1024)

        # classify
        output = self.classifier(x_global)
        return output

1.2 Training Setup

Optimizer: Adam

Learning Rate: 0.001

Epochs: 130

Batch Size: 32

1.3 Test Accuracy

Final Test Accuracy: 0.9717

1.4 Visualizations of Predictions

Correct Predictions

Failure Cases

Vase failure example:

Lamp failure example:

Chair failure example:

1.5 Interpretation

The failure case are likely cause by global pooling of extract the most prominent feature of a categories. The miscategorized shape does have some features that is ambiguous of both categories, such as having long supporting legs for both lamp and chair, have a container-like shape for both lamp and vase. The model is likely having a hard time distinguishing simply based on global pooling layers with only global awareness.

Q2. Segmentation Model

2.1 Model Architecture Summary

class seg_model(nn.Module):
    def __init__(self, num_seg_classes=6):
        super(seg_model, self).__init__()
        
        # 1. Encoder (Same as Classification)
        self.point_features = nn.Sequential(
            nn.Conv1d(3, 64, 1),
            nn.ReLU(),
            nn.Conv1d(64, 128, 1),
            nn.ReLU(),
            nn.Conv1d(128, 1024, 1)
        )
        
        # 2. Decoder (The "Concatenation Trick")
        # Input: 1024 (Global) + 1024 (Local) = 2048
        self.decoder = nn.Sequential(
            nn.Conv1d(2048, 512, 1),
            nn.ReLU(),
            nn.Conv1d(512, 256, 1),
            nn.ReLU(),
            nn.Conv1d(256, 128, 1),
            nn.ReLU(),
            nn.Conv1d(128, num_seg_classes, 1)
        )

    def forward(self, points):
        '''
        points: tensor of size (B, N, 3)
        output: tensor of size (B, N, num_seg_classes)
        '''
        batch_size = points.size(0)
        num_points = points.size(1)
        
        # Transpose: (B, N, 3) -> (B, 3, N)
        points_transposed = points.transpose(1, 2)
        
        # Encoder: (B, 3, N) -> (B, 1024, N)
        local_features = self.point_features(points_transposed)
        
        # Max Pooling (Global Feature): (B, 1024, N) -> (B, 1024, 1)
        global_feature = torch.max(local_features, dim=2, keepdim=True)[0]
        
        # Expansion: (B, 1024, 1) -> (B, 1024, N)
        global_feature_repeated = global_feature.repeat(1, 1, num_points)
        
        # Concatenation: (B, 1024, N) + (B, 1024, N) -> (B, 2048, N)
        combined_features = torch.cat([local_features, global_feature_repeated], dim=1)
        
        # Decoder: (B, 2048, N) -> (B, num_seg_classes, N)
        logits = self.decoder(combined_features)
        
        # Transpose back: (B, num_seg_classes, N) -> (B, N, num_seg_classes)
        return logits.transpose(1, 2)

2.2 Training Setup

Optimizer: Adam

Learning Rate: 0.001

Epochs: 130

Batch Size: 32

2.3 Test Accuracy

Final Test Accuracy: 0.8995

2.4 Visualizations

For each object: show predicted segmentation + ground truth.

Object 1

Object 2

Object 3

Object 4 — Bad Prediction

Object 5 — Bad Prediction

2.5 Interpretation

The model achieves near-perfect accuracy on standard chair topologies where functional parts—such as legs, seats, and backrests—are geometrically distinct and spatially separated. However, performance degrades significantly on complex shapes like armchairs or continuous curved designs (Object 4 and 5), where the boundaries between semantic parts are ambiguous or merged. In these failure cases, the model struggles to classify adjacent points that share similar local geometry, causing labels to incorrectly "bleed" across transition zones where parts are not clearly separated.

Q3. Robustness Analysis

Experiment 1

Procedure

Down-sampled each point cloud to 10k, 1k, 100, and 50 points, then reran the pretrained classification and segmentation models to quantify sensitivity to sparsity.

Accuracy Comparison

Classification:

[Experiment 1: Point Density]
Num Points      | Accuracy  
------------------------------
10000           | 0.9717
1000            | 0.9664
100             | 0.9423
50              | 0.9056

Segmentation:

[Experiment 1: Point Density]
Num Points      | Accuracy  
------------------------------
10000           | 0.8995
1000            | 0.8959
100             | 0.8262
50              | 0.7810

Visualizations

Interpretation

The point density doesn’t have big impact for prediction accuracy for classification task because even if points are as sparse as 50 points, general structure are still visible. Segmentation tasks was impacted more when points are becoming sparser because points for specific parts can come to a point where is not legible any more and specific points become ambiguous even with naked eye of which parts it belongs to.

Experiment 2