Assignment 2 –

Name: Sagar Chandrashekhar Bellad

Andrew ID: sbellad

Disclaimer: I have referred online documentation, stackOverflow and used GPT - not to copy the code blindly/directly but to understand a concept or code that was new to me or that I did not fully understand.

Section 1: Fitting Data (Before vs After Training)

1.1. Fitting a Voxel Grid

Source (Before)

Source (After)

Target

1.2. Fitting a Point Cloud

Source (Before)

Source (After)

Target

1.3. Fitting a Mesh

Source (Before)

Source (After)

Target

Section 2.1: Image → Vox

Image Ground Truth Prediction

Section 2.2: Image → Point Cloud

I have used --load_feat and CPU for this section as I haven't received AWS credits and my deadlines in this and other assignments are closing in.

Image Ground Truth Prediction

Section 2.3: Image → Mesh

Image Ground Truth Prediction

Section 2.4: Quantitative Comparisons

Voxel

Point Cloud

Mesh

Section 2.5: Analyse effects of hyperparams variations

Parameter 1: Varying Number of Points for Point Cloud

Number of points trained and evaluated upon: 100

python train_model.py --type 'point' --device 'cpu' --load_feat --max_iter 200 --save_freq 188 --n_points 100

2D Input Image

Ground Truth Point Cloud

Predicted Point Cloud

Number of points trained and evaluated upon: 500

python train_model.py --type 'point' --device 'cpu' --load_feat --max_iter 200 --save_freq 188 --n_points 500

2D Input Image

Ground Truth Point Cloud

Predicted Point Cloud

Number of points trained and evaluated upon: 1000

python train_model.py --type 'point' --device 'cpu' --load_feat --max_iter 200 --save_freq 188 --n_points 1000

2D Input Image

Ground Truth Point Cloud

Predicted Point Cloud

Parameter 2: Varying w_smooth, batch_size, and lr

Trial 1: w_smooth = 0.1, batch_size = 32, lr = 4e-4

Comment: Extreme sharpness towards the edges.

2D Input Image

Ground Truth Mesh

Predicted Mesh

Trial 2: w_smooth = 0.4, batch_size = 64, lr = 6e-4

Comment: Smoothing loss was not weighted enough against Chamfer loss so I increased it. I also increased the batch size and learning rate and got better results.

2D Input Image

Ground Truth Mesh

Predicted Mesh

Parameter 3: Varying Decoder Architecture

Architecture 1:

self.decoder = nn.Sequential(
    nn.Linear(512, 1024),
    nn.ReLU(inplace=True),
    nn.Linear(1024, 1024),
    nn.ReLU(inplace=True),
    nn.Linear(1024, self.V * 3)
)
Parameters: w_smooth = 0.1, batch_size = 32, lr = 4e-4

2D Input Image

Ground Truth Mesh

Predicted Mesh

Architecture 2:

self.decoder = nn.Sequential(
    nn.Linear(512, 1024),
    nn.BatchNorm1d(1024),
    nn.LeakyReLU(0.2, inplace=True),

    nn.Linear(1024, 2048),
    nn.BatchNorm1d(2048),
    nn.LeakyReLU(0.2, inplace=True),

    nn.Dropout(0.3),

    nn.Linear(2048, 1024),
    nn.BatchNorm1d(1024),
    nn.LeakyReLU(0.2, inplace=True),

    nn.Linear(1024, self.V * 3)
)
Parameters: w_smooth = 0.1, batch_size = 32, lr = 4e-4

2D Input Image

Ground Truth Mesh

Predicted Mesh

Architecture 1 (Different Hyperparameters):

self.decoder = nn.Sequential(
    nn.Linear(512, 1024),
    nn.ReLU(inplace=True),
    nn.Linear(1024, 1024),
    nn.ReLU(inplace=True),
    nn.Linear(1024, self.V * 3)
)
Parameters: w_smooth = 0.4, batch_size = 64, lr = 6e-4

Learning Outcomes:

  • Keep the decoder architecture simple — larger and more complex versions are harder to train effectively.
  • The loss function combines Chamfer distance and Gaussian smoothing. Chamfer focuses on geometric accuracy (distances), while smoothing ensures surface continuity. To get clean meshes, smoothing should be given equal or slightly higher weight than Chamfer loss.

2D Input Image

Ground Truth Mesh

Predicted Mesh

Section 2.6: Interpret your model

Posting numerical Comparisons of how Chamfeer loses and Laplacian Smoothing loss are varying during training Chamfer loss decreases steadily, showing that predicted positions are getting closer to ground truth. The Laplacian smoothing loss also drops, indicating that the model enforces mesh regularization and avoids noisy or jagged predictions. In our input variables I did give more 0.4 to the weight of Laplacian Smoothing loss, and 1.0 to the weight of chamfer loss.

Chamfer Loss vs iteration

Laplacian Smoothing Loss vs iteration

To better understand how the voxel decoder refines its outputs, I visualized intermediate activations of my decoder layer

Here I see that the values are towards the ends - either towards 0.1 or 1.0, suggesting the network is laying down coarse voxel occupancy.

Middle layers (this and the subsequence) begin to form structured spatial patterns (Activation 2), hinting at coarse object shapes.

By the later layers (Activation 3+), activations exhibit clearer global structure

Section 3: Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)

3.1 I did write code for the implicit decoder but had issues with generating output - did not have time to resolve it.

3.2 I did write code for the parametric decoder but had issues with generating output - did not have time to resolve it.

3.3 Extended Dataset for Training

With Just 1 Class With 3 Classes

Example Output (3-Class Training)

Input Image

Ground Truth Point Cloud

Predicted Point Cloud

Conclusion / Observation: Training with more classes and datapoints results in relatively higher F1 scores.
1-class training → ~71% F1 score
3-class training → ~82% F1 score