Section 1: Fitting Data (Before vs After Training)
1.1. Fitting a Voxel Grid
Source (Before)
Source (After)
Target
1.2. Fitting a Point Cloud
Source (Before)
Source (After)
Target
1.3. Fitting a Mesh
Source (Before)
Source (After)
Target
Section 2.1: Image → Vox
| Image | Ground Truth | Prediction |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Section 2.2: Image → Point Cloud
I have used --load_feat and CPU for this section as I haven't received AWS credits and my deadlines in this and other assignments are closing in.
| Image | Ground Truth | Prediction |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Section 2.3: Image → Mesh
| Image | Ground Truth | Prediction |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Section 2.4: Quantitative Comparisons
Voxel
Point Cloud
Mesh
Section 2.5: Analyse effects of hyperparams variations
Parameter 1: Varying Number of Points for Point Cloud
Number of points trained and evaluated upon: 100
python train_model.py --type 'point' --device 'cpu' --load_feat --max_iter 200 --save_freq 188 --n_points 100
2D Input Image
Ground Truth Point Cloud
Predicted Point Cloud
Number of points trained and evaluated upon: 500
python train_model.py --type 'point' --device 'cpu' --load_feat --max_iter 200 --save_freq 188 --n_points 500
2D Input Image
Ground Truth Point Cloud
Predicted Point Cloud
Number of points trained and evaluated upon: 1000
python train_model.py --type 'point' --device 'cpu' --load_feat --max_iter 200 --save_freq 188 --n_points 1000
2D Input Image
Ground Truth Point Cloud
Predicted Point Cloud
Parameter 2: Varying w_smooth, batch_size, and lr
Trial 1: w_smooth = 0.1, batch_size = 32, lr = 4e-4
Comment: Extreme sharpness towards the edges.
2D Input Image
Ground Truth Mesh
Predicted Mesh
Trial 2: w_smooth = 0.4, batch_size = 64, lr = 6e-4
Comment: Smoothing loss was not weighted enough against Chamfer loss so I increased it. I also increased the batch size and learning rate and got better results.
2D Input Image
Ground Truth Mesh
Predicted Mesh
Parameter 3: Varying Decoder Architecture
Architecture 1:
self.decoder = nn.Sequential(
nn.Linear(512, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, self.V * 3)
)
Parameters: w_smooth = 0.1, batch_size = 32, lr = 4e-4
2D Input Image
Ground Truth Mesh
Predicted Mesh
Architecture 2:
self.decoder = nn.Sequential(
nn.Linear(512, 1024),
nn.BatchNorm1d(1024),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(1024, 2048),
nn.BatchNorm1d(2048),
nn.LeakyReLU(0.2, inplace=True),
nn.Dropout(0.3),
nn.Linear(2048, 1024),
nn.BatchNorm1d(1024),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(1024, self.V * 3)
)
Parameters: w_smooth = 0.1, batch_size = 32, lr = 4e-4
2D Input Image
Ground Truth Mesh
Predicted Mesh
Architecture 1 (Different Hyperparameters):
self.decoder = nn.Sequential(
nn.Linear(512, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, self.V * 3)
)
Parameters: w_smooth = 0.4, batch_size = 64, lr = 6e-4
Learning Outcomes:
- Keep the decoder architecture simple — larger and more complex versions are harder to train effectively.
- The loss function combines Chamfer distance and Gaussian smoothing. Chamfer focuses on geometric accuracy (distances), while smoothing ensures surface continuity. To get clean meshes, smoothing should be given equal or slightly higher weight than Chamfer loss.
2D Input Image
Ground Truth Mesh
Predicted Mesh
Section 2.6: Interpret your model
Posting numerical Comparisons of how Chamfeer loses and Laplacian Smoothing loss are varying during training
Chamfer loss decreases steadily, showing that predicted positions are getting closer to ground truth.
The Laplacian smoothing loss also drops, indicating that the model enforces mesh regularization and avoids noisy or jagged predictions.
In our input variables I did give more 0.4 to the weight of Laplacian Smoothing loss, and 1.0 to the weight of chamfer loss.
Chamfer Loss vs iteration
Laplacian Smoothing Loss vs iteration
To better understand how the voxel decoder refines its outputs, I visualized intermediate activations of my decoder layer
Here I see that the values are towards the ends - either towards 0.1 or 1.0, suggesting the network is laying down coarse voxel occupancy.
Middle layers (this and the subsequence) begin to form structured spatial patterns (Activation 2), hinting at coarse object shapes.
By the later layers (Activation 3+), activations exhibit clearer global structure
Section 3: Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)
3.1 I did write code for the implicit decoder but had issues with generating output - did not have time to resolve it.
3.2 I did write code for the parametric decoder but had issues with generating output - did not have time to resolve it.
3.3 Extended Dataset for Training
Chamfer Loss vs iteration
Laplacian Smoothing Loss vs iteration
Here I see that the values are towards the ends - either towards 0.1 or 1.0, suggesting the network is laying down coarse voxel occupancy.
Middle layers (this and the subsequence) begin to form structured spatial patterns (Activation 2), hinting at coarse object shapes.
By the later layers (Activation 3+), activations exhibit clearer global structure
Section 3: Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)
3.1 I did write code for the implicit decoder but had issues with generating output - did not have time to resolve it.
3.2 I did write code for the parametric decoder but had issues with generating output - did not have time to resolve it.
3.3 Extended Dataset for Training
| With Just 1 Class | With 3 Classes |
|---|---|
![]() |
![]() |
Example Output (3-Class Training)
Input Image
Ground Truth Point Cloud
Predicted Point Cloud
Conclusion / Observation: Training with more classes and datapoints results in relatively higher F1 scores.
1-class training → ~71% F1 score
3-class training → ~82% F1 score



























