Goals: In this assignment, you will explore the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.






Final F1@0.05 score: 73.88


Prediction: 
Input RGB: 

Prediction: 
Input RGB: 


For better coverage, I chose n_points = 2048.
Final F1@0.05 score: 76.81


Prediction: 
Input RGB: 

Prediction: 
Input RGB: 


Final F1@0.05 score: 72.34


Prediction: 
Input RGB: 

Prediction: 
Input RGB: 


At low thresholds, even small deviations are penalized harshly, lowering both recall and precision.
Given similar result qualities, pointcloud should have slightly higher F1 scores, since pointclouds are trained to directly fit the ground truth.
Voxels and meshes require an extra sample_points_from_meshes during F1 score computation, and they are subject to a random sampling process, which may slightly increase the F1 score.



Analyse the results, by varying a hyperparameter of your choice.
For example n_points or vox_size or w_chamfer or initial mesh (ico_sphere) etc.
Try to be unique and conclusive in your analysis.
During point cloud training, I explored the effect of varying the relative weights between the two terms in the Chamfer loss, which correspond to precision and recall. The modified chamfer loss is defined as:
When $\alpha < 1$, recall is prioritized, which ensures all ground truth points have nearby predictions. The resulting point clouds tend to loosely cover the entire shape but often lack fine structural details.
When $\alpha > 1$, precision is prioritized, which encourages predicted points to be closer to actual surface regions. However, as $\alpha$ increases, points begin to cluster densely around high-confidence areas, causing sparse coverage elsewhere and potentially missing thin or distant structures (e.g., chair legs).
To mitigate over-clustering, I added an additional repulsion loss defined as
In practice, I find that repulsion loss + chamfer loss with \(\alpha = 1.0\) produces the best qualitative results.
Results in a prediction every structure details are evenly covered, but the shape is less defined.

Trained for 150k iterations with \(\alpha = 1.0\), followed by 10k iteration of \(\alpha = 1.5\) (all with repulsion loss)

We can gain insight into the model’s behavior by visualizing the per-point contribution to the Chamfer distance. Each ground truth point is colored according to its nearest distance to any predicted point. Green indicates accurate reconstruction, while red represents larger deviations.
This error heatmap reveals the model’s typical failure modes. Most of the sitting surfaces of chairs are reconstructed precisely, showing dense green regions. In contrast, backrests, thin legs, edges, and thin structures often appear red, suggesting higher geometric uncertainty or incomplete recovery of fine structures.
Here are some representative examples.

Final F1@0.05 score: 41.32


Prediction: 
Input RGB: 

Prediction: 
Input RGB: 

