Homework 2¶
By Zhewen Zheng (zhewenz)
1. Exploring loss functions¶
1.1. Fitting a voxel grid¶
| src | target |
|---|---|
![]() |
![]() |
1.2 Fitting a point cloud¶
| src | target |
|---|---|
![]() |
![]() |
1.3 Fitting a mesh¶
| src | target |
|---|---|
![]() |
![]() |
2. Reconstructing 3D from single view¶
2.1 Image to voxel grid¶
| img | prediction | gt |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2.2 Image to point cloud¶
| img | prediction | gt |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2.3 Image to mesh¶
| img | prediction | gt |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2.4 Quantitative comparisons¶
| Voxel Avg F1@0.05: 74.809 | Pointcloud Avg F1@0.05: 79.738 | Mesh Avg F1@0.05: 75.308 |
|---|---|---|
![]() |
![]() |
![]() |
Despite achieving decent F1 scores, voxel reconstructions exhibit noticeable discontinuities around thin structures—likely a consequence of limited grid resolution and imprecisely learned occupancies that were filtered out by the marching cubes isovalue threshold.
The point cloud representation performs best both quantitatively and visually. Its lack of explicit connectivity grants it greater flexibility, allowing the model to better capture fine geometric variations and align closely with the ground truth shapes.
The mesh representation achieves slightly higher F1 scores than voxels and produces more continuous surfaces. However, its fixed initial topology (e.g., an icosphere) constrains deformation, making it difficult to accurately model complex or topologically distinct shapes, such as those containing holes or thin appendages.
In summary, each representation presents a trade-off between geometric fidelity and structural constraints: voxels offer regularity but suffer from resolution limits, meshes provide surface continuity but are topologically rigid, while point clouds balance simplicity and adaptability, yielding the most faithful reconstructions overall.
2.5 Analyse effects of hyperparams variations¶
| img | n_points = 1000, Avg F1@0.05: 79.738 | n_points = 2000, Avg F1@0.05: 84.403 | n_points = 8000 Avg F1@0.05: 88.409 | gt |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Models trained with larger n_points consistently achieved higher F1 scores. Notably, in the third row, the model with n_points = 1000 predicts a similar but incorrect chair shape (with armrests that shouldn’t exist), which becomes correctly reconstructed as n_points increases. However, this improvement comes with a trade-off—higher point densities also introduce unwanted clutter in empty regions, as seen in the second-row examples, where points begin to appear in areas that should remain empty.
2.6 Interpret your model¶
To better understand what visual cues the model relies on, I visualized the saliency map for some entries. The saliency map highlights regions in the input that most strongly influence the model’s prediction, effectively showing where the network looks at when reconstructing the 3D shape.
Interestingly, the highlighted regions tend to align with object edges and boundaries, suggesting that the model focuses on high-frequency features such as silhouettes and sharp transitions, which are typical cues for shape understanding. However, we also observe some attention in empty or background regions, likely due to the receptive field and global averaging behavior of the ResNet backbone, which aggregates spatial context beyond local object boundaries.
Overall, the visualization provides evidence that the model captures boundary information but also exhibits diffuse attention, hinting at opportunities for better spatial localization in future designs.
3. Exploring other architectures / datasets¶
3.1 Implicit network¶
| img | prediction | gt |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
3.2 Parametric network¶
| img | prediction | gt |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |




































































