16-825 Assignment 2: Single View to 3D
Rodrigo Lopes Catto | rlopesca
1. Exploring loss functions
1.1. Fitting a voxel grid
1.2. Fitting a point cloud
1.3. Fitting a mesh
2. Reconstructing 3D from single view
2.1. Image to voxel grid
2.2. Image to point cloud
2.3. Image to mesh
2.4. Quantitative comparisions
The quantitative comparison shows that the point cloud representation achieves the highest F1-score of approximately 77%,
followed closely by meshes at around 75%, while voxels perform the worst at roughly 74% at the 0.05 threshold.
This difference reflects how each representation encodes 3D geometry. Voxel grids discretize the space into fixed cubes,
which limits spatial resolution and causes a loss of fine geometric details, resulting in lower reconstruction accuracy.
Point clouds, in contrast, directly represent the object’s surface as a collection of points, allowing for higher precision
and flexibility in capturing complex shapes without the limitations of discretization. Meshes offer a continuous surface
representation through connected triangles, improving realism over voxels but sometimes losing accuracy due to errors in
predicting vertex connectivity or surface topology. Overall, point clouds strike the best balance between geometric fidelity
and representational simplicity, which explains their slightly superior quantitative performance in 3D reconstruction tasks.
2.5. Analyse effects of hyperparams variations
Input
Smooth 0.01
Smooth 0.1 (Baseline)
Smooth 0.5
Target
By varying the smoothness weight w_smooth across 0.01, 0.1, and 0.5, we observed a clear trade-off
between geometric detail and surface regularity in the reconstructed meshes. With w_smooth = 0.01,
the model produced sharper and more detailed shapes but introduced noticeable surface noise and irregularities.
Increasing the weight to the baseline value of 0.1 resulted in the best balance, yielding smooth yet detailed meshes
with consistent reconstruction quality. However, when w_smooth was raised to 0.5, the reconstructions became overly
smooth and lost fine geometric features, leading to a drop in accuracy. Overall, a moderate smoothness weight around
0.1 proved most effective, maintaining both structural fidelity and visually coherent surfaces.
2.6. Interpret your model
Instead of just showing the final recon and one score, I make a simple error heatmap on the predicted point
cloud. For each predicted point, I find the closest point on the ground truth mesh and use that distance as the
error. I normalize the errors, color the points with a colormap, and render a short rotating GIF. I also save a
separate colorbar with real units so the colors are easy to read. This makes it obvious where the model struggles:
thin legs and occluded parts usually light up, while big flat areas stay cool. It helps me see if the model is guessing
hidden geometry, averaging across symmetry, or making parts too thick. It is not perfect since nearest neighbor ignores
topology, but it is fast, clear, and more useful than a single number.
3. Exploring other architectures / datasets
Quantitative comparison:
Full Dataset on the left, Chairs Dataset on the right. Looking the at the differences in performance, the full one looks better overall, but this will
be different when we look at the quantitative comparison below.
Qualitative comparison:
Looking at the comparison between the two models, the one we trained on the the chairs seems to be more consistent with the recontruction and details of the chairs,
while the full model seems to be more noisy and inconsistent. This is probably due to the fact that the full model has to learn to reconstruct a wider variety of shapes,
while the chair model can focus on the specific features of chairs.
| Input |
Source (Chair Model) |
Source (Full Model) |
Target |
 |
 |
 |
 |
 |
 |
 |
 |
Even the examples of the new classes, such as airplanes and cars, do not hold as many details and features as the chairs trained on the chairs dataset, as seen below.
| Input |
Source (Full Model) |
Target |
 |
 |
 |
 |
 |
 |
 |
 |
 |