16-825 Assignment 2: Single View to 3D

Rodrigo Lopes Catto | rlopesca


1. Exploring loss functions

1.1. Fitting a voxel grid

Placeholder Image Placeholder Image

1.2. Fitting a point cloud

Placeholder Image Placeholder Image

1.3. Fitting a mesh

Placeholder Image Placeholder Image

2. Reconstructing 3D from single view

2.1. Image to voxel grid

Input Source Target
Input Image Source Image Target Image
Input Image Source Image Target Image
Input Image Source Image Target Image

2.2. Image to point cloud

Input Source Target
Input Image Source Image Target Image
Input Image Source Image Target Image
Input Image Source Image Target Image

2.3. Image to mesh

Input Source Target
Input Image Source Image Target Image
Input Image Source Image Target Image
Input Image Source Image Target Image

2.4. Quantitative comparisions

Placeholder Image Placeholder Image Placeholder Image

The quantitative comparison shows that the point cloud representation achieves the highest F1-score of approximately 77%, followed closely by meshes at around 75%, while voxels perform the worst at roughly 74% at the 0.05 threshold. This difference reflects how each representation encodes 3D geometry. Voxel grids discretize the space into fixed cubes, which limits spatial resolution and causes a loss of fine geometric details, resulting in lower reconstruction accuracy. Point clouds, in contrast, directly represent the object’s surface as a collection of points, allowing for higher precision and flexibility in capturing complex shapes without the limitations of discretization. Meshes offer a continuous surface representation through connected triangles, improving realism over voxels but sometimes losing accuracy due to errors in predicting vertex connectivity or surface topology. Overall, point clouds strike the best balance between geometric fidelity and representational simplicity, which explains their slightly superior quantitative performance in 3D reconstruction tasks.

2.5. Analyse effects of hyperparams variations

Input

Placeholder Image

Smooth 0.01

Placeholder Image

Smooth 0.1 (Baseline)

Placeholder Image

Smooth 0.5

Placeholder Image

Target

Placeholder Image

By varying the smoothness weight w_smooth across 0.01, 0.1, and 0.5, we observed a clear trade-off between geometric detail and surface regularity in the reconstructed meshes. With w_smooth = 0.01, the model produced sharper and more detailed shapes but introduced noticeable surface noise and irregularities. Increasing the weight to the baseline value of 0.1 resulted in the best balance, yielding smooth yet detailed meshes with consistent reconstruction quality. However, when w_smooth was raised to 0.5, the reconstructions became overly smooth and lost fine geometric features, leading to a drop in accuracy. Overall, a moderate smoothness weight around 0.1 proved most effective, maintaining both structural fidelity and visually coherent surfaces.

2.6. Interpret your model

Input Source Target
Input Image Source Image
Source Image PNG
Target Image
Input Image Source Image
Source Image PNG
Target Image

Instead of just showing the final recon and one score, I make a simple error heatmap on the predicted point cloud. For each predicted point, I find the closest point on the ground truth mesh and use that distance as the error. I normalize the errors, color the points with a colormap, and render a short rotating GIF. I also save a separate colorbar with real units so the colors are easy to read. This makes it obvious where the model struggles: thin legs and occluded parts usually light up, while big flat areas stay cool. It helps me see if the model is guessing hidden geometry, averaging across symmetry, or making parts too thick. It is not perfect since nearest neighbor ignores topology, but it is fast, clear, and more useful than a single number.


3. Exploring other architectures / datasets

Quantitative comparison:

Full Dataset on the left, Chairs Dataset on the right. Looking the at the differences in performance, the full one looks better overall, but this will be different when we look at the quantitative comparison below.

Full Dataset Evaluation Chairs Dataset Evaluation

Qualitative comparison:

Looking at the comparison between the two models, the one we trained on the the chairs seems to be more consistent with the recontruction and details of the chairs, while the full model seems to be more noisy and inconsistent. This is probably due to the fact that the full model has to learn to reconstruct a wider variety of shapes, while the chair model can focus on the specific features of chairs.

Input Source (Chair Model) Source (Full Model) Target
Input Image Source Image Source Image Target Image
Input Image Source Image Source Image Target Image

Even the examples of the new classes, such as airplanes and cars, do not hold as many details and features as the chairs trained on the chairs dataset, as seen below.

Input Source (Full Model) Target
Input Image Source Image Target Image
Input Image Source Image Target Image
Input Image Source Image Target Image