Learning for 3D Vision: Assignment 2


Single View to 3D

CMU 16825 Learning for 3D Vision

Tushar Nayak [tusharn]

Q1. Exploring Loss Functions

1.1 Fitting a voxel grid

Ground Truth

target_vox

Optimized Voxel Grid

source_vox

1.2 Fitting a point cloud

Ground Truth

target_cloud

Optimized Voxel Grid

source_cloud

1.1 Fitting a mesh

Ground Truth

target_mesh

Optimized Mesh

source_mesh

Reconstructing 3D from Single View

Image to Voxel Grid

Chair

gt_image_1

Ground Truth: Mesh

model_mesh_t0

Ground Truth: Voxel

model_vox_t0

Predicted Voxel Grid

model_vox0

Sofa

gt_image_2

Ground Truth: Mesh

model_mesh_t1

Ground Truth: Voxel

model_vox_t1

Predicted Voxel Grid

model_vox1

Chair with armrest

gt_image_5

Ground Truth: Mesh

model_mesh_t2

Ground Truth: Voxel

model_vox_t2

Predicted Voxel Grid

model_vox2

Image to Point Cloud

Chair

gt_image_1

Ground Truth: Mesh

model_mesh_t0

Ground Truth: Point Cloud

model_cloud_t0

Predicted Point Cloud

model_cloud0

Sofa

gt_image_2

Ground Truth: Mesh

model_mesh_t1

Ground Truth: Point Cloud

model_cloud_t1

Predicted Point Cloud

model_cloud1

Chair with armrest

gt_image_5

Ground Truth: Mesh

model_mesh_t2

Ground Truth: Point Cloud

model_cloud_t2

Predicted Point Cloud

model_cloud2

Image to Mesh

Chair

gt_image_1

Ground Truth: Mesh

model_mesh_t0

Predicted Voxel Grid

model_mesh0

Sofa

gt_image_2

Ground Truth: Mesh

model_mesh_t1

Predicted Voxel Grid

model_mesh1

Chair with armrest

gt_image_5

Ground Truth: Mesh

model_mesh_t2

Predicted Voxel Grid

model_mesh2

Quantitative Comparison

The point cloud model achieved the strongest performance. Point clouds are easier to optimize because their points are independent and not bound by surface connectivity. This means that misplacing a single point has limited impact on the rest of the shape, enabling faster convergence and a simpler prediction process.

The mesh model performed better than the voxel model by numerical metrics, but was visually the least convincing. While vertex placement is generally reasonable for similar reasons to point clouds, the model struggles with predicting face orientations accurately maybe due to sampling size used in loss calculation which is smaller than the number of faces in both ground truth and predicted models, reducing loss informativeness. Additionally, its fixed spherical topology prevents accurate reconstruction of shapes with more complex surface structures, such as chairs with loops or holes.

The voxel model was worse than both point cloud and mesh. While it visually appears smoother and more complete than meshes, its low F1-score and low fine detail could be due to its coarse 32x32x32 resolution limits its ability to capture fine details.

F1-Score/Threshold and reported F1 Scores

for Voxel Grid (Score: 90.83)

eval_vox

for Point Cloud (Score: 95.98)

eval_point

for Mesh (Score: 93.14)

eval_mesh

Effects of hyperparameter variation

To combat the disjointedness of the mesh surfaces, 3 w_smooth parameters were attempted to get better results with values at 1, 3 and 5.

w_smooth = 1

eval_vox

eval_point

eval_mesh

Score: 91.255

w_smooth = 1

eval_vox

eval_point

eval_mesh

Score: 92.996

w_smooth = 1

eval_vox

eval_point

eval_mesh

Score: 92.996

Interpret your model

eval_mesh

It starts at maximum loss with points scattered randomly, then progressively tightens as optimization minimizes a shape/registration objective, revealing the target point cloud structure. Early frames show high dispersion and unstable correspondences. Midway, clusters stabilize as gradients guide points toward consistent nearest-neighbor or transport matches. At convergence, motion is small and the cloud aligns with the intended geometry, reflecting reduced symmetric error typical in point-cloud evaluations.

Implicit Network

eval_vox

eval_point

eval_mesh