Learning for 3D Vision: Assignment 2
Single View to 3D
CMU 16825 Learning for 3D Vision
Tushar Nayak [tusharn]
Q1. Exploring Loss Functions
1.1 Fitting a voxel grid
Ground Truth
Optimized Voxel Grid
1.2 Fitting a point cloud
Ground Truth
Optimized Voxel Grid
1.1 Fitting a mesh
Ground Truth
Optimized Mesh
Reconstructing 3D from Single View
Image to Voxel Grid
Chair
Ground Truth: Mesh
Ground Truth: Voxel
Predicted Voxel Grid
Sofa
Ground Truth: Mesh
Ground Truth: Voxel
Predicted Voxel Grid
Chair with armrest
Ground Truth: Mesh
Ground Truth: Voxel
Predicted Voxel Grid
Image to Point Cloud
Chair
Ground Truth: Mesh
Ground Truth: Point Cloud
Predicted Point Cloud
Sofa
Ground Truth: Mesh
Ground Truth: Point Cloud
Predicted Point Cloud
Chair with armrest
Ground Truth: Mesh
Ground Truth: Point Cloud
Predicted Point Cloud
Image to Mesh
Chair
Ground Truth: Mesh
Predicted Voxel Grid
Sofa
Ground Truth: Mesh
Predicted Voxel Grid
Chair with armrest
Ground Truth: Mesh
Predicted Voxel Grid
Quantitative Comparison
The point cloud model achieved the strongest performance. Point clouds are easier to optimize because their points are independent and not bound by surface connectivity. This means that misplacing a single point has limited impact on the rest of the shape, enabling faster convergence and a simpler prediction process.
The mesh model performed better than the voxel model by numerical metrics, but was visually the least convincing. While vertex placement is generally reasonable for similar reasons to point clouds, the model struggles with predicting face orientations accurately maybe due to sampling size used in loss calculation which is smaller than the number of faces in both ground truth and predicted models, reducing loss informativeness. Additionally, its fixed spherical topology prevents accurate reconstruction of shapes with more complex surface structures, such as chairs with loops or holes.
The voxel model was worse than both point cloud and mesh. While it visually appears smoother and more complete than meshes, its low F1-score and low fine detail could be due to its coarse 32x32x32 resolution limits its ability to capture fine details.
F1-Score/Threshold and reported F1 Scores
for Voxel Grid (Score: 90.83)
for Point Cloud (Score: 95.98)
for Mesh (Score: 93.14)
Effects of hyperparameter variation
To combat the disjointedness of the mesh surfaces, 3 w_smooth parameters were attempted to get better results with values at 1, 3 and 5.
w_smooth = 1
Score: 91.255
w_smooth = 1
Score: 92.996
w_smooth = 1
Score: 92.996
Interpret your model
It starts at maximum loss with points scattered randomly, then progressively tightens as optimization minimizes a shape/registration objective, revealing the target point cloud structure. Early frames show high dispersion and unstable correspondences. Midway, clusters stabilize as gradients guide points toward consistent nearest-neighbor or transport matches. At convergence, motion is small and the cloud aligns with the intended geometry, reflecting reduced symmetric error typical in point-cloud evaluations.
Implicit Network