Assignment 2 Report


1.1 Voxel Representation

Definition:
The object is represented as a voxel grid inside a fixed-size cube. Each voxel encodes occupancy (0/1 or probability).

Loss Function:
We use Binary Cross Entropy (BCE) to measure prediction vs. ground truth voxels.

Voxel 3D
Figure: 3D voxel comparison (Pred vs GT)


1.2 Point Cloud Representation

Definition:
The object is represented as a set of 3D points capturing its geometry.

Loss Function:
We use Chamfer Distance, computed as the average nearest-neighbor distance between predicted and ground truth points.

Evaluation:
Visualization compares predicted point cloud and ground truth.

Point Cloud
Figure: Predicted vs. Ground Truth Point Clouds


1.3 Mesh Representation

Definition:
The object is represented by vertices and faces, forming a mesh surface.

Loss Function:
We use Laplacian Smoothness Loss to penalize large differences between neighboring vertices, encouraging smooth surfaces.

Evaluation:
We render predicted mesh and ground truth mesh side by side.

Mesh
Figure: Predicted vs. Ground Truth Mesh

2.1 Voxel Representation

Visualization Example

Input RGB → Predicted voxel isosurface → Ground-truth voxel.

Voxel eval example 1 Voxel eval example 2 Voxel eval example 3

2.2 Point Cloud Representation

Visualization Example

Input RGB → Predicted point cloud → Ground-truth point cloud.

Point eval example 1 Point eval example 2 Point eval example 3

2.3 Mesh Representation

Visualization Example

Input RGB → Predicted mesh → Ground-truth mesh.

Mesh eval example 1 Mesh eval example 2 Mesh eval example 3

2.4 Quantitative Comparisons

Quantitatively compare the F1 score of 3D reconstruction for meshes vs pointcloud vs voxelgrids.

Intuitive Explanation

From the F1–threshold curves, we see that all three methods improve as the threshold increases. However:

Conclusion: Mesh-based reconstruction yields the most accurate geometry, pointclouds offer a good trade-off between coverage and detail, and voxelgrids are limited by resolution quantization.

2.5 Analyse effects of hyperparams variations

vox_size 32 → 48

voxel resolution (vox_size) on 3D reconstruction quality by increasing it from 32 → 48*.

Voxel_48 eval example 1 Voxel_48 eval example 2 Voxel_48 eval example 3

eval voxel

F1-scores do not increase significantly when moving from 32 → 48.

However, higher vox_size improves visual detail capture: thin structures (e.g., chair legs, armrests) and sharper edges are reconstructed more faithfully.

The improvement is mostly qualitative: objects look closer to the ground truth, though the overall overlap metric (F1) does not reflect large changes.

Increasing vox_size comes with higher memory and compute costs.

n_points 1000 → 2000

1000-point sampling (np=1000) with an increased resolution of 2000 points (np=2000).

np_2000 eval example 1 np_2000 eval example 2 np_2000 eval example 3

eval voxel

While np=2000 provides more realistic and detailed reconstructions, the quantitative metric (F1-score) does not show a large improvement. This suggests that the metric may not fully reflect perceptual quality in point cloud prediction.

w_chamfer 0.01 → 0.05

w_chamfer=0.01 with a relaxed evaluation at w_chamfer=0.05.


wc_005 eval example 1
wc_005 eval example 2
wc_005 eval example 3

eval mesh wc005

2.6. Interpret your model

Voxels: 32 vs 48 — Error heatmap GIFs

Below we compare vox32 (left) vs vox48 (right) using rotating error visualizations.
Top block colors the GT surface by distance to the prediction (error_heat_*.gif).
Bottom block colors the Predicted points by distance to the GT surface (error_heat_pred_*.gif).

Predicted points colored by error

Example vox32 vox48
00
01
02
03
04
05

Takeaways (qualitative):

3.2 Parametric Network

Model Overview

Implement a parametric decoder (Parametric2Dto3D) that maps sampled 2D points (UV, in [-1,1]^2) and a global image feature vector to 3D coordinates.
At test time we sample N UV points on a canonical 2D domain (e.g., stratified or random) and predict their 3D positions to form a point cloud.

Evaluation Examples

Triptychs (RGB · Pred · GT)
example_00
example_01
example_02

Training Curve

Average F1 across thresholds
eval_param

Per-point Error Visualization (Pred → GT)

Example PNG GIF
0
1
2
3
4
5

Result Analysis

The parametric model recovers the global chair geometry and major surfaces, with concentrated low errors over large regions. Higher errors cluster around thin parts, sharp edges, and concavities, reflecting the difficulty of capturing high-frequency details from global features and a pointwise MLP. Increasing point count, adding positional encodings on UV, or incorporating local image features can reduce these localized errors and improve fine structures.