To reproduce the results, check the Makefile. Example run commands: make help, make 2.1 etc.

1. Exploring loss functions¶

1.1. Fitting a voxel grid (5 points)¶

GT: Pred: Pred

1.2. Fitting a point cloud (5 points)¶

GT: Pred: Pred

1.3. Fitting a mesh (5 points)¶

GT: Pred: Pred

2. Reconstructing 3D from single view¶

2.1. Image to voxel grid (20 points)¶

RGB: RGB GT: Pred: Pred

2.2. Image to point cloud (20 points)¶

RGB: RGB GT: Pred: Pred

2.3. Image to mesh (20 points)¶

RGB: RGB GT: Pred: Pred

2.4. Quantitative comparisions(10 points)¶

Representation	Avg F1@0.05
Voxel	58.532
Point Cloud	85.484
Mesh	69.394

Point Cloud > Mesh > Voxel

Since the voxel grid is quantized at 32x32x32, it has a very low resolution compared to point clouds and meshes.
Meshes have connectivity and it is harder to model smooth regions and holes unless we increase the number of vertices.
Point clouds have a very low loss in a few epochs guided by chamfer loss. The points assume the structure pretty quickly and optimize further in subsequent epochs. This also leads to clustered regions due to the nature of the chamfer loss.

2.5. Analyse effects of hyperparams variations (10 points)¶

For Point Clouds:¶

hyperparam	Avg F1@0.05	Visualization
`n_points` = 1000	76.554
`n_points` = 5000	85.484

Increasing the number of points increases their ability to model the shape more intricately. This is clearly evident from the Avg F1@0.05 scores as well. I also tried to experiment with repulsion loss so that clustering is reduced in the output pointcloud. The repulsion loss helped to a certain extent.

2.6. Interpret your model (15 points)¶

For the single image to point cloud model, I tried to visualize the best, average and worst predicted models based on the F1@0.05. I visualized the ground truth and predicted pointclouds in a single space, to better understand the fit of the model. GT is in red and Pred is green.

sample	Avg F1@0.05	Visualization
Best	99.216
Average	85.484
Worst	6.087

This provides a good insight into how well the model performs and if it fails, what is the type of failure. The worst example is off from the ground truth by a translation and other finer details.

3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)¶

3.3 Extended dataset for training (10 points)¶

I tried to train the single view to point clouds model with n_points=5000 on the full dataset with 3 classes and 16k iterations.

Qualitative comparison¶

training type	Avg F1@0.05	Visualization
1 class `n_points` = 5000	94.077
3 class `n_points` = 5000	71.716
3 class `n_points` = 5000	other sampeles:
3 class `n_points` = 5000	other sampeles:
3 class `n_points` = 5000	other sampeles:

From the above samples, it is evident that the model has learnt diversity.

Quantitative comparison on the common 1 class test set¶

training type	Avg F1@0.05
1 class `n_points` = 5000	85.484
3 class `n_points` = 5000	85.623

We can see that quantitatively, the scores are almost similar since I trained for a simalar number of iterations (slightly higher).

3.1 Implicit network (10 points)¶

I created a simple implicit network that takes in the image latents and concatenates them with 3d point coordinates. I further tried 2 experiments with 20k iterations: randomly sampling (1) 5k and (2) 32^3 points in a [-1, 1] voxel grid. The results results weren't good, suggesting the need for a better architecture.