To reproduce the results, check the Makefile. Example run commands: make help, make 2.1 etc.

1. Exploring loss functions¶

1.1. Fitting a voxel grid (5 points)¶

GT: GT Pred: Pred

1.2. Fitting a point cloud (5 points)¶

GT: GT Pred: Pred

1.3. Fitting a mesh (5 points)¶

GT: GT Pred: Pred

2. Reconstructing 3D from single view¶

2.1. Image to voxel grid (20 points)¶

RGB: RGB GT: GT Pred: Pred

RGB: RGB GT: GT Pred: Pred

RGB: RGB GT: GT Pred: Pred

2.2. Image to point cloud (20 points)¶

RGB: RGB GT: GT Pred: Pred

RGB: RGB GT: GT Pred: Pred

RGB: RGB GT: GT Pred: Pred

2.3. Image to mesh (20 points)¶

RGB: RGB GT: GT Pred: Pred

RGB: RGB GT: GT Pred: Pred

RGB: RGB GT: GT Pred: Pred

2.4. Quantitative comparisions(10 points)¶

Representation Avg F1@0.05
Voxel 58.532
Point Cloud 85.484
Mesh 69.394

Point Cloud > Mesh > Voxel

  • Since the voxel grid is quantized at 32x32x32, it has a very low resolution compared to point clouds and meshes.
  • Meshes have connectivity and it is harder to model smooth regions and holes unless we increase the number of vertices.
  • Point clouds have a very low loss in a few epochs guided by chamfer loss. The points assume the structure pretty quickly and optimize further in subsequent epochs. This also leads to clustered regions due to the nature of the chamfer loss.

2.5. Analyse effects of hyperparams variations (10 points)¶

For Point Clouds:¶

hyperparam Avg F1@0.05 Visualization
n_points = 1000 76.554 Pred
n_points = 5000 85.484 Pred

Increasing the number of points increases their ability to model the shape more intricately. This is clearly evident from the Avg F1@0.05 scores as well. I also tried to experiment with repulsion loss so that clustering is reduced in the output pointcloud. The repulsion loss helped to a certain extent.

2.6. Interpret your model (15 points)¶

For the single image to point cloud model, I tried to visualize the best, average and worst predicted models based on the F1@0.05. I visualized the ground truth and predicted pointclouds in a single space, to better understand the fit of the model. GT is in red and Pred is green.

sample Avg F1@0.05 Visualization
Best 99.216 Pred
Average 85.484 Pred
Worst 6.087 Pred

This provides a good insight into how well the model performs and if it fails, what is the type of failure. The worst example is off from the ground truth by a translation and other finer details.

3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)¶

3.3 Extended dataset for training (10 points)¶

I tried to train the single view to point clouds model with n_points=5000 on the full dataset with 3 classes and 16k iterations.

Qualitative comparison¶

training type Avg F1@0.05 Visualization
1 class n_points = 5000 94.077 Pred
3 class n_points = 5000 71.716 Pred
3 class n_points = 5000 other sampeles: Pred
3 class n_points = 5000 other sampeles: Pred
3 class n_points = 5000 other sampeles: Pred

From the above samples, it is evident that the model has learnt diversity.

Quantitative comparison on the common 1 class test set¶

training type Avg F1@0.05
1 class n_points = 5000 85.484
3 class n_points = 5000 85.623

We can see that quantitatively, the scores are almost similar since I trained for a simalar number of iterations (slightly higher).

3.1 Implicit network (10 points)¶

I created a simple implicit network that takes in the image latents and concatenates them with 3d point coordinates. I further tried 2 experiments with 20k iterations: randomly sampling (1) 5k and (2) 32^3 points in a [-1, 1] voxel grid. The results results weren't good, suggesting the need for a better architecture.