To reproduce the results, check the Makefile. Example run commands:
make help, make 2.1 etc.
1. Exploring loss functions¶
1.1. Fitting a voxel grid (5 points)¶
GT:
Pred: 
1.2. Fitting a point cloud (5 points)¶
GT:
Pred: 
1.3. Fitting a mesh (5 points)¶
GT:
Pred: 
2. Reconstructing 3D from single view¶
2.1. Image to voxel grid (20 points)¶
RGB:
GT:
Pred: 
RGB:
GT:
Pred: 
RGB:
GT:
Pred: 
2.2. Image to point cloud (20 points)¶
RGB:
GT:
Pred: 
RGB:
GT:
Pred: 
RGB:
GT:
Pred: 
2.3. Image to mesh (20 points)¶
RGB:
GT:
Pred: 
RGB:
GT:
Pred: 
RGB:
GT:
Pred: 
2.4. Quantitative comparisions(10 points)¶
| Representation | Avg F1@0.05 |
|---|---|
| Voxel | 58.532 |
| Point Cloud | 85.484 |
| Mesh | 69.394 |
Point Cloud > Mesh > Voxel
- Since the voxel grid is quantized at 32x32x32, it has a very low resolution compared to point clouds and meshes.
- Meshes have connectivity and it is harder to model smooth regions and holes unless we increase the number of vertices.
- Point clouds have a very low loss in a few epochs guided by chamfer loss. The points assume the structure pretty quickly and optimize further in subsequent epochs. This also leads to clustered regions due to the nature of the chamfer loss.
2.5. Analyse effects of hyperparams variations (10 points)¶
For Point Clouds:¶
| hyperparam | Avg F1@0.05 | Visualization |
|---|---|---|
n_points = 1000 |
76.554 | ![]() |
n_points = 5000 |
85.484 | ![]() |
Increasing the number of points increases their ability to model the shape more intricately. This is clearly evident from the Avg F1@0.05 scores as well. I also tried to experiment with repulsion loss so that clustering is reduced in the output pointcloud. The repulsion loss helped to a certain extent.
2.6. Interpret your model (15 points)¶
For the single image to point cloud model, I tried to visualize the best, average and worst predicted models based on the F1@0.05.
I visualized the ground truth and predicted pointclouds in a single space, to better understand the fit of the model. GT is in red and Pred is green.
| sample | Avg F1@0.05 | Visualization |
|---|---|---|
| Best | 99.216 | ![]() |
| Average | 85.484 | ![]() |
| Worst | 6.087 | ![]() |
This provides a good insight into how well the model performs and if it fails, what is the type of failure. The worst example is off from the ground truth by a translation and other finer details.
3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)¶
3.3 Extended dataset for training (10 points)¶
I tried to train the single view to point clouds model with n_points=5000 on the full dataset with 3 classes and 16k iterations.
Qualitative comparison¶
| training type | Avg F1@0.05 | Visualization |
|---|---|---|
1 class n_points = 5000 |
94.077 | ![]() |
3 class n_points = 5000 |
71.716 | ![]() |
3 class n_points = 5000 |
other sampeles: | ![]() |
3 class n_points = 5000 |
other sampeles: | ![]() |
3 class n_points = 5000 |
other sampeles: | ![]() |
From the above samples, it is evident that the model has learnt diversity.
Quantitative comparison on the common 1 class test set¶
| training type | Avg F1@0.05 |
|---|---|
1 class n_points = 5000 |
85.484 |
3 class n_points = 5000 |
85.623 |
We can see that quantitatively, the scores are almost similar since I trained for a simalar number of iterations (slightly higher).
3.1 Implicit network (10 points)¶
I created a simple implicit network that takes in the image latents and concatenates them with 3d point coordinates. I further tried 2 experiments with 20k iterations: randomly sampling (1) 5k and (2) 32^3 points in a [-1, 1] voxel grid. The results results weren't good, suggesting the need for a better architecture.









