16-825 Assignment 2: Single View to 3D
Andrew ID: rajathc
1. Exploring loss functions
1.1. Fitting a voxel grid (5 points)
| Ground Truth | Fit Result |
 |
 |
1.2. Fitting a point cloud (5 points)
| Ground Truth | Fit Result |
 |
 |
1.3. Fitting a mesh (5 points)
| Ground Truth | Fit Result |
 |
 |
2. Reconstructing 3D from single view
Q2.1 Image to Voxel Grid
| Input Image | Ground Truth | Prediction |
 |
 |
 |
 |
 |
 |
 |
 |
 |
2.2. Image to point cloud (20 points)
| Input Image | Ground Truth | Prediction |
 |
 |
 |
 |
 |
 |
 |
 |
 |
2.3. Image to mesh (20 points)
| Input Image | Ground Truth | Prediction |
 |
 |
 |
 |
 |
 |
 |
 |
 |
2.4. Quantitative comparisions (10 points)
| Voxel - Avg F1@ 0.05 = 73.681 |
Point Cloud - Avg F1@ 0.05 = 78.512 |
Mesh - Avg F1@ 0.05 = 73.060 |
 |
 |
 |
The F1-score curves show that the Point Cloud method outperforms both Voxel and Mesh representations across all thresholds, achieving the highest average F1@0.05 score of 78.512. This suggests that point cloud-based reconstructions can better learn fine details, while voxel and mesh methods likely show lower accuracy due to discretization and surface approximation respectively.
2.5. Analyse effects of hyperparams variations (10 points)
Hyperparameter Tuned: n_points: 300 points, 1000 points and 3000 points.
Quantitative Results
| n_points | F1 Score @ 0.05 |
| 300 | 65.363 |
| 1000 | 78.542 |
| 3000 | 85.655 |
Qualitative Results
| Input Image |
Ground Truth - 3000 Points |
Prediction - 300 Points |
Prediction - 1000 Points |
Prediction - 3000 Points |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
Increasing the n_points hyperparameter significantly improves performance. As the number of points increases from 300 to 3000, the F1 Score @ 0.05 rises from 65.36 to 85.65, indicating better precision and recall in predictions. Qualitatively, predictions with higher n_points more closely match the ground truth, showing finer boundaries and fewer artifacts.
2.6. Interpret your model (15 points)
I wanted to see how robust my model was to noise. So I added different amounts of gaussian noise to see how it effects results.
Quantitative Results
| Noise Standard Deviation | F1 Score @ 0.05 |
| No noise | 78.542 |
| 0.05 | 71.396 |
| 0.1 | 67.886 |
| 0.2 | 64.101 |
Qualitative Results
| 0.05 noise_std Image |
0.05 noise_std Prediction |
0.1 noise_std Image |
0.1 noise_std Prediction |
0.2 noise_std Image |
0.2 noise_std Prediction |
GT Point Cloud |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
The model’s performance degrades as Gaussian noise increases, with the F1 score dropping from 78.542 (no noise) to 64.101 at 0.2 noise standard deviation, indicating reduced robustness to noise. Qualitatively, the object boundaries—particularly visible in features like the legs of the chair—become noticeably blurrier and less distinct as noise increases.
3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)
3.3 Extended dataset for training (10 points)
Quantitative Results
3 Classes - Avg F1@ 0.05 = 87.304 (Airplane, Car, Chair) |
Single Class - Avg F1@ 0.05 = 78.512 (Chair) |
 |
 |
Qualitative Results
Qualitative results of the Image to point cloud model trained on multiple classes:
| Input Image | Ground Truth | Prediction |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
Training on multiple classes improves performance, as shown by the higher average F1 score (87.30 vs. 78.51). Additionally, models trained on three classes produce more structurally consistent and diverse 3D outputs, suggesting better generalization and shape understanding. So we see that our model has sufficient capacity to learn from muliple data classes at the same time.