16-825 Assignment 2: Single View to 3D

Andrew ID: rajathc

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Ground TruthFit Result

1.2. Fitting a point cloud (5 points)

Ground TruthFit Result

1.3. Fitting a mesh (5 points)

Ground TruthFit Result

2. Reconstructing 3D from single view

Q2.1 Image to Voxel Grid

Input ImageGround TruthPrediction

2.2. Image to point cloud (20 points)

Input ImageGround TruthPrediction

2.3. Image to mesh (20 points)

Input ImageGround TruthPrediction

2.4. Quantitative comparisions (10 points)

Voxel - Avg F1@ 0.05 = 73.681 Point Cloud - Avg F1@ 0.05 = 78.512 Mesh - Avg F1@ 0.05 = 73.060

The F1-score curves show that the Point Cloud method outperforms both Voxel and Mesh representations across all thresholds, achieving the highest average F1@0.05 score of 78.512. This suggests that point cloud-based reconstructions can better learn fine details, while voxel and mesh methods likely show lower accuracy due to discretization and surface approximation respectively.

2.5. Analyse effects of hyperparams variations (10 points)

Hyperparameter Tuned: n_points: 300 points, 1000 points and 3000 points.

Quantitative Results

n_pointsF1 Score @ 0.05
30065.363
100078.542
300085.655

Qualitative Results

Input Image Ground Truth - 3000 Points Prediction - 300 Points Prediction - 1000 Points Prediction - 3000 Points

Increasing the n_points hyperparameter significantly improves performance. As the number of points increases from 300 to 3000, the F1 Score @ 0.05 rises from 65.36 to 85.65, indicating better precision and recall in predictions. Qualitatively, predictions with higher n_points more closely match the ground truth, showing finer boundaries and fewer artifacts.

2.6. Interpret your model (15 points)

I wanted to see how robust my model was to noise. So I added different amounts of gaussian noise to see how it effects results.

Quantitative Results

Noise Standard DeviationF1 Score @ 0.05
No noise78.542
0.0571.396
0.167.886
0.264.101

Qualitative Results

0.05 noise_std Image 0.05 noise_std Prediction 0.1 noise_std Image 0.1 noise_std Prediction 0.2 noise_std Image 0.2 noise_std Prediction GT Point Cloud

The model’s performance degrades as Gaussian noise increases, with the F1 score dropping from 78.542 (no noise) to 64.101 at 0.2 noise standard deviation, indicating reduced robustness to noise. Qualitatively, the object boundaries—particularly visible in features like the legs of the chair—become noticeably blurrier and less distinct as noise increases.

3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)

3.3 Extended dataset for training (10 points)

Quantitative Results

3 Classes - Avg F1@ 0.05 = 87.304
(Airplane, Car, Chair)
Single Class - Avg F1@ 0.05 = 78.512
(Chair)

Qualitative Results

Qualitative results of the Image to point cloud model trained on multiple classes:

Input ImageGround TruthPrediction

Training on multiple classes improves performance, as shown by the higher average F1 score (87.30 vs. 78.51). Additionally, models trained on three classes produce more structurally consistent and diverse 3D outputs, suggesting better generalization and shape understanding. So we see that our model has sufficient capacity to learn from muliple data classes at the same time.