16-825 Assignment 2: Single View to 3D

Andrew ID: rajathc

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Ground Truth	Fit Result

1.2. Fitting a point cloud (5 points)

Ground Truth	Fit Result

1.3. Fitting a mesh (5 points)

Ground Truth	Fit Result

2. Reconstructing 3D from single view

Q2.1 Image to Voxel Grid

Input Image	Ground Truth	Prediction

2.2. Image to point cloud (20 points)

Input Image	Ground Truth	Prediction

2.3. Image to mesh (20 points)

Input Image	Ground Truth	Prediction

2.4. Quantitative comparisions (10 points)

Voxel - Avg F1@ 0.05 = 73.681	Point Cloud - Avg F1@ 0.05 = 78.512	Mesh - Avg F1@ 0.05 = 73.060

The F1-score curves show that the Point Cloud method outperforms both Voxel and Mesh representations across all thresholds, achieving the highest average F1@0.05 score of 78.512. This suggests that point cloud-based reconstructions can better learn fine details, while voxel and mesh methods likely show lower accuracy due to discretization and surface approximation respectively.

2.5. Analyse effects of hyperparams variations (10 points)

Hyperparameter Tuned: n_points: 300 points, 1000 points and 3000 points.

Quantitative Results

n_points	F1 Score @ 0.05
300	65.363
1000	78.542
3000	85.655

Qualitative Results

Input Image	Ground Truth - 3000 Points	Prediction - 300 Points	Prediction - 1000 Points	Prediction - 3000 Points

Increasing the n_points hyperparameter significantly improves performance. As the number of points increases from 300 to 3000, the F1 Score @ 0.05 rises from 65.36 to 85.65, indicating better precision and recall in predictions. Qualitatively, predictions with higher n_points more closely match the ground truth, showing finer boundaries and fewer artifacts.

2.6. Interpret your model (15 points)

I wanted to see how robust my model was to noise. So I added different amounts of gaussian noise to see how it effects results.

Quantitative Results

Noise Standard Deviation	F1 Score @ 0.05
No noise	78.542
0.05	71.396
0.1	67.886
0.2	64.101

Qualitative Results

0.05 noise_std Image	0.05 noise_std Prediction	0.1 noise_std Image	0.1 noise_std Prediction	0.2 noise_std Image	0.2 noise_std Prediction	GT Point Cloud

The model’s performance degrades as Gaussian noise increases, with the F1 score dropping from 78.542 (no noise) to 64.101 at 0.2 noise standard deviation, indicating reduced robustness to noise. Qualitatively, the object boundaries—particularly visible in features like the legs of the chair—become noticeably blurrier and less distinct as noise increases.

3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)

3.3 Extended dataset for training (10 points)

Quantitative Results

3 Classes - Avg F1@ 0.05 = 87.304 (Airplane, Car, Chair)	Single Class - Avg F1@ 0.05 = 78.512 (Chair)

Qualitative Results

Qualitative results of the Image to point cloud model trained on multiple classes:

Input Image	Ground Truth	Prediction

Training on multiple classes improves performance, as shown by the higher average F1 score (87.30 vs. 78.51). Additionally, models trained on three classes produce more structurally consistent and diverse 3D outputs, suggesting better generalization and shape understanding. So we see that our model has sufficient capacity to learn from muliple data classes at the same time.