3D Reconstruction from Single View

2. Reconstructing 3D from Single View

2.1 Image to Voxel Grid 20 points

Example 1

Input

Prediction

Ground Truth

Example 2

Input

Prediction

Ground Truth

Example 3

Input

Prediction

Ground Truth

2.2 Image to Point Cloud 20 points

Example 1

Input

Prediction

Ground Truth

Example 2

Input

Prediction

Ground Truth

Example 3

Input

Prediction

Ground Truth

2.3 Image to Mesh 20 points

Example 1

Input

Prediction

Ground Truth

Example 2

Input

Prediction

Ground Truth

Example 3

Input

Prediction

Ground Truth

2.4 Quantitative Comparisons 10 points

F1 score comparison across different 3D representations.

Analysis and Intuition

The F1-score comparison demonstrates that the model achieved nearly identical high performance (approaching an F1-score of 80) regardless of whether the output was represented as a Voxel, Point Cloud, or Mesh. The strong correlation between F1-score and Threshold across all plots indicates robust model learning, where the specific shape complexity doesn't critically stress the expressive power of any one format.

Efficiency and Scalability
While the final metric is similar, the Point Cloud representation offers distinct practical advantages. It is the most computationally efficient and scalable, as its memory and time complexity scale linearly with the number of points (O(N)). This contrasts sharply with the Voxel representation, which scales cubically (O(N 3 )), making high-resolution reconstructions infeasible due to extreme memory demands.

Point Clouds also excel at detail capture due to their adaptive, non-grid-bound nature, enabling effective training via sophisticated, differentiable losses like the Chamfer Distance. The Mesh representation, while offering explicit topology, involves more complex training pipelines and loss functions.

2.5 Hyperparameter Analysis 10 points

Effects of varying hyperparameters on model performance.

Hyperparameter Studied: Pointcloud Density

Methodology: I varied the point cloud density by adjusting the number of points predicted by the model. Initially, it was 1000 and later increased to 2000.

Pointcloud Density: 1000

Pointcloud Density: 2000

F1 Score: 1000

F1 Score: 2000

Conclusions

With density as 2000, the F1 Score increased by 1.5% compared to the 1000 density configuration. My analysis suggests that higher point cloud density allows for better representation of complex geometries, leading to improved model performance. But, in this case, I couldn't train for the same number of steps due to increased computational requirements.

Hyperparameter Studied: Pointcloud Density & Chamfer Distance weight

Methodology: I varied the point cloud density by adjusting the number of points as done above, and also experimented with different weights for the Chamfer Distance loss function. I made chamfer weight to 0.8 and increased smooth factor to 0.4.

F1 Score: 2000

F1 Score: 2000 + 0.8 Chamfer + 0.4 Smooth

Conclusions

As seen previously, increasing the points showed slightly improved F1 scores, indicating better model performance with higher point cloud density. But, by changing the Chamfer Distance weight and smooth factor, there was not much improvement observed. This suggested that slight change in chamfer weight does not significantly impact the overall performance. I plan to further investigate this by changing these parameters more drastically.

2.6 Model Interpretation 15 points

Voxel model analysis

Monte Carlo Dropout - 1000 steps

Monte Carlo Dropout - 60000 steps

Summary of Monte Carlo Dropout Visualization

This visualization uses Monte Carlo Dropout to assess prediction uncertainty in a single-view 3D reconstruction task by comparing the model's state at 1,000 steps (under-trained) and 60,000 steps (well-trained).

Key Observations by Training Stage:

1,000 Steps (Under-Trained, Left):

Mean Prediction: Sharp but unreliable structure with high contrast.
Uncertainty (Variance): Extremely high and uniform across all planes (bright yellow/orange everywhere). The model is essentially guessing randomly as it hasn't learned meaningful 3D patterns yet.

60,000 Steps (Well-Trained, Right):

Mean Prediction: Clear, well-defined L-shape structure with proper voxel occupancy.
Uncertainty (Variance): Low overall with strategic localization. High confidence in object interior (dark regions), with uncertainty concentrated only at boundaries and ambiguous regions (bright spots).

Primary Insight: From Uniform to Calibrated Uncertainty

The evolution from 1,000 to 60,000 steps demonstrates proper model calibration:

Early training: Uniform high uncertainty everywhere indicates the model hasn't learned what features matter for 3D reconstruction.
After training: Uncertainty becomes structured and localized, concentrated at:
- Object boundaries where voxel occupancy transitions occur
- Depth-ambiguous regions (particularly visible in XZ plane)

3. Exploring Other Architectures / Datasets

3.1 Implicit Network 10 points

Decoder that predicts occupancy values from 3D coordinates and image features.

Implementation Details

Input RGB

Occupancy prediction

Ground truth

3.2 Parametric Network 10 points

Implementation Details

Architecture:
1. Multi-patch MLP system with multiple independent decoders (e.g., 10 patches) Each patch: 2D latent (u,v) → 4-layer MLP (512 units) → 3D coordinates (x,y,z) Batch normalization and ReLU activations throughout

2. Trained on airplane mesh from ShapeNet dataset
Sampling Strategy:
Random (u,v) coordinates sampled from [0,1] uniform distribution
Points divided equally across all patch decoders
Training: Chamfer loss against target point cloud samples
Inference: Generate points by sampling random 2D coordinates through trained decoders
Preprocessing: Point clouds centered and normalized to unit sphere This approach allows multiple surface patches to collectively represent complex 3D shapes like aircraft with wings and fuselage.

Code can be found in `bonus_3_2.py`.

Ground truth

Reconstructed 3D points

3.3 Extended Dataset Training 10 points

Representation Chosen: Voxel

Training Setup	F1 Score (Chair)
Single Class (Chair)	76.2 %
Three Classes	63.0 %

Trained on one class

Trained on three classes

Loss Comparison

Input

Predicted

Ground Truth

Input

Predicted

Ground Truth

Analysis: Single Class vs Multi-Class Training

Quantitative Observations: F1 score decreased for chairs but was very high for aeroplanes (about 86.0 %). My understanding is that when we add more classes and train it for similar number of steps, the model may struggle to fit to the data due to increased complexity. Since chairs are harder to learn, the models is clearly struggling to learn the representation in the same number of steps.

Loss Curve Observations: The loss curve for the multi-class model shows a more erratic pattern compared to the single-class model, indicating that the multi-class model is having difficulty converging. It can also be seen that the loss for single class converges very quickly.

Qualitative Observations: The drop in F1 score is also reflective of the visualization differences between the single-class and multi-class models. The single-class model appears to have a more focused and accurate representation of chairs, while the multi-class model struggles with ambiguity and misclassification.

Conclusions: Multiple objects with diversity require more training data and potentially more complex models to achieve similar performance to a model trained on a single class.

16-825 Assignment 2: Single View to 3D

1. Exploring Loss Functions

1.1 Fitting a Voxel Grid 5 points

1.2 Fitting a Point Cloud 5 points

1.3 Fitting a Mesh 5 points

2. Reconstructing 3D from Single View

2.1 Image to Voxel Grid 20 points

2.2 Image to Point Cloud 20 points

2.3 Image to Mesh 20 points

2.4 Quantitative Comparisons 10 points

Analysis and Intuition

2.5 Hyperparameter Analysis 10 points

Hyperparameter Studied: Pointcloud Density

Conclusions

Hyperparameter Studied: Pointcloud Density & Chamfer Distance weight

Conclusions

2.6 Model Interpretation 15 points

Summary of Monte Carlo Dropout Visualization

Key Observations by Training Stage:

Primary Insight: From Uniform to Calibrated Uncertainty

3. Exploring Other Architectures / Datasets

3.1 Implicit Network 10 points

Implementation Details

3.2 Parametric Network 10 points

Implementation Details

3.3 Extended Dataset Training 10 points

Representation Chosen: Voxel

Analysis: Single Class vs Multi-Class Training