16-825 Assignment 2: Single View to 3D¶

1. Exploring loss functions¶

1.1. Fitting a voxel grid (5 points)¶

In [ ]:
%run fit_data.py --type 'vox'
No description has been provided for this image
Source Voxel
No description has been provided for this image
Target Voxel

1.2. Fitting a point cloud (5 points)¶

In [ ]:
%run fit_data.py --type 'point'
No description has been provided for this image
Source Cloud
No description has been provided for this image
Target Cloud

1.3. Fitting a mesh (5 points)¶

In [ ]:
%run fit_data.py --type 'mesh'
No description has been provided for this image
Source Mesh
No description has been provided for this image
Target Mesh

2. Reconstructing 3D from single view¶

2.1. Image to voxel grid (20 points)¶

In [ ]:
%run python eval_model.py --type 'vox' --load_checkpoint
Input Image Ground Truth Model Prediction
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image

2.2. Image to point cloud (20 points)¶

In [ ]:
%run python eval_model.py --type 'point' --load_checkpoint
Input Image Ground Truth Model Prediction
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image

2.3. Image to mesh (20 points)¶

In [ ]:
%run python eval_model.py --type 'mesh' --load_checkpoint
Input Image Ground Truth Model Prediction
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image

2.4. Quantitative comparisions(10 points)¶

Voxels¶

The image to 3D voxel representation model achieves an average F1 score of 80 after 30k iterations, and as seen in the output images it captures global 3D structures well but suffers while capturing fine detail due to high memory usage.

No description has been provided for this image

Point Clouds¶

The image to 3D point cloud representation model achieves an average F1 score of 83 after only 5k iterations, and is efficient at modelling overall shape of the object but suffers with continuous surface representation.

No description has been provided for this image

Meshes¶

The image to 3D Mesh representation model achieves an average F1 score of 78 after 35k iterations modelling the geometry well but suffering by adding artifacts not belonging to the GT image.

No description has been provided for this image

2.5. Analyse effects of hyperparams variations (10 points)¶

In [ ]:
%run eval_model.py --type 'point' --hyperparams_variation

Analysis of Sampling Density (n_points)¶

The decision to increase the number of sampled points in the Point Cloud model from 1k to 2.5k yielded a significant F1 score improvement, rising from an average of 83 to 90. This increase is primarily attributed to better surface coverage and the resulting stability of the nearest-neighbor calculations, which reduced the number of false negatives.

Specifically, denser sampling enables the reconstructed point cloud to fill gaps missed by sparse sampling, thereby directly improving recall. Furthermore, a denser point cloud ensures that the nearest-neighbor match for each Ground Truth (GT) point is more reliable, as the GT point is much more likely to be in close proximity to a predicted output point. This conclusive result confirms that, for point cloud representations, sampling density is a primary driver of geometric accuracy and the overall F1 metric.

No description has been provided for this image
Input Image Ground Truth Model Prediction n_pts = 2500 Model Prediction n_pts = 1000
No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image No description has been provided for this image

2.6. Interpret your model (15 points)¶

In [ ]:
%run eval_model.py --type 'vox' --load_checkpoint --my_viz
%run eval_model.py --type 'point' --load_checkpoint --my_viz
%run eval_model.py --type 'mesh' --load_checkpoint --my_viz

I have compiled a few GIFs to explain how the three models (Voxels, Point Clouds, and Meshes) model 3D structures differently. These GIFs illustrate the reconstruction process by starting from a point in space, moving towards the reconstructed object, and then passing through it. This visualization technique effectively demonstrates how each representation handles geometry, particularly in the presence of holes and gaps within the Ground Truth (GT) 3D object.


Voxel Model Analysis¶

The Voxel model, while effective for capturing overall volumetric occupancy, struggles significantly with representing fine details, especially gaps and holes. As demonstrated in the examples below, the reconstructed voxel grids often fill in or blur these negative spaces, failing to precisely capture the object's geometry. This limitation stems from its fundamental nature as a discretized grid: voxels represent the presence or absence of material within fixed-size cubic cells. Modeling intricate holes would require an impractically high resolution (small voxel size), leading to excessive memory usage and computational cost. Thus, at typical resolutions, the voxel model prioritizes solid volume over detailed negative space.

Input Image Ground Truth (Mesh) Model Prediction (Mesh)
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image

Point Cloud Model Analysis¶

The Point Cloud model demonstrates a superior ability to represent gaps and holes, as evidenced by the examples. For instance, in the chair's arms, the model accurately predicts and preserves the gap, unlike the voxel and mesh representations. Even where the overall reconstruction might not be flawless, point clouds provide a much closer approximation to the GT geometry and significantly outperform voxels in capturing complex shapes. This strength lies in the discrete, unordered nature of point clouds; they represent surfaces as a collection of individual points, naturally allowing for empty space between them without requiring a high-resolution grid. This makes them inherently more flexible for modeling non-manifold or disconnected surfaces and intricate negative spaces.

Input Image Ground Truth (Mesh) Model Prediction (Mesh)
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image

Mesh Model Analysis¶

The Mesh model generally predicts the overall geometry of shapes well, offering a smooth and continuous surface reconstruction that often surpasses voxels. However, it too typically struggles with accurately representing gaps and holes. While capturing the broad structure of an object, meshes, being composed of connected vertices and faces, tend to either "seal off" or create erroneous connections across internal voids. This limitation arises because traditional mesh generation often prioritizes a manifold, closed surface. Accurately modeling holes requires careful topological manipulation and a sufficiently dense vertex distribution around the void, which can be challenging for a neural network to infer perfectly from a 2D image without introducing artifacts or closing off intended gaps.

Input Image Ground Truth (Mesh) Model Prediction (Mesh)
No description has been provided for this image No description has been provided for this image No description has been provided for this image
No description has been provided for this image No description has been provided for this image No description has been provided for this image

3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)¶

3.2 Parametric network (10 points)¶

I modeled a MLP to take a sampled 2D input (x,y) points as input and to learn a parametric function z = sin(pix) * cos(piy) and output its respective point in the new 3D space.

Training for 10,000 iterations on 1,000 points ensured full convergence, yielding a highly optimized model with a very low Mean Squared Error (MSE).

In [1]:
%run Parametric_Network.py
Input data shape: torch.Size([2000, 2])
Target data shape: torch.Size([2000, 3])
Training network...
Epoch 0, Loss: 0.308620
Epoch 100, Loss: 0.006926
Epoch 200, Loss: 0.001367
Epoch 300, Loss: 0.000403
Epoch 400, Loss: 0.000197
Epoch 500, Loss: 0.000115
Epoch 600, Loss: 0.000082
Epoch 700, Loss: 0.000140
Epoch 800, Loss: 0.000059
Epoch 900, Loss: 0.000049
Epoch 1000, Loss: 0.000046
Epoch 1100, Loss: 0.000153
Epoch 1200, Loss: 0.000036
Epoch 1300, Loss: 0.000034
Epoch 1400, Loss: 0.000033
Epoch 1500, Loss: 0.000352
Epoch 1600, Loss: 0.000028
Epoch 1700, Loss: 0.000028
Epoch 1800, Loss: 0.000040
Epoch 1900, Loss: 0.000024
Epoch 2000, Loss: 0.000031
Epoch 2100, Loss: 0.000022
Epoch 2200, Loss: 0.000021
Epoch 2300, Loss: 0.000220
Epoch 2400, Loss: 0.000019
Epoch 2500, Loss: 0.000042
Epoch 2600, Loss: 0.000018
Epoch 2700, Loss: 0.000017
Epoch 2800, Loss: 0.000026
Epoch 2900, Loss: 0.000020
Epoch 3000, Loss: 0.000016
Epoch 3100, Loss: 0.000342
Epoch 3200, Loss: 0.000014
Epoch 3300, Loss: 0.000028
Epoch 3400, Loss: 0.000013
Epoch 3500, Loss: 0.000017
Epoch 3600, Loss: 0.000012
Epoch 3700, Loss: 0.000013
Epoch 3800, Loss: 0.000022
Epoch 3900, Loss: 0.000011
Epoch 4000, Loss: 0.000011
Epoch 4100, Loss: 0.000011
Epoch 4200, Loss: 0.000068
Epoch 4300, Loss: 0.000016
Epoch 4400, Loss: 0.000011
Epoch 4500, Loss: 0.000010
Epoch 4600, Loss: 0.000012
Epoch 4700, Loss: 0.000010
Epoch 4800, Loss: 0.000009
Epoch 4900, Loss: 0.000015
Epoch 5000, Loss: 0.000009
Epoch 5100, Loss: 0.000022
Epoch 5200, Loss: 0.000008
Epoch 5300, Loss: 0.000009
Epoch 5400, Loss: 0.000190
Epoch 5500, Loss: 0.000008
Epoch 5600, Loss: 0.000009
Epoch 5700, Loss: 0.000007
Epoch 5800, Loss: 0.000008
Epoch 5900, Loss: 0.000019
Epoch 6000, Loss: 0.000007
Epoch 6100, Loss: 0.000115
Epoch 6200, Loss: 0.000007
Epoch 6300, Loss: 0.000563
Epoch 6400, Loss: 0.000007
Epoch 6500, Loss: 0.000007
Epoch 6600, Loss: 0.000007
Epoch 6700, Loss: 0.000312
Epoch 6800, Loss: 0.000007
Epoch 6900, Loss: 0.000044
Epoch 7000, Loss: 0.000006
Epoch 7100, Loss: 0.000006
Epoch 7200, Loss: 0.000007
Epoch 7300, Loss: 0.000006
Epoch 7400, Loss: 0.000009
Epoch 7500, Loss: 0.000024
Epoch 7600, Loss: 0.000006
Epoch 7700, Loss: 0.000008
Epoch 7800, Loss: 0.000013
Epoch 7900, Loss: 0.000129
Epoch 8000, Loss: 0.000006
Epoch 8100, Loss: 0.000012
Epoch 8200, Loss: 0.000006
Epoch 8300, Loss: 0.000006
Epoch 8400, Loss: 0.000006
Epoch 8500, Loss: 0.000007
Epoch 8600, Loss: 0.000006
Epoch 8700, Loss: 0.000006
Epoch 8800, Loss: 0.000013
Epoch 8900, Loss: 0.000009
Epoch 9000, Loss: 0.000008
Epoch 9100, Loss: 0.000005
Epoch 9200, Loss: 0.000052
Epoch 9300, Loss: 0.000005
Epoch 9400, Loss: 0.000162
Epoch 9500, Loss: 0.000005
Epoch 9600, Loss: 0.000095
Epoch 9700, Loss: 0.000006
Epoch 9800, Loss: 0.000130
Epoch 9900, Loss: 0.000005

Training complete.

Test Loss (MSE) on sample points: 0.000003

Sample predictions vs. Ground Truth:
----------------------------------------
2D Input:      [0.1 0.2]
3D Ground Truth: [0.1  0.2  0.25]
3D Predicted:  [0.10195343 0.2014614  0.25354284]
MSE Loss:      0.000006
----------------------------------------
2D Input:      [-0.3  0.4]
3D Ground Truth: [-0.3   0.4  -0.25]
3D Predicted:  [-0.29882124  0.40162012 -0.2510722 ]
MSE Loss:      0.000002
----------------------------------------
2D Input:      [ 0.5 -0.6]
3D Ground Truth: [ 0.5        -0.6        -0.30901715]
3D Predicted:  [ 0.49976665 -0.60047054 -0.30707666]
MSE Loss:      0.000001