1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Source first, then target voxel src voxel tgt

1.2. Fitting a point cloud (5 points)

Source first, then target point cloud src point cloud tgt

1.3. Fitting a mesh (5 points)

Source first, then target mesh src mesh tgt

2. Reconstructing 3D from single view

2.1. Image to voxel grid (20 points)

All model training and evaluation in this assignment uses the load_feat argument. The images are in order of input image, groundtruth, prediction

Example 1

image 13 input image image 13 gt mesh image 13 pred voxel

Example 2

image 16 input image image 16 gt mesh image 16 pred voxel

Example 3

image 19 input image image 19 gt mesh image 19 pred voxel

2.2. Image to point cloud (20 points)

The images are in order of input image, groundtruth, prediction

Example 1

image 0 input image image 0 gt mesh image 0 pred voxel

Example 2

image 1 input image image 1 gt mesh image 1 pred voxel

Example 3

image 2 input image image 2 gt mesh image 2 pred voxel

2.3. Image to mesh (20 points)

The images are in order of input image, groundtruth, prediction

Example 1

image 11 input image image 11 gt mesh image 11 pred voxel

Example 2

image 12 input image image 12 gt mesh image 12 pred voxel

Example 3

image 3 input image image 3 gt mesh image 3 pred voxel

2.4. Quantitative comparisions(10 points)

From the graphs below, we can see that all three models follow a similar pattern in F1-Scores when varying the thresholds. They all start with a low F1-Score at threshold 0.01, and steadily increase to achieve the highest F1-Score when using a threshold of 0.05.

The best performing model is the pointcloud predictor, with an F1-Score of 77. The next best is the voxel predictor, with an F1-Score of 74. Then, the mesh predictor has an F1-Score of 70.

The performance of these models can be explained by their strengths and limitations.

Voxel F1-Score Curve voxel f1 scores Pointcloud F1-Score Curve Pointcloud f1 scores Mesh F1-Score Curve Mesh f1 scores

2.5. Analyse effects of hyperparams variations (10 points)

I changed the number of points predicted in the point cloud prediction. Below are the F1-Score curves and qualitative results.

Analysis: Using fewer points makes our model more time and memory efficient, but the predicted shapes tend to be coarse and miss fine details such as thin structures or edges, leading to higher reconstruction error. On the other hand, predicting more points provides denser coverage of the object surface, allowing the model to capture small geometric details and achieve lower error metrics, though this comes at the cost of higher computational and memory demands. This is shown through the quantiative and qualitative metrics.

500 points 500 pts f1 scores 1000 points 1000 pts f1 scores 3000 points 3000 pts f1 scores 5000 points 5000 pts f1 scores

500 points result 500 pts GT 500 pts pred 1000 points result 1000 pts GT 1000 pts pred 3000 points result 3000 pts GT 3000 pts pred 5000 points result 5000 pts GT 5000 pts pred

2.6. Interpret your model (15 points)

I converted the groundtruth meshes into voxel grids and compared predicted and ground-truth voxels by calculating the distance from each predicted voxel to its closest groundtruth voxel surface. I coloring them based on distance from the mesh surface, highlighting where predictions diverged most from the ground truth surface. From the qualitative results, we can see that our model is able to accurately predict the details of the top of the chair, but it loses its geometric precision towards the bottom of the chair. This may be due to the fact that many of the input images were taken with high elevation angles. This elevation shows that 3D reconstruction performs well in areas of the scene that are captured well from the training images and performs poorly in areas that aren't captured thoroughly.

Here are the qualitative results below. The order is GT mesh, predicted mesh, error map.

Example 1

image 0 GT mesh image 0 Pred mesh image 0 Error map

Example 2

image 2 GT mesh image 2 Pred mesh image 2 Error map

Example 3

image 12 GT mesh image 12 Pred mesh image 12 Error map

3. Exploring other architectures / datasets. (Choose at least one! More than one is extra credit)

3.3 Extended dataset for training (10 points)

I ran an experiment, training my voxel model on chair, car, and plane. It was trained with the same number of iterations, learning rate, model architecture, and all other hyperparameters, compared to the training scheme in Q2.1. The only difference was that it was trained on the full dataset, and tested on the mini, chair-only dataset.

The quantitative and qualitative results on the test set show that the Q2.1 model performs better. This can be due to the fact that the training dataset had the same distribution as the testing dataset in Q2.1's model. On the other hand, Q3.3 was trained on other types of objects like cars and planes, which meant that the distributions of the training and testing dataset were very different. If we were to increase the model's capacity for Q3.3's experiment, have more chair data in the full dataset, or tune other hyperparameters, then we can reduce this performance gap.

From the quantitative results, we can see that Q3.3 had a much lower F-Score than Q2.1's model at all thresholds. The qualitative results also show that Q3.3 produces results that are have less quality than Q2.1's model.

Q2.1 Quantitative Results q2.1 f1 scores Q3.3 Quantitative Results q3.3 f1 scores

Qualitative Results In order of GT mesh --> Q2.1 Result --> Q3.3 Result

image 215 input image image 215 gt mesh image 215 pred voxel q2.1 image 215 pred voxel q3.3

image 24 input image image 24 gt mesh image 24 pred voxel q2.1 image 24 pred voxel q3.3

image 114 input image image 114 gt mesh image 114 pred voxel q2.1 image 114 pred voxel q3.3