Homework 2¶

NOTE: I used CPU for questions 1 and 2.

1. Exploring loss functions¶

1.1. Fitting a Voxel Grid¶

Ground Truth	My Prediction

1.2. Fitting a Point Cloud¶

Ground Truth	My Prediction

1.3. Fitting a Mesh¶

Ground Truth	My Prediction

¶

2. Reconstructing 3D From Single View¶

2.1 Image to Voxel Grid¶

RGB	Ground Truth	My Prediction

2.2 Image to Point Cloud¶

RGB	Ground Truth	My Prediction

2.3 Image to Mesh¶

RGB	Ground Truth	My Prediction

2.4 Quantitative Comparisons¶

Mesh	Point Clouds	Voxels

| F1 Mesh | F1 Point Cloud | F1 Voxels

Analysis¶

As we can see here, the 3D reconstruction for point clouds performed best, followed by meshes and then voxels.

All three of the graphs show that as the threshold increases from 0.01 to 0.05, the F1-score for each task increases almost linearly.

The point cloud prediction performs best amongst the three tasks. Given that my architecture is an MLP, this could be the case because the model is trying to predict the location of a 3D point in space, effectively learning where each point is instead of deforming a surface or making an occupancy prediction. As we can see above, we're not learning any connectivity between points, but rather just a set of points that are on or near the object's surface.

The mesh prediction performs slightly worse than point cloud prediction. This could be because you are trying to deform a shape (in our case, an ico-sphere) into a chair, which is a much harder task than predicting a 3D point in space. The model, in this case, is trying to predict connectivity between points and deform the ico-sphere into the target object (chair) while preserving a connected surface. This can introduce distortions, as we can see above – our predicted meshes have noisier outputs and sharper edges.

Finally, the voxel grid performs the worst. This is because for voxel construction, you're trying to perform occupancy prediction. This involves discretizing space, so as a consequence, finer-grained details and parts of the object may not be rendered.

2.5 Hyperparameter Variations¶

I varied the number of points for the point cloud reconstruction to 10,000.

Here's the F1-score graphs to showcase the difference in the F1-scores between num_points being 1000 and 10000:

num_points = 1000	num_points = 10000

Here's an example showcasing the differences between the num_points being 1000 and 10000:

num_points = 1000	num_points = 10000

Analysis¶

Here, we can see that by sampling for more points for the point cloud, we get a much higher F1 score at the threshold of 0.05. In addition, when observing the example above that displays the point cloud rendering for 1000 and 10000 points, we see a more explicit shape being rendered when sampling more points. This indicates that by sampling more points, we can capture finer details about the object we are rendering and reconstruct the object more precisely.

2.6 Model Interpretation¶

RGB	Ground Truth	My Prediction	Diff

Analysis¶

For this question, I chose to make a view (see "Diff" in the table above) that made a color gradient in the predicted point cloud. The color gradient is dependent on distance to the ground truth, where blue is for points that are close to the ground truth, and red is for points that are far from the ground truth.

This visualization helps showcase the spatial patterns that exist in the reconstruction of the predicted point cloud. The points inside the chair are all blue, indicating that they are close to a point in the ground truth rendering. However, as we go to the points on the outline of the chair, we see more points becoming closer to red, indicating that the model struggles to capture the edges and fine-grained details of the object. This indicates that our predicted point cloud gets the cluster of points inside the object close to the ground truth, but the outline and the edges of the object becomes noisier and less refined.

3. Exploring Other Architectures/Datasets¶

3.3 Extended Dataset for Training (Point Cloud)¶

Partial Dataset	Full Dataset

Ground Truth	Partial Dataset Prediction	Full Dataset Prediction

Analysis¶

The F1 score for the full dataset is lower than for the partial dataset. This could be for a few reasons, including the fact that because the model has been trained on 3 classes, it has to generalize for a wide range of shapes and therefore does not capture the fine-grained details that the model trained on the partial dataset can. This is evident in the visualizations above, as well, where we can see that for the model trained on just the chair data, it more closely matches the ground truth point cloud rendering, whereas for the model trained on the full dataset, it captures more general features (backrest, legs, etc.) and less of the fine-grained details (shape of the chair, etc.).