16-825 Assignment 2¶

Author: Yu Jin Goh

Q1.1 Fitting a voxel grid¶

vox

Q1.2¶

point

Q1.3¶

mesh

Q2.1 Image to Voxel Grid¶

Left: Input Image
Middle: Predicted Voxel
Right: Groundtruth Voxel

vox_model1
vox_model2
vox_model3

Q2.2 Image to Point Cloud¶

Left: Input Image
Middle: Predicted Point Cloud
Right: Groundtruth Point Cloud

point_model1
point_model2
point_model3

Q2.3 Image to Mesh¶

Left: Input Image
Middle: Predicted Mesh
Right: Groundtruth Mesh

point_model1
point_model2
point_model3

Q2.4 Quantitative Comparision¶

Voxel F1 Score: 64.537¶

vox_f1

Point cloud F1 Score: 75.617¶

point_f1

Mesh F1 Score: 74.821¶

mesh_f1

Q2.5 Hyperparameter Tuning¶

In this question I have tuned n_points such that we experiment with 250 points, 1000 points and 4000 points.

We highlight the quantitative performance of increasing the number of points sampled.

n_points	F1 Score @ 0.05
250	64.202
1000	75.617
4000	82.471

As you can see the F1 Score @ 0.05 of the point predictor increases with the number of points sampled. We plot a few sample outputs of the model below:

Left: input image
Middle: predicted point cloud
Right: ground truth point cloud

n_points	Sample Output 1
250
1000
4000

We can see that the overall structure of the model here is consistent but the density of the points have increased.
Thus, we conclude that due to the increase in density of the point clouds and there will be more points near a groundtruth point, but the structure remains similar.

Q2.6 Intepret your model¶

We visualize what happens when a patch is masked out in the input image and what happens to the predicted voxels. The green sliding window represents the patch that is masked to be 0 (black). This experiment gives us an idea about how sensitive are occusions to the model for different types of chairs and also which parts of the chairs would affect the prediction.

Below are some of the observed findings from our visualizations of the model:

1) The model defaults to 4 legs if the legs are not visible
occlusion

2) The model is sensitive to partial occlusion of the armrest

3) The model is not sensitive to occlusion of the legs in 4 legged chairs

Q3.1 Implicit network¶

Q3.3 Extended dataset for training¶

Quantitative Results¶

Each model was evaluated against the full shapenet dataset of 3 classes (airplane, car and chair) and subset of 1 class (chair). From the results below we can see that training on a larger dataset does not actually change the performance of the model on the 1 class case as performance remains relatively consistent and infact shows a slight improvement from 75.6 to 76.3. At the same time, the model is able to generalize better to different classes when trained on a larger dataset consisting of more variations, which can be seen from how the model attains a score of 85.9 compared to the model which was only trained on chairs that attained a score of 69.1.

Dataset	Model trained on	F1 Score @ 0.05
1 Class	1 Class	75.617
1 Class	3 Class	76.300
3 Class	1 Class	69.168
3 Class	3 Class	85.918

Qualitative Results¶

The below plots visualize the results across 3 different classes. As confirmed, the model that was only trained on the chairs class predicted chair shapes for even airplanes and vehicles. In comparision, the model trained on the full dataset performed well onthe original chair class as well as the new car and airplane classes.