Assignment 2

Question 1.1: Fitting a voxel grid

Ground Truth (Source)
Ground truth voxel grid
Optimized (Target)
Optimized voxel grid

Question 1.2: Point cloud

Ground Truth (Source)
Ground truth point cloud
Optimized (Target)
Optimized point cloud

Question 1.3: Mesh

Ground Truth (Source)
Ground truth mesh
Optimized (Target)
Optimized mesh

Question 2.1: Image to voxel grid

Input RGB
Example 0 input RGB
Predicted Voxel Grid
Example 0 predicted voxel grid
Ground Truth Mesh
Example 0 ground truth mesh
Input RGB
Example 100 input RGB
Predicted Voxel Grid
Example 100 predicted voxel grid
Ground Truth Mesh
Example 100 ground truth mesh
Input RGB
Example 350 input RGB
Predicted Voxel Grid
Example 350 predicted voxel grid
Ground Truth Mesh
Example 350 ground truth mesh

Question 2.2: Image to point cloud

Input RGB
Example 0 input RGB
Predicted Point Cloud
Example 0 predicted point cloud
Ground Truth Mesh
Example 0 ground truth mesh
Input RGB
Example 100 input RGB
Predicted Point Cloud
Example 100 predicted point cloud
Ground Truth Mesh
Example 100 ground truth mesh
Input RGB
Example 350 input RGB
Predicted Point Cloud
Example 350 predicted point cloud
Ground Truth Mesh
Example 350 ground truth mesh

Question 2.3: Image to mesh

Input RGB
Example 0 input RGB
Predicted Mesh
Example 0 predicted mesh
Ground Truth Mesh
Example 0 ground truth mesh
Input RGB
Example 100 input RGB
Predicted Mesh
Example 100 predicted mesh
Ground Truth Mesh
Example 100 ground truth mesh
Input RGB
Example 350 input RGB
Predicted Mesh
Example 350 predicted mesh
Ground Truth Mesh
Example 350 ground truth mesh

Question 2.4: Quantitative comparisons

Voxel Grid F1 Score
Voxel grid F1 score curve
Point Cloud F1 Score
Point cloud F1 score curve
Mesh F1 Score
Mesh F1 score curve
Clearly, the F1 score for the point network is the highest. Intuitively, this is because the F1 scores are computed on the set of points directly predicted by the point network. Whereas, for the voxel and the mesh networks, points are sampled from the predicted representations to compute the F1 score. As there is no intermediate representation between the prediction and the F1 score computation for the point network, we would expect that to obtain the highest F1 score. The mesh network has to satisfy the smoothness constraint which might serve as a good regularizer, which makes it achieve a marginally higher F1 score than the voxel network.

Question 2.5: Analyze effects of hyperparameter variations

Ground Truth
Example 50 ground truth mesh
Predicted with w_smooth=0.1
Example 50 predicted mesh w_smooth=0.1
Predicted with w_smooth=0.2
Example 50 predicted mesh w_smooth=0.2
Ground Truth
Example 150 ground truth mesh
Predicted with w_smooth=0.1
Example 150 predicted mesh w_smooth=0.1
Predicted with w_smooth=0.2
Example 150 predicted mesh w_smooth=0.2
Evaluation w_smooth=0.1
Mesh evaluation w_smooth=0.1
Evaluation w_smooth=0.2
Mesh evaluation w_smooth=0.2
Clearly the quality if the mesh reconstructions with w_smooth = 0.1 is qualitatively better than with w_smooth = 0.2. The meshes produces in the latter case are blobbier, which is particularly noticeable in the second example. The backrest with w = 0.2 is appreciably thicker. The F1 scores also reinforce the point that over smoothing the mesh predictions generally leads to poorer quality reconstructions.

Question 2.6: Interpret your model

Ground Truth
Example 0 ground truth mesh
Predicted Mesh
Example 0 predicted mesh
Ordered color
Example 0 predicted mesh ordered color
Ground Truth
Example 100 ground truth mesh
Predicted Mesh
Example 100 predicted mesh
Ordered color
Example 100 predicted mesh ordered color
Ground Truth
Example 350 ground truth mesh
Predicted Mesh
Example 350 predicted mesh
Ordered color
Example 350 predicted mesh ordered color
I have chosen to visualize the order of the points predicted by the final layer of the mesh network. Specifically, for every predicted mesh, I have assigned "red" color to the first (0th index) point and "blue" to the last indexed point as indexed by the outputs of the last layer of the mesh network. The color varies linearly from red to blue as we go from the first index of the output layer to the last index.

My naive intuition would have expected the indexing of the output layer to be reflected in the geometry of the chair, as in, the predicted textured mesh varied from red to blue from top to bottom, or something similar.

As is seen in the visualizations, the initial indices of the output layer almost form a spider web (or a skeleton) around the overall geometry of the chair (as seen in red) and the latter indices fill it up (as seen in blue).

Question 3.3: Extended dataset for training

Ground Truth Point Cloud
Example 50 ground truth point cloud
Predicted Point Cloud
Example 50 predicted point cloud
Predicted Point Cloud (Full Dataset)
Example 50 predicted point cloud full dataset
Ground Truth Point Cloud
Example 450 ground truth point cloud
Predicted Point Cloud
Example 450 predicted point cloud
Predicted Point Cloud (Full Dataset)
Example 450 predicted point cloud full dataset
Evaluation (Limited Dataset)
Point cloud evaluation limited dataset
Evaluation (Full Dataset)
Point cloud evaluation full dataset
Based on the F1 score curves, the network trained on the full dataset marginally outperforms the one trained on just class both the models were trained on.

The quantitative comparison fails to highlight the difference in performance in particularly hard examples within the class. The visualizations shown were handpicked and are among the harder examples within the "chairs" class. On these examples, the network trained on just the chair class seems to predict better point clouds than the network trained on the full dataset, despite the average F1 score telling a different story.