Assignment 2
Question 1.1: Fitting a voxel grid
Question 1.2: Point cloud
Question 2.1: Image to voxel grid
Question 2.2: Image to point cloud
Question 2.3: Image to mesh
Question 2.4: Quantitative comparisons
Clearly, the F1 score for the point network is the highest. Intuitively, this is because the F1 scores are computed on the set of points directly predicted by the point network. Whereas, for the voxel and the mesh networks, points are sampled from the predicted representations to compute the F1 score. As there is no intermediate representation between the prediction and the F1 score computation for the point network, we would expect that to obtain the highest F1 score. The mesh network has to satisfy the smoothness constraint which might serve as a good regularizer, which makes it achieve a marginally higher F1 score than the voxel network.
Question 2.5: Analyze effects of hyperparameter variations
Clearly the quality if the mesh reconstructions with w_smooth = 0.1 is qualitatively better than with w_smooth = 0.2. The meshes produces in the latter case are blobbier, which is particularly noticeable in the second example. The backrest with w = 0.2 is appreciably thicker. The F1 scores also reinforce the point that over smoothing the mesh predictions generally leads to poorer quality reconstructions.
Question 2.6: Interpret your model
I have chosen to visualize the order of the points predicted by the final layer of the mesh network. Specifically, for every predicted mesh, I have assigned "red" color to the first (0th index) point and "blue" to the last indexed point as indexed by the outputs of the last layer of the mesh network. The color varies linearly from red to blue as we go from the first index of the output layer to the last index.
My naive intuition would have expected the indexing of the output layer to be reflected in the geometry of the chair, as in, the predicted textured mesh varied from red to blue from top to bottom, or something similar.
As is seen in the visualizations, the initial indices of the output layer almost form a spider web (or a skeleton) around the overall geometry of the chair (as seen in red) and the latter indices fill it up (as seen in blue).
Question 3.3: Extended dataset for training
Based on the F1 score curves, the network trained on the full dataset marginally outperforms the one trained on just class both the models were trained on.
The quantitative comparison fails to highlight the difference in performance in particularly hard examples within the class. The visualizations shown were handpicked and are among the harder examples within the "chairs" class. On these examples, the network trained on just the chair class seems to predict better point clouds than the network trained on the full dataset, despite the average F1 score telling a different story.