Learning for 3D Vision HW2¶

1.1. Fitting a voxel grid¶

aaa

1.2. Fitting a point cloud¶

bbb

1.3. Fitting a mesh¶

2.1. Image to voxel grid¶

the following images are ground true image, predicted, ground truth mesh

2.2. Image to point cloud¶

2.3. Image to mesh¶

2.4. Quantitative comparisions¶

The performance of point based model is much more better than vox and mesh, it may because the calculation of F1 score only cares about points, so methods learning surfaces like vox and mesh will perform worse on F1 score.

2.5. Analyse effects of hyperparams variations¶

Voxel size: When decreasing the voxel size from 32 to 16, the performance degrades-F1@0.05 drops from 72 to 65.5. Finer details are missing, e.g., the thin legs of the chair disappear.
number of points: Increasing the number of points can cause a more robust training, the F1@0.05 improving from 78.989 to 82.740.
w_smooth: Increasing w_smooth from 0.1 to 1 cause over-smoothing, the faces of mesh are sticked together and the quality degrades F1@0.05 from 75.9 to 70.2.
initial mesh: ico_sphere(3) has higher F1 score (75.9) compared to ico_sphere(4) (72.5) , which may because ico_sphere(4) has more vertices and thus harder for optimized.

2.6. Interpret your model¶

Through visualizing the intermediate value after the activation function, we can observe the spatial contour gradually formed.

3.1 Implicit network¶

3.2 Parametric network¶

The image to points model in 2.2 use an AtlasNet like model that takes the global latent vector and a 2d point as input, the point coordinate is transform from dim=2 to dim=512 with a linear function then add to the global latent.

3.3 Extended dataset for training¶

from the performance on F1-score, train on 3 object has slightly worse performance, but it get better when setting threshold to 0.05.

And from the qualitative results, the qulity of training on 1 object is better than 3 object, e.g. chair has more straight leg.

One obj.

Three obj.