16-825 Assignment 2

Hyojae Park

1.1

Optimized:

Ground truth:

1.2

Optimized:

Ground truth:

1.3

Optimized:

Ground truth:

2.1

Input RGB 1:

Predicted grid 1:

Ground truth grid 1:

Ground truth mesh 1:

Input RGB 2:

Predicted grid 2:

Ground truth grid 2:

Ground truth mesh 2:

Input RGB 3:

Predicted grid 3:

Ground truth grid 3:

Ground truth mesh 3:

2.2

Input RGB 1:

Predicted point cloud 1:

Ground truth point cloud 1:

Ground truth mesh 1:

Input RGB 2:

Predicted point cloud 2:

Ground truth point cloud 2:

Ground truth mesh 2:

Input RGB 3:

Predicted point cloud 3:

Ground truth point cloud 3:

Ground truth mesh 3:

2.3

Input RGB 1:

Predicted mesh 1:

Ground truth mesh 1:

Input RGB 2:

Predicted mesh 2:

Ground truth mesh 2:

Input RGB 3:

Predicted mesh 3:

Ground truth mesh 3:

2.4

F1-score curves: While all F1-scores increase as the threshold increases, the point cloud F1-score increases to about double that of the voxel grid, and the mesh F1-score increases to just under the point cloud's score. This shows that the point cloud performs better than voxel grids due to its ability to better model the surface and its flexible resolution while the voxel grids faces drawbacks from being discretized. Similarly, the mesh score performs better than voxels due to its ability to model the surface explicitly, which enables surface continuity and connectivity. However, the point cloud performs better than the mesh due to its simplicity (points vs geometry), which reduces the difficulty of the metric that the model needs to learn, improving performance.

Voxel grid:

Point cloud:

Mesh:

2.5

I varied n_points from 1000 to 5000. The hope was that this would force the model to learn more refined details of the mesh when predicting point clouds, such as the topology. However, as evident by the two photos (left = 5000, right = 1000), it did not actually capture more details. Instead, the large number of points led to more occurences of incorrectly placed points, leading to the outcome looking more "bloated" and less detailed. Further, the output was simply more densely sampled around the areas that the model could predict the mostly easily, specifically the corners of the chair. Further, the original mesh's hole in the chair does not become more clear after increasing this parameter. This shows that while increasing n_points increases computation and training time, it does not necessarily lead to greater detail and understanding of information like topology, showing a fundamental limitation in the trained model.

2.6

A visualization that I created was when calculating the F1-score, to visualize each sampled point by how far it is from the closest point in the mesh. This metric helps to visually highlight where the model often gets the prediction correct, and which components of the object is struggles to correctly model.

For instance:

These two figures show the pointwise-error from the voxel grid model, where a brighter color means greater error. While the general shape of the couch exists in both images, it is evident that in the first image, the edges of the couch contains the most error, while the bottom surface of the couch contains the most error in the second. These two images demonstrate that the model struggles to correctly place points near the surface of the model, especially around sharp edges.

A similar trend also shows for the point cloud model:

As you can see, the model tends to struggle on flat surfaces (the backrest of the chair) as well as the thin sharp edges (the legs of the chair). These plots help to therefore highlight difficult components of the object that all models struggle with.

3.2

I implemented a parametric network. The image below shows points randomly being sampled in [0,1]x[0,1] and mapped to a point in 3D.

Unfortunately, the model did not properly learn to parameterize the surface. I tried multiple mapping methods (spherical coordinates, SVD, projection onto a plane), but they resulted in the model mapping the 2D points towards a fixed point, usually the origin in 3D. I also played around with different loss function, from the Chamfer distance to MSELoss. These results indicate that the model was stuck at a local minima of predicting a single point each time. A possible remedy is to significantly increase the complexity of the model, and use a dataset that provides a UV mapping so that the 2D-3D correspondence is optimal.