
My outputs of grid/ray visualization:

My visualization of the point samples from the first camera:

My results and visualization of the depth:

No visualization, so implementation code as following:
# Random subsampling of pixels from an image
def get_random_pixels_from_image(n_pixels, image_size, camera):
xy_grid = get_pixels_from_image(image_size, camera)
# TODO (Q2.1): Random subsampling of pixel coordinaters
N = xy_grid.shape[0]
indices = torch.randperm(N, device=xy_grid.device)[:n_pixels]
xy_grid_sub = xy_grid[indices]
# Return
return xy_grid_sub
Box center: (0.25, 0.25, -0.00) Box side lengths: (2.01, 1.50, 1.50)
Before training:

After training:

My result:

My result:

My result for the lego scene:

My result for the materials scene:

The trade-offs between increased view dependence and generalization quality:
My result:

My implementation: The goal is to find the intersection between camera rays and an implicit surface defined by a signed distance function (SDF).
The visualization of the input cloud and my prediction:

My Implementation:
Intuitively, alpha measures the strength of density: a higher alpha leads to a more solid surface, while a lower alpha corresponds to a more transparent surface. Beta measures the sharpness of surface: a higher beta leads to a thick and fuzzy surface, whhile a lower beta corresponds to a thin and sharp surface.
With a high beta, the volume rendering loss provides gradients for a wide region around the surface. The network can achieve low loss even if the SDF zero-level set is slightly off. Therefore, the SDF is allowed to be a bit imprecise, resulting in a smoother, slightly biased surface. On the contrary, with a low beta, only points very close to the true surface produce nonzero density, so gradients are concentrated near the zero-level set. The network must predict distances accurately at the surface to reduce loss.
An SDF is usually easier to train with a higher beta. This is because a low beta makes the density very sharp; only points extremely close to the zero-level set contribute to the gradient. So it is very likely to lead to near-zero gradients. However, with a high beta, although the surface might be more fuzzy, it makes the SDF easier to train.
It is more likely to learn an accurate surface with a lower beta. This is because a low beta creates a steep density transition, so the volume rendering loss is only satisfied if the zero-level set of the SDF is precisely aligned with the true surface.
My choice of the hyper-parameters is: alpha=15, beta=0.05. Here is the best result:

Using 20 training views, both VolSDF (left) and NeRF (right) solution are able to infer the scene.

Using only 10 training views, only VolSDF (left) successfully infer the scene.
