Assignment 3 : Neural Volume Rendering and Surface Rendering¶

Name: Xinyu Liu¶

A. Neural Volume Rendering¶

0. Transmittance Calculation¶

image

1. Differentiable Volume Rendering¶

1.3. Ray sampling¶

My outputs of grid/ray visualization:

image image

1.4. Point sampling¶

My visualization of the point samples from the first camera:

image

1.5. Volume rendering¶

My results and visualization of the depth:

GIF image

2. Optimizing a basic implicit volume¶

2.1. Random ray sampling¶

No visualization, so implementation code as following:

# Random subsampling of pixels from an image
def get_random_pixels_from_image(n_pixels, image_size, camera):
    xy_grid = get_pixels_from_image(image_size, camera)

    # TODO (Q2.1): Random subsampling of pixel coordinaters
    N = xy_grid.shape[0]
    indices = torch.randperm(N, device=xy_grid.device)[:n_pixels]
    xy_grid_sub = xy_grid[indices]

    # Return
    return xy_grid_sub

2.2. Loss and training¶

Box center: (0.25, 0.25, -0.00) Box side lengths: (2.01, 1.50, 1.50)

Before training:

image image image image

After training:

image image image image

2.3. Visualization¶

My result:

GIF

3. Optimizing a Neural Radiance Field (NeRF)¶

My result:

GIF

4. NeRF Extras¶

4.1 View Dependence¶

My result for the lego scene:

GIF

My result for the materials scene:

GIF

The trade-offs between increased view dependence and generalization quality:

  1. Introducing view dependence enables the NeRF model to capture complex appearance effects, resulting in more realistic reconstructions.
  2. However, it also raises the risk of overfitting to training viewpoints, especially when the dataset has limited angular coverage, thereby reducing the model’s ability to generalize to unseen views.
  3. In conclusion, increased view dependence improves visual realism for observed views but weakens generalization to novel ones.

B. Neural Surface Rendering¶

5. Sphere Tracing¶

My result:

GIF

My implementation: The goal is to find the intersection between camera rays and an implicit surface defined by a signed distance function (SDF).

  1. For each ray, initialized with its origin (origins) and direction (directions), a running distance (dist) is maintained to track how far the ray has traveled.
  2. At each iteration, the algorithm computes the current 3D points along the rays and queries the implicit function (implicit_fn) to obtain their SDF values. These values indicate the minimum safe step size the ray can advance without overshooting the surface. If the absolute SDF value at a point is smaller than a small threshold (eps), the ray is considered to have hit the surface and marked in the hit mask (hit). Rays that have not yet hit continue marching forward until they either converge or exceed the maximum tracing distance (self.far).
  3. Finally, the intersection points (points) and the boolean mask (final_mask) are returned, indicating which rays successfully intersected the surface.

6. Optimizing a Neural SDF¶

The visualization of the input cloud and my prediction: GIF GIF

My Implementation:

  1. MLP: The mlp_dist MLP predicts the signed distance (SDF) for each 3D point. Each point is first encoded with a harmonic embedding to capture high-frequency spatial variations. The MLP consists of multiple fully connected layers (num_layers_dist) with hidden_dim_dist neurons each, and includes a skip connection after layer 3, which concatenates the original embedding to the hidden representation. The final output is a scalar SDF per point, representing its distance to the implicit surface, with negative values inside the surface and positive values outside.
  2. Eikonal loss: The eikonal_loss regularizes the SDF network by enforcing the Eikonal constraint, which requires that the gradient of the SDF with respect to spatial coordinates has unit norm (|∇SDF| = 1) almost everywhere. Given a batch of SDF gradients (gradients), the function computes their L2 norm per point, measures the squared deviation from 1, and averages over all points to obtain a scalar loss.

7. VolSDF¶

  1. Intuitively, alpha measures the strength of density: a higher alpha leads to a more solid surface, while a lower alpha corresponds to a more transparent surface. Beta measures the sharpness of surface: a higher beta leads to a thick and fuzzy surface, whhile a lower beta corresponds to a thin and sharp surface.

  2. With a high beta, the volume rendering loss provides gradients for a wide region around the surface. The network can achieve low loss even if the SDF zero-level set is slightly off. Therefore, the SDF is allowed to be a bit imprecise, resulting in a smoother, slightly biased surface. On the contrary, with a low beta, only points very close to the true surface produce nonzero density, so gradients are concentrated near the zero-level set. The network must predict distances accurately at the surface to reduce loss.

  3. An SDF is usually easier to train with a higher beta. This is because a low beta makes the density very sharp; only points extremely close to the zero-level set contribute to the gradient. So it is very likely to lead to near-zero gradients. However, with a high beta, although the surface might be more fuzzy, it makes the SDF easier to train.

  4. It is more likely to learn an accurate surface with a lower beta. This is because a low beta creates a steep density transition, so the volume rendering loss is only satisfied if the zero-level set of the SDF is precisely aligned with the true surface.

My choice of the hyper-parameters is: alpha=15, beta=0.05. Here is the best result:

GIF GIF

8. Neural Surface Extras¶

8.2 Fewer Training Views¶

Using 20 training views, both VolSDF (left) and NeRF (right) solution are able to infer the scene.

GIF GIF

Using only 10 training views, only VolSDF (left) successfully infer the scene.

GIF GIF