Assignment 3: Neural Volume & Surface Rendering

Student Name: [Minghao Xu]

Student ID: [mxu3]

A. Neural Volume Rendering (80 points)

0. Transmittance Calculation (10 points)

The solution involves integrating the density along the ray's path to find the total opacity, and then calculating the final transmittance. My handwritten solution is shown below.

1. Differentiable Volume Rendering

1.3. Ray Sampling (5 points)

I implemented get_pixels_from_image to generate normalized device coordinates (NDC) and get_rays_from_pixels in ray_utils.py to unproject these 2D coordinates into world-space ray origins and directions. The visualizations below show the resulting pixel grid and the corresponding camera rays.

Grid Visualization

Rays Visualization

1.4. Point Sampling (5 points)

I implemented the StratifiedRaysampler forward pass in sampler.py. This method generates evenly spaced depth values between the near and far planes and computes the 3D coordinates of each sample point along the rays. The resulting point cloud forms a view frustum, as visualized below.

1.5. Volume Rendering (20 points)

I implemented the core volume rendering equation in renderer.py. The _compute_weights function calculates per-sample alpha, transmittance, and final rendering weights. The forward pass uses these weights to aggregate color and depth via the _aggregate function. The resulting render of a box SDF and its normalized depth map are shown below.

Rendered Box GIF

Depth Map

2. Optimizing a Basic Implicit Volume (10 points)

For this task, I implemented get_random_pixels_from_image in ray_utils.py to enable efficient stochastic ray sampling during training. I also implemented a Mean Squared Error (MSE) loss function in the train method of volume_rendering_main.py. This loss calculates the difference between the rendered pixel colors and the ground truth colors from the provided images.

By running the training script with the train_box configuration, the differentiable rendering pipeline optimized the center and side lengths of an implicit box SDF. The model successfully learned the correct parameters by minimizing the image reconstruction loss.

After training, the optimized parameters converged to:

Box Center: (0.2502, 0.2506, -0.0005)
Box Side Lengths: (2.0051, 1.5036, 1.5034)

Below is a side-by-side comparison of my rendered animation of the final optimized box and the reference GIF provided by the Github repo. The result shows that my model has accurately recovered the geometry of the target object.

My Result

Sample Result

3. Optimizing a Neural Radiance Field (NeRF) (20 points)

I implemented the NeuralRadianceField class in implicit.py, creating a Multi-Layer Perceptron (MLP) that maps a positionally encoded 3D coordinate to a volume density and an RGB color. After training this model on the Lego bulldozer dataset, it produced the following rendering.

4. NeRF Extras (Extra Credit)

4.1. View Dependence (10 points)

I extended the NeRF MLP to incorporate view-dependence. The network was restructured so that the color output is a function of both the 3D position and the 2D viewing direction. This allows the model to learn view-dependent effects like specular highlights. The model was trained on the 'materials' dataset.

Training Progression

The following GIFs show the rendered output at different epochs during training. A clear progression can be seen as the model learns the scene's geometry and material properties. The initial renderings are noisy and blurry, but the model gradually refines the shape and appearance, eventually learning the distinct specular reflections on the surfaces.

Epoch 10

Epoch 50

Epoch 100

Epoch 150

Epoch 200

Epoch 240

Final Result

The final rendered animation after 240 epochs is shown below. The view-dependent effects, such as the highlights on the spheres, are clearly visible and change realistically as the camera moves.

Discussion: View Dependence vs. Generalization Quality

Advantage: Adding view dependence is key for realism, as it allows the model to render view-dependent effects like specular highlights on shiny surfaces. Without this, all objects would look matte.

Trade-off: This comes at the cost of a higher risk of overfitting. The model might just memorize the appearance from the training viewpoints and fail to generalize to new ones, causing artifacts like flickering highlights. The NeRF architecture reduces this risk by only using the view direction in the final layers of the network. This forces the model to first learn the basic shape and color of the object, and then add view-dependent effects on top, which leads to better results.

4.2. Coarse/Fine Sampling (10 points)

I implemented NeRF's two-stage sampling strategy. A "coarse" network first renders the scene along uniformly sampled points to estimate where geometry is present. Based on these results, a second, more informed set of points is sampled, concentrating them in important regions. A "fine" network then uses these new points to produce the final, higher-quality rendering. The comparison below shows the significant improvement over the standard NeRF from Part 3.

Standard NeRF (Part 3)

NeRF with Coarse/Fine Sampling

Discussion: Speed vs. Quality Trade-off

Quality: The coarse/fine sampling strategy greatly improves rendering quality. By focusing sampling on areas with complex geometry instead of empty space, the model can produce much sharper images with fewer artifacts like "floaters".

Speed: The improvement in quality comes at the cost of speed. Since the model has to do two rendering passes (coarse and fine) for each ray, training and rendering take significantly longer and use more memory. However, this trade-off is generally worth it for the much better results.

B. Neural Surface Rendering (50 points)

5. Sphere Tracing (10 points)

I implemented the sphere_tracing algorithm in renderer.py. The function iteratively marches along each ray, taking "safe" steps equal to the SDF distance until the surface (distance < epsilon) is found or the ray goes beyond the far plane. The result of rendering a simple torus SDF is shown below.

6. Optimizing a Neural SDF (15 points)

For this task, I implemented a neural network to represent a Signed Distance Function (SDF) and trained it to reconstruct a shape from a point cloud. This involved two main components: implementing the MLP architecture in the NeuralSurface class and defining the Eikonal loss in losses.py.

MLP Implementation

I populated the NeuralSurface class in implicit.py to predict a distance for any given input point. The architecture is a Multi-Layer Perceptron (MLP) that is similar in structure to the NeRF model. It first takes a 3D coordinate and enhances it using a HarmonicEmbedding layer for positional encoding. This high-dimensional vector is then passed through a series of linear layers with ReLU activations, using a skip connection to improve the learning of high-frequency details. The final layer of the network outputs a single scalar value, which represents the signed distance. Unlike the NeRF's color output, this distance output does not use a sigmoid activation, as it needs to represent both positive and negative values.

Eikonal Loss Implementation

To ensure the network learns a valid SDF, I implemented the Eikonal loss in losses.py. This loss acts as a regularization term. For any given point in space, the gradient of a true SDF with respect to the point's coordinates should have a vector norm (magnitude) of 1. The eikonal_loss function calculates the norm of the network's output gradients and computes the mean squared error between these norms and 1. By minimizing this loss, we force the network to learn a function that behaves like a true distance field across the entire space, not just on the surface.

The model was then trained on the bunny point cloud. The total loss was a combination of an SDF loss (forcing the predicted distance on the surface points to be zero) and the Eikonal loss. The input point cloud and the final rendered surface after training are shown below.

Input Point Cloud

Reconstructed Surface

7. VolSDF (15 points)

I extended the NeuralSurface model to predict per-point color and implemented the VolSDF paper's formula to convert the learned SDF into a volume density distribution. This allows for high-quality volume rendering of the neural surface. The model was trained on the Lego dataset, and the results for the extracted geometry and the final color render are shown below.

Geometry (Learned SDF)

Color Render (VolSDF)

Explanation of `alpha` and `beta`

Alpha (α): This is the main control for the object's opacity. A high alpha makes the object look solid, while a low alpha makes it look transparent.
Beta (β): This controls how "sharp" or "blurry" the surface is. A low beta creates a sharp, well-defined surface, while a high beta creates a soft, fuzzy transition from empty space to the object.

Discussion Questions

How does beta affect the learned SDF? A high beta allows the model to get away with a less precise shape, since the blurry surface can hide errors. A low beta forces the model to learn a very accurate shape, because any small mistake will be obvious.
Is it easier to train with a low or high beta? It's easier to start training with a high beta. The blurry surface is easier for the rays to "find" at the beginning of training, which helps the model start learning. A low beta creates such a sharp surface that the model might not find it at all and fail to train.
Which beta gives a more accurate surface? A low beta results in a more accurate surface. Because the surface is so sharp, the model is forced to learn the object's geometry very precisely to get the rendering right.

8. Neural Surface Extras (Extra Credit)

8.1. Render a Large Scene with Sphere Tracing (10 points)

I created a new ComplexSceneSDF and a more advanced InigoSceneSDF class in `implicit.py`. These classes combine over 20 primitive SDFs by taking the minimum of their individual distance functions at query time. The sphere tracing algorithm can then render this complex scene without any changes, demonstrating its efficiency and power.

8.2. Fewer Training Views (10 points)

I conducted an experiment to compare the performance of VolSDF and standard NeRF when trained on a sparse set of only 20 views of the Lego dataset. The results clearly demonstrate the advantage of surface-based representations in low-data regimes.

Standard NeRF (20 Views)

The standard NeRF fails completely, producing a mostly black, empty scene.

VolSDF (20 Views)

In contrast, VolSDF successfully reconstructs a coherent, solid geometric shape. The underlying SDF acts as a powerful geometric prior, forcing the network to learn a smooth, continuous surface even with limited data. The color is less detailed than with full data, but the geometry is stable and recognizable.

8.3. Alternate SDF to Density Conversions (10 points)

I implemented the "naive" SDF-to-density conversion from the NeuS paper and compared its results to the VolSDF method on the Lego dataset.

VolSDF Method (Task 7)

The VolSDF formula creates a symmetric density distribution around the SDF=0 surface. This results in a slightly softer, more "volumetric" appearance.

NeuS Method

The NeuS formula models a high density *inside* the surface and zero density outside. This can lead to a harder, more "solid" appearance, but can also be more sensitive to errors in the learned SDF, potentially creating small holes or noisy surfaces.