HW3 Neural Volume Rendering and Surface Rendering

Part A: Neural Volume Rendering

0. Transmittance Calculation

Here is a screenshot of the completed PDF with the transmittance calculations.

1. Differentiable Volume Rendering

1.3 Ray Sampling Visualization

Here are the visualizations for the generated pixel grid (xy_grid) and the initial camera rays.

1.4 Point Sampling Visualization

This image shows the stratified point samples generated along the rays from the first camera view.

1.5 Volume Rendering Visualization

Below are the final rendered color image and the corresponding depth map for the box scene. The depth map is normalized for visualization.

# TODO (Q1.5): Visualize depth
if cam_idx == 2 and file_prefix == '':
    # 1. Get the depth tensor from the renderer's output.
    depth_tensor = out['depth']
    # 2. Reshape the depth tensor to the image dimensions (H, W).
    depth_map = depth_tensor.view(image_size[1], image_size[0])
    # 3. Move tensor to CPU and convert to NumPy array for saving.
    depth_map_np = depth_map.detach().cpu().numpy()
    # 4. Normalize the depth map to the [0, 1] range.
    depth_map_normalized = (depth_map_np - depth_map_np.min()) / (depth_map_np.max() - depth_map_np.min())
    # 5. Save the normalized depth map as an image using matplotlib.
    plt.imsave("images/depth_visualization.png", depth_map_normalized, cmap='viridis')

2. Optimizing a Basic Implicit Volume

2.2 Loss and Training Results

After training, the optimized parameters for the box are:

Box center: (0.2500002682209015, 0.2504751682281494, -0.000625148881226778)
Box side lengths: (2.0037100315093994, 1.5010567903518677, 1.5037394762039185)

2.3 Visualization

This GIF shows a spiral rendering of the optimized box volume after training. It successfully learned the correct position and dimensions from the input images.

3. Optimizing a Neural Radiance Field (NeRF)

This GIF shows the rendered output from a spiral camera path after training the NeRF model on the lego bulldozer dataset.

        # --- MLP Core Layers (Next Step) ---
        hidden_dim = cfg.n_hidden_neurons_xyz 
        n_layers = cfg.n_layers_xyz          
        self.mlp_layers = torch.nn.ModuleList() 

        #first layer
        self.mlp_layers.append(torch.nn.Linear(embedding_dim_xyz,hidden_dim))
        self.mlp_layers.append(torch.nn.ReLU())

        for _ in range(n_layers-1):
            self.mlp_layers.append(torch.nn.Linear(hidden_dim,hidden_dim))
            self.mlp_layers.append(torch.nn.ReLU())

        # Output heads
        self.density_output = torch.nn.Linear(hidden_dim,1)
        self.color_output = torch.nn.Linear(hidden_dim,3)

4. NeRF Extras (4.2 Coarse/Fine Sampling)

Quality: By using a preliminary "coarse" pass to identify important regions along the ray (likely containing surfaces), the "fine" pass can focus its samples more densely in these areas. This importance sampling leads to much sharper details and better representation of complex geometry compared to uniformly sampling the same total number of points. It's more efficient at capturing high-frequency information.

Speed:
- Per Iteration: Each training step is slightly slower because it involves two network evaluations (coarse and fine) and the overhead of calculating the sampling distribution from the coarse weights.
- Overall Convergence: However, it often reaches a higher quality level faster (in fewer total epochs) than a basic NeRF that would need many more uniform samples (and thus be much slower per iteration) to achieve similar sharpness.

Training Stability: The two-network system can sometimes be trickier to tune. The fine network's performance depends on the coarse network providing useful weight distributions. Poor initial performance from the coarse network can slow down the fine network's learning. It may also require more careful adjustment of hyperparameters like the learning rate (often needing a smaller value) to remain stable, as you observed.

Part B: Neural Surface Rendering

5. Sphere Tracing

This GIF shows a simple torus rendered using my implementation of the sphere tracing algorithm.

My implementation of sphere tracing finds the intersection point between viewing rays and the surface defined by a Signed Distance Function (SDF).

The core logic works iteratively:

It starts each ray at its origin.

In a loop, it queries the SDF at the ray's current position to get the distance d to the nearest surface.

It then safely advances the ray's position forward along its direction by this distance d.

The loop continues until either the distance d becomes very close to zero (indicating a surface hit) or a maximum number of iterations is reached / the ray travels beyond the far plane (indicating a miss).

The function returns the final 3D points reached by each ray and a boolean mask indicating which rays successfully intersected the surface within the allowed steps and distance.

6. Optimizing a Neural SDF

The input point cloud is shown on the left, and the surface rendered from the trained Neural SDF is on the right.

n_layers_distance: 6
n_hidden_neurons_distance: 128
n_epoch = 1000

# MLP layers
hidden_dim = cfg.n_hidden_neurons_distance
n_layers = cfg.n_layers_distance

self.mlp_layers = torch.nn.ModuleList()
self.mlp_layers.append(torch.nn.Linear(embedding_dim_xyz,hidden_dim))
self.mlp_layers.append(torch.nn.ReLU())

for _ in range(n_layers-1):
    self.mlp_layers.append(torch.nn.Linear(hidden_dim,hidden_dim))
    self.mlp_layers.append(torch.nn.ReLU())

self.distance_output = torch.nn.Linear(hidden_dim,1)

A brief write-up on the MLP and Eikonal loss:

MLP Architecture: The NeuralSurface uses a 6-layer MLP with 128 hidden neurons and ReLU activations to predict signed distance from positionally encoded 3D coordinates.

Eikonal Loss: This loss regularizes the MLP by penalizing deviations of the predicted distance field's gradient norm from 1, ensuring it learns a valid SDF.

📌

Three losses used:

On-Surface Loss: torch.square(distances).mean()The primary objective forces the MLP's predicted distance to be zero for points sampled directly from the input point cloud. This anchors the SDF's zero-level set to the object's surface.

Eikonal Loss: eikonal_loss(eikonal_gradients)This regularizer ensures the learned function behaves like a true distance field. It penalizes the MLP if the gradient norm (or "steepness") of the predicted distance isn't one everywhere.

Off-Surface Loss: torch.exp(-100 * torch.abs(eikonal_distances)).mean()Another regularizer discourages the MLP from predicting near-zero distances for points sampled randomly away from the surface, promoting a clean separation between the surface and empty space

7. VolSDF

Below is the rendered geometry (using sphere tracing on the learned SDF) and the final color render (using volume rendering) after training the VolSDF model on the lego dataset.

Epoch: 0080, Loss: 0.008514, alpha: 10.0, beta: 0.05

high beta biases create a density field that is softer with thicker transition while low beta bias towards a shaper thinner transition, concentrating density very close to the SDF's zero-level set, mimicking a more distinct surface.

Based on observation, training with volume rendering is generally easier with a high beta. The resulting thicker density shell provides useful gradients to more rays (even those passing near the surface), making the optimization more stable, especially in early stages. A low beta provides sparser gradients initially, potentially making it harder for the model to start learning.

A low beta is more likely to yield a more accurate final surface but take more time potentially. By forcing the density to be highly concentrated near the zero-crossing, it pushes the optimizer to learn a precise SDF, leading to a sharper and geometrically accurate surface representation. A high beta allows more "fuzziness," which might result in a less precise geometry.

8. Neural Surface Extras (Chosen Option: [e.g., 8.2 Fewer Training Views])

trained VolSDF (left) and NeRF (right) both with 200 epochs, similar hyperparameters and 20 views as oppose to full 100 views, VolSDF clearly yeld better and more crisp result