HW3 Neural Volume Rendering and Surface Rendering
Part A: Neural Volume Rendering
0. Transmittance Calculation
Here is a screenshot of the completed PDF with the transmittance calculations.

1. Differentiable Volume Rendering
1.3 Ray Sampling Visualization
Here are the visualizations for the generated pixel grid (xy_grid) and the initial camera rays.
1.4 Point Sampling Visualization
This image shows the stratified point samples generated along the rays from the first camera view.
1.5 Volume Rendering Visualization
Below are the final rendered color image and the corresponding depth map for the box scene. The depth map is normalized for visualization.
# TODO (Q1.5): Visualize depth
if cam_idx == 2 and file_prefix == '':
# 1. Get the depth tensor from the renderer's output.
depth_tensor = out['depth']
# 2. Reshape the depth tensor to the image dimensions (H, W).
depth_map = depth_tensor.view(image_size[1], image_size[0])
# 3. Move tensor to CPU and convert to NumPy array for saving.
depth_map_np = depth_map.detach().cpu().numpy()
# 4. Normalize the depth map to the [0, 1] range.
depth_map_normalized = (depth_map_np - depth_map_np.min()) / (depth_map_np.max() - depth_map_np.min())
# 5. Save the normalized depth map as an image using matplotlib.
plt.imsave("images/depth_visualization.png", depth_map_normalized, cmap='viridis')2. Optimizing a Basic Implicit Volume
2.2 Loss and Training Results
After training, the optimized parameters for the box are:
Box center: (0.2500002682209015, 0.2504751682281494, -0.000625148881226778)
Box side lengths: (2.0037100315093994, 1.5010567903518677, 1.5037394762039185)2.3 Visualization
This GIF shows a spiral rendering of the optimized box volume after training. It successfully learned the correct position and dimensions from the input images.

3. Optimizing a Neural Radiance Field (NeRF)
This GIF shows the rendered output from a spiral camera path after training the NeRF model on the lego bulldozer dataset.

# --- MLP Core Layers (Next Step) ---
hidden_dim = cfg.n_hidden_neurons_xyz
n_layers = cfg.n_layers_xyz
self.mlp_layers = torch.nn.ModuleList()
#first layer
self.mlp_layers.append(torch.nn.Linear(embedding_dim_xyz,hidden_dim))
self.mlp_layers.append(torch.nn.ReLU())
for _ in range(n_layers-1):
self.mlp_layers.append(torch.nn.Linear(hidden_dim,hidden_dim))
self.mlp_layers.append(torch.nn.ReLU())
# Output heads
self.density_output = torch.nn.Linear(hidden_dim,1)
self.color_output = torch.nn.Linear(hidden_dim,3)4. NeRF Extras (4.2 Coarse/Fine Sampling)
- Quality: By using a preliminary "coarse" pass to identify important regions along the ray (likely containing surfaces), the "fine" pass can focus its samples more densely in these areas. This importance sampling leads to much sharper details and better representation of complex geometry compared to uniformly sampling the same total number of points. It's more efficient at capturing high-frequency information.
- Speed:
- Per Iteration: Each training step is slightly slower because it involves two network evaluations (coarse and fine) and the overhead of calculating the sampling distribution from the coarse weights.
- Overall Convergence: However, it often reaches a higher quality level faster (in fewer total epochs) than a basic NeRF that would need many more uniform samples (and thus be much slower per iteration) to achieve similar sharpness.
- Training Stability: The two-network system can sometimes be trickier to tune. The fine network's performance depends on the coarse network providing useful weight distributions. Poor initial performance from the coarse network can slow down the fine network's learning. It may also require more careful adjustment of hyperparameters like the learning rate (often needing a smaller value) to remain stable, as you observed.
.gif)
Part B: Neural Surface Rendering
5. Sphere Tracing
This GIF shows a simple torus rendered using my implementation of the sphere tracing algorithm.

My implementation of sphere tracing finds the intersection point between viewing rays and the surface defined by a Signed Distance Function (SDF).
The core logic works iteratively:
- It starts each ray at its origin.
- In a loop, it queries the SDF at the ray's current position to get the distance
dto the nearest surface.
- It then safely advances the ray's position forward along its direction by this distance
d.
- The loop continues until either the distance
dbecomes very close to zero (indicating a surface hit) or a maximum number of iterations is reached / the ray travels beyond the far plane (indicating a miss).
The function returns the final 3D points reached by each ray and a boolean mask indicating which rays successfully intersected the surface within the allowed steps and distance.
6. Optimizing a Neural SDF
The input point cloud is shown on the left, and the surface rendered from the trained Neural SDF is on the right.
n_layers_distance: 6
n_hidden_neurons_distance: 128
n_epoch = 1000
# MLP layers
hidden_dim = cfg.n_hidden_neurons_distance
n_layers = cfg.n_layers_distance
self.mlp_layers = torch.nn.ModuleList()
self.mlp_layers.append(torch.nn.Linear(embedding_dim_xyz,hidden_dim))
self.mlp_layers.append(torch.nn.ReLU())
for _ in range(n_layers-1):
self.mlp_layers.append(torch.nn.Linear(hidden_dim,hidden_dim))
self.mlp_layers.append(torch.nn.ReLU())
self.distance_output = torch.nn.Linear(hidden_dim,1)A brief write-up on the MLP and Eikonal loss:
- MLP Architecture: The
NeuralSurfaceuses a 6-layer MLP with 128 hidden neurons and ReLU activations to predict signed distance from positionally encoded 3D coordinates.
- Eikonal Loss: This loss regularizes the MLP by penalizing deviations of the predicted distance field's gradient norm from 1, ensuring it learns a valid SDF.
Three losses used:
- On-Surface Loss:
torch.square(distances).mean()The primary objective forces the MLP's predicted distance to be zero for points sampled directly from the input point cloud. This anchors the SDF's zero-level set to the object's surface.
- Eikonal Loss:
eikonal_loss(eikonal_gradients)This regularizer ensures the learned function behaves like a true distance field. It penalizes the MLP if the gradient norm (or "steepness") of the predicted distance isn't one everywhere.
- Off-Surface Loss:
torch.exp(-100 * torch.abs(eikonal_distances)).mean()Another regularizer discourages the MLP from predicting near-zero distances for points sampled randomly away from the surface, promoting a clean separation between the surface and empty space
7. VolSDF
Below is the rendered geometry (using sphere tracing on the learned SDF) and the final color render (using volume rendering) after training the VolSDF model on the lego dataset.
Epoch: 0080, Loss: 0.008514, alpha: 10.0, beta: 0.05
- high beta biases create a density field that is softer with thicker transition while low beta bias towards a shaper thinner transition, concentrating density very close to the SDF's zero-level set, mimicking a more distinct surface.
- Based on observation, training with volume rendering is generally easier with a high
beta. The resulting thicker density shell provides useful gradients to more rays (even those passing near the surface), making the optimization more stable, especially in early stages. A lowbetaprovides sparser gradients initially, potentially making it harder for the model to start learning.
- A low
betais more likely to yield a more accurate final surface but take more time potentially. By forcing the density to be highly concentrated near the zero-crossing, it pushes the optimizer to learn a precise SDF, leading to a sharper and geometrically accurate surface representation. A highbetaallows more "fuzziness," which might result in a less precise geometry.
8. Neural Surface Extras (Chosen Option: [e.g., 8.2 Fewer Training Views])
- trained VolSDF (left) and NeRF (right) both with 200 epochs, similar hyperparameters and 20 views as oppose to full 100 views, VolSDF clearly yeld better and more crisp result









