Assignment 3: Neural Volume Rendering and Surface Rendering

Name: Ishita Gupta
Andrew ID: ishitag


Part A: Neural Volume Rendering

Q0: Transmittance Calculation

Solution:

Transmittance Solution


Q1: Differentiable Volume Rendering

Q1.3: Ray Sampling (5 points)

Implemented get_pixels_from_image and get_rays_from_pixels in ray_utils.py.

Grid Visualization Rays Visualization
Grid Rays

I first generate per-pixel NDC coordinates in [-1,1]^2 using meshgrid(.., indexing='ij'), stack them as (x,y), and reshape to (HxW,2). I then form image-plane points (x,y,1) in NDC, unproject them with the camera to world space, set all ray origins to the camera center, and define directions by normalizing (world_point - origin).


Q1.4: Point Sampling

Implemented StratifiedSampler.forward() in sampler.py.

Visualization:

Sample Points

I implement stratified sampling by partitioning the depth range [min_depth, max_depth] into n_pts_per_ray equal bins, computing midpoints, and adding uniform random offsets within each bin. The 3D sample points are then computed as sample_points = ray_origins + z_vals * ray_directions, where z_vals are the stratified depth values, producing a structured point cloud along all camera rays.


Q1.5: Volume Rendering

Implemented:

  1. VolumeRenderer._compute_weights
  2. VolumeRenderer._aggregate
  3. Modified VolumeRenderer.forward to render depth maps
Spiral Rendering Depth Map
Part 1 GIF Depth

Q2: Optimizing a Basic Implicit Volume

Q2.1: Random Ray Sampling (5 points)

Implemented get_random_pixels_from_image in ray_utils.py.

H, W = image_size
rand_y = torch.randint(0, H, (num_pixels,))
rand_x = torch.randint(0, W, (num_pixels,))
xy_grid = convert_to_ndc(rand_x, rand_y, H, W)

Q2.2: Loss and Training

Used Mean Squared Error (MSE) loss between predicted and ground truth RGB values.

After training:

Before Training After Training
Before 0 After 0
Before 1 After 1

Part 2 GIF

I trained the box SDF model by randomly sampling rays from ground truth images and minimizing the MSE loss between predicted and ground truth RGB values. The model optimizes both the box center position and side lengths through gradient descent. Starting from an initial guess of a centered cube at origin with side lengths [1.5, 1.5, 1.5], the network discovered through differentiable volume rendering that the actual box is offset to (0.25, 0.25, 0) and is a rectangular prism elongated along the X-axis.


Q3: Optimizing a Neural Radiance Field (NeRF)

Implemented NeuralRadianceField class in implicit.py:

Training Configuration:

Results:

NeRF Lego

NeRF Architecture Design: The implementation follows the original NeRF paper architecture with several key components:

  1. Positional Encoding: Raw 3D coordinates are transformed using sinusoidal functions at multiple frequencies (2^0, 2^1, ..., 2^5). This encoding allows the MLP to represent high-frequency details that would be difficult to learn with raw coordinates alone.

  2. Deep MLP with Skip Connections: The 6-layer MLP with 128 hidden units provides sufficient capacity to represent complex 3D scenes. The skip connection at layer 3 concatenates the original 3D coordinates, providing a direct path for gradients and helping preserve high-frequency information.

  3. Output Processing: The network outputs raw values that are processed with ReLU (density) and Sigmoid (RGB) to ensure physical constraints are met.

Results Quality: The trained NeRF successfully learns the 3D geometry and appearance of the lego bulldozer scene. The spiral rendering shows:


Q4: NeRF Extras (10 points + Extra Credit)

Q4.1: View Dependence

I added view dependence by implementing a two-head architecture: a view-independent density head that processes only 3D position features, and a view-dependent RGB head that concatenates position features with direction embeddings. The direction embeddings use harmonic encoding of normalized ray directions, which are expanded per sample point and fed into the RGB head alongside the position features to enable material appearance to vary with viewing angle.

Results:

lego materials highres materials

View Dependence Lego|Materials Scene | Materials Scene High Res


Q4.2: Coarse/Fine Sampling

I implemented hierarchical (coarse-to-fine) sampling as described in the original NeRF paper. This approach uses two networks: a smaller coarse network that samples uniformly along rays to estimate density, and a fine network that performs importance sampling based on the coarse predictions. By concentrating samples near surfaces, this method aims to improve rendering quality while maintaining computational efficiency. The implementation produces functional results, though with some training instability.

I attempted to implement the hierarchical sampling approach from the original NeRF paper, which uses two networks (coarse and fine) with importance sampling. The approach works as follows:

  1. Coarse Network: First pass samples points uniformly along each ray and evaluates a smaller "coarse" NeRF network to get initial density estimates.

  2. Importance Sampling: Use the coarse network's density predictions to compute a probability distribution along each ray, identifying regions likely to contain surfaces.

  3. Fine Network: Sample additional points based on this importance distribution (denser sampling near surfaces) and evaluate the full "fine" network at both coarse and fine sample points.

I created a CoarseNeRF class with a smaller architecture (half the hidden units and layers) and modified the training loop to:

Results:

The hierarchical sampling implementation produced partial results. While the training was somewhat unstable initially, it did generate renderings:

Hierarchical NeRF

The results show that the hierarchical sampler is functional but maybe it is not fully optimized. The rendering quality is acceptable though not perfect (comparing to the previous results), indicating the coarse-to-fine sampling strategy is working to some degree.

Speed/Quality Trade-offs:

Challenges encountered:


Part B: Neural Surface Rendering

Q5: Sphere Tracing

Implemented sphere_tracing function in renderer.py: I implemented sphere tracing by marching along each ray in steps equal to the SDF value at the current point. Starting at the near plane, I normalize directions, iteratively update points, and stop when |SDF| < epsilon (hit) or the accumulated distance exceeds the far plane (miss). The function returns the final points and a boolean mask indicating which rays intersected the torus surface.

Results:

Torus


Q6: Optimizing a Neural SDF

I implement a dual-branch architecture with positional encoding: a 6-layer distance MLP (128 hidden units) with ReLU activations and optional skip connections for SDF prediction, and a separate 2-layer color MLP (128 hidden units) for RGB output. The distance head outputs raw signed distances (no activation), while the color head uses sigmoid to ensure 0-1 RGB range. Both branches use harmonic positional encoding (4 frequencies) on 3D coordinates to improve representation quality.

Eikonal Loss: Implemented in losses.py

```
grad_norm = torch.norm(gradients, dim=-1)    
eikonal_loss = torch.mean((grad_norm - 1.0) ** 2)
```

Results:

Input Point Cloud Reconstructed Surface
Input Surface

Q7: VolSDF

Extended Neural SDF from Q6 with:

  1. Color Network: 2-layer MLP (128 hidden units) with positional encoding, sigmoid activation for RGB output [0,1].

  2. SDF to Density: VolSDF Laplace CDF conversion - density = alpha * Psi_beta(-sdf) where Psi_beta is high near surface (sdf ≈ 0), exponentially decaying away from surface.

  3. Networks: 6-layer distance MLP (128 units), 2-layer color MLP (128 units), 6 harmonic frequencies

  4. Alpha=10.0, Beta=0.05, LR=0.0005 (decay gamma=0.8/50 epochs), Eikonal=0.02, Interior=0.1
  5. 1000 pretrain iterations on sphere SDF, bounds [-4,4]^3
Geometry Rendered Color
Geometry Color
  1. Alpha and Beta intuition:

  2. How does high beta bias your learned SDF? What about low beta?:

  3. Would an SDF be easier to train with volume rendering and low beta or high beta? Why?:

  4. Would you be more likely to learn an accurate surface with high beta or low beta? Why?:


Q8: Neural Surface Extras

Q8.1: Render a Large Scene with Sphere Tracing

I created a complex scene with 36 primitives arranged in an inverted cone structure (like a Christmas tree) with toruses on top. The scene uses SDF union operations (taking the minimum of multiple SDFs):

Command:

python -m surface_rendering_main --config-name=complex_scene

Results:

Complex Scene

"Come one, cheer up, it's nearly Christmas."

— Hagrid


Q8.2: Fewer Training Views

nerf volsdf volsdf geometry

nerf|nerf| nerf|

Trained both NeRF and VolSDF on only 20 views (vs. 100 standard). VolSDF uses stronger regularization (5x eikonal weight=0.1, 2x interior weight=0.2) and longer pretraining (2000 iters) to compensate for sparse data. The SDF-based representation with geometric priors (del_f=1) produces more consistent geometry in unobserved regions compared to NeRF, which tends to overfit or produce artifacts with limited views.


Q8.3: Alternate SDF to Density Conversions

Implemented three SDF-to-density conversion methods in renderer.py:

  1. VolSDF (Laplace CDF) - Original method using Laplace cumulative distribution:

  2. NeuS (Logistic Density) - Uses derivative of sigmoid function:

  3. Naive (Simple Exponential) - Basic exponential decay:

VolSDF NeuS Naive
VolSDF NeuS Naive