16-825 Assignment 3: Neural Volume Rendering and Surface Rendering

A. Neural Volume Rendering (80 points)

A.0. Transmittance Calculation (10 points)

Please compute the transmittance for the non-homogeneous medium as shown in the assignment.

Q 1 Q 2 Q 3 Q 4
Transmittance calculation step 1 Transmittance calculation step 2 Transmittance calculation step 3 Transmittance calculation step 4

1. Differentiable Volume Rendering (30 points)

1.1 Ray Sampling (5 points)

Implementation of ray generation from camera pixels and visualization of the grid and rays.

Grid Visualization Ray Visualization
Grid visualization Ray visualization

1.2 Point Sampling (5 points)

Stratified sampling of points along rays.

Sample points visualization

1.3 Volume Rendering (20 points)

Implementation of differentiable volume rendering.

Rendered Depth Map Spiral Animation
Depth map Part 1 animation

2. Optimizing a Basic Implicit Volume (10 points)

2.1 Random Ray Sampling (5 points)

Implementation of random ray sampling for training.

Spiral Rendering
Part 2 animation

2.2 Loss and Training (5 points)

Box Center: [0.25 0.25 0.00]
Side Lengths: [2.00 1.50 1.50]

3. Optimizing a Neural Radiance Field (NeRF) (20 points)

Implementation of a Neural Radiance Field using MLP.

Final NeRF Rendering
Part 3 NeRF animation

Discussion: The NeuralRadianceField architecture consists of:
- Positional Encoding: HarmonicEmbedding transforms 3D coordinates (3 → embedding_dim_xyz) and view directions (3 → embedding_dim_dir) to higher-dimensional frequency embeddings
- Geometry Branch (MLP_xyz): Processes positionally encoded coordinates through MLPWithInputSkips with n_layers, output_dim hidden_dim neurons, and input_skips at specified layers. This branch learns the 3D geometric structure
- Density Prediction: Single linear layer maps hidden_dim → 1 to output volume density σ. Density is activated with ReLU to ensure non-negativity
- View-Dependent Branch (optional): When view=True, processes direction embeddings through separate MLP_dir. Skip connections at specified layers concatenate direction embeddings to maintain view-dependent information flow
- Color Prediction: Final layer maps concatenated features [MLP_xyz_output, MLP_dir_output] → 3 for RGB, or just MLP_xyz_output → 3 when view-independent. Output is passed through Sigmoid to constrain RGB ∈ [0,1]
- Input Skips: Skip connections concatenate positional encodings at specified layers, facilitating learning of high-frequency geometric and view-dependent effects

4. NeRF Extras (Choose One - 10 points)

4.1 View Dependence (10 points) - EXTRA: +10

Implementation of view-dependent rendering with the materials scene.

Without View Dependence With View Dependence
No view dependence With view dependence

Discussion: Using view dependence improves the quality of the rendering, but it also makes the rendering overfits. In the worst case the rendering can become a direct delta function of the view direction. Hence the generalization quality is degraded.

4.2 Coarse/Fine Sampling (10 points) - EXTRA: +10

Implementation of two-stage coarse/fine sampling strategy.

Coarse Network Only Fine Network
No view dependence With view dependence

Discussion: Importance sampling from the coarse network significantly improves rendering accuracy by focusing sample points on regions with voids. However, it increases computational cost, degrading both inference speed and training time due to the additional forward passes through both coarse and fine networks.

B. Neural Surface Rendering (50 points)

5. Sphere Tracing (10 points)

Implementation of sphere tracing for rendering an SDF.

Torus SDF Sphere Traced
Part 5 sphere tracing

Implementation Details: Sphere tracing implementation with pseudocode:

Algorithm:
1. Initialize distances = 0, mask = False for all rays
2. For each iteration (max_iters):
    a. Compute points = origins + directions × distances
    b. Evaluate SDF(points) = implicit_fn(points)
    c. Check convergence: |SDF(points)| < ε
    d. Update mask with converged rays
    e. Early termination if all rays converged
    f. Update distances = distances + SDF(points)
3. Return final points and convergence mask

Key Insight: The SDF value at a point gives the distance to the nearest surface. By marching along the ray direction and stepping by the SDF value, we stay within the safe "sphere" and converge to the surface when |SDF| < ε.

6. Optimizing a Neural SDF (15 points)

Implementation of MLP for neural SDF with eikonal regularization.

Input Point Cloud Optimized SDF
Input point cloud Optimized SDF

MLP Architecture: The NeuralSurface architecture consists of:
- Positional Encoding: HarmonicEmbedding transforms 3D coordinates (3 → embedding_dim_xyz) to higher-dimensional frequency embeddings for capturing fine geometric detail
- Distance Branch (MLP_xyz): Processes positionally encoded coordinates through MLPWithInputSkips with n_layers_distance layers, n_hidden_neurons_distance hidden units, and input_skips at layers specified in append_distance. Skip connections concatenate positional encodings to maintain high-frequency geometric information flow
- SDF Prediction: Single linear layer maps n_hidden_neurons_distance → 1 to output signed distance values. No activation is applied, allowing the network to output both positive and negative distances
- Color Branch (MLP_color): Separate branch with n_layers_color layers processes positionally encoded coordinates for appearance modeling, with input_skips specified in append_color to facilitate view-dependent or material properties
- RGB Prediction: Linear layer maps n_hidden_neurons_color → 3 to output RGB color values. Output is passed through Sigmoid to constrain RGB ∈ [0,1]
- Design: Two-branch architecture separates geometry (distance) and appearance (color) learning, allowing independent optimization while sharing positional encodings

Eikonal Loss: The eikonal equation enforces that the gradient of the SDF has unit magnitude everywhere, ensuring the learned function is a valid signed distance field.

Implementation Pseudocode:
1. Compute gradients ∇SDF of the predicted distances with respect to input positions
2. Compute norm: ||∇SDF|| = √(∇SDF_x² + ∇SDF_y² + ∇SDF_z²)
3. Compute penalty: (||∇SDF|| - 1)²
4. Return mean loss: E[(||∇SDF|| - 1)²]

Mathematical Form: L_eikonal = E_p[(||∇SDF(p)|| - 1)²], where ||∇SDF|| is the Euclidean norm of the SDF gradient, ensuring unit gradients at all sampled points p.

Rationale: This constraint regularizes the neural SDF to have locally planar level sets, preventing the network from learning arbitrary functions and encouraging geometric coherence consistent with true signed distance fields.

7. VolSDF (15 points)

Implementation of VolSDF with SDF to density conversion and color prediction.

Geometry Final Rendering
Geometry Final rendering

SDF to Density Conversion: Intuitive explanation of alpha and beta parameters.

Questions:

  1. Intuitive explanation of alpha and beta parameters?
  2. β (beta): Represents the thickness of the surface boundary — it determines how many points around the zero-level set are considered part of the surface (i.e., how "soft" or "sharp" the surface boundary is).
    α (alpha): Represents the overall density scaling — it controls the magnitude of densities throughout the volume. Reducing α decreases the density values of all points across the entire space.

  3. How does high beta bias your learned SDF? What about low beta?
  4. High β: Increases the confidence regions, allowing more points to be trusted as surface points.
    Low β: Decreases the confidence bound, meaning fewer points can be trusted as surface points.

  5. Would an SDF be easier to train with volume rendering and low beta or high beta? Why?
  6. High β is easier to train since more points can be trusted on the surface, reducing the chances of model collapse and providing better gradient flow during optimization.

  7. Would you be more likely to learn an accurate surface with high beta or low beta? Why?
  8. Low β: Provides sharper surfaces with finer detail preservation, but is difficult to train and prone to model collapse.
    High β: Produces smoother average surfaces but worse surface quality and less detail.

Best Hyperparameters: Describe the settings you chose and why they work well.

I chose β = 0.05 based on hyperparameter tuning. I experimented with β = 0.01, 0.05, and 0.1, and found that β = 0.05 provides the best balance between surface quality and training stability, capturing fine geometric details while avoiding model collapse.

β = 0.01 β = 0.05 β = 0.1
Geometry Geometry beta 0.01 Geometry beta 0.05 Geometry beta 0.1
Final Rendering Rendering beta 0.01 Rendering beta 0.05 Rendering beta 0.1

8. Neural Surface Extras (Choose One - 10 points)

8.2 Fewer Training Views (10 points) - EXTRA: +10

Training VolSDF with fewer views and comparison with NeRF.

VolSDF with 20 Views VolSDF with 20 Views (Geometry) NeRF with 20 Views
VolSDF 20 views VolSDF 20 views (Geometry) NeRF 20 views

Discussion: VolSDF enforces that the underlying field follows SDF properties. This strong inductive bias constrains the solution space, producing more consistent geometry with better regularization. Hence, the geometry quality in VolSDF is superior to NeRF. However, in terms of rendering quality, NeRF outperforms VolSDF due to its richer volume representation.

8.3 Alternate SDF to Density Conversions (10 points) - EXTRA: +10

Comparison of different SDF to density conversion methods.

VolSDF Method Alternative Method (NeS)
Geometry Geometry
VolSDF geometry NeS geometry
Rendering Rendering
VolSDF rendering NeS rendering

Discussion: NeS models only the zero level set of the SDF, making it simpler conceptually. However, it is difficult to optimize and achieve good results in practice. VolSDF models continuous volume density throughout the space, enabling significantly more accurate surface reconstruction compared to NeS's zero level set approach.