Assignment 3: Neural Volume Rendering and Surface Rendering

Part A: Neural Volume Rendering

Part 0: Transmittance Calculation (10 points)

Solution:

Below is my calculation of the transmittance for a ray traveling through a non-homogeneous medium:

Transmittance Calculation

1.3 Ray Sampling (5 points)

XY Grid

Ray Origins and Directions

1.4 Point Sampling (5 points)

Sampled points along Rays

1.5 Volume Rendering (20 points)

Rendered Volume

Rendered Depth Map of Box Volume

Part 2: Optimizing a Basic Implicit Volume (10 points)

Training Results:

Box Center: (0.25, 0.25, -0.0005)

Side Lengths: (2.00, 1.50, 1.50)

Before Training:

After Training:

Spiral Rendering:

Optimized Box Volume - Spiral Rendering

Part 3: Optimizing a Neural Radiance Field (NeRF) (20 points)

NeRF Rendering of Lego Bulldozer

Part 4: NeRF Extras (10 + 10 Extra Credit)

4.1 View Dependence

Implementation Notes:

Extended the NeRF model to handle view-dependent effects by incorporating ray direction into the network. The architecture processes 3D positions through 8 fully-connected layers with a skip connection at layer 4 to predict view-independent density. The resulting feature vector is then concatenated with harmonically-embedded ray directions (tested with n=2,4,6 frequency bands) and passed through additional layers to predict view-dependent RGB colors. This design ensures geometric consistency while enabling the model to capture specular reflections and glossy surface appearance.

Trade-offs:

Higher frequency direction embeddings (n=6) enable sharper specular highlights and more accurate view-dependent effects but increase the risk of overfitting to training views and produce less smooth interpolation for novel viewpoints. Lower frequencies (n=2) generalize better to unseen views with smoother transitions but cannot capture fine-grained specular details. The model also introduces geometry-appearance ambiguity where lighting effects may be incorrectly encoded as either view-dependent appearance or geometric features, particularly when training views are sparse.

Results - Materials Scene:

Without View Dependence

With View Dependence (2 directions)

With View Dependence (4 directions)

With View Dependence (6 directions)

Analysis:

Experiments with n=2, 4, and 6 directional harmonic functions reveal a clear trade-off between specular quality and generalization. With n=2, the model produces smooth, diffuse-like renderings that generalize well to novel views but fail to capture sharp specular highlights and mirror-like reflections. At n=6, specular highlights become crisp and reflections are accurately rendered on glossy surfaces, but the model tends to overfit training views, producing artifacts and color inconsistencies in novel viewpoints.

Part B: Neural Surface Rendering

5. Sphere Tracing (10 points)

Implementation Notes:

Implemented the sphere_tracing function in renderer.py. The algorithm:

Ray Marching: March along each ray starting from the near plane, querying the SDF at each point. Step forward by the SDF distance value, which represents the largest safe step that guarantees no surface intersection is skipped (sphere tracing property).
Intersection Detection: A ray-surface intersection is registered when the SDF value falls below the convergence threshold, indicating the ray is sufficiently close to the surface. Rays that don't converge are masked out.
Convergence Criteria: The algorithm terminates when either: (1) the SDF value is below the threshold (successful hit), or (2) the maximum iteration count (100) is reached (missed surface). A boolean mask tracks which rays successfully intersected, with non-converged rays flagged for appropriate handling in downstream rendering.

Results:

Torus Rendered with Sphere Tracing

6. Optimizing a Neural SDF (15 points)

Implementation Notes:

MLP Architecture:

The NeuralSurface model consists of two separate MLPs: one for predicting signed distance and another for predicting surface color. The distance MLP takes harmonically-embedded 3D positions as input and processes them through multiple fully-connected layers with ReLU activations and a skip connection at layer 4 (concatenating the original positional embedding to combat vanishing gradients). The final layer outputs a single unbounded distance value without activation, allowing the network to represent both positive (outside) and negative (inside) distances. The color MLP follows a similar architecture but outputs 3 values passed through a sigmoid activation to constrain RGB colors to [0,1]. Both networks use Xavier uniform initialization for stable training.

Eikonal Loss:

Implemented the eikonal constraint to enforce that the gradient norm of the SDF should be close to 1 everywhere: ||∇d(x)|| ≈ 1. This regularization is crucial because any function with zero level set at the target surface could fit the point cloud data, but only a true signed distance function has unit gradient magnitude. The eikonal loss penalizes deviations from this property using L2 loss: L_eikonal = (||∇d(x)|| - 1)². This encourages the network to learn geometrically meaningful distance values rather than an arbitrary implicit function, improving surface reconstruction quality and generalization.

Hyperparameters:

Harmonic functions (xyz): 4
Training epochs: 7000
Eikonal weight: 0.02

Results:

Input Point Cloud

Reconstructed Neural SDF Surface

7. VolSDF (15 points)

Intuitive Explanation of Parameters:

Alpha (α): Controls overall density magnitude and surface opacity in volume rendering.
Beta (β): Controls transition sharpness in SDF-to-density conversion (sigmoid(-β·sdf)). High beta creates gradual transitions; low beta creates step-like transitions near the surface.

Analysis Questions:

Q: How does high beta bias your learned SDF? What about low beta?

High beta creates gradual density transitions over larger spatial regions, biasing toward smoother surfaces. Low beta acts like a step function, concentrating density near the surface and learning more precise boundaries closer to the true zero level set.

Q: Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

High beta is easier-smoother gradients flow to more points, creating stable optimization. Low beta is harder since only points very close to the surface receive gradient signals, requiring the network to be near the correct solution to learn effectively.

Q: Would you be more likely to learn an accurate surface with high beta or low beta? Why?

Low beta produces more accurate surfaces with finer details closer to the true level set. High beta yields smoother surfaces that may miss fine geometry and drift from the zero level set.

Results:

beta = 0.05

beta = 0.05

beta = 0.1

Hyperparameter Settings:

I tried varying the beta. With a lower beta, we are able to capture finer details as compared to 0.1. This is because a lower beta allows for sharper transitions in the SDF-to-density conversion, resulting in more precise surface representations.

8. Neural Surface Extras (10 + 20 Extra Credit)

8.1 Render a Large Scene with Sphere Tracing (10 points)

Scene Composition:

The scene consists of 21 primitives rendered using sphere tracing with composed SDFs: 12 spheres (radius 0.3) arranged in a circular formation at radius 3.0 around the origin, 6 tori positioned at various locations (four at the corners ±1.5, ±1.5 and two along the z-axis at ±2.0) with major radius R=0.4-0.5 and minor radius r=0.15, and 3 boxes of varying sizes (0.3-0.4 units) placed at the origin and ±2.0 along the x-axis. The SDF composition uses minimum operations to combine all primitives, allowing efficient ray marching through the complex geometry.

Results:

Complex Scene with 21 Primitives Rendered with Sphere Tracing

8.2 Fewer Training Views (10 points)

VolSDF - Geometry (20 views)

VolSDF - Color (20 views)

NeRF - Color (20 views)

Observable Differences: With limited training views, NeRF missed out on finer details like red dotted colours on the lego. VolSDF's surface-based approach produces cleaner reconstructions with better defined boundaries and more reliable geometry interpolation between training views.

8.3 VolSDF vs NeUS (10 points)

VolSDF - Geometry

VolSDF Rendering

NeUS Geometry

NeUS - Color

Observable Differences: VolSDF clearly outperforms the sigmoid approach in capturing fine geometric details and complete surface coverage. Most notably, VolSDF accurately reconstructs the teeth on the excavator bucket, while the sigmoid method loses these small features entirely. The base platform also shows significant differences—VolSDF produces a flat, fully covered surface, whereas the sigmoid approach has holes or incomplete coverage in the base. These issues highlight the sigmoid method's tendency to miss thin structures and struggle with surface completeness. VolSDF's sharper density transitions and better zero level set alignment result in more faithful geometry reconstruction, especially for fine details and flat surfaces that require precise surface localization.