Assignment 3: Neural Volume Rendering and Surface Rendering¶

Name: Simson D'Souza, Andrew ID: sjdsouza, Email: sjdsouza@andrew.cmu.edu¶

A. Neural Volume Rendering (80 points)¶

0. Transmittance Calculation (10 points)¶

1. Differentiable Volume Rendering¶

1.3 Ray sampling (5 points)¶

No description has been provided for this image

Figure 1: Grid Visualization

Figure 2: Rays Visualization

1.4 Point sampling (5 points)¶

Figure 3: Point Samples from First Camera

1.5 Volume rendering (20 points)¶

Figure 4: Volume Render (Color) Visualization

Figure 5: Depth Visualization

2. Optimizing a basic implicit volume¶

2.1 Random ray sampling (5 points)¶

2.2 & 2.3 Loss and training, and Visualization (5 points)¶

Figure 8: Before Training - View 0

Figure 9: After Training - View 0

Figure 10: Before Training - View 1

Figure 11: After Training - View 1

Figure 12: Before Training - View 2

Figure 13: After Training - View 2

Figure 14: Before Training - View 3

Figure 15: After Training - View 3

Figure 16: Spiral Rendering After Training

3. Optimizing a Neural Radiance Field (NeRF) (20 points)¶

NeRF Reference Paper: NeRF: Representing Scenes as Neural Radiance Fields (arXiv:2003.08934)

The NeRF rendering visualizations below are without view dependence.

The high resolution render appears less blurry, more clear and sharp.

NeRF Lego Rendering High Resolution — Figure 18: NeRF Lego Rendering (High Resolution)

4. NeRF Extras¶

4.1. View Dependence (10 points)¶

Trade-Off: View Dependence vs. Generalization Quality:

Adding the viewing direction to the network's input enables it to learn complex specular highlights and reflections. The benefit is that it is necessary for materials like metal or glossy plastic to appear realistic and dynamic.
The network's ability to model the object's geometry (density) is preserved. Since density remains dependent only on position, the fundamental shape of the object stays consistent and stable regardless of the viewing angle.
View dependence introduces a high risk of overfitting to the training data's specific lighting conditions. The model may mistake a temporary reflection for the object's intrinsic color.
When overfitting occurs, the model performs poorly on unseen views. If the network relies too much on the viewing direction for the base color, novel views may show unstable, blurry, or incorrect colors because the network lacks the true base color data for that angle.

Following visualizations for lego and materials are with view dependence

Figure 19: NeRF Lego Rendering (Low Resolution)

Figure 20: NeRF Materials Rendering (Low Resolution)

Figure 21: NeRF Materials Rendering (High Resolution)

B. Neural Surface Rendering (50 points)¶

5. Sphere Tracing (10 points)¶

Core Principle of Sphere Tracing: Unlike volumetric rendering, it is an iterative algorithm that utilizes the SDF's distance value to ensure maximum efficiency. At every step, the SDF guarantees the minimum distance to the nearest surface. By advancing the ray exactly this minimum distance, the algorithm takes the largest possible "safe step" forward without missing the target surface. This makes the render time nearly independent of the scene complexity and significantly faster than fixed-step ray marching.

Implementation:

Initialization: I initialized the ray's depth array (depth) for the entire batch to the "self.near" plane. I also set up two boolean masks to manage state: active_mask (tracking rays still marching) and mask_hit (tracking rays that have successfully hit the surface).
Safe Step Advancement: In each iteration, I query the SDF at the current 3D point (current_points). The returned SDF distance is the largest safe step the ray can take without penetrating the surface. This distance is added to the ray's accumulated depth.
Termination: I stop the iterative process for a ray if it satisfies one of two conditions:

Successful Hit: The SDF distance falls below a small tolerance (eps).
Far Plane Limit: The ray's accumulated depth exceeds the "self.far" bound.

Vectorized Updates: I used boolean indexing and non-aliasing operations to update the depth, mask_hit, and active_mask across the entire tensor batch simultaneously, maximizing GPU efficiency.
By finding the final intersection depth for each ray, the function returns the resulting 3D intersection points and a boolean mask indicating which points are valid surface hits.

Figure 22: Torus Rendering with Sphere Tracing

6. Optimizing a Neural SDF (15 points)¶

MPL Implementation to predict distance:

Designed to map a 3D coordinate to a single predicted distance scalar.
Input Pipeline: The network receives the input coordinate ($\mathbf{x}$) already transformed by Positional Encoding to enhance the learning of fine surface detail.
Architecture: The core network (self.mlp_core_dist) is built with sequential hidden layers and periodic skip connections. These connections are vital, as they prevent the loss of low-frequency positional information deep within the network.
Output: The output is produced by the final linear layer (self.distance_head), yielding the scalar Signed Distance.

Eikonal Loss Implementation: To ensure the function learned by the MLP is a true distance function and not just an arbitrary zero boundary, the Eikonal Loss is implemented to randomly sampled points in the space around the surface.

This constraint is applied to randomly sampled points throughout the volume.
It enforces the mathematical property that the gradient magnitude of a valid SDF must be 1.0 everywhere: $\|\nabla f(\mathbf{x})\| = 1$. It calculates the MSE between the gradient magnitude and 1.0, effectively guaranteeing a smooth, stable, and accurate geometric representation .

I got good results with default hyperparameters values.

Figure 23: Input Point Cloud

Figure 24: Predicted SDF Surface

7. VolSDF (15 points)¶

Experiments:

Alpha	Beta	Geometry	Color
10.0 (Deafult)	0.05 (Default)
1.0	0.05
50.0	0.05
10.0	0.5

What does parameters $\alpha$ and $\beta$ do?

$\alpha$ (Opacity Scale): This parameter acts as a scaling factor for the overall density. A high $\alpha$ makes the entire volume more opaque or "denser," ensuring the object appears solid even if the underlying SDF is complex.
$\beta$ (Sharpness/Smoothness): This parameter controls the steepness of the density transition near the object's surface ($d(\mathbf{x})=0$). A crisp surface (low $\beta$) or a soft cloud (high $\beta$).

Based on experiments with various $\alpha$ and $\beta$ values,

How does high $\beta$ bias the learned SDF? What about low $\beta$?

High $\beta$ biases the geometry toward a smooth, soft boundary, meaning density fades slowly. The surface does not appear precisely and fails to render smaller structures.
Low $\beta$ forces the density to be concentrated sharply at the geometric surface and network learns surface locations better.

Would an SDF be easier to train with volume rendering and low $\beta$ or high $\beta$? Why?

It is easier to train with high $\beta$. This parameter creates a smoother density field, resulting in less drastic and more stable gradients for the optimizer.
Low $\beta$ creates extremely sharp density transitions. These steep transitions result in unstable, high-magnitude gradients in the volume rendering loss function, which can cause the neural network training process to become unstable and fail.

Would you be more likely to learn an accurate surface with high $\beta$ or low $\beta$? Why?

Low $\beta$ enforces that the volumetric contributions are tightly linked to the surface, resulting in a cleaner, less blurry geometric reconstruction. It severely penalizing density that exists even slightly away from $\mathbf{d}(\mathbf{x}) = 0$ and this forces the network to learn a mathematically precise surface definition.
Also, this should not be too low, otherwise it becomes unstable and surface is not defined precisely.

After multiple experiments, I got the best results on default values.

8. Neural Surface Extras¶

8.1 Render a Large Scene with Sphere Tracing (10 points)¶

The complex scene is composed of 25 individual geometric primitives, grouped into five distinct sets, all placed at separate locations in 3D space.

Spheres: Linearly arranged along the X-axis (on the ground plane).
Boxes: Linearly arranged along the Y-axis.
Torus: Scattered horizontally at a high elevation (Z=1.0).
Capsules: Arranged in a horizontal line (parallel to sphere group), aligned along the X-axis.
Octahedrons: Arranged in a line parallel to the Capsule group, but offset in Y and Z.

The entire scene is defined by the "ComplexSceneSDF" class, which acts as the master geometry function.

The class applies the Union operation ($\min$ function) to all 25 primitives sequentially in its forward method. This operation mathematically merges all shapes into one single, continuous geometry.
Any point queried during rendering returns the shortest distance to the combined boundary of all 25 shapes. This allows the Sphere Tracer to render the large, complex scene just as efficiently as it would a single primitive.

Figure 25: Complex Scene with 25 primitives