Assignment 3: Neural Volume Rendering and Surface Rendering¶

Name: Simson D'Souza, Andrew ID: sjdsouza, Email: sjdsouza@andrew.cmu.edu¶


A. Neural Volume Rendering (80 points)¶

0. Transmittance Calculation (10 points)¶

Transmittance Calculation
Transmittance Calculation
Transmittance Calculation
Transmittance Calculation

1. Differentiable Volume Rendering¶

1.3 Ray sampling (5 points)¶

No description has been provided for this image
Figure 1: Grid Visualization
No description has been provided for this image
Figure 2: Rays Visualization

1.4 Point sampling (5 points)¶

Point Samples from First Camera
Figure 3: Point Samples from First Camera

1.5 Volume rendering (20 points)¶

No description has been provided for this image
Figure 4: Volume Render (Color) Visualization
No description has been provided for this image
Figure 5: Depth Visualization

2. Optimizing a basic implicit volume¶

2.1 Random ray sampling (5 points)¶

Random Ray Sampling
Figure 6: Random Ray Sampling Code

2.2 & 2.3 Loss and training, and Visualization (5 points)¶

Mean Squared Error Loss
Figure 7: Mean Squared Error Loss
No description has been provided for this image
Figure 8: Before Training - View 0
No description has been provided for this image
Figure 9: After Training - View 0
No description has been provided for this image
Figure 10: Before Training - View 1
No description has been provided for this image
Figure 11: After Training - View 1
No description has been provided for this image
Figure 12: Before Training - View 2
No description has been provided for this image
Figure 13: After Training - View 2
No description has been provided for this image
Figure 14: Before Training - View 3
No description has been provided for this image
Figure 15: After Training - View 3
Spiral Rendering After Training
Figure 16: Spiral Rendering After Training

3. Optimizing a Neural Radiance Field (NeRF) (20 points)¶

NeRF Reference Paper: NeRF: Representing Scenes as Neural Radiance Fields (arXiv:2003.08934)

The NeRF rendering visualizations below are without view dependence.

NeRF Lego Rendering
Figure 17: NeRF Lego Rendering

The high resolution render appears less blurry, more clear and sharp.

NeRF Lego Rendering High Resolution
Figure 18: NeRF Lego Rendering (High Resolution)

4. NeRF Extras¶

4.1. View Dependence (10 points)¶

Trade-Off: View Dependence vs. Generalization Quality:

  1. Adding the viewing direction to the network's input enables it to learn complex specular highlights and reflections. The benefit is that it is necessary for materials like metal or glossy plastic to appear realistic and dynamic.
  2. The network's ability to model the object's geometry (density) is preserved. Since density remains dependent only on position, the fundamental shape of the object stays consistent and stable regardless of the viewing angle.
  3. View dependence introduces a high risk of overfitting to the training data's specific lighting conditions. The model may mistake a temporary reflection for the object's intrinsic color.
  4. When overfitting occurs, the model performs poorly on unseen views. If the network relies too much on the viewing direction for the base color, novel views may show unstable, blurry, or incorrect colors because the network lacks the true base color data for that angle.

Following visualizations for lego and materials are with view dependence

No description has been provided for this image
Figure 19: NeRF Lego Rendering (Low Resolution)
No description has been provided for this image
Figure 20: NeRF Materials Rendering (Low Resolution)
No description has been provided for this image
Figure 21: NeRF Materials Rendering (High Resolution)

B. Neural Surface Rendering (50 points)¶

5. Sphere Tracing (10 points)¶

Core Principle of Sphere Tracing: Unlike volumetric rendering, it is an iterative algorithm that utilizes the SDF's distance value to ensure maximum efficiency. At every step, the SDF guarantees the minimum distance to the nearest surface. By advancing the ray exactly this minimum distance, the algorithm takes the largest possible "safe step" forward without missing the target surface. This makes the render time nearly independent of the scene complexity and significantly faster than fixed-step ray marching.

Implementation:

  1. Initialization: I initialized the ray's depth array (depth) for the entire batch to the "self.near" plane. I also set up two boolean masks to manage state: active_mask (tracking rays still marching) and mask_hit (tracking rays that have successfully hit the surface).
  2. Safe Step Advancement: In each iteration, I query the SDF at the current 3D point (current_points). The returned SDF distance is the largest safe step the ray can take without penetrating the surface. This distance is added to the ray's accumulated depth.
  3. Termination: I stop the iterative process for a ray if it satisfies one of two conditions:
  • Successful Hit: The SDF distance falls below a small tolerance (eps).
  • Far Plane Limit: The ray's accumulated depth exceeds the "self.far" bound.
  1. Vectorized Updates: I used boolean indexing and non-aliasing operations to update the depth, mask_hit, and active_mask across the entire tensor batch simultaneously, maximizing GPU efficiency.
  2. By finding the final intersection depth for each ray, the function returns the resulting 3D intersection points and a boolean mask indicating which points are valid surface hits.
Torus Rendering with Sphere Tracing
Figure 22: Torus Rendering with Sphere Tracing

6. Optimizing a Neural SDF (15 points)¶

MPL Implementation to predict distance:

  1. Designed to map a 3D coordinate to a single predicted distance scalar.
  2. Input Pipeline: The network receives the input coordinate ($\mathbf{x}$) already transformed by Positional Encoding to enhance the learning of fine surface detail.
  3. Architecture: The core network (self.mlp_core_dist) is built with sequential hidden layers and periodic skip connections. These connections are vital, as they prevent the loss of low-frequency positional information deep within the network.
  4. Output: The output is produced by the final linear layer (self.distance_head), yielding the scalar Signed Distance.

Eikonal Loss Implementation: To ensure the function learned by the MLP is a true distance function and not just an arbitrary zero boundary, the Eikonal Loss is implemented to randomly sampled points in the space around the surface.

  1. This constraint is applied to randomly sampled points throughout the volume.
  2. It enforces the mathematical property that the gradient magnitude of a valid SDF must be 1.0 everywhere: $\|\nabla f(\mathbf{x})\| = 1$. It calculates the MSE between the gradient magnitude and 1.0, effectively guaranteeing a smooth, stable, and accurate geometric representation .

I got good results with default hyperparameters values.

No description has been provided for this image
Figure 23: Input Point Cloud
No description has been provided for this image
Figure 24: Predicted SDF Surface

7. VolSDF (15 points)¶

Experiments:

Alpha Beta Geometry Color
10.0 (Deafult) 0.05 (Default) No description has been provided for this image No description has been provided for this image
1.0 0.05 No description has been provided for this image No description has been provided for this image
50.0 0.05 No description has been provided for this image No description has been provided for this image
10.0 0.5 No description has been provided for this image No description has been provided for this image

What does parameters $\alpha$ and $\beta$ do?

  • $\alpha$ (Opacity Scale): This parameter acts as a scaling factor for the overall density. A high $\alpha$ makes the entire volume more opaque or "denser," ensuring the object appears solid even if the underlying SDF is complex.
  • $\beta$ (Sharpness/Smoothness): This parameter controls the steepness of the density transition near the object's surface ($d(\mathbf{x})=0$). A crisp surface (low $\beta$) or a soft cloud (high $\beta$).

Based on experiments with various $\alpha$ and $\beta$ values,

  1. How does high $\beta$ bias the learned SDF? What about low $\beta$?
  • High $\beta$ biases the geometry toward a smooth, soft boundary, meaning density fades slowly. The surface does not appear precisely and fails to render smaller structures.
  • Low $\beta$ forces the density to be concentrated sharply at the geometric surface and network learns surface locations better.
  1. Would an SDF be easier to train with volume rendering and low $\beta$ or high $\beta$? Why?
  • It is easier to train with high $\beta$. This parameter creates a smoother density field, resulting in less drastic and more stable gradients for the optimizer.
  • Low $\beta$ creates extremely sharp density transitions. These steep transitions result in unstable, high-magnitude gradients in the volume rendering loss function, which can cause the neural network training process to become unstable and fail.
  1. Would you be more likely to learn an accurate surface with high $\beta$ or low $\beta$? Why?
  • Low $\beta$ enforces that the volumetric contributions are tightly linked to the surface, resulting in a cleaner, less blurry geometric reconstruction. It severely penalizing density that exists even slightly away from $\mathbf{d}(\mathbf{x}) = 0$ and this forces the network to learn a mathematically precise surface definition.
  • Also, this should not be too low, otherwise it becomes unstable and surface is not defined precisely.

After multiple experiments, I got the best results on default values.


8. Neural Surface Extras¶

8.1 Render a Large Scene with Sphere Tracing (10 points)¶

The complex scene is composed of 25 individual geometric primitives, grouped into five distinct sets, all placed at separate locations in 3D space.

  • Spheres: Linearly arranged along the X-axis (on the ground plane).
  • Boxes: Linearly arranged along the Y-axis.
  • Torus: Scattered horizontally at a high elevation (Z=1.0).
  • Capsules: Arranged in a horizontal line (parallel to sphere group), aligned along the X-axis.
  • Octahedrons: Arranged in a line parallel to the Capsule group, but offset in Y and Z.

The entire scene is defined by the "ComplexSceneSDF" class, which acts as the master geometry function.

  • The class applies the Union operation ($\min$ function) to all 25 primitives sequentially in its forward method. This operation mathematically merges all shapes into one single, continuous geometry.
  • Any point queried during rendering returns the shortest distance to the combined boundary of all 25 shapes. This allows the Sphere Tracer to render the large, complex scene just as efficiently as it would a single primitive.
Complex Scene with 25 primitives
Figure 25: Complex Scene with 25 primitives