Assignment 3 - Neural Volume Rendering and Surface Rendering

Author: Kailash Jagadeesh

Course: 16-825 Learning for 3D Vision — Carnegie Mellon University

0. Transmittance Calculation

The base case for transmittance at y1 is T(x,y1)=1T(x, y1) = 1.

The inductive case is T(x,xti)=T(x,xti1)eσti1ΔtT(x, x_{t_i}) = T(x, x_{t_{i-1}}) e^{- \sigma_{t_{i-1}} \cdot \Delta t}

T(y1,y2)=e12=e2T(y1, y2) = e^{-1 \cdot 2} = e^{-2}

T(y2,y3)=e0.51=e0.5T(y2, y3) = e^{-0.5 \cdot 1} = e^{-0.5}

T(y3,y4)=e103=e30T(y3, y4) = e^{-10 \cdot 3} = e^{-30}

For the given figure

1. Differentiable Volume Rendering

1.3 Ray Sampling

Here are the resulting XY Grid and Rays outputs:

1.4 Point Sampling

Here is the render points output that shows the point samples:

1.5 Volume Rendering

Here is the trained spinning box:

Here is the visualized depth of the box from one angle:

2. Optimizing a basic implicit volume

2.2 Loss and training

The optimal box center is at (0.25,0.25,0.00)(0.25, 0.25, 0.00) and the side lengths are (2.00,1.50,1.50)(2.00, 1.50, 1.50).

2.3 Visualization

3. Optimizing a Neural Radiance Field

4. NeRF extras

4.1 View Dependance

Incorporating view dependence enables more accurate modeling of reflective surfaces—such as polished metal—by capturing how appearance changes with viewing angle. This leads to a higher-fidelity representation of materials compared to models that rely solely on XYZ point coordinates. However, strong view dependence can reduce generalization performance. As demonstrated in the VDN-NeRF paper, if not properly regularized, neural networks may lose fine geometric details in scenes with high view dependence.

Here is the same Lego scene rendered with view dependence:

Here are the Materials and High-res materials scenes:

5. Sphere Tracing

Here is the rendered torus:

Sphere tracing iteratively marches rays through a scene to find surface intersections. In each step, the algorithm evaluates the implicit function at the current point to estimate its signed distance from the surface. If this value drops below a small threshold (epsilon), the point is considered to have reached the surface, and it is marked as a hit. Otherwise, the ray advances along its direction by the distance returned from the implicit function. This process repeats for a fixed number of iterations. After all iterations, points marked as hits indicate successful surface intersections, while the remaining points correspond to rays that did not encounter any surface within the iteration limit.

6. Optimizing a Neural SDF

Here is the predicted NeuralSDF for the bunny, given the input point cloud:

The MLP follows a similar structure, employing a single network with skip connections applied at the fourth layer. A key distinction from the NeRF setup is that the distance prediction head does not include any nonlinear activation. Because distance values are continuous and unbounded, the final layer is implemented as a simple linear output.

The eikonal loss imposes the condition that the magnitude of the gradients should be equal to one, thereby satisfying the eikonal constraint. This is achieved by minimizing the mean squared error between the gradient norm and the target value of 1, which drives the gradients toward unit magnitude as training progresses.

7. VolSDF

In this model, α (alpha) and β (beta) control how the signed distance function (SDF) is turned into surface density.
Alpha sets the maximum density inside the object — higher alpha values make the surface look more opaque, while lower values make it more transparent.

Beta determines how sharp or smooth the surface boundary appears. A low beta makes the transition at the object’s edge very sharp, creating crisp but sometimes unstable or jagged boundaries. A high beta, on the other hand, smooths out the transition, producing softer, blurrier edges that are easier for the network to learn.

Although training is usually more stable with higher beta values, accurately capturing fine surface details requires lowering beta. For instance, using a very low beta (e.g., 0.025) produced sharper surfaces but also introduced false surface detections near the base platform.

8. Neural Surface Extras

8.2 Fewer Training Views

The VolSDF model was trained using only 20 input views, as shown below:

Under the same conditions, NeRF failed to converge with its default configuration. After experimenting with different parameters, increasing the number of points per ray allowed NeRF to successfully reconstruct the following scene:

However, the NeRF output appears noticeably blurred around the edges, indicating that 20 views were insufficient for the network to fully capture scene details.

By increasing the input images to 80, NeRF was able to converge reliably. This setup produced the following reconstruction:

For comparison, the corresponding VolSDF reconstruction trained on the same 80 views is shown below: