Assignment 3: Neural Volume Rendering and Surface Rendering

Andrew ID: abhinavm

Setup

I used AWS and spun an ec2 instance with the AMI (Deep Learning AMI GPU PyTorch 2.1.0 (Ubuntu 20.04) 20231103) with a g4dn.xlarge instance. Few set up tips if you’re using something similar, since the CUDA version of this image is 12.1, there will be some issues with cuda+torch+pytorch compatibility, so install torch, torchvision 2.5.1 (compatible with cuda 12.1) and VERY IMPORTANTLY use this following command to install pytorch3d:

pip install --extra-index-url https://miropsota.github.io/torch_packages_builder pytorch3d==0.7.8+pt2.5.1cu121

0. Transmittance Calculation

I’ve attached a screenshot of rendered editted tex file:

1. Differentiable Volume Rendering

Command: python volume_rendering_main.py --config-name=box

1.3 Ray sampling (5 points)

The outputs of grid/ray visualization:

Grid Rays

1.4. Point sampling (10 points)

Visualization of the point samples from the first camera:

1.5. Volume rendering (30 points)

Part 1 visualization and depth visualization:

2. Optimizing a basic implicit volume

Command: python volume_rendering_main.py --config-name=train_box

2.2. Loss and training (5 points)

Box center: (0.25, 0.25, 0.00)
Box side lengths: (2.00, 1.50, 1.50)

2.3. Visualization

Provided GIF	My Output

3. Optimizing a Neural Radiance Field (NeRF) (30 points)

Command: python volume_rendering_main.py --config-name=nerf_lego

Visualization of the lego object with my NeRF (without view dependence)

Provided GIF	My Output

4.1 View Dependence (10 pts)

Multiple configs available: nerf_materials_highres.yaml, nerf_lego_view_dependence.yaml (I have added a variable to handle whether we want to implement with view dependence or not) Command: python volume_rendering_main.py --config-name=nerf_materials_highres

Adding view dependence lets the model capture specular highlights and reflections, which makes materials like metal or glass look way more realistic. There b=might be overfitting though, as the network can just memorize the training views instead of learning actual 3D structure. The standard fix is to only use view direction after predicting density, so geometry stays consistent across views.

View-dependent models need more training data to work well since they’re learning a higher-dimensional function. With sparse viewpoints, the model struggles to interpolate between what it’s seen, and you end up with artifacts or blurriness in novel views. This is especially noticeable on shiny surfaces where the appearance changes rapidly with viewing angle.

The tradeoff lies in the fact that view dependence preserves fine details and makes surfaces look more physically accurate, but it can also introduce temporal inconsistencies. When rendering a smooth camera path, you might see flickering on reflective surfaces in regions with limited training coverage. So you get better peak quality on well-covered views but potentially worse stability overall.

On the lego example, view dependence didnt seem to have much of an effect, but it is more visible in the rendering of the highres materials.

View Dependent NeRF	View Independent NeRF

5. Sphere Tracing (10 points)

Command: python -m surface_rendering_main --config-name=torus_surface

Ray Marching algorithm (implemented in renderer.py) which marches along each ray from the near plane by repeatedly querying the SDF and stepping forward by the returned distance value (this sphere tracing property guarantees we won’t skip over any surfaces). An intersection is detected when the SDF drops below the convergence threshold, meaning we’re close enough to the surface. The stopping criteria is either when we hit the surface or after 100 iterations if we miss, and we track successful intersections with a boolean mask to handle non-converged rays appropriately in the rendering pipeline.

6. Optimizing a Neural SDF (15 points)

Command: python -m surface_rendering_main --config-name=points_surface

The neural SDF MLP takes in a 3D point and outputs a scalar value representing its predicted signed distance to the surface (0 for points on the surface). We optimize it by training the network to output zero for the observed point cloud points.

The implemented MLP is quite similar to the NeRF one but with a few key differences. Similar setup of positionally encoded input points using Harmonic embeddings with 4 harmonic functions, 6 layers of Linear+ReLU (hidden dimension 128) for distance prediction, and no skip connections. Linear output layer that maps from the feature vector to a single scalar distance prediction and for this question, we are just predicting distance, not color

Eikonal Loss: The objective is to minimize the mean squared distance for the observed points (since they should be 0) and the Eikonal Loss. To encourage our MLP to actually learn a proper SDF, the norm of the gradients are pushed to = 1. This is a property of a true SDF because its gradient should point outward from the surface of the object and its magnitude (i.e. norm) should indicate the rate of change of distance, which = 1 for an actual distance field. We enforce it by:

\[\mathcal{L}_{\text{Eikonal}} = \mathbb{E}_{\mathbf{x}}\left[\left(\|\nabla_{\mathbf{x}} f(\mathbf{x})\|_2 - 1\right)^2\right]\]

Training: Putting all this together, we can optimize a Neural SDF for a given input, as seen below (Left is the input points, right is our optimized rendering). I trained for 5000 epochs with a learning rate of 0.0001, batch size of 4096, and an eikonal weight of 0.02. The default hyperparameters worked well without any tuning needed.

Input Point Cloud	Rendering

7. VolSDF (15 points)

Commands: python -m surface_rendering_main --config-name=volsdf_surface

python -m surface_rendering_main --config-name=volsdf_surface_2

python -m surface_rendering_main --config-name=volsdf_surface_3

python -m surface_rendering_main --config-name=volsdf_surface_4

In this part, I extended the Neural SDF to predict color and implemented the VolSDF conversion from signed distance to volume density for rendering.

Color Prediction MLP: Building on the distance network from Part 6, I added a color prediction branch that takes the intermediate features from the distance network and predicts RGB values. A feature transformation layer that processes the distance network’s output and Concatenation with the original positional embeddings to preserve spatial information and 2 layers of Linear+ReLU (hidden dimension 128) for color prediction and Final linear layer with sigmoid activation to output RGB values in [0,1]

SDF to Density Conversion:

Following the VolSDF paper, the conversion from SDF to density uses the formula:

\[\sigma(\mathbf{x}) = \alpha \Psi_\beta(-d_\Omega(\mathbf{x}))\]

where \(\Psi_\beta\) is the Cumulative Distribution Function of a Laplace distribution:

\[\Psi_\beta(s) = \begin{cases} \frac{1}{2}\exp\left(\frac{s}{\beta}\right), & \text{if } s \leq 0 \\ 1 - \frac{1}{2}\exp\left(-\frac{s}{\beta}\right), & \text{if } s > 0 \end{cases}\]

The parameters control how the SDF maps to density:

\(\alpha\): Scales the overall density magnitude. Higher α increases density values, making the surface contribute more strongly to the rendered image.

\(\beta\): Controls the “thickness” of the density distribution around the surface. It determines how quickly density falls off as you move away from the zero-level set.

High vs Low Beta Bias: High \(\beta\) creates a wider, more diffuse density distribution around the surface, allowing points farther from the true surface to contribute to rendering. Low \(\beta\) creates a tight, concentrated density peak, biasing the network to learn a sharper surface boundary.
Training Ease: An SDF would be easier to train with high \(\beta\). The wider density distribution provides meaningful gradients even when the SDF is inaccurate early in training. With low \(\beta\), only points very close to the surface get gradient signals, making initial learning difficult.
Surface Accuracy: You’d learn a more accurate surface with \(\beta\) once training has converged. The tight density distribution enforces that the volume rendering behavior closely matches the true surface location, leading to more precise geometry.

Hyperparameter Experiments:

I kept \(\alpha\) = 10 (default) and experimented with different \(\beta\) values: - \(\beta\) = 0.5: Failed to render properly, too diffuse - \(\beta\) = 0.1: Worked, but less sharp surface details - \(\beta\) = 0.05 (default): Worked well, produced clean geometry with sharper details - \(\beta\) = 0.01: Failed to render properly, density too concentrated

The default \(\beta\) = 0.05 provided a good balance—small enough for sharp surface precision but large enough to maintain stable gradients during training. \(\beta\) = 0.1 worked but produced slightly fuzzier geometry due to the wider density distribution around the surface.

Geometry (\(\beta\) = 0.05)	Rendered Result (\(\beta\) = 0.05)

Geometry (\(\beta\) = 0.1)	Rendered Result (\(\beta\) = 0.1)

8.1 Render a Large Scene with Sphere Tracing (10 points)

Command: python -m surface_rendering_main --config-name=complex_surface

8.2 Fewer Training Views (10 points)

Commands: python -m surface_rendering_main --config-name=volsdf_surface_8_2

python -m surface_rendering_main --config-name=nerf_lego_view_dependence_8_2

(Config has a parameter that controls number of views to train on)

In theory, VolSDF performs better with fewer views because it enforces stronger geometric priors through its surface based representation. The SDF constraint and Eikonal loss ensure the network learns a proper distance field everywhere, meaning even unobserved regions must follow physically meaningful geometric rules. This reduces overfitting compared to NeRF, which can learn arbitrary density patterns that work for training views but don’t generalize well.

That being said, the difference was hard to see even if the views were reduced to 20-10, with some differences coming around 8 views.

Num Views	VolSDF	NeRF
100
20
10
8
5

8.3 Alternate SDF to Density Conversions (10 points)

Command: python -m surface_rendering_main --config-name=volsdf_surface_8_3

NeuS Naive Conversion: The naive approach from the NeuS paper uses a simpler formula:

\[\sigma = \alpha \cdot \text{sigmoid}\left(-\frac{d}{\beta}\right)\]

This directly maps signed distance to density using a sigmoid function. Compared to VolSDF’s Laplace CDF, this creates a smoother, more symmetric transition around the surface. The sigmoid naturally outputs values in (0,1) which are then scaled by \(\alpha\), making it more straightforward but potentially less flexible in controlling the density distribution shape on either side of the surface.

Method	Geometry	Rendered Result
VolSDF (Laplace CDF)
NeuS Naive (Sigmoid)

The Laplace CDF is better because it’s asymmetric around the surface or the density behaves differently inside vs outside the object, which matches physical reality. The sigmoid is symmetric, treating both sides equally, which can cause ambiguity about surface orientation and lead to less well-defined boundaries. The asymmetry in VolSDF provides stronger training signals for learning properly oriented surfaces.