CMU 16-825 • Learning for 3D Vision

HW 3 — Volume Rendering

Vaishnavi Khindkar (vkhindka@andrew.cmu.edu)

Overview. This page documents my solutions and results for HW3. I start with ray generation and stratified sampling, then implement NeRF-style volume rendering. Next, I move to implicit surfaces: sphere tracing, a learned Neural SDF with the eikonal prior, and VolSDF for color + geometry. The extras explore view dependence, coarse–fine sampling, large SDF scenes, learning from fewer views, and an alternate SDF→density mapping (NeuS). Each section includes settings, gifs, and takeaways.

GPU: T4 (CUDA 12.1) • PyTorch 2.4 • PyTorch3D OK

A0 — Transmittance (Derivation)

A0 PDF screenshot — Screenshot of A0 PDF (answers inside).

T(t) = exp(-∫₀ᵗ σ(s) ds)
C = ∫ T(t) σ(t) c(t) dt
≈ Σ Tᵢ (1 - e^{-σᵢ Δᵢ}) cᵢ,  Tᵢ = Πⱼ<i (1 - αⱼ)

See the PDF for a careful discrete→continuous derivation and the piecewise solution for the figure with segments (y₁y₂), (y₂y₃), (y₃y₄).

A1 — Differentiable Volume Rendering (Box Scene)

A1 render gif — Final render (spiral camera).

A1 depth map — Expected depth (transmittance-weighted).

A1 sample points — Stratified samples along rays (colored by t).

Equation → code mapping

αᵢ = 1 - exp(-σᵢ Δᵢ)
Tᵢ = Πⱼ<i (1 - αⱼ)
wᵢ = Tᵢ αᵢ
RGB  = Σ wᵢ cᵢ
Depth= Σ wᵢ tᵢ

A2 — Optimizing a Basic Implicit Volume (Box)

We randomly sample rays from each training image, render through the analytic box SDF with our differentiable volume renderer, and minimize MSE to the ground-truth colors. This inverts the renderer to recover the box parameters (center & side lengths) from images + known camera poses.

Learned Parameters

after 1,000 iters

Center (x, y, z)

(+0.25, +0.25, +0.00)

Side lengths (x, y, z)

(2.01, 1.50, 1.50)

Config: T4 • CUDA 12.1 • Torch 2.4 • PyTorch3D OK • batch=cfg.training.batch_size

A2 spiral render after training — Spiral rendering after optimization.

Before training

After training

Training details (concise)

    # rays: random subset of pixels per iter (saves memory)
    # forward: sample points → densities/colors → transmittance weights → RGB
    # loss: MSE(pred_rgb, rgb_gt), backprop into box center & side lengths
    # outputs: part_2_before_training_*.png, part_2_after_training_*.png, images/part_2.gif

A3 — Neural Radiance Field (Lego)

View-independent NeRF trained on 128×128 Lego views. The MLP maps 3D position → density (ReLU) and color (Sigmoid), with positional encoding (L = 6). We render by integrating along rays using transmittance weights.

Ablation Analysis

Positional Encoding Depth (L = 4 vs L = 6): Lower-frequency encoding (L = 4) blurs fine details—metallic wheels lose sharpness and the red markers dim. L = 6 captures higher-frequency geometry with cleaner silhouettes and brighter highlights. Measured difference: PSNR ≈ 33 dB vs. the L = 6 baseline (lower is worse).
Samples per Ray (64 vs 128): Visual and PSNR differences are negligible; both converge to nearly identical reconstructions (PSNR ≈ 120 dB, MSE ≈ 0 when comparing spiral frames). For this smooth Lego scene, 64 samples per ray are already sufficient, so using fewer samples mostly improves efficiency without sacrificing quality.

Takeaway: Frequency encoding (L) drives spatial fidelity; sample count mainly affects compute for this dataset.

A3 spiral render of Lego NeRF — Spiral render of the trained NeRF.

Training summary

250 epochs • batch 1024 • 128 samples/ray

Positional Encoding: L=6 (xyz), view-independent colors
Renderer: alpha = 1−exp(−σΔ), transmittance cumprod, weighted color sum
Optimizer: Adam, lr 5e-4, step LR scheduler (γ=0.8 / 50 epochs)

PSNR (L=4)

33 dB

PSNR (samples/ray=64)

120 dB

Checkpoint

./checkpoints

Reproduce

    > conda activate learning3d

    # Main:
    > python volume_rendering_main.py --config-name=nerf_lego

    # Ablation 1 (L=4):
    > python volume_rendering_main.py --config-name=nerf_lego implicit_function.n_harmonic_functions_xyz=4 training.resume=False training.checkpoint_path=checkpoints_ablations/nerf_L4.pt

    # Ablation 2 (SPR=64)
    > python volume_rendering_main.py --config-name=nerf_lego sampler.n_pts_per_ray=64 training.resume=False training.checkpoint_path=checkpoints_ablations2/nerf_64spp.pt

Ablations (optional)

Compare PE level and samples/ray:

Part 4 — NeRF Extras

4.1 View-Dependent NeRF (Materials)

We extend NeRF to model view-dependent emission by conditioning the color head on the viewing direction (harmonic encoding with L_dir=2). Density depends only on position. This captures specular highlights and glossy BRDF effects.

View-Dependent NeRF (128x128) — 128×128 (quick) — shifting highlights indicate view dependence.

View-Dependent NeRF (400x400) — 400×400 (high-res) — sharper specularities & reflections.

Config: n_harmonic_functions_dir = 2, xyz = 6; batch 1024; 128 samples/ray
Trade-offs: +Realism for glossy materials; −Slightly more compute; risk of baked-in highlights if overfit

Part 4.2 — Coarse / Fine Sampling

We implemented hierarchical sampling: a coarse pass renders with uniform depths to estimate geometry, then a fine pass importance-samples additional depths from the coarse weights (PDF) and re-renders.

Single-pass NeRF — Single pass (uniform sampling)

Coarse/Fine NeRF — Coarse→Fine (importance sampling)

Observation

In our setup the hierarchical version appears slightly blurrier than the uniform baseline. This can happen when the coarse network’s density weights are not yet accurate, so the importance sampler spreads fine samples into suboptimal regions. At 128×128 and limited training, uniform sampling remained sharper and more stable.

Why this can happen

Diffuse coarse weights: Early in training the PDF is broad → fine samples don’t focus on the surface.
Low resolution / short training: Coarse geometry stabilizes late; supervising fine too early adds noise.
Budget split: With the same total samples, moving some from uniform → PDF can reduce coverage.

Settings we used

Coarse

Fine

+64

Resolution

128×128

Tips that usually improve coarse/fine

Warm-up with uniform only for a few epochs, then enable fine sampling.
Use more total samples (e.g., 64 coarse + 128 fine) if memory allows.
Build the PDF from midpoint bins and clip weights with a small ε to avoid spiky CDFs.
Slightly reduce LR or add density noise early to stabilize the coarse field.

Part 5 — Sphere Tracing an SDF (Torus)

We render a signed–distance field by sphere tracing. From each ray origin we step forward by the current SDF value until the absolute distance falls below a small threshold (hit) or the ray exceeds far (miss). The renderer outputs the intersection point and a hit mask, then shades the hit location.

Sphere tracing a torus SDF — Torus rendered with sphere tracing. Convergence is controlled by ε and `max_iters`.

Algorithm

Iterate p = o + t·d, step by SDF(p), stop when |SDF(p)| < ε.

Pros

Fast, no wasted samples, crisp surfaces.

Cons

Requires well-behaved SDF; thin features can be missed if ε is large.

Part 6 — Optimizing a Neural SDF (with Eikonal)

We train an MLP to approximate a signed distance field (SDF) from a sparse point cloud. The network predicts distance at arbitrary 3D locations and is regularized by the eikonal loss, which encourages a unit-norm gradient (‖∇f(x)‖≈1) so the learned function behaves like a true distance field.

Input point cloud — Training input: sparse bunny point cloud.

Learned neural SDF — Learned SDF rendered as an iso-surface (marching cubes).

Model & Loss

MLP: ReLU blocks (hidden 256×6), optional harmonic encoding of 3D position; linear head outputs signed distance (no activation).
Data term: enforce f(p)=0 on observed points; two-sided near-surface supervision with offsets ±ε to stabilize the zero-level set.
Eikonal: minimize (‖∇f(x)‖−1)² on random samples in the bounding box to promote SDF-like behavior.
Extraction: iso-surface at a tight threshold (e.g., 0.0005) with marching cubes.

Key settings

PE L=6, hidden=256, layers=6, skip at 3

Regularization

Eikonal weight ≈ 0.02, inter prior 0.1

Training

batch 4096, ~1000 epochs, pretrain 250 iters

Part 7 — VolSDF: SDF → Density + Color

VolSDF color rendering of LEGO — **Volumetric render (learned color)**
Trained with volume rendering and our SDF→density mapping.

Extracted surface / geometry render — **Extracted geometry**
Marching cubes over the learned SDF (visualized via our geometry renderer).

What did we implement?

NeuralSurface MLP that predicts a signed distance value d(x) and a per-point color c(x) (sigmoid).
SDF → density mapping (VolSDF §3.1) used during volume rendering. We convert occupancy-like value to per-step σ via σ = -log(1-φ(d))/Δ.
Eikonal regularization on ∥∇d(x)∥ to bias the field towards an actual SDF.

Key settings

Datasetlego (128×128)
Network (dist)6×128 ReLU (+PE L=6)
Network (color)2×128 → Sigmoid
Samples / ray128 (stratified)
Epochs250
LR5e-4 (decay×0.8/50)
α, β10.0, 0.05
Eikonal λ0.02
Interior λ0.1

Config: configs/volsdf_surface.yaml.

Intuition for α and β

α (alpha) scales opacity near the surface. Larger α → stronger accumulation per step.
β (beta) controls the “softness” of the transition around the zero-level set.
Small β = sharp, thin shell; large β = broad, fuzzy shell.

Short Q&A

High β bias? A thicker, blurrier band of density around the surface (softer geometry).
Low vs. high β for training stability? Higher β is usually easier (smoother gradients); very low β can make optimization brittle.
Which β learns a crisper surface? Lower β, because the density concentrates tightly near d(x)=0 → better surface localization.

Notes & tips

If geometry “vanishes,” clamp the occupancy φ(d) to [1e-6, 1-1e-6], add small noise to σ during training, and monitor gradient norms.
Too fuzzy? Decrease β (e.g., 0.03) or raise eikonal weight a bit.
Noisy mesh? Increase marching-cubes resolution or tighten the iso-threshold slightly (≈0).

Part 8 — Neural Surface Extras

This section explores three optional extensions built on top of our Neural Surface / VolSDF pipeline: (8.1) large scene rendering via sphere tracing, (8.2) reconstruction from fewer training views, and (8.3) alternate SDF→density mappings (NeuS vs VolSDF).

8.1 — Sphere Tracing a Complex Scene

We created a complex scene with over 20 SDF primitives (spheres, boxes, and a torus), each positioned along a ring to demonstrate how efficiently sphere tracing handles large implicit scenes.

Complex SDF scene rendered with sphere tracing — **Complex Scene (ours)**
Union of 24 primitives using smooth-min. 64 sphere-tracing iterations per ray.

Reference simple torus from Part 5 — **Reference (Part 5)**
Single torus using the same renderer and camera path.

Implementation Highlights

Added ComplexSceneSDF class in implicit.py composing multiple analytic SDFs via smooth-min.
Renderer: unchanged (SphereTracingRenderer).
Config: implicit_function.type: sdf_surface and sdf.type: complex_scene.

Observations

Efficiency: Runtime scales with steps, not polygon count.
Quality: Produces smooth silhouettes and continuous unions.
Limitations: Extremely thin surfaces may be skipped due to large step sizes.

8.2 — Fewer Training Views (VolSDF vs NeRF)

To evaluate surface regularization benefits, we trained both VolSDF and a standard NeRF using only 20 views of the LEGO scene (instead of 100). Both models used identical camera subsets (limit_train=20, view_seed=42).

VolSDF 20 views rendering — **VolSDF — Rendering**
Learned from 20 sparse views. Appearance remains fairly consistent.

VolSDF 20 views geometry — **VolSDF — Geometry**
SDF-based zero level-set retains shape integrity and coherent topology.

NeRF 20 views rendering — **NeRF — Rendering 20 views**
Struggles with view sparsity — textures blur and fine details vanish.

Implementation Details

Subset Selection:

# in surface_rendering_main.py
limit = int(getattr(cfg.data, "limit_train", 0) or 0)
seed  = int(getattr(cfg.data, "view_seed", 0))
if limit > 0 and limit < len(train_dataset):
    import numpy as np
    rng = np.random.default_rng(seed)
    idx = np.sort(rng.choice(len(train_dataset), size=limit, replace=False))
    train_dataset = ListDataset([train_dataset[i] for i in idx])
    print(f"[Few Views] Using {len(train_dataset)} train views (seed={seed}).")

VolSDF Run:

python -m surface_rendering_main --config-name=volsdf_surface \
  data.limit_train=20 data.view_seed=42

NeRF Run:

python volume_rendering_main.py --config-name=nerf_lego \
  data.limit_train=20 data.view_seed=42

Observations

VolSDF maintains structure even with sparse views, leveraging geometric priors from its SDF formulation.
NeRF collapses to diffuse density, showing weaker 3D consistency without surface constraints.
With fewer training images, VolSDF converges faster and produces sharper silhouettes.
This demonstrates the key advantage of surface-based volumetric representations in low-data regimes.

8.3 — Alternate SDF → Density Mappings (VolSDF vs NeuS)

We compared the VolSDF exponential mapping with the NeuS sigmoid-based mapping. While NeuS theoretically provides smoother gradients near the zero level-set, in our implementation VolSDF produced cleaner and more stable reconstructions.

VolSDF baseline render — **VolSDF Baseline**
Exponential mapping → smooth and stable density transitions, minimal artifacts.

VolSDF geometry baseline — **VolSDF Geometry**
Clear structural details and compact surfaces.

NeuS render — **NeuS Mapping (ours)**
Sigmoid-based occupancy; slightly noisier reconstruction and weaker global consistency.

NeuS geometry result — **NeuS Geometry**
More floaters and fragmented topology despite sharper local gradients.

Implementation Details

Added alternate mapping in renderer.py:

def sdf_to_density_neus(d, alpha, beta):
    phi = torch.sigmoid(-d / beta)
    return alpha * phi * (1 - phi)

Enabled via config flag:

renderer:
  type: volume_sdf
  alpha: 10.0
  beta: 0.05
  use_neus: True

Comparison & Insights

Empirical results: VolSDF yielded sharper, more stable geometry than NeuS on LEGO.
NeuS limitations: The fixed β=0.05 caused unstable gradients and noisy surfaces.
VolSDF advantage: Exponential falloff implicitly regularizes density, producing smoother and cleaner reconstructions.
Interpretation: NeuS can outperform VolSDF with careful β annealing and stronger Eikonal weighting, but not by default.
Next step: Experiment with adaptive β scheduling or hybrid mappings combining NeuS and VolSDF behaviors.