CMU 16-825 • Learning for 3D Vision

HW 3 — Volume Rendering

Vaishnavi Khindkar (vkhindka@andrew.cmu.edu)

Overview. This page documents my solutions and results for HW3. I start with ray generation and stratified sampling, then implement NeRF-style volume rendering. Next, I move to implicit surfaces: sphere tracing, a learned Neural SDF with the eikonal prior, and VolSDF for color + geometry. The extras explore view dependence, coarse–fine sampling, large SDF scenes, learning from fewer views, and an alternate SDF→density mapping (NeuS). Each section includes settings, gifs, and takeaways.

GPU: T4 (CUDA 12.1) • PyTorch 2.4 • PyTorch3D OK

A0 — Transmittance (Derivation)

A0 PDF screenshot
Screenshot of A0 PDF (answers inside).
T(t) = exp(-∫₀ᵗ σ(s) ds)
C = ∫ T(t) σ(t) c(t) dt
≈ Σ Tᵢ (1 - e^{-σᵢ Δᵢ}) cᵢ,  Tᵢ = Πⱼ<i (1 - αⱼ)
          

See the PDF for a careful discrete→continuous derivation and the piecewise solution for the figure with segments (y₁y₂), (y₂y₃), (y₃y₄).

A1 — Differentiable Volume Rendering (Box Scene)

A1 render gif
Final render (spiral camera).
A1 depth map
Expected depth (transmittance-weighted).
A1 sample points
Stratified samples along rays (colored by t).
Equation → code mapping
αᵢ = 1 - exp(-σᵢ Δᵢ)
Tᵢ = Πⱼ<i (1 - αⱼ)
wᵢ = Tᵢ αᵢ
RGB  = Σ wᵢ cᵢ
Depth= Σ wᵢ tᵢ
        

A2 — Optimizing a Basic Implicit Volume (Box)

We randomly sample rays from each training image, render through the analytic box SDF with our differentiable volume renderer, and minimize MSE to the ground-truth colors. This inverts the renderer to recover the box parameters (center & side lengths) from images + known camera poses.

Learned Parameters

after 1,000 iters
Center (x, y, z)
(+0.25, +0.25, +0.00)
Side lengths (x, y, z)
(2.01, 1.50, 1.50)
Config: T4 • CUDA 12.1 • Torch 2.4 • PyTorch3D OK • batch=cfg.training.batch_size
A2 spiral render after training
Spiral rendering after optimization.

Before training

before 0
view 0
before 1
view 1
before 2
view 2
before 3
view 3

After training

after 0
view 0
after 1
view 1
after 2
view 2
after 3
view 3
Training details (concise)
    # rays: random subset of pixels per iter (saves memory)
    # forward: sample points → densities/colors → transmittance weights → RGB
    # loss: MSE(pred_rgb, rgb_gt), backprop into box center & side lengths
    # outputs: part_2_before_training_*.png, part_2_after_training_*.png, images/part_2.gif
        

A3 — Neural Radiance Field (Lego)

View-independent NeRF trained on 128×128 Lego views. The MLP maps 3D position → density (ReLU) and color (Sigmoid), with positional encoding (L = 6). We render by integrating along rays using transmittance weights.

Ablation Analysis

Takeaway: Frequency encoding (L) drives spatial fidelity; sample count mainly affects compute for this dataset.

A3 spiral render of Lego NeRF
Spiral render of the trained NeRF.

Training summary

250 epochs • batch 1024 • 128 samples/ray
  • Positional Encoding: L=6 (xyz), view-independent colors
  • Renderer: alpha = 1−exp(−σΔ), transmittance cumprod, weighted color sum
  • Optimizer: Adam, lr 5e-4, step LR scheduler (γ=0.8 / 50 epochs)
PSNR (L=4)
33 dB
PSNR (samples/ray=64)
120 dB
Checkpoint
./checkpoints
Reproduce
    > conda activate learning3d

    # Main:
    > python volume_rendering_main.py --config-name=nerf_lego

    # Ablation 1 (L=4):
    > python volume_rendering_main.py --config-name=nerf_lego implicit_function.n_harmonic_functions_xyz=4 training.resume=False training.checkpoint_path=checkpoints_ablations/nerf_L4.pt

    # Ablation 2 (SPR=64)
    > python volume_rendering_main.py --config-name=nerf_lego sampler.n_pts_per_ray=64 training.resume=False training.checkpoint_path=checkpoints_ablations2/nerf_64spp.pt
            

Ablations (optional)

Compare PE level and samples/ray:

L=4
PE L=4
64 spr
64 samples/ray

Part 4 — NeRF Extras

4.1 View-Dependent NeRF (Materials)

We extend NeRF to model view-dependent emission by conditioning the color head on the viewing direction (harmonic encoding with Ldir=2). Density depends only on position. This captures specular highlights and glossy BRDF effects.

View-Dependent NeRF (128x128)
128×128 (quick) — shifting highlights indicate view dependence.
View-Dependent NeRF (400x400)
400×400 (high-res) — sharper specularities & reflections.
  • Config: n_harmonic_functions_dir = 2, xyz = 6; batch 1024; 128 samples/ray
  • Trade-offs: +Realism for glossy materials; −Slightly more compute; risk of baked-in highlights if overfit

Part 4.2 — Coarse / Fine Sampling

We implemented hierarchical sampling: a coarse pass renders with uniform depths to estimate geometry, then a fine pass importance-samples additional depths from the coarse weights (PDF) and re-renders.

Single-pass NeRF
Single pass (uniform sampling)
Coarse/Fine NeRF
Coarse→Fine (importance sampling)

Observation

In our setup the hierarchical version appears slightly blurrier than the uniform baseline. This can happen when the coarse network’s density weights are not yet accurate, so the importance sampler spreads fine samples into suboptimal regions. At 128×128 and limited training, uniform sampling remained sharper and more stable.

Why this can happen

  • Diffuse coarse weights: Early in training the PDF is broad → fine samples don’t focus on the surface.
  • Low resolution / short training: Coarse geometry stabilizes late; supervising fine too early adds noise.
  • Budget split: With the same total samples, moving some from uniform → PDF can reduce coverage.

Settings we used

Coarse
64
Fine
+64
Resolution
128×128

Tips that usually improve coarse/fine

  • Warm-up with uniform only for a few epochs, then enable fine sampling.
  • Use more total samples (e.g., 64 coarse + 128 fine) if memory allows.
  • Build the PDF from midpoint bins and clip weights with a small ε to avoid spiky CDFs.
  • Slightly reduce LR or add density noise early to stabilize the coarse field.

Part 5 — Sphere Tracing an SDF (Torus)

We render a signed–distance field by sphere tracing. From each ray origin we step forward by the current SDF value until the absolute distance falls below a small threshold (hit) or the ray exceeds far (miss). The renderer outputs the intersection point and a hit mask, then shades the hit location.

Sphere tracing a torus SDF
Torus rendered with sphere tracing. Convergence is controlled by ε and max_iters.
Algorithm
Iterate p = o + t·d, step by SDF(p), stop when |SDF(p)| < ε.
Pros
Fast, no wasted samples, crisp surfaces.
Cons
Requires well-behaved SDF; thin features can be missed if ε is large.

Part 6 — Optimizing a Neural SDF (with Eikonal)

We train an MLP to approximate a signed distance field (SDF) from a sparse point cloud. The network predicts distance at arbitrary 3D locations and is regularized by the eikonal loss, which encourages a unit-norm gradient (‖∇f(x)‖≈1) so the learned function behaves like a true distance field.

Input point cloud
Training input: sparse bunny point cloud.
Learned neural SDF
Learned SDF rendered as an iso-surface (marching cubes).

Model & Loss

  • MLP: ReLU blocks (hidden 256×6), optional harmonic encoding of 3D position; linear head outputs signed distance (no activation).
  • Data term: enforce f(p)=0 on observed points; two-sided near-surface supervision with offsets ±ε to stabilize the zero-level set.
  • Eikonal: minimize (‖∇f(x)‖−1)2 on random samples in the bounding box to promote SDF-like behavior.
  • Extraction: iso-surface at a tight threshold (e.g., 0.0005) with marching cubes.
Key settings
PE L=6, hidden=256, layers=6, skip at 3
Regularization
Eikonal weight ≈ 0.02, inter prior 0.1
Training
batch 4096, ~1000 epochs, pretrain 250 iters

Part 7 — VolSDF: SDF → Density + Color

VolSDF color rendering of LEGO
Volumetric render (learned color)
Trained with volume rendering and our SDF→density mapping.
Extracted surface / geometry render
Extracted geometry
Marching cubes over the learned SDF (visualized via our geometry renderer).
What did we implement?
  • NeuralSurface MLP that predicts a signed distance value d(x) and a per-point color c(x) (sigmoid).
  • SDF → density mapping (VolSDF §3.1) used during volume rendering. We convert occupancy-like value to per-step σ via σ = -log(1-φ(d))/Δ.
  • Eikonal regularization on ∥∇d(x)∥ to bias the field towards an actual SDF.

Key settings

  • Datasetlego (128×128)
  • Network (dist)6×128 ReLU (+PE L=6)
  • Network (color)2×128 → Sigmoid
  • Samples / ray128 (stratified)
  • Epochs250
  • LR5e-4 (decay×0.8/50)
  • α, β10.0, 0.05
  • Eikonal λ0.02
  • Interior λ0.1

Config: configs/volsdf_surface.yaml.

Intuition for α and β

  • α (alpha) scales opacity near the surface. Larger α → stronger accumulation per step.
  • β (beta) controls the “softness” of the transition around the zero-level set.
    Small β = sharp, thin shell; large β = broad, fuzzy shell.

Short Q&A

  1. High β bias? A thicker, blurrier band of density around the surface (softer geometry).
  2. Low vs. high β for training stability? Higher β is usually easier (smoother gradients); very low β can make optimization brittle.
  3. Which β learns a crisper surface? Lower β, because the density concentrates tightly near d(x)=0 → better surface localization.
Notes & tips
  • If geometry “vanishes,” clamp the occupancy φ(d) to [1e-6, 1-1e-6], add small noise to σ during training, and monitor gradient norms.
  • Too fuzzy? Decrease β (e.g., 0.03) or raise eikonal weight a bit.
  • Noisy mesh? Increase marching-cubes resolution or tighten the iso-threshold slightly (≈0).

Part 8 — Neural Surface Extras

This section explores three optional extensions built on top of our Neural Surface / VolSDF pipeline: (8.1) large scene rendering via sphere tracing, (8.2) reconstruction from fewer training views, and (8.3) alternate SDF→density mappings (NeuS vs VolSDF).

8.1 — Sphere Tracing a Complex Scene

We created a complex scene with over 20 SDF primitives (spheres, boxes, and a torus), each positioned along a ring to demonstrate how efficiently sphere tracing handles large implicit scenes.

Complex SDF scene rendered with sphere tracing
Complex Scene (ours)
Union of 24 primitives using smooth-min. 64 sphere-tracing iterations per ray.
Reference simple torus from Part 5
Reference (Part 5)
Single torus using the same renderer and camera path.
Implementation Highlights
  • Added ComplexSceneSDF class in implicit.py composing multiple analytic SDFs via smooth-min.
  • Renderer: unchanged (SphereTracingRenderer).
  • Config: implicit_function.type: sdf_surface and sdf.type: complex_scene.

Observations

  • Efficiency: Runtime scales with steps, not polygon count.
  • Quality: Produces smooth silhouettes and continuous unions.
  • Limitations: Extremely thin surfaces may be skipped due to large step sizes.

8.2 — Fewer Training Views (VolSDF vs NeRF)

To evaluate surface regularization benefits, we trained both VolSDF and a standard NeRF using only 20 views of the LEGO scene (instead of 100). Both models used identical camera subsets (limit_train=20, view_seed=42).

VolSDF 20 views rendering
VolSDF — Rendering
Learned from 20 sparse views. Appearance remains fairly consistent.
VolSDF 20 views geometry
VolSDF — Geometry
SDF-based zero level-set retains shape integrity and coherent topology.
NeRF 20 views rendering
NeRF — Rendering 20 views
Struggles with view sparsity — textures blur and fine details vanish.
NeRF front view
NeRF — Full View
Implementation Details
  • Subset Selection:
    # in surface_rendering_main.py
    limit = int(getattr(cfg.data, "limit_train", 0) or 0)
    seed  = int(getattr(cfg.data, "view_seed", 0))
    if limit > 0 and limit < len(train_dataset):
        import numpy as np
        rng = np.random.default_rng(seed)
        idx = np.sort(rng.choice(len(train_dataset), size=limit, replace=False))
        train_dataset = ListDataset([train_dataset[i] for i in idx])
        print(f"[Few Views] Using {len(train_dataset)} train views (seed={seed}).")
  • VolSDF Run:
    python -m surface_rendering_main --config-name=volsdf_surface \
      data.limit_train=20 data.view_seed=42
  • NeRF Run:
    python volume_rendering_main.py --config-name=nerf_lego \
      data.limit_train=20 data.view_seed=42

Observations

  • VolSDF maintains structure even with sparse views, leveraging geometric priors from its SDF formulation.
  • NeRF collapses to diffuse density, showing weaker 3D consistency without surface constraints.
  • With fewer training images, VolSDF converges faster and produces sharper silhouettes.
  • This demonstrates the key advantage of surface-based volumetric representations in low-data regimes.

8.3 — Alternate SDF → Density Mappings (VolSDF vs NeuS)

We compared the VolSDF exponential mapping with the NeuS sigmoid-based mapping. While NeuS theoretically provides smoother gradients near the zero level-set, in our implementation VolSDF produced cleaner and more stable reconstructions.

VolSDF baseline render
VolSDF Baseline
Exponential mapping → smooth and stable density transitions, minimal artifacts.
VolSDF geometry baseline
VolSDF Geometry
Clear structural details and compact surfaces.
NeuS render
NeuS Mapping (ours)
Sigmoid-based occupancy; slightly noisier reconstruction and weaker global consistency.
NeuS geometry result
NeuS Geometry
More floaters and fragmented topology despite sharper local gradients.
Implementation Details
  • Added alternate mapping in renderer.py:
    def sdf_to_density_neus(d, alpha, beta):
        phi = torch.sigmoid(-d / beta)
        return alpha * phi * (1 - phi)
  • Enabled via config flag:
    renderer:
      type: volume_sdf
      alpha: 10.0
      beta: 0.05
      use_neus: True

Comparison & Insights

  • Empirical results: VolSDF yielded sharper, more stable geometry than NeuS on LEGO.
  • NeuS limitations: The fixed β=0.05 caused unstable gradients and noisy surfaces.
  • VolSDF advantage: Exponential falloff implicitly regularizes density, producing smoother and cleaner reconstructions.
  • Interpretation: NeuS can outperform VolSDF with careful β annealing and stronger Eikonal weighting, but not by default.
  • Next step: Experiment with adaptive β scheduling or hybrid mappings combining NeuS and VolSDF behaviors.