Overview. This page documents my solutions and results for HW3. I start with ray generation and stratified sampling, then implement NeRF-style volume rendering. Next, I move to implicit surfaces: sphere tracing, a learned Neural SDF with the eikonal prior, and VolSDF for color + geometry. The extras explore view dependence, coarse–fine sampling, large SDF scenes, learning from fewer views, and an alternate SDF→density mapping (NeuS). Each section includes settings, gifs, and takeaways.
T(t) = exp(-∫₀ᵗ σ(s) ds)
C = ∫ T(t) σ(t) c(t) dt
≈ Σ Tᵢ (1 - e^{-σᵢ Δᵢ}) cᵢ, Tᵢ = Πⱼ<i (1 - αⱼ)
See the PDF for a careful discrete→continuous derivation and the piecewise solution for the figure with segments (y₁y₂), (y₂y₃), (y₃y₄).
αᵢ = 1 - exp(-σᵢ Δᵢ)
Tᵢ = Πⱼ<i (1 - αⱼ)
wᵢ = Tᵢ αᵢ
RGB = Σ wᵢ cᵢ
Depth= Σ wᵢ tᵢ
We randomly sample rays from each training image, render through the analytic box SDF with our differentiable volume renderer, and minimize MSE to the ground-truth colors. This inverts the renderer to recover the box parameters (center & side lengths) from images + known camera poses.








# rays: random subset of pixels per iter (saves memory)
# forward: sample points → densities/colors → transmittance weights → RGB
# loss: MSE(pred_rgb, rgb_gt), backprop into box center & side lengths
# outputs: part_2_before_training_*.png, part_2_after_training_*.png, images/part_2.gif
View-independent NeRF trained on 128×128 Lego views. The MLP maps 3D position → density (ReLU) and color (Sigmoid), with positional encoding (L = 6). We render by integrating along rays using transmittance weights.
Takeaway: Frequency encoding (L) drives spatial fidelity; sample count mainly affects compute for this dataset.
./checkpoints
> conda activate learning3d
# Main:
> python volume_rendering_main.py --config-name=nerf_lego
# Ablation 1 (L=4):
> python volume_rendering_main.py --config-name=nerf_lego implicit_function.n_harmonic_functions_xyz=4 training.resume=False training.checkpoint_path=checkpoints_ablations/nerf_L4.pt
# Ablation 2 (SPR=64)
> python volume_rendering_main.py --config-name=nerf_lego sampler.n_pts_per_ray=64 training.resume=False training.checkpoint_path=checkpoints_ablations2/nerf_64spp.pt
Compare PE level and samples/ray:


We extend NeRF to model view-dependent emission by conditioning the color head on the viewing direction (harmonic encoding with Ldir=2). Density depends only on position. This captures specular highlights and glossy BRDF effects.
We implemented hierarchical sampling: a coarse pass renders with uniform depths to estimate geometry, then a fine pass importance-samples additional depths from the coarse weights (PDF) and re-renders.
In our setup the hierarchical version appears slightly blurrier than the uniform baseline. This can happen when the coarse network’s density weights are not yet accurate, so the importance sampler spreads fine samples into suboptimal regions. At 128×128 and limited training, uniform sampling remained sharper and more stable.
We render a signed–distance field by sphere tracing. From each ray origin we step forward by the current SDF value
until the absolute distance falls below a small threshold (hit) or the ray exceeds far (miss). The renderer
outputs the intersection point and a hit mask, then shades the hit location.
max_iters.
p = o + t·d, step by SDF(p), stop when |SDF(p)| < ε.
We train an MLP to approximate a signed distance field (SDF) from a sparse point cloud. The network predicts distance at arbitrary 3D locations and is regularized by the eikonal loss, which encourages a unit-norm gradient (‖∇f(x)‖≈1) so the learned function behaves like a true distance field.
σ = -log(1-φ(d))/Δ.
Config: configs/volsdf_surface.yaml.
φ(d) to [1e-6, 1-1e-6], add small noise to σ during training, and monitor gradient norms.This section explores three optional extensions built on top of our Neural Surface / VolSDF pipeline: (8.1) large scene rendering via sphere tracing, (8.2) reconstruction from fewer training views, and (8.3) alternate SDF→density mappings (NeuS vs VolSDF).
We created a complex scene with over 20 SDF primitives (spheres, boxes, and a torus), each positioned along a ring to demonstrate how efficiently sphere tracing handles large implicit scenes.
ComplexSceneSDF class in implicit.py composing multiple analytic SDFs via smooth-min.SphereTracingRenderer).implicit_function.type: sdf_surface and sdf.type: complex_scene.
To evaluate surface regularization benefits, we trained both VolSDF and a standard
NeRF using only 20 views of the LEGO scene (instead of 100). Both models
used identical camera subsets (limit_train=20, view_seed=42).
# in surface_rendering_main.py
limit = int(getattr(cfg.data, "limit_train", 0) or 0)
seed = int(getattr(cfg.data, "view_seed", 0))
if limit > 0 and limit < len(train_dataset):
import numpy as np
rng = np.random.default_rng(seed)
idx = np.sort(rng.choice(len(train_dataset), size=limit, replace=False))
train_dataset = ListDataset([train_dataset[i] for i in idx])
print(f"[Few Views] Using {len(train_dataset)} train views (seed={seed}).")
python -m surface_rendering_main --config-name=volsdf_surface \
data.limit_train=20 data.view_seed=42
python volume_rendering_main.py --config-name=nerf_lego \
data.limit_train=20 data.view_seed=42
We compared the VolSDF exponential mapping with the NeuS sigmoid-based mapping. While NeuS theoretically provides smoother gradients near the zero level-set, in our implementation VolSDF produced cleaner and more stable reconstructions.
renderer.py:
def sdf_to_density_neus(d, alpha, beta):
phi = torch.sigmoid(-d / beta)
return alpha * phi * (1 - phi)
renderer:
type: volume_sdf
alpha: 10.0
beta: 0.05
use_neus: True
β=0.05 caused unstable gradients and noisy surfaces.β annealing and stronger Eikonal weighting, but not by default.