16-825 Learning for 3D Vision • Fall 2025
Name: Haejoon Lee (andrewid: haejoonl)

Assignment 3: Neural Volume Rendering and Surface Rendering

Table of Contents

A. Neural Volume Rendering (80 points)

0. Transmittance Calculation (10 points)

Computed the transmittance of a ray going through a non-homogeneous medium as required for volume rendering.

Transmittance calculation proof

Transmittance Calculation Solution

Transmittance Formula: The transmittance T(t) represents the fraction of light that survives from the ray origin to distance t along the ray, calculated as T(t) = exp(-∫₀ᵗ σ(s) ds) where σ(s) is the density at distance s.

1. Differentiable Volume Rendering (30 points)

In this section, we implemented the core components of a differentiable volume rendering pipeline: ray generation, point sampling, and volume rendering with transmittance calculation.

1.3 Ray Sampling (5 points)

Implemented ray generation from camera parameters, converting from pixel coordinates to world-space rays through NDC space transformation.

# Ray Generation Implementation def get_pixels_from_image(image_size, camera): W, H = image_size[0], image_size[1] device = camera.device x = torch.linspace(0, W-1, W, device=device) y = torch.linspace(0, H-1, H, device=device) x = 2 * x / (W - 1) - 1 # Convert to [-1, 1] y = 2 * y / (H - 1) - 1 # Convert to [-1, 1] xy_grid = torch.stack( tuple(reversed(torch.meshgrid(y, x))), dim=-1 ).view(W * H, 2) return -xy_grid def get_rays_from_pixels(xy_grid, image_size, camera): W, H = image_size[0], image_size[1] device = camera.device ndc_points = xy_grid.to(device) ndc_points = torch.cat([ ndc_points, torch.ones_like(ndc_points[..., -1:]) ], dim=-1) world_pts = camera.unproject_points(ndc_points, world_coordinates=True, from_ndc=True) camera_center = camera.get_camera_center() rays_o = camera_center.expand(world_pts.shape[0], -1) rays_d = F.normalize(world_pts - rays_o) return RayBundle(rays_o, rays_d, torch.zeros_like(rays_o).unsqueeze(1), torch.zeros_like(rays_o).unsqueeze(1))
Pixel grid visualization

Pixel Grid in NDC Space

Ray visualization

Generated Rays from Camera

1.4 Point Sampling (5 points)

Implemented stratified sampling along rays to generate 3D sample points for volume evaluation.

# Stratified Sampling Implementation class StratifiedRaysampler(torch.nn.Module): def __init__(self, cfg): super().__init__() self.n_pts_per_ray = cfg.n_pts_per_ray self.min_depth = cfg.min_depth self.max_depth = cfg.max_depth def forward(self, ray_bundle): device = ray_bundle.origins.device z_vals = torch.linspace(self.min_depth, self.max_depth, self.n_pts_per_ray, device=device) origins = ray_bundle.origins directions = ray_bundle.directions z_vals = z_vals.unsqueeze(0).expand(origins.shape[0], -1) origins_expanded = origins.unsqueeze(1) directions_expanded = directions.unsqueeze(1) z_vals_expanded = z_vals.unsqueeze(-1) sample_points = origins_expanded + z_vals_expanded * directions_expanded return ray_bundle._replace( sample_points=sample_points, sample_lengths=z_vals_expanded, )
Sample points visualization

Stratified Sample Points Along Rays

Sampling Strategy: Uniform sampling between near and far planes provides a good balance between coverage and computational efficiency for volume rendering.

1.5 Volume Rendering (20 points)

Implemented the core volume rendering equation with transmittance calculation and depth rendering.

# Volume Rendering Implementation def _compute_weights(self, deltas, rays_density: torch.Tensor, eps: float = 1e-10): cumulative_density = torch.cumsum(rays_density * deltas, dim=-2) cumulative_density = torch.cat( [torch.zeros_like(cumulative_density[..., :1, :]), cumulative_density[..., :-1, :]], dim=-2 ) transmittance = torch.exp(-cumulative_density) alpha = 1 - torch.exp(-rays_density * deltas) weights = transmittance * alpha return weights def _aggregate(self, weights: torch.Tensor, rays_feature: torch.Tensor): feature = torch.sum(weights * rays_feature, dim=-2) return feature
Volume Rendering Equation:
C(r) = ∫ T(t) σ(r(t)) c(r(t), d) dt
where T(t) = exp(-∫₀ᵗ σ(r(s)) ds)
Volume rendering result

Volume Rendering of Box SDF

Depth visualization

Depth Map (Normalized)

2. Optimizing a Basic Implicit Volume (10 points)

2.1 Random Ray Sampling (5 points)

Implemented efficient random ray sampling for training to reduce memory usage and enable batch processing.

# Random Ray Sampling Implementation def get_random_pixels_from_image(n_pixels, image_size, camera): xy_grid = get_pixels_from_image(image_size, camera) total_pixels = xy_grid.shape[0] indices = torch.randperm(total_pixels)[:n_pixels] xy_grid_sub = xy_grid[indices] return xy_grid_sub
Memory Optimization: Random sampling of rays instead of full image rendering significantly reduces GPU memory usage during training while maintaining good gradient coverage.

2.2 Loss and Training (5 points)

Implemented MSE loss for optimizing implicit volume parameters from ground truth images.

# Training Loss Implementation loss = torch.nn.functional.mse_loss(out['feature'], rgb_gt)
Optimized box rendering

Optimized Box After Training

3. Optimizing a Neural Radiance Field (NeRF) (20 points)

Implemented a complete NeRF architecture with positional encoding, density and color prediction, and view dependence.

# NeRF MLP Architecture class NeuralRadianceField(torch.nn.Module): def __init__(self, cfg): super().__init__() self.harmonic_embedding_xyz = HarmonicEmbedding(3, cfg.n_harmonic_functions_xyz) self.harmonic_embedding_dir = HarmonicEmbedding(3, cfg.n_harmonic_functions_dir) # XYZ network self.layers_xyz = torch.nn.ModuleList() for i in range(cfg.n_layers_xyz): if i == 0: self.layers_xyz.append(torch.nn.Linear(embedding_dim_xyz, hidden_dims[0])) elif i == 4: # Skip connection self.layers_xyz.append(torch.nn.Linear(embedding_dim_xyz + hidden_dims[0], hidden_dims[0])) else: self.layers_xyz.append(torch.nn.Linear(hidden_dims[0], hidden_dims[0])) # Density and feature prediction self.layer_sigma = torch.nn.Sequential(torch.nn.Linear(hidden_dims[0], 1), torch.nn.ReLU()) self.layer_feature = torch.nn.Sequential(torch.nn.Linear(hidden_dims[0], hidden_dims[0]), torch.nn.ReLU()) # Direction network for view dependence self.layers_dir = torch.nn.Sequential( torch.nn.Linear(embedding_dim_dir + hidden_dims[0], hidden_dims[1]), torch.nn.ReLU(), torch.nn.Linear(hidden_dims[1], 3), torch.nn.Sigmoid() )
NeRF lego rendering

NeRF on Lego Dataset

NeRF Architecture Features:
  • Positional Encoding: Harmonic embeddings for XYZ coordinates and viewing directions
  • Skip Connections: Residual connection at layer 4 for better gradient flow
  • Density Prediction: ReLU activation ensures non-negative density values
  • Color Prediction: Sigmoid activation constrains colors to [0,1] range

4. NeRF Extras (10 points)

4.1 View Dependence (10 points)

Extended NeRF with view dependence for realistic material rendering on the materials dataset.

NeRF materials rendering

View-Dependent NeRF on Materials Dataset

View Dependence Implementation:
  • Direction Encoding: Harmonic embedding of viewing directions
  • Feature Concatenation: Combined XYZ features with direction features for color prediction
  • Material Effects: Successfully captures specular reflections and material properties

B. Neural Surface Rendering (50 points)

5. Sphere Tracing (10 points)

Implemented sphere tracing algorithm for efficient SDF-based surface rendering.

# Sphere Tracing Implementation def sphere_tracing(self, implicit_fn, origins, directions): device = origins.device n_rays = origins.shape[0] current_points = origins.clone() directions = F.normalize(directions, dim=-1) mask = torch.ones(n_rays, 1, dtype=torch.bool, device=device) for iteration in range(self.max_iters): signed_distances = implicit_fn(current_points) converged = torch.abs(signed_distances) < 1e-6 mask = mask & ~converged if not mask.any(): break current_points = current_points + directions * signed_distances distances_from_origin = torch.norm(current_points - origins, dim=-1, keepdim=True) mask = mask & (distances_from_origin < self.far) if not mask.any(): break final_distances = implicit_fn(current_points) final_mask = torch.abs(final_distances) < 1e-4 return current_points, final_mask
Sphere tracing torus

Sphere Tracing of Torus SDF

Sphere Tracing Algorithm:
  • Ray Marching: Step along rays by SDF distance at each point
  • Convergence Detection: Stop when SDF value is below threshold

6. Optimizing a Neural SDF (15 points)

Implemented neural SDF training with eikonal regularization for point cloud reconstruction.

# Neural SDF Implementation class NeuralSurface(torch.nn.Module): def __init__(self, cfg): super().__init__() self.harmonic_embedding_xyz = HarmonicEmbedding(3, cfg.n_harmonic_functions_xyz) # Distance network with skip connection self.layers_distance = torch.nn.ModuleList() for layeri in range(self.n_layers_distance): if layeri == 0: self.layers_distance.append(torch.nn.Linear(self.embedding_dim_xyz, hidden_dims[0])) elif layeri == self.skip_ind: # Skip connection self.layers_distance.append(torch.nn.Linear(self.embedding_dim_xyz + hidden_dims[0], hidden_dims[0])) else: self.layers_distance.append(torch.nn.Linear(hidden_dims[0], hidden_dims[0])) self.layer_sigma = torch.nn.Linear(hidden_dims[0], 1) # Eikonal Loss Implementation def eikonal_loss(gradients): gradient_norms = torch.norm(gradients, dim=-1) eikonal_constraint = torch.square(gradient_norms - 1.0) return eikonal_constraint.mean()
Input point cloud

Input Point Cloud (Bunny)

Neural SDF result

Neural SDF Reconstruction

Neural SDF Training:
  • Point Cloud Loss: MSE between predicted distances and zero (surface constraint)
  • Eikonal Regularization: Enforces unit gradient magnitude for valid SDF properties
  • Skip Connections: Residual connections improve gradient flow and learning
  • Surface Quality: Produces smooth, watertight surfaces from sparse point clouds

7. VolSDF (15 points)

Implemented VolSDF combining SDFs with volume rendering using Laplace CDF for SDF-to-density conversion.

# SDF to Density Conversion (VolSDF) def sdf_to_density(signed_distance, alpha, beta): lap_dist = torch.distributions.laplace.Laplace(0, beta) return alpha * lap_dist.cdf(-signed_distance) # Neural Surface with Color Prediction def get_distance_color(self, points): points = points.view(-1, 3) xyz = points h = self.harmonic_embedding_xyz(points) x = h for i, layer in enumerate(self.layers_distance): if i == 0: x = h elif i == self.skip_ind: x = torch.cat((x, h), dim=-1) x = layer(x) x = self.relu(x) distance = self.layer_sigma(x) x = torch.cat((x, xyz), dim=-1) # Skip connection for color for layer in self.layers_color: x = layer(x) color = x return distance, color
VolSDF SDF-to-Density Conversion:
σ(x) = α · Φβ(-f(x))
where Φβ is the Laplace CDF with parameter β
VolSDF color rendering

VolSDF Color Rendering

VolSDF geometry

VolSDF Geometry (SDF-based)

VolSDF Parameter Analysis:

Alpha (α) Parameter:

  • High α: Higher density values, more opaque surfaces, better geometry definition
  • Low α: Lower density values, more transparent surfaces, softer geometry

Beta (β) Parameter Analysis:

  • High β: Wider density falloff, smoother surfaces, easier training but less sharp geometry
  • Low β: Sharper density falloff, more precise surfaces, harder training but better geometry quality

VolSDF Beta Parameter Questions

Question 1: How does high β bias your learned SDF? What about low β?

High β (e.g., β = 0.1):

  • Wider Density Falloff: The Laplace CDF creates a gradual transition from 0 to 1 over a larger distance range
  • Smooth Surface Bias: The network learns to create smooth, gradual density changes rather than sharp surface boundaries
  • Blurred Geometry: Surfaces appear "fuzzy" with soft edges, making it harder to define precise surface locations
  • Gradient Smoothing: Gradients are smoother and more stable during training, reducing noise

Low β (e.g., β = 0.01):

  • Sharp Density Falloff: The Laplace CDF creates a very steep transition near the zero level set
  • Sharp Surface Bias: The network learns to create sharp, well-defined surface boundaries
  • Precise Geometry: Surfaces have crisp edges with clear inside/outside distinctions
  • Gradient Instability: Gradients can be very large near surfaces, potentially causing training instability

Question 2: Would an SDF be easier to train with volume rendering and low β or high β? Why?

High β is easier to train with volume rendering because:

  • Smoother Gradients: The gradual density falloff provides smoother, more stable gradients during backpropagation
  • Better Convergence: The network doesn't need to learn extremely sharp transitions, making optimization more stable
  • Reduced Noise: Smoother density functions reduce the impact of sampling noise during volume rendering
  • Broader Learning Signal: The wider falloff provides learning signals over a larger spatial region around the surface
  • Numerical Stability: Less prone to numerical issues that can arise from very sharp density transitions

Low β is harder to train because:

  • Sharp Gradients: Very steep density transitions create large, potentially unstable gradients
  • Sampling Sensitivity: Small errors in sample point locations can lead to large changes in density values
  • Convergence Issues: The network must learn very precise surface locations, making optimization challenging

Question 3: Would you be more likely to learn an accurate surface with high β or low β? Why?

Low β is more likely to produce accurate surfaces because:

  • Sharp Surface Definition: The steep density falloff forces the network to learn precise zero-level sets
  • Clear Inside/Outside Distinction: Sharp transitions create well-defined surface boundaries
  • Geometric Accuracy: The SDF can represent fine details and sharp features that would be blurred with high β
  • True SDF Properties: Low β better approximates the ideal SDF behavior with sharp zero crossings

High β limitations for surface accuracy:

  • Surface Blurring: Gradual density falloff creates "fuzzy" surfaces that don't correspond to sharp geometric boundaries
  • Detail Loss: Fine geometric details get smoothed out due to the wide density transition
  • Imprecise Level Sets: The zero level set becomes less well-defined, reducing surface accuracy

Trade-off Summary: There's a fundamental trade-off between training stability (high β) and surface accuracy (low β). In practice, you often start with high β for stable training, then gradually reduce it or use a curriculum learning approach to achieve both stability and accuracy.

8. Neural Surface Extras (30 points)

8.1 Complex Scene with Sphere Tracing (10 points)

Created a complex scene with 16+ twisted primitives using SDF composition and the twist operation.

# Twist Operation Implementation def op_twist(primitive_sdf, points, k=10.0): x, y, z = points[..., 0], points[..., 1], points[..., 2] c = torch.cos(k * y) s = torch.sin(k * y) x_twisted = x * c - z * s z_twisted = x * s + z * c twisted_points = torch.stack([x_twisted, y, z_twisted], dim=-1) return primitive_sdf(twisted_points) # Complex Scene with Multiple Twisted Primitives class ComplexSceneSDF(torch.nn.Module): def get_distance(self, points): sdf = torch.full((points.shape[0], 1), 1000.0, device=points.device) # 1 Twisted Torus (center) twisted_torus = op_twist(lambda p: self.torus_sdf(p, 0.4, 0.15), torus_points, k=8.0) sdf = torch.minimum(sdf, twisted_torus) # 1 Twisted Box twisted_box = op_twist(lambda p: self.box_sdf(p, 0.3), box_points, k=6.0) sdf = torch.minimum(sdf, twisted_box) # 1 Twisted Sphere twisted_sphere = op_twist(lambda p: self.sphere_sdf(p, 0.25), sphere_points, k=12.0) sdf = torch.minimum(sdf, twisted_sphere) # 5 Twisted Spheres (pentagon pattern) for i in range(5): angle = i * 2 * np.pi / 5 x_offset = 2.5 * np.cos(angle) z_offset = 2.5 * np.sin(angle) center = torch.tensor([x_offset, 0.0, z_offset], device=points.device) local_points = points - center twisted_local = op_twist(lambda p: self.sphere_sdf(p, 0.2), local_points, k=15.0 + i * 2) sdf = torch.minimum(sdf, twisted_local) # 8 Twisted Tori (outer ring) for i in range(8): angle = i * 2 * np.pi / 8 x_offset = 3.5 * np.cos(angle) z_offset = 3.5 * np.sin(angle) center = torch.tensor([x_offset, 0.0, z_offset], device=points.device) local_points = points - center twisted_local = op_twist(lambda p: self.torus_sdf(p, 0.2, 0.08), local_points, k=10.0 + i) sdf = torch.minimum(sdf, twisted_local) return sdf
Complex twisted scene

Complex Scene with 16+ Twisted Primitives

Scene Composition:
  • 16+ Primitives: 1 center torus, 1 box, 1 sphere, 5 pentagon spheres, 8 outer tori
  • Twist Operations: Different twist strengths (k=6 to k=17) for visual variety
  • SDF Union: Using torch.minimum for proper SDF composition
  • Efficient Rendering: Sphere tracing handles complex geometry efficiently

8.2 Fewer Training Views (10 points)

Experimented with training VolSDF and NeRF using only 20 views instead of 100, comparing reconstruction quality.

VolSDF 100 views

VolSDF: 100 Views

VolSDF 20 views

VolSDF: 20 Views

NeRF 100 views

NeRF: 100 Views

NeRF 20 views

NeRF: 20 Views

Few-View Training Analysis:

VolSDF Performance:

  • Quality Degradation: Rendering results show much blurrier and less detailed with fewer views

NeRF Performance:

  • Better than Expected: NeRF surprisingly performs well with 20 views
  • View Dependence: View-dependent effects are still captured reasonably well
  • Generalization: NeRF's implicit regularization helps with few-view scenarios

Key Insights:

  • NeRF Resilience: NeRF's view-dependent modeling and positional encoding provide good few-view performance
  • Quality Trade-off: Both methods show some quality degradation but remain usable with 20 views

8.3 Alternate SDF to Density Conversions (10 points)

Implemented and compared the naive NeuS solution for SDF-to-density conversion as an alternative to VolSDF.

# NeuS Naive SDF-to-Density Conversion def sdf_to_density_naive_neus(signed_distance, alpha, beta): s = 1.0 / beta x = -signed_distance exp_term = torch.exp(-s * x) logistic_density = s * exp_term / torch.square(1 + exp_term) return alpha * logistic_density
NeuS Naive Formula:
σ(x) = α · s · e-s·(-f(x)) / (1 + e-s·(-f(x))
where s = 1/β
NeuS naive rendering

NeuS Naive SDF-to-Density Conversion

SDF-to-Density Method Comparison:

Method Formula Advantages Disadvantages
VolSDF α · Φβ(-f(x)) Well-theorized, smooth gradients, proven performance Requires careful parameter tuning
NeuS Naive α · s · e-sx / (1 + e-sx Simple implementation, logistic distribution May have different convergence properties

Observations:

  • NeuS Naive Results: Produces blurrier but reasonable color rendering but shows emptygeometry