Assignment 3: Neural Volume Rendering and Surface Rendering

A. Neural Volume Rendering (80 points)
B. Neural Surface Rendering (50 points)

A. Neural Volume Rendering (80 points)

0. Transmittance Calculation (10 points)

Computed the transmittance of a ray going through a non-homogeneous medium as required for volume rendering.

Transmittance Calculation Solution

Transmittance Formula: The transmittance T(t) represents the fraction of light that survives from the ray origin to distance t along the ray, calculated as T(t) = exp(-∫₀ᵗ σ(s) ds) where σ(s) is the density at distance s.

1. Differentiable Volume Rendering (30 points)

In this section, we implemented the core components of a differentiable volume rendering pipeline: ray generation, point sampling, and volume rendering with transmittance calculation.

1.3 Ray Sampling (5 points)

Implemented ray generation from camera parameters, converting from pixel coordinates to world-space rays through NDC space transformation.

# Ray Generation Implementation
def get_pixels_from_image(image_size, camera):
    W, H = image_size[0], image_size[1]
    device = camera.device
    x = torch.linspace(0, W-1, W, device=device)
    y = torch.linspace(0, H-1, H, device=device)
    x = 2 * x / (W - 1) - 1  # Convert to [-1, 1]
    y = 2 * y / (H - 1) - 1  # Convert to [-1, 1]
    xy_grid = torch.stack(
        tuple(reversed(torch.meshgrid(y, x))), dim=-1
    ).view(W * H, 2)
    return -xy_grid

def get_rays_from_pixels(xy_grid, image_size, camera):
    W, H = image_size[0], image_size[1]
    device = camera.device
    ndc_points = xy_grid.to(device)
    ndc_points = torch.cat([
        ndc_points,
        torch.ones_like(ndc_points[..., -1:])
    ], dim=-1)
    world_pts = camera.unproject_points(ndc_points, world_coordinates=True, from_ndc=True)
    camera_center = camera.get_camera_center()
    rays_o = camera_center.expand(world_pts.shape[0], -1)
    rays_d = F.normalize(world_pts - rays_o)
    return RayBundle(rays_o, rays_d, torch.zeros_like(rays_o).unsqueeze(1), torch.zeros_like(rays_o).unsqueeze(1))
                    

Pixel Grid in NDC Space

Generated Rays from Camera

1.4 Point Sampling (5 points)

Implemented stratified sampling along rays to generate 3D sample points for volume evaluation.

# Stratified Sampling Implementation
class StratifiedRaysampler(torch.nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.n_pts_per_ray = cfg.n_pts_per_ray
        self.min_depth = cfg.min_depth
        self.max_depth = cfg.max_depth

    def forward(self, ray_bundle):
        device = ray_bundle.origins.device
        z_vals = torch.linspace(self.min_depth, self.max_depth, self.n_pts_per_ray, device=device)
        origins = ray_bundle.origins
        directions = ray_bundle.directions
        z_vals = z_vals.unsqueeze(0).expand(origins.shape[0], -1)
        origins_expanded = origins.unsqueeze(1)
        directions_expanded = directions.unsqueeze(1)
        z_vals_expanded = z_vals.unsqueeze(-1)
        sample_points = origins_expanded + z_vals_expanded * directions_expanded
        return ray_bundle._replace(
            sample_points=sample_points,
            sample_lengths=z_vals_expanded,
        )
                    

Stratified Sample Points Along Rays

Sampling Strategy: Uniform sampling between near and far planes provides a good balance between coverage and computational efficiency for volume rendering.

1.5 Volume Rendering (20 points)

Implemented the core volume rendering equation with transmittance calculation and depth rendering.

# Volume Rendering Implementation
def _compute_weights(self, deltas, rays_density: torch.Tensor, eps: float = 1e-10):
    cumulative_density = torch.cumsum(rays_density * deltas, dim=-2)
    cumulative_density = torch.cat(
        [torch.zeros_like(cumulative_density[..., :1, :]), cumulative_density[..., :-1, :]],
        dim=-2
    )
    transmittance = torch.exp(-cumulative_density)
    alpha = 1 - torch.exp(-rays_density * deltas)
    weights = transmittance * alpha
    return weights

def _aggregate(self, weights: torch.Tensor, rays_feature: torch.Tensor):
    feature = torch.sum(weights * rays_feature, dim=-2)
    return feature
                    

Volume Rendering Equation:
C(r) = ∫ T(t) σ(r(t)) c(r(t), d) dt
where T(t) = exp(-∫₀ᵗ σ(r(s)) ds)

Volume Rendering of Box SDF

Depth Map (Normalized)

2. Optimizing a Basic Implicit Volume (10 points)

2.1 Random Ray Sampling (5 points)

Implemented efficient random ray sampling for training to reduce memory usage and enable batch processing.

# Random Ray Sampling Implementation
def get_random_pixels_from_image(n_pixels, image_size, camera):
    xy_grid = get_pixels_from_image(image_size, camera)
    total_pixels = xy_grid.shape[0]
    indices = torch.randperm(total_pixels)[:n_pixels]
    xy_grid_sub = xy_grid[indices]
    return xy_grid_sub
                    

Memory Optimization: Random sampling of rays instead of full image rendering significantly reduces GPU memory usage during training while maintaining good gradient coverage.

2.2 Loss and Training (5 points)

Implemented MSE loss for optimizing implicit volume parameters from ground truth images.

# Training Loss Implementation
loss = torch.nn.functional.mse_loss(out['feature'], rgb_gt)
                    

Optimized Box After Training

3. Optimizing a Neural Radiance Field (NeRF) (20 points)

Implemented a complete NeRF architecture with positional encoding, density and color prediction, and view dependence.

# NeRF MLP Architecture
class NeuralRadianceField(torch.nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.harmonic_embedding_xyz = HarmonicEmbedding(3, cfg.n_harmonic_functions_xyz)
        self.harmonic_embedding_dir = HarmonicEmbedding(3, cfg.n_harmonic_functions_dir)
        
        # XYZ network
        self.layers_xyz = torch.nn.ModuleList()
        for i in range(cfg.n_layers_xyz):
            if i == 0:
                self.layers_xyz.append(torch.nn.Linear(embedding_dim_xyz, hidden_dims[0]))
            elif i == 4:  # Skip connection
                self.layers_xyz.append(torch.nn.Linear(embedding_dim_xyz + hidden_dims[0], hidden_dims[0]))
            else:
                self.layers_xyz.append(torch.nn.Linear(hidden_dims[0], hidden_dims[0]))
        
        # Density and feature prediction
        self.layer_sigma = torch.nn.Sequential(torch.nn.Linear(hidden_dims[0], 1), torch.nn.ReLU())
        self.layer_feature = torch.nn.Sequential(torch.nn.Linear(hidden_dims[0], hidden_dims[0]), torch.nn.ReLU())
        
        # Direction network for view dependence
        self.layers_dir = torch.nn.Sequential(
            torch.nn.Linear(embedding_dim_dir + hidden_dims[0], hidden_dims[1]),
            torch.nn.ReLU(),
            torch.nn.Linear(hidden_dims[1], 3),
            torch.nn.Sigmoid()
        )
                    

NeRF on Lego Dataset

NeRF Architecture Features:

Positional Encoding: Harmonic embeddings for XYZ coordinates and viewing directions
Skip Connections: Residual connection at layer 4 for better gradient flow
Density Prediction: ReLU activation ensures non-negative density values
Color Prediction: Sigmoid activation constrains colors to [0,1] range

4. NeRF Extras (10 points)

4.1 View Dependence (10 points)

Extended NeRF with view dependence for realistic material rendering on the materials dataset.

View-Dependent NeRF on Materials Dataset

View Dependence Implementation:

Direction Encoding: Harmonic embedding of viewing directions
Feature Concatenation: Combined XYZ features with direction features for color prediction
Material Effects: Successfully captures specular reflections and material properties

B. Neural Surface Rendering (50 points)

5. Sphere Tracing (10 points)

Implemented sphere tracing algorithm for efficient SDF-based surface rendering.

# Sphere Tracing Implementation
def sphere_tracing(self, implicit_fn, origins, directions):
    device = origins.device
    n_rays = origins.shape[0]
    current_points = origins.clone()
    directions = F.normalize(directions, dim=-1)
    mask = torch.ones(n_rays, 1, dtype=torch.bool, device=device)
    
    for iteration in range(self.max_iters):
        signed_distances = implicit_fn(current_points)
        converged = torch.abs(signed_distances) < 1e-6
        mask = mask & ~converged
        if not mask.any():
            break
        current_points = current_points + directions * signed_distances
        distances_from_origin = torch.norm(current_points - origins, dim=-1, keepdim=True)
        mask = mask & (distances_from_origin < self.far)
        if not mask.any():
            break
    
    final_distances = implicit_fn(current_points)
    final_mask = torch.abs(final_distances) < 1e-4
    return current_points, final_mask
                

Sphere Tracing of Torus SDF

Sphere Tracing Algorithm:

Ray Marching: Step along rays by SDF distance at each point
Convergence Detection: Stop when SDF value is below threshold

6. Optimizing a Neural SDF (15 points)

Implemented neural SDF training with eikonal regularization for point cloud reconstruction.

# Neural SDF Implementation
class NeuralSurface(torch.nn.Module):
    def __init__(self, cfg):
        super().__init__()
        self.harmonic_embedding_xyz = HarmonicEmbedding(3, cfg.n_harmonic_functions_xyz)
        # Distance network with skip connection
        self.layers_distance = torch.nn.ModuleList()
        for layeri in range(self.n_layers_distance):
            if layeri == 0:
                self.layers_distance.append(torch.nn.Linear(self.embedding_dim_xyz, hidden_dims[0]))
            elif layeri == self.skip_ind:  # Skip connection
                self.layers_distance.append(torch.nn.Linear(self.embedding_dim_xyz + hidden_dims[0], hidden_dims[0]))
            else:
                self.layers_distance.append(torch.nn.Linear(hidden_dims[0], hidden_dims[0]))
        self.layer_sigma = torch.nn.Linear(hidden_dims[0], 1)

# Eikonal Loss Implementation
def eikonal_loss(gradients):
    gradient_norms = torch.norm(gradients, dim=-1)
    eikonal_constraint = torch.square(gradient_norms - 1.0)
    return eikonal_constraint.mean()
                

Input Point Cloud (Bunny)

Neural SDF Reconstruction

Neural SDF Training:

Point Cloud Loss: MSE between predicted distances and zero (surface constraint)
Eikonal Regularization: Enforces unit gradient magnitude for valid SDF properties
Skip Connections: Residual connections improve gradient flow and learning
Surface Quality: Produces smooth, watertight surfaces from sparse point clouds

7. VolSDF (15 points)

Implemented VolSDF combining SDFs with volume rendering using Laplace CDF for SDF-to-density conversion.

# SDF to Density Conversion (VolSDF)
def sdf_to_density(signed_distance, alpha, beta):
    lap_dist = torch.distributions.laplace.Laplace(0, beta)
    return alpha * lap_dist.cdf(-signed_distance)

# Neural Surface with Color Prediction
def get_distance_color(self, points):
    points = points.view(-1, 3)
    xyz = points
    h = self.harmonic_embedding_xyz(points)
    x = h
    for i, layer in enumerate(self.layers_distance):
        if i == 0: x = h
        elif i == self.skip_ind: x = torch.cat((x, h), dim=-1)
        x = layer(x)
        x = self.relu(x)
    distance = self.layer_sigma(x)
    x = torch.cat((x, xyz), dim=-1)  # Skip connection for color
    for layer in self.layers_color:
        x = layer(x)
    color = x
    return distance, color
                

VolSDF SDF-to-Density Conversion:
σ(x) = α · Φ_β(-f(x))
where Φ_β is the Laplace CDF with parameter β

VolSDF Color Rendering

VolSDF Geometry (SDF-based)

VolSDF Parameter Analysis:

Alpha (α) Parameter:

High α: Higher density values, more opaque surfaces, better geometry definition
Low α: Lower density values, more transparent surfaces, softer geometry

Beta (β) Parameter Analysis:

High β: Wider density falloff, smoother surfaces, easier training but less sharp geometry
Low β: Sharper density falloff, more precise surfaces, harder training but better geometry quality

VolSDF Beta Parameter Questions

Question 1: How does high β bias your learned SDF? What about low β?

High β (e.g., β = 0.1):

Wider Density Falloff: The Laplace CDF creates a gradual transition from 0 to 1 over a larger distance range
Smooth Surface Bias: The network learns to create smooth, gradual density changes rather than sharp surface boundaries
Blurred Geometry: Surfaces appear "fuzzy" with soft edges, making it harder to define precise surface locations
Gradient Smoothing: Gradients are smoother and more stable during training, reducing noise

Low β (e.g., β = 0.01):

Sharp Density Falloff: The Laplace CDF creates a very steep transition near the zero level set
Sharp Surface Bias: The network learns to create sharp, well-defined surface boundaries
Precise Geometry: Surfaces have crisp edges with clear inside/outside distinctions
Gradient Instability: Gradients can be very large near surfaces, potentially causing training instability

Question 2: Would an SDF be easier to train with volume rendering and low β or high β? Why?

High β is easier to train with volume rendering because:

Smoother Gradients: The gradual density falloff provides smoother, more stable gradients during backpropagation
Better Convergence: The network doesn't need to learn extremely sharp transitions, making optimization more stable
Reduced Noise: Smoother density functions reduce the impact of sampling noise during volume rendering
Broader Learning Signal: The wider falloff provides learning signals over a larger spatial region around the surface
Numerical Stability: Less prone to numerical issues that can arise from very sharp density transitions

Low β is harder to train because:

Sharp Gradients: Very steep density transitions create large, potentially unstable gradients
Sampling Sensitivity: Small errors in sample point locations can lead to large changes in density values
Convergence Issues: The network must learn very precise surface locations, making optimization challenging

Question 3: Would you be more likely to learn an accurate surface with high β or low β? Why?

Low β is more likely to produce accurate surfaces because:

Sharp Surface Definition: The steep density falloff forces the network to learn precise zero-level sets
Clear Inside/Outside Distinction: Sharp transitions create well-defined surface boundaries
Geometric Accuracy: The SDF can represent fine details and sharp features that would be blurred with high β
True SDF Properties: Low β better approximates the ideal SDF behavior with sharp zero crossings

High β limitations for surface accuracy:

Surface Blurring: Gradual density falloff creates "fuzzy" surfaces that don't correspond to sharp geometric boundaries
Detail Loss: Fine geometric details get smoothed out due to the wide density transition
Imprecise Level Sets: The zero level set becomes less well-defined, reducing surface accuracy

Trade-off Summary: There's a fundamental trade-off between training stability (high β) and surface accuracy (low β). In practice, you often start with high β for stable training, then gradually reduce it or use a curriculum learning approach to achieve both stability and accuracy.

8. Neural Surface Extras (30 points)

8.1 Complex Scene with Sphere Tracing (10 points)

Created a complex scene with 16+ twisted primitives using SDF composition and the twist operation.

# Twist Operation Implementation
def op_twist(primitive_sdf, points, k=10.0):
    x, y, z = points[..., 0], points[..., 1], points[..., 2]
    c = torch.cos(k * y)
    s = torch.sin(k * y)
    x_twisted = x * c - z * s
    z_twisted = x * s + z * c
    twisted_points = torch.stack([x_twisted, y, z_twisted], dim=-1)
    return primitive_sdf(twisted_points)

# Complex Scene with Multiple Twisted Primitives
class ComplexSceneSDF(torch.nn.Module):
    def get_distance(self, points):
        sdf = torch.full((points.shape[0], 1), 1000.0, device=points.device)
        
        # 1 Twisted Torus (center)
        twisted_torus = op_twist(lambda p: self.torus_sdf(p, 0.4, 0.15), torus_points, k=8.0)
        sdf = torch.minimum(sdf, twisted_torus)
        
        # 1 Twisted Box
        twisted_box = op_twist(lambda p: self.box_sdf(p, 0.3), box_points, k=6.0)
        sdf = torch.minimum(sdf, twisted_box)
        
        # 1 Twisted Sphere
        twisted_sphere = op_twist(lambda p: self.sphere_sdf(p, 0.25), sphere_points, k=12.0)
        sdf = torch.minimum(sdf, twisted_sphere)
        
        # 5 Twisted Spheres (pentagon pattern)
        for i in range(5):
            angle = i * 2 * np.pi / 5
            x_offset = 2.5 * np.cos(angle)
            z_offset = 2.5 * np.sin(angle)
            center = torch.tensor([x_offset, 0.0, z_offset], device=points.device)
            local_points = points - center
            twisted_local = op_twist(lambda p: self.sphere_sdf(p, 0.2), local_points, k=15.0 + i * 2)
            sdf = torch.minimum(sdf, twisted_local)
        
        # 8 Twisted Tori (outer ring)
        for i in range(8):
            angle = i * 2 * np.pi / 8
            x_offset = 3.5 * np.cos(angle)
            z_offset = 3.5 * np.sin(angle)
            center = torch.tensor([x_offset, 0.0, z_offset], device=points.device)
            local_points = points - center
            twisted_local = op_twist(lambda p: self.torus_sdf(p, 0.2, 0.08), local_points, k=10.0 + i)
            sdf = torch.minimum(sdf, twisted_local)
        
        return sdf
                    

Complex Scene with 16+ Twisted Primitives

Scene Composition:

16+ Primitives: 1 center torus, 1 box, 1 sphere, 5 pentagon spheres, 8 outer tori
Twist Operations: Different twist strengths (k=6 to k=17) for visual variety
SDF Union: Using torch.minimum for proper SDF composition
Efficient Rendering: Sphere tracing handles complex geometry efficiently

8.2 Fewer Training Views (10 points)

Experimented with training VolSDF and NeRF using only 20 views instead of 100, comparing reconstruction quality.

VolSDF: 100 Views

VolSDF: 20 Views

NeRF: 100 Views

NeRF: 20 Views

Few-View Training Analysis:

VolSDF Performance:

Quality Degradation: Rendering results show much blurrier and less detailed with fewer views

NeRF Performance:

Better than Expected: NeRF surprisingly performs well with 20 views
View Dependence: View-dependent effects are still captured reasonably well
Generalization: NeRF's implicit regularization helps with few-view scenarios

Key Insights:

NeRF Resilience: NeRF's view-dependent modeling and positional encoding provide good few-view performance
Quality Trade-off: Both methods show some quality degradation but remain usable with 20 views

8.3 Alternate SDF to Density Conversions (10 points)

Implemented and compared the naive NeuS solution for SDF-to-density conversion as an alternative to VolSDF.

# NeuS Naive SDF-to-Density Conversion
def sdf_to_density_naive_neus(signed_distance, alpha, beta):
    s = 1.0 / beta
    x = -signed_distance
    exp_term = torch.exp(-s * x)
    logistic_density = s * exp_term / torch.square(1 + exp_term)
    return alpha * logistic_density
                    

NeuS Naive Formula:
σ(x) = α · s · e^-s·(-f(x)) / (1 + e^-s·(-f(x)))²
where s = 1/β

NeuS Naive SDF-to-Density Conversion

SDF-to-Density Method Comparison:

Method	Formula	Advantages	Disadvantages
VolSDF	α · Φ_β(-f(x))	Well-theorized, smooth gradients, proven performance	Requires careful parameter tuning
NeuS Naive	α · s · e^-sx / (1 + e^-sx)²	Simple implementation, logistic distribution	May have different convergence properties

Observations:

NeuS Naive Results: Produces blurrier but reasonable color rendering but shows emptygeometry