Learning for 3D Vision: Assignment 3

Name: Ahish Deshpande
Andrew ID: ahishd

A. Neural Volume Rendering

0. Transmittance Calculation

1. Differentiable Volume Rendering

1.3. Ray Sampling

Visualization
XY Grid	Rays

1.4. Point Sampling

Visualization

1.5. Volume Rendering

Visualization
Box	Depth Image

2. Optimizing a Basic Implicit Volume

2.2. Loss and Training

Center of the box: (0.25, 0.25, 0.00)
Side Lengths: (2.0, 1.5, 1.5)

2.3. Visualization

Visualization

Before Training	After Training

3. Optimizing a Neural Radiance Field (NeRF)

Visualization

Epoch
10	20	40	90	190	240

NeRF Extras

4.1. View Dependence

Visualization
Lego	Materials

Epoch
10	60	240

Tradeoff between View Dependence and Generalization Quality

Adding view dependence in the NeRF model causes it to be more prone to overfitting as the model is able to assign different colors based on the view since now it has view information as well. This reduces the generalization quality of the model as compared to the vanilla NeRF model as that tries to figure out a general color which would work from all viewing directions. However, by reducing the complexity of the model after the direction input is passed, the risk of overfitting can be reduced.

4.2. Coarse/Fine Sampling

Visualization

Epoch
10	30	70	150

Tradeoff between Speed and Quality

This method involves training two models simultaneously:

A coarse model
A fine model

With the help of the outputs of the coarse model, the fine model is able to probabilistically sample more relevant points along the rays, leading to more efficient learning. This reduces the number of epoch in which the model converges and the quality of the generated output as well. However, since two models are now being trained, the speed of training reduces significantly. This drawback can be managed to an extent by reducing the complexity of the coarse model, or by reducing the number of coarse points sampled without significantly reducing the quality of the fine model.

B. Neural Surface Rendering

5. Sphere Tracing

Visualization

Implementation Details

Sphere tracing involves moving along the ray in increments of the sine distance function (SDF) at the current point. This works as it ensures we never move more than the minimum distance at which a surface of the object is present in any direction, thus guaranteeing convergence iteratively. This is implemented by heuristically setting a maximum number of steps and an epsilon, below which the algorithm is terminated. For a vectorized implementation, a mask is maintained which indicates whether a ray has reached the surface of an object or not. Now, on looping over the maximum number of steps and updating all points with the SDF values at their present values, the sphere tracing is completed.

6. Optimizing a Neural SDF

Visualization
Input	Output

MLP and Eikonal Loss

The MLP design is as follows:

Input: The positional embedding of the input point is passed as input.
Hidden layers: 6 hidden layers are used of dimension 256, with the ReLU activation function.
Skip Connections: Skip connections between the input and the second and fourth hidden layers are made.
Output: The output is of dimension one without an activation function to account for the fact that the distance could be a small or large, negative or positive value.

The Eikonal loss is used to ensure the property of SDFs wherein they provide the distance of the input point to the closest point on the surface. This can be done by ensuring that the norm of the gradient of the learned function is 1 in the direction it is pointing in.

7. VolSDF

Visualization
Geometry	Rendering

Hyperparameters Used

After experimenting by changing a few different hyperparameters, the best results were obtained by slighly increasing the Eikonal weight to 0.05, and using a weight of 1.5 on the distance loss. I also tried changing the MLP architecture a bit, but increasing the number of layers resulted in overfitting and thus this wasn't used. Changing the beta value caused a more blurry output, but also resulted in faster convergence.

SDF to Density

A high beta makes the laplacian distribution smoother, ie. it drops to zero slowly. When beta is low, the distribution is more "pointed" at 0, and thus the density around the surface would drop down much quicker. For the purposes of our model, we would want the learned beta to be as low as possible (but positive), so that we can be sure that the model has learned the surface boundary well.
An SDF is easier to train with a high beta, as it would give more non-zero points for the model to learn based on. It does create a blur around the object surface though, and thus over time, the value of beta should be reduced to ensure a sharper result.
An accurate surface would be learned with a low beta value, as it ensure that the density around a surface drops to zero quickly.

8. Neural Surface Extras

8.1. Render a Large Scene with Sphere Tracing

Visualization

25 evenly spaced tori rendered quickly using sphere tracing (~30s-1m for the entire GIF), that look like bean bags around a disco ball.

8.2. Fewer Training Views

Visualization
Model	Rendering	Geometry
VolSDF
NeRF

Comparison between VolSDF and NeRF

The NeRF model requires much more training epochs than the VolSDF model to converge, and starts off much more hazily as compared to the VolSDF model. Carefully tuning of the hyperparameters was required to get the NeRF model to converge and avoid overfitting.

8.3. Alternate SDF to Density Conversions

Using the 'naive' solution from the NeuS paper

Visualization
Lego	Materials

Epoch
10	20	80	120	240

The naive solution works by taking the derivative of the sigmoid function, which results in a function that peaks at the origin, and reduces gradually around it. However, the initialization of the standard deviation, s, is extremely important as a low value of s makes the function extremely hazy and the model underfits.