Learning for 3D Vision: Assignment 3

A. Neural Volume Rendering

0. Transmittance Calculation

1. Differentiable Volume Rendering

1.3. Ray Sampling

Visualization
XY Grid
Rays

1.4. Point Sampling

Visualization

1.5. Volume Rendering

Visualization
Box
Depth Image

2. Optimizing a Basic Implicit Volume

2.2. Loss and Training

2.3. Visualization

Visualization
Before TrainingAfter Training

3. Optimizing a Neural Radiance Field (NeRF)

Visualization
Epoch
10204090190240

NeRF Extras

4.1. View Dependence

Visualization
LegoMaterials
Epoch
1060240

Tradeoff between View Dependence and Generalization Quality

Adding view dependence in the NeRF model causes it to be more prone to overfitting as the model is able to assign different colors based on the view since now it has view information as well. This reduces the generalization quality of the model as compared to the vanilla NeRF model as that tries to figure out a general color which would work from all viewing directions. However, by reducing the complexity of the model after the direction input is passed, the risk of overfitting can be reduced.

4.2. Coarse/Fine Sampling

Visualization
Epoch
103070150

Tradeoff between Speed and Quality

This method involves training two models simultaneously: With the help of the outputs of the coarse model, the fine model is able to probabilistically sample more relevant points along the rays, leading to more efficient learning. This reduces the number of epoch in which the model converges and the quality of the generated output as well. However, since two models are now being trained, the speed of training reduces significantly. This drawback can be managed to an extent by reducing the complexity of the coarse model, or by reducing the number of coarse points sampled without significantly reducing the quality of the fine model.

B. Neural Surface Rendering

5. Sphere Tracing

Visualization

Implementation Details

Sphere tracing involves moving along the ray in increments of the sine distance function (SDF) at the current point. This works as it ensures we never move more than the minimum distance at which a surface of the object is present in any direction, thus guaranteeing convergence iteratively. This is implemented by heuristically setting a maximum number of steps and an epsilon, below which the algorithm is terminated. For a vectorized implementation, a mask is maintained which indicates whether a ray has reached the surface of an object or not. Now, on looping over the maximum number of steps and updating all points with the SDF values at their present values, the sphere tracing is completed.

6. Optimizing a Neural SDF

Visualization
InputOutput

MLP and Eikonal Loss

The MLP design is as follows: The Eikonal loss is used to ensure the property of SDFs wherein they provide the distance of the input point to the closest point on the surface. This can be done by ensuring that the norm of the gradient of the learned function is 1 in the direction it is pointing in.

7. VolSDF

Visualization
GeometryRendering

Hyperparameters Used

After experimenting by changing a few different hyperparameters, the best results were obtained by slighly increasing the Eikonal weight to 0.05, and using a weight of 1.5 on the distance loss. I also tried changing the MLP architecture a bit, but increasing the number of layers resulted in overfitting and thus this wasn't used. Changing the beta value caused a more blurry output, but also resulted in faster convergence.

SDF to Density

  1. A high beta makes the laplacian distribution smoother, ie. it drops to zero slowly. When beta is low, the distribution is more "pointed" at 0, and thus the density around the surface would drop down much quicker. For the purposes of our model, we would want the learned beta to be as low as possible (but positive), so that we can be sure that the model has learned the surface boundary well.
  2. An SDF is easier to train with a high beta, as it would give more non-zero points for the model to learn based on. It does create a blur around the object surface though, and thus over time, the value of beta should be reduced to ensure a sharper result.
  3. An accurate surface would be learned with a low beta value, as it ensure that the density around a surface drops to zero quickly.

8. Neural Surface Extras

8.1. Render a Large Scene with Sphere Tracing

Visualization
25 evenly spaced tori rendered quickly using sphere tracing (~30s-1m for the entire GIF), that look like bean bags around a disco ball.

8.2. Fewer Training Views

Visualization
ModelRenderingGeometry
VolSDF
NeRF

Comparison between VolSDF and NeRF

The NeRF model requires much more training epochs than the VolSDF model to converge, and starts off much more hazily as compared to the VolSDF model. Carefully tuning of the hyperparameters was required to get the NeRF model to converge and avoid overfitting.

8.3. Alternate SDF to Density Conversions

Using the 'naive' solution from the NeuS paper

Visualization
LegoMaterials
Epoch
102080120240
The naive solution works by taking the derivative of the sigmoid function, which results in a function that peaks at the origin, and reduces gradually around it. However, the initialization of the standard deviation, s, is extremely important as a low value of s makes the function extremely hazy and the model underfits.