Learning for 3D Vision HW3

0. Transmittance Calculation

Transmittance computation

$$ \begin{align*} T(y_1, y_2) &= e^{-(1 * 2)} = e^{-2} \approx 0.135\\ T(y_2, y_4) &= e^{-(0.5*1 + 10 * 3)} = e^{-30.5} \approx 5.68 \times 10^{-14} \\ T(x, y_4) &= e^{-(1 * 2 + 0.5*1 + 10 * 3)} = e^{-32.5} \approx 7.68 \times 10^{-15} \\ T(x, y_3) &= e^{-(1 * 2 + 0.5*1)} = e^{-2.5} \approx 8.21 \times 10^{-2} \end{align*} $$

1. Differentiable Volume Rendering

1.3. Ray sampling

gird ray

1.4. Point sampling

sample

1.5. Volume rendering

part1 depth

2. Optimizing a basic implicit volume

2.2. Loss and training

Box center: (0.25, 0.25, 0.00); Box side lengths: (2.01, 1.50, 1.50)

2.3. Visualization

part2

3. Optimizing a Neural Radiance Field (NeRF)

part3

4. NeRF Extras

4.1 View Dependence

Trade-offs between increased view dependence and generalization quality: While adding view dependence to models, the models will try to memorize what should the output be, damaging the geralization quality. Follows what was done in NeRF paper, the embeddings of view is added after the prediction of density and one linear layer to output rgb value.

highmaterial

4.2 Coarse/Fine Sampling

fine

The coarse–fine sampling strategy helps us sample more meaningful points along each ray, improving the representation’s accuracy. However, it may increase computational cost by about 1.5×, since it uses 64 points per ray in the coarse network and 128 points per ray in the fine network.

5. Sphere Tracing

part5

In every iteration, we query SDF to get the distance between the current position to the suface, then we move our points in the direction with that amount, if the SDF output is less than certain value i.e. 1e-3, then we classify it as itersection with surface.

6. Optimizing a Neural SDF

part6

The network architecture is similar as NeRF, 8 layers MLP with hidden dimension 256 output the distance and followed by 4 layer MLP with hidden dimension 128 output the rgb value. The eikonal loss is implement with the norm of gradient should be 1, with a L2 loss penalty.

7. VolSDF

  1. With a high beta, the model learns a smoother representation, and with a low beta, it learns a sharper one.
  2. The SDF can be learned more easily with a high beta, since the surface is smoother.
  3. A low beta is better for producing a more accurate and sharper surface.
Geometry Result
part7geo part7

A beta value of 0.05 provides a sharp enough boundary, and an alpha value of 10 provides stable values for empty space and the density inside the object.

8. Neural Surface Extras

8.1 Render a Large Scene with Sphere Tracing

part8multiobj

8.2 Fewer Training Views

NeRF VolSDF
part8nerf part8volsdf

When decrease the amount of images to 20, NeRF fails to learn the representation, output images are fully black images and VolSDF remain reasonable quality.

8.3 Alternate SDF to Density Conversions

Using the function in NeuS to convert SDF to density, which is the derivative of sigmoid function with a scale factor s, here we select s=100 for experiments.

VolSDF NeuS
part8ori part8neus
part8ori part8neus

The left image shows the original function, and the right one shows the NeuS function. The visual quality remains nearly the same after the function change; the images from NeuS seems brighter. The geometry visualization appears broken, but this may simply be because we didn’t sample with sufficient precision, so it skip the suface.