Homework 3: Neural Volume Rendering and Surface Rendering¶

Question 0¶

Question 0

Note: for T(x, $y_4$), I forgot to put parentheses around $-2 + 30.5$. That should be $-(2 + 30.5)$

Question 1¶

Question 1.3¶

XY Grid XY Rays
XY Grid XY Rays

Question 1.4¶

1.4

Question 1.5¶

Gif Depth

Question 2¶

Question 2.1¶

Implemented in get_random_pixels_from_image() in ray_utils.py

Question 2.2¶

Box center: (0.25, 0.25, 0.00)
Box side lengths: (2.01, 1.50, 1.50)

Question 2.3¶

Gif

Question 3¶

Gif

Question 4¶

4.1 View Dependence¶

Regular High Res
Reg Gif High Res Gif

Discussion¶

Adding view dependence to NeRF makes the emitted color more dependent on the viewing direction. Instead of assuming that a point always emits the same color at every direction, implementing view dependence makes the color formula dependent on the 3D position and the view direction. This accounts for photorealism since it captures how colors emitted from an object change with the viewpoint. However, the network may overfit and learn view direction patterns it sees in the training data instead of learning the geometry itself to help it generalize to unseen views.

Question 5¶

Gif

Implementation Specs¶

For this code, I utilized both an active mask and a surface mask. These masks track which points are still being updated and which ones hit the surface, respectively. I then iterate for the maximum number of iterations, updating the point along the ray direction and tracking the cumulative distance traveled along the ray thus far. I update the surface mask in this loop and update the active mask to only include points that have still not traveled the maximum distance or are above a small value (I set it to $e^{-4}$). If an SDF value goes below this small value of $e^{-4}$, then we mark it as a hit on the surface, and update the surface mask accordingly.

Question 6¶

Input Gif Predicted Gif

Implementation Specs¶

I implemented an MLP that had 8 hidden layers with 128 hidden neurons per layer. I encoded the points into harmonic embeddings and utilized skip connections for all but the input and last hidden layer in the MLP. The output layer returns a single distance value for each point.

For eikonal loss, I implemented the loss to be the mean of the difference of squares of the L2-norm of the gradient and 1. This ensures that the norm of the gradients is close to 1.

Question 7¶

  1. A high $\beta$ value biases the SDF by making the surface look blurrier (the boundary is not as sharp), so the surface is less precise geometrically. In contrast, a low $\beta$ value biases the SDF by making the surface look sharper and more geometrically precise, but it also makes it hard to optimize early on due to vanishing gradients.
  2. An SDF would be easier to train with volume rendering and a high $\beta$, since with a low $\beta$, we may encounter small/vanishing gradients, which could lead to a much slower convergence. In contrast, a high $\beta$ means the gradients will not vanish and span a larger space, thus making it more stable.
  3. You would learn a more accurate surface with a low $\beta$ because the model will localize near the actual surface, and therefore the geometry would be more precise.

Discussion¶

For this, I experimented with a few hyperparameters, namely 1) the default setting of $\alpha = 0.10$, $\beta = 0.05$, 2) increasing $\alpha$ to 0.15 (keeping $\beta$ the same), and 3) decreasing $\beta$ to 0.01. I found that the best setting was setting (2) where $\alpha$ to 0.15 since it yielded sharper, less blurrier images. I chose against option (3) because I was getting a black box instead of something close to the original gif generated for NeRF. Increasing $\alpha$ here contributed to sharper images because a higher $\alpha$ value increases the density of the region, which contributes to the strength of the color intensity at any given pixel in the gif.

Gif Geometry

Question 8¶

8.2 Fewer Training Views¶

For NeRF:

Base 20 views
base gif 20 views

For VolSDF:

Num Views Color Geometry
Base base gif geometry

| 20 views | base gif | geometry

Discussion¶

Here, I used 20 training views, randomly sampled, instead of the default setting. For both NeRF and VolSDF, we see that the 20-views output is much more blurry than the original output. There are less observable details in the 20-views version than in the original version.

However, for training and inference, for both NeRF and VolSDF, the 20-views version was much faster. In addition, the 20-views version for VolSDF was really close to the geometry of the default version, as it was just missing parts of the base. The base was also not as smooth as it was in the default setting.