Question 0: Transmittance Calculation¶
Question 1: Differentiable Volume Rendering¶
1.3: Ray Sampling¶
Ray Visualization (left), Grid Visualization (right)
1.4: Point Sampling¶
1.5: Volume Rendering¶
Question 2: Optimizing Implicit Volume¶
2.2: Loss and Training¶
Box center: (0.25, 0.25, 0.00)
Box side lengths: (2.01, 1.50, 1.50)
2.3: Visualization¶
Question 3: Optimizing Neural Radiance Field (NeRF)¶
Question 4: NeRF Extras (View Dependence)¶
Increasing view dependence inherently decreases generalization quality because the view dependence is closely coupled with the rays we obtain from the training images. Thus, when generalizing to views that are different from the training images, the view directions are much different and the network will have a harder time generating accurate renders for these views.
However, with view dependence, we now have more accurate results for training views and views very close to training (new view is close to train view in world space).
Question 5: Sphere Tracing¶
To implement this, I did the following:
- Set points to be the origins
- Set accumulated distances for all the points to 0
- While max iters hasn't elapsed
- I calculate the distances of each point to the closest implicit surface
- I set my mask to true if the absolute value of these distances is less than epsilon (1e-3)
- This means the ray has intersected the surface
- I add distances to accumulated distances
- I set points to origins + accumulated distances * directions
- I check if the points' z values are less than near or greater than far
- If all the points' z values are less than near OR greater than far, I break out of the loop
- Finally I return both points and the mask
Question 6: Optimizing Neural SDF¶
My MLP network was structured as follows:
The hidden dimension was 128
- The input is first preprocessed to get harmonic embedding
- The input is then passed through a linear layer (harmonic --> hidden) + ReLU
- The input is then passed through 5 linear layers (harmonic + hidden --> hidden) + ReLU
- The input is then passed through a linear layer (hidden --> hidden) + ReLU
- Finally the input is passed through an SDF head which is linear layer (128 --> 1)
Eikonal Loss:
Since we want our SDF to be valid, the norm of the gradients must be 1.
Thus eikonal loss is calculated as follows:
- Take norm of each gradient
- Square each norm
- Subtract 1 from each norm
- Take average of the subtraction
Question 7: VolSDF¶
What is Alpha and Beta:
Alpha is the total density of the object while beta is the smoothing of the cumulative density function near the surface of the object.
Effect of Beta:
A high beta increases the smoothing that occurs at the implicit surface. When beta becomes infinity, there's uniform density throughout the volume. A low beta decreases the smoothing that occurs at the implicit surface. When beta becomes infinity, we get back a step function where density becomes 1 immediately at the implicit surface.
SDF would be easier to train with volume rendering and high beta. With a high beta we have more smoothing so the SDF doesn't have to predict 0 exactly at the surface and 1 everywhere else.
We'd be moe likely to learn an accurate surface with low beta because there's more density concentrated at the actual implicit surface, leading to a more accurate surface representation.
I used the parameters alpha = 10 and beta = 0.03.
I started off with the defaults of alpha = 10 and beta = 0.05. Given the discussion above, I first tried decreasing beta to 0.01. At this point, I started getting NaNs in the training (not enough smoothing of the density). So then I chose a middlepoint of 0.03 and got good results.
I then tried varying alpha in both directions. I tried alpha 5 and alpha 15. With alpha 5, the inside details of the shape became more blurry. With alpha 15, there wasn't a noticeable difference.
Question 8: Neural Surface Extras: Fewer Training Views¶
I used a total of 20 views.
NeRF:
100 views
20 views
VolSDF: 100 views (L), 20 views (R)
For VolSDF, reducing the number of views did significantly reduce the fidelity of the implicit surface and the radiance field. It is a little hard to see since the images are small, but if you look at the bulldozer's blade, the 100-view version has clear separation between the spokes of the blade. The 20-view version instead blurs this separation. Also the back wheels of the bulldozer are completely blurred in the 20-view version.
We do see a similar difference between the outputs in VolSDF as compared to NeRF. In the NeRF case, for the 20 views, the entire bulldozer seems more smoothed. Also the entire back of the bulldozer is blurry while the back of the 100-view bulldozer is more crisp.
These findings don't present strong evidence that including geometry helps with the 3d representation. However, I did try with only 20 views. Maybe increasing the number of views to 30 or 40 would allow the geometry to better guide the final rendering. Also, I used random training images. Maybe doing a linspace would give better results since we'll have even views of all directions around the bulldozer.