Learning for 3D Vision: Assignment 3

Neural Volume Rendering

CMU 16825 Learning for 3D Vision

Tushar Nayak [tusharn]

Q0. Transmittance Calculation

resp

Q1. DIfferentiable Volume Rendering: Ray Sampling

Q1. Differentiable Volume Rendering: Point Sampling

Q1. DIfferentiable Volume Rendering: Volume Sampling

Q2. Implicit Volume Optimization: Ray Sampling

Box center: (0.25, 0.25, -0.0005)

Box side lengths: (2.00, 1.50, 1.50)

Q3. Neural Radiance Field optimization

The output above comes from the NeRF, but it currently does not account for view-dependent effects. It uses ReLU activation to convert the first component of the network's output into density, while the remaining outputs are passed through a Sigmoid function to produce color values. Additionally, HarmonicEmbedding is applied to enhance rendering quality.

Q4. Neural Radiance Field extras

Using view dependence leads to more realistic outcomes since color is predicted based on both position and ray direction. Incorporating ray direction embedding for color prediction, rather than relying solely on position, can significantly improve rendering quality, especially for objects whose surface colors change noticeably from different angles. As shown below, this approach produces realistic light reflections on the metallic-like surface. However, relying on view dependence carries the risk of overfitting to the training images, particularly if the dataset is small, lacks sufficient viewpoint diversity, or the model depends too heavily on view direction. Thus, it is essential to have a large, diverse dataset and a model that generalizes well.

Q5. Sphere Tracing

The points are iteratively updated, starting from the origins, by adding the distance to the nearest surface multiplied by the ray direction. A mask with the same batch dimension as the ray origins is maintained to indicate which rays have distances below the 1e-6 threshold. The loop is terminated when all rays have distances under this threshold or the maximum number of iterations has been reached.

Q5. Sphere Tracing

Q5. Optimizing Neural SDF

HarmonicEmbedding is used at the input to enable the MLP to capture high-frequency variations more effectively, often critical when representing intricate signals or fields. The MLP consists of 7 fully connected layers, each followed by a ReLU activation function except the final layer, increasing the learning capacity and non-linearity. The 4th layer includes a skip connection, which allows the previous layer's output to bypass and be concatenated or added at the 4th layer. This design helps with alleviating vanishing gradients and improves convergence.

Q7. VolSDF

How does high beta bias your learned SDF? What about low beta?

A high beta causes the density to taper off more gradually near the surface, leading to smoother and more diffused renderings. A low beta, on the other hand, makes the density drop sharply, producing crisper and more distinct boundaries. While a high beta can help simulate more realistic lighting effects, it also tends to blur the surface details.

Would an SDF be easier to train with volume rendering and low beta or high beta? Why?

An SDF is easier to train with a higher beta since a smoother gradient profile simplifies the optimization process and leads to more stable gradient-based learning.

Would you be more likely to learn an accurate surface with high beta or low beta? Why?

A lower beta typically leads to more precise surface reconstruction because it preserves sharper geometric features and captures finer details. In contrast, a high beta results in smoother, less distinct surfaces. Although high beta improves the realism of lighting and shading, it compromises the geometric accuracy, so choosing the right beta involves balancing detail vs. smoothness.

Hyperparameter Tuning