Part 1¶

Part 1.3¶

Part 1.4¶

Part 1.5¶

Part 2.3¶
| TA | Mine |
|---|---|
![]() |
![]() |
Part 3¶

Part 4¶
Part 4.1 View Dependence¶
| View Independent | View Dependent |
|---|---|
![]() |
![]() |
View independent model has a simple diffuse shading, while view dependent emission allows the model to more accurately express effects such as specular highlights. However, this also makes the model more complex and potentially harder to generalize - it must learn not only the underlying geometry and diffuse appearance, but also how appearance changes with viewing direction. As a result, view-dependent emission can capture realistic lighting effects like reflections and gloss, while a purely diffuse (view-independent) model tends to produce more matte, consistent surfaces that may miss such details.
Part 4.2 Coarse/Fine Network¶
| Single | Coarse/Fine |
|---|---|
![]() |
![]() |
At the cost of speed/extra compute (halves speed on my machine), coarse/fine sampling draws extra queries to the network to get a better estimate of the density, ideally leading to faster convergence than simple stratified sampling.
Part 5¶
| TA | Mine |
|---|---|
![]() |
![]() |
I initialize each ray's point at its origin and a travel distance t=0, then keep a boolean active set and a hit mask. For up to max_iters, I only process active rays: gather their points, query the SDF, and mark rays as converged (hits) when signed distance <= 1e-6, deactivating those. For the rest, I sphere-trace one step along the ray by |sdf| (i.e., p += d * |sdf|), update points and accumulate t. I also apply a (optional) bailout that deactivates rays whose t exceeds 1e3 (treated as misses). Finally, I return the last points and the hit mask as (N, 1).
Part 6¶
| Input | Output |
|---|---|
![]() |
![]() |
Part 7¶
| Appearance | Geometry |
|---|---|
![]() |
![]() |
Hyperparameter Tuning¶
| alpha/beta | 0.05 | 0.1 | 0.5 |
|---|---|---|---|
| 1 | ![]() |
![]() |
![]() |
| 10 | ![]() |
![]() |
![]() |
| 20 | ![]() |
![]() |
![]() |
A high $\beta$ biases the learned SDF toward smoother, more diffuse transitions around the surface—effectively "blurring" the boundary and producing a thicker, softer density region. Conversely, a low $\beta$ enforces a sharper transition, making the SDF more like a true signed distance field with a thin, well-defined surface.
An SDF is generally easier to train with a higher $\beta$, since the smoother transition provides more non-zero gradients during volume rendering, making optimization more stable. However, I think the model is more likely to learn an accurate and crisp surface with a lower $\beta$, as it encourages a tight boundary close to the true geometry.
The default VolSDF parameters already strike a good balance: they yield solid, complete geometry with visually consistent appearance. Other hyperparameter choices tend to push the model toward extremes—too large a $\beta$ causes overly soft or blurry mesh, while too small a $\beta$ or α can result in holes or even an empty mesh. Smaller $\beta$ coupled with small $\alpha$ made the appearance look gaseous.
Here, $\alpha$ sets the overall density magnitude (higher $\alpha$ means higher density), while $\beta$ controls the spread of the CDF around the zero level set (larger $\beta$ means blurrier boundary, smaller $\beta$ means thinner surface).
Part 8.2¶
| type | 5 views | 20 views | 50 views |
|---|---|---|---|
| volsdf | ![]() |
![]() |
![]() |
| NeRF | ![]() |
![]() |
![]() |
With only 5 input views, VolSDF struggles to reconstruct meaningful structure - the geometry collapses into a roughly spherical blob, and the rendered appearance becomes a blurry mess with little spatial consistency. NeRF, on the other hand, captures a faint but more globally consistent shape: although the rendering appears ghostly and semi-transparent, the overall appearance aligns better with the true scene.
At 20 views, both methods improve significantly. VolSDF begins to form correct geometry, showing well-defined surfaces, while NeRF’s appearance becomes much sharper and more realistic. However, NeRF still holds an advantage in appearance fidelity, producing clearer textures and fewer artifacts.
At 50 views, the reconstructions from both methods further refine. VolSDF gains smoother and more stable geometry, and NeRF achieves even crisper appearance and lighting consistency. While both models benefit from the increased supervision, NeRF continues to outperform VolSDF in visual realism, though the gap narrows slightly in terms of geometry.
Part 8.3¶
Naive NeuS SDF to Density conversion:
| Type | Appearance | Geometry |
|---|---|---|
| Naive NeuS | ![]() |
![]() |
| VolSDF | ![]() |
![]() |
Comparing VolSDF and the naive NeuS implementation reveals little visible difference in the final reconstructions — both produce similar geometry and appearance under comparable training settings. However, the key distinction lies in how they handle the SDF-to-density conversion. VolSDF relies on fixed hyperparameters that must be manually tuned to balance surface sharpness and opacity, while naive NeuS introduces a learnable parameter that automatically adapts the mapping during training. I believe this makes Naive NeuS more flexible across different scenes, reducing the need for manual tuning and often leading to more stable convergence even if the visual results appear similar.







































