ylchen write-up with visuals and GIFs.
Depth is normalized by the maximum value for display. Also included is the grid image and the spiral render GIF from part 1.
Training converges with very low loss; the learned box center and side lengths are reported by the script.
Box center: (0.25022637844085693, 0.2505774199962616, -0.0004850137047469616) Box side lengths: (2.0051112174987793, 1.503594994544983, 1.5033595561981201)
Spiral render of the optimized volume plus snapshots before/after training.
Qualitative object appearance across view sweeps.
Examples above (materials/lego) illustrate view-dependent effects in the learned radiance field.
Left: input point cloud used for training. Right: predicted SDF rendering.
My neural SDF uses an MLP that maps a 3D point to a scalar signed distance value, where negative values indicate inside the surface. The final layer is linear to allow both positive and negative outputs. The Eikonal loss enforces that the gradient norm of the SDF is close to 1: Lₑₖ = Eₓ[(‖∇ₓ f(x)‖₂ - 1)²] This encourages f(x) to behave like a valid SDF, resulting in smoother and more accurate surfaces.
Alpha controls how strongly density contributes to volume accumulation during rendering while Beta controls how rapidly density increases as points approach the surface. TLDR: Beta affects how crisp or blurry our surfaces are while Alpha affects how visible the surface is (gapping).
High Beta gives a very sharp transition, concentrating density near the surface. Low Beta resulted in a smoother transition, more diffuse density around the surface.
Low Beta is easier to train, since gradients are smoother and more stable. High Beta creates a steep step function that makes learning unstable early in training
High Beta yields sharper, more accurate surfaces, once the network has learned roughly correct geometry, since the density focuses tightly around the zero level set
VolSDF performs noticeably worse than NeRF under sparse training views, likely because surface-based representations require more stable geometry early in training, while NeRF’s volumetric model is more robust to incomplete view coverage.