Learning for 3D Vision: Assignment 4


1. 3D Gaussian Splatting

1.1. Fitting a Voxel Grid

1.1.2 Fitting a Voxel Grid

All Unit tests pass: 4/4.

1.1.5. Perform Splatting

Splatting Output
Still
Output

1.2. Training 3D Gaussian Representations

1.2.2. Perform Forward Pass and Compute Loss

Final Render and Training Progress
Final Render Training Progress
Learning Rates Tried
Opacities Scales Colours Means Comments Render Progress
0.00005 0.0001 0.00001 0.000001 Model Underfitted
0.001 0.03 0.05 0.0005 Model Overfitted
0.0001 0.001 0.001 0.00002 Good representation
Number of Iterations

The best performing render was trained over 2000 iterations.

PSNR and SSIM

PSNR: 23.445
SSIM: 0.860

1.3. Extensions

1.3.1. Rendering Using Spherical Harmonics

Final Render (with View Dependence) Final Render (without View Dependence)
Frame No. View Dependent Splat View Independent Splat Difference Explanation
3 This is because across the views, it appears that the lights are placed to the top left and bottom right facing the chair. In the view dependent rendering, we can see the shadows accordingly.
13 This is because across the views, it appears that the lights are placed to the top left and bottom right facing the chair. In the view dependent rendering, we can see the shadows accordingly.
31 This is because across the views, it appears that the lights are placed to the top left and bottom right facing the chair. In the view dependent rendering, we can see the shadows accordingly.

2. Diffusion-guided Optimization

2.1. SDS Loss + Image Optimization

Prompt Iterations Guidance = 0 Guidance = 1
A Hamburger 1600
A Standing Corgi Dog 1900
A Mansion 1600
A DSLR 1700

2.2. Texture Map Optimization for Mesh

Final Renders: Textured Cow Mesh

2.3. NeRF Optimization

Prompt RGB Render (Video) Depth Map (Video)
A Standing Corgi Dog
A Squirrel in a Cello
A Baby Lion

2.4. Extensions

2.4.1. Extensions (View Dependent Conditioning)

Prompt RGB (View-Dependent) Depth (View-Dependent) Original RGB (2.3)
A Standing Corgi Dog
A Squirrel in a Cello
A Cute Panda

As is visible in the above comparisons, the view dependent conditioning is able to ensure a consistent output across views, avoiding the problem where multiple front facing views appear in the model. It does however take longer to train, and thus in a 100 iterations is not able to come up with a detailed rendering of the prompts. The above renderings do show that the addition of view dependent embeddings helps in ensuring consistency.