Learning for 3D Vision: Assignment 4

Name: Ahish Deshpande
Andrew ID: ahishd

1. 3D Gaussian Splatting

1.1. Fitting a Voxel Grid

1.1.2 Fitting a Voxel Grid

All Unit tests pass: 4/4.

1.1.5. Perform Splatting

Splatting Output
Still

Output

1.2. Training 3D Gaussian Representations

1.2.2. Perform Forward Pass and Compute Loss

Final Render and Training Progress
Final Render	Training Progress

Learning Rates Tried

Opacities	Scales	Colours	Means	Comments
0.00005	0.0001	0.00001	0.000001	Model Underfitted
0.001	0.03	0.05	0.0005	Model Overfitted
0.0001	0.001	0.001	0.00002	Good representation

Number of Iterations

The best performing render was trained over 2000 iterations.

PSNR and SSIM

PSNR: 23.445
SSIM: 0.860

1.3. Extensions

1.3.1. Rendering Using Spherical Harmonics

Final Render (with View Dependence)							Final Render (without View Dependence)

Frame No.	View Dependent Splat						View Independent Splat						Difference Explanation
3													This is because across the views, it appears that the lights are placed to the top left and bottom right facing the chair. In the view dependent rendering, we can see the shadows accordingly.
13													This is because across the views, it appears that the lights are placed to the top left and bottom right facing the chair. In the view dependent rendering, we can see the shadows accordingly.
31													This is because across the views, it appears that the lights are placed to the top left and bottom right facing the chair. In the view dependent rendering, we can see the shadows accordingly.

2. Diffusion-guided Optimization

2.1. SDS Loss + Image Optimization

Prompt	Iterations	Guidance = 0	Guidance = 1
A Hamburger	1600
A Standing Corgi Dog	1900
A Mansion	1600
A DSLR	1700

2.2. Texture Map Optimization for Mesh

Final Renders: Textured Cow Mesh

2.3. NeRF Optimization

Prompt	RGB Render (Video)	Depth Map (Video)
A Standing Corgi Dog
A Squirrel in a Cello
A Baby Lion

2.4. Extensions

2.4.1. Extensions (View Dependent Conditioning)

Prompt	RGB (View-Dependent)	Depth (View-Dependent)	Original RGB (2.3)
A Standing Corgi Dog
A Squirrel in a Cello
A Cute Panda

As is visible in the above comparisons, the view dependent conditioning is able to ensure a consistent output across views, avoiding the problem where multiple front facing views appear in the model. It does however take longer to train, and thus in a 100 iterations is not able to come up with a detailed rendering of the prompts. The above renderings do show that the addition of view dependent embeddings helps in ensuring consistency.