Learning for 3D Vision: Assignment IV


Gaussian Splatting And Diffusion

CMU 16825 Learning for 3D Vision

Tushar Nayak [tusharn]

1.1.5 Gaussian Splatting

resp
render of gaussian splatting

1.2.2. Forward Pass and Loss Computation

Opacities: 0.01, Scales: 0.005 , Colors: 0.05, Means: 0.001 & Iterations: 1000 to minimize L1 loss b/w ground truth and generation

PSNR: 29.462 , SIMM: 0.913

resp
Final Resultant Render

Visualiztion of Training

resp
Training Progress Visualization

1.3.2 Training On A Harder Scene

Initial- Opacities: 0.01, Colors: 0.001, Scales: 0.001, Means: 0.001, Quats: 0.003

Iterations: 1000, PSNR: 21.589, SIMM: 0.727

resp
initial Training Progress
resp
initial final render

Optimized- Opacities: 0.01, Colors: 0.001, Scales: 0.001, Means: 0.001, Quats: 0.003

Iterations: 2000, PSNR: 25.195, SIMM: 0.734

Changes over Initial: isotropic = false, doubled total iterations

resp
initial Training Progress
resp
initial final render

While shape edges arent properly reconstructed, the results are sharper than initial run with the reconstructed shapes being rendered better

2. Diffusion Guided Optimization

2.1 VIsualized after 2000 iterations

Prompt: a hamburger

resp
Hamburger, Without Guidance
resp
Hamburger, With Guidance

Prompt: a standing corgi dog

resp
Corgi, Without Guidance
resp
Corgi, With Guidance

Prompt: doge dog in foreground with tesla  in background

resp
Doge!, Without Guidance
resp
Doge!, With Guidance

Prompt: front view of a scottish terrier with bone in mouth and red scarf in neck

resp
Scotty!, Without Guidance
resp
Scotty!, With Guidance

2.2 Texture Map Optimization

resp
1. initial Mesh
resp
Prompt: red and black striped cow
resp
Prompt: white and red cow

2.3 NeRF Optimization

resp
Prompt: a standing corgi dog

RGB Video

Depth Video

___

resp
Prompt: a hamburger

RGB Video

Depth Video

___

resp
Prompt: a rose flower

RGB Video

Depth Video

2.4 View-Dependent Text Embedding

resp
Prompt: a standing corgi dog

RGB Video

Depth Video

resp
Prompt: a standing corgi dog

RGB Video

Depth Video

View-dependent text embedding augments diffusion/NeRF models by injecting camera viewpoint information into the text embedding. This integration improves multi-view synthesis by ensuring generated outputs are coherent, consistent, and realistic from different perspectives, better accounting for changes in perspective, lighting, and appearance.