Learning for 3D Vision: Assignment IV

Gaussian Splatting And Diffusion

CMU 16825 Learning for 3D Vision

Tushar Nayak [tusharn]

1.1.5 Gaussian Splatting

1.2.2. Forward Pass and Loss Computation

Opacities: 0.01, Scales: 0.005 , Colors: 0.05, Means: 0.001 & Iterations: 1000 to minimize L1 loss b/w ground truth and generation

PSNR: 29.462 , SIMM: 0.913

Visualiztion of Training

1.3.2 Training On A Harder Scene

Initial- Opacities: 0.01, Colors: 0.001, Scales: 0.001, Means: 0.001, Quats: 0.003

Iterations: 1000, PSNR: 21.589, SIMM: 0.727

Optimized- Opacities: 0.01, Colors: 0.001, Scales: 0.001, Means: 0.001, Quats: 0.003

Iterations: 2000, PSNR: 25.195, SIMM: 0.734

Changes over Initial: isotropic = false, doubled total iterations

While shape edges arent properly reconstructed, the results are sharper than initial run with the reconstructed shapes being rendered better

2. Diffusion Guided Optimization

2.1 VIsualized after 2000 iterations

Prompt: a hamburger

Prompt: a standing corgi dog

Prompt: doge dog in foreground with tesla in background

Prompt: front view of a scottish terrier with bone in mouth and red scarf in neck

2.2 Texture Map Optimization

2.3 NeRF Optimization

RGB Video

Depth Video

___

RGB Video

Depth Video

___

RGB Video

Depth Video

2.4 View-Dependent Text Embedding

RGB Video

Depth Video

RGB Video

Depth Video

View-dependent text embedding augments diffusion/NeRF models by injecting camera viewpoint information into the text embedding. This integration improves multi-view synthesis by ensuring generated outputs are coherent, consistent, and realistic from different perspectives, better accounting for changes in perspective, lighting, and appearance.