Learning for 3D Vision: Assignment IV
Gaussian Splatting And Diffusion
CMU 16825 Learning for 3D Vision
Tushar Nayak [tusharn]
1.1.5 Gaussian Splatting
1.2.2. Forward Pass and Loss Computation
Opacities: 0.01, Scales: 0.005 , Colors: 0.05, Means: 0.001 & Iterations: 1000 to minimize L1 loss b/w ground truth and generation
PSNR: 29.462 , SIMM: 0.913
Visualiztion of Training
1.3.2 Training On A Harder Scene
Initial- Opacities: 0.01, Colors: 0.001, Scales: 0.001, Means: 0.001, Quats: 0.003
Iterations: 1000, PSNR: 21.589, SIMM: 0.727
Optimized- Opacities: 0.01, Colors: 0.001, Scales: 0.001, Means: 0.001, Quats: 0.003
Iterations: 2000, PSNR: 25.195, SIMM: 0.734
Changes over Initial: isotropic = false, doubled total iterations
While shape edges arent properly reconstructed, the results are sharper than initial run with the reconstructed shapes being rendered better
2. Diffusion Guided Optimization
2.1 VIsualized after 2000 iterations
Prompt: a hamburger
Prompt: a standing corgi dog
Prompt: doge dog in foreground with tesla in background
Prompt: front view of a scottish terrier with bone in mouth and red scarf in neck
2.2 Texture Map Optimization
2.3 NeRF Optimization
2.4 View-Dependent Text Embedding
View-dependent text embedding augments diffusion/NeRF models by injecting camera viewpoint information into the text embedding. This integration improves multi-view synthesis by ensuring generated outputs are coherent, consistent, and realistic from different perspectives, better accounting for changes in perspective, lighting, and appearance.