Assignment 4
3D Gaussian Splatting and Diffusion-guided Optimization
Part 1: 3D Gaussian Splatting
1.1.5 Perform Splatting
render.py
Rendering of Pre-trained 3D Gaussians
1.2 Training 3D Gaussian Representations
• Opacities learning rate: 0.01
• Scales learning rate: 0.005
• Colors learning rate: 0.0025
• Means learning rate: 0.00016
• Number of iterations: 1000
• Loss function: L1 Loss
Results:
• Mean PSNR: 28.52 dB
• Mean SSIM: 0.921
Training Progress (Top: Predicted, Bottom: GT)
Final Trained Renderings
1.3.1 Rendering Using Spherical Harmonics
WITH Spherical Harmonics (Q1.3.1)
WITHOUT Spherical Harmonics (Q1.1.5 - DC only)
Frame 13 - With SH
Frame 13 - Without SH
Frame 16 - With SH
Frame 16 - Without SH
1.3.2 Training On a Harder Scene
Baseline Results
• Isotropic Gaussians
• Random initialization
• Learning rates: position=0.00016, opacity=0.05, scaling=0.005, rotation=0.001
• Mean PSNR: 18.456
• Mean SSIM: 0.385
Improved Results
• Learning rates: position=0.015, opacity=0.01, scaling=0.005, colours=0.02
• Number of iterations: 10,000
• Mean PSNR: 20.123
• Mean SSIM: 0.430
Training Progress on Materials Dataset
Final Renderings on Materials Dataset
1. Increased Learning Rates - Position LR increased 94x (0.00016→0.015), colours LR increased 8x (0.0025→0.02)
2. Extended Training - 10,000 iterations instead of default 1,000
3. Loss Function Change - MSE loss instead of L1 for better high-frequency detail capture
4. Dataset Adaptation - NDC to screen camera conversion for NeRF-style datasets
Explanation: The materials dataset presents complex reflective surfaces and intricate textures that require more aggressive optimization. Higher learning rates enabled faster adaptation to complex material properties. MSE loss provided stronger gradients for capturing high-frequency details in reflective surfaces. Extended training allowed sufficient convergence time for the more challenging scene. Camera coordinate conversion ensured proper alignment with the NeRF-style dataset format. These changes collectively improved PSNR by 1.667 dB and SSIM by 0.045.
Part 2: Diffusion-guided Optimization
2.1 SDS Loss + Image Optimizatio
Prompt 1: "a hamburger"
Without Guidance (700 iterations)
With Guidance (1100 iterations)
Prompt 2: "a standing corgi dog"
Without Guidance (600 iterations)
With Guidance (400 iterations)
Prompt 3: "A stranger things poster"
Without Guidance ( 1400 iterations)
With Guidance (800 iterations)
Prompt 4: "A view of Marina, Lagos"
Without Guidance (700] iterations)
With Guidance ([160] iterations)
2.2 Texture Map Optimization for Mesh
Prompt: "A deep forest green cow"
Prompt: "A zebra-striped cow"
2.3 NeRF Optimization
• lambda_entropy: [YOUR VALUE]
• lambda_orient: [YOUR VALUE]
• latent_iter_ratio: [YOUR VALUE]
Prompt 1: "a standing corgi dog"
RGB Rendering
Depth Map
Prompt 2: "A slice of watermelon"
RGB Rendering
Depth Map
Prompt 3: "A country-styled house"
RGB Rendering
Depth Map
2.4.1 View-dependent Text Embedding
Prompt 1: "a standing corgi dog"
RGB Rendering
Depth Map
Prompt 2: "A country-styled house"
RGB Rendering
Depth Map
The comparison with Q2.3 reveals significant improvements from view-dependent text conditioning. Without view dependence, the results appear blurry and lack coherent 3D structure, whereas view-dependent conditioning produces markedly sharper and more geometrically consistent outputs. For the house prompt, the door becomes clearly defined and properly oriented across views, while for the corgi dog, the tail exhibits stronger three-dimensional presence and maintains consistent visibility from appropriate angles. This enhancement demonstrates that view-dependent text conditioning effectively guides the optimization to respect 3D view consistency, resulting in more plausible and well-structured geometry compared to the ambiguous forms generated without this conditioning.