Assignment 4

3D Gaussian Splatting and Diffusion-guided Optimization

Part 1: 3D Gaussian Splatting

1.1.5 Perform Splatting

Submission: Attach the GIF obtained by running render.py
Pre-trained Gaussian Rendering

Rendering of Pre-trained 3D Gaussians

1.2 Training 3D Gaussian Representations

Submission: Learning rates, number of iterations, PSNR/SSIM, and both training GIFs
Training Configuration:
• Opacities learning rate: 0.01
• Scales learning rate: 0.005
• Colors learning rate: 0.0025
• Means learning rate: 0.00016
• Number of iterations: 1000
• Loss function: L1 Loss

Results:
• Mean PSNR: 28.52 dB
• Mean SSIM: 0.921
Training Progress

Training Progress (Top: Predicted, Bottom: GT)

Final Renderings

Final Trained Renderings

1.3.1 Rendering Using Spherical Harmonics

Submission: GIF from 1.3.1 and 1.1.5, 2-3 side-by-side comparisons with explanations
With Spherical Harmonics

WITH Spherical Harmonics (Q1.3.1)

Without Spherical Harmonics

WITHOUT Spherical Harmonics (Q1.1.5 - DC only)

Frame 1 With SH

Frame 13 - With SH

Fram 1 Without SH

Frame 13 - Without SH

Comparison 1 Analysis: In Frame 13, the version without Spherical Harmonics shows the sitting part of the chair with a uniform color, lacking any shadow variation. The top-right part appears brighter as it's exposed to light, but the transition is flat. With Spherical Harmonics, we see a clear shadow on the sitting surface, realistically showing how the chair's structure blocks the light source, creating natural depth and lighting interaction.
Frame 2 With SH

Frame 16 - With SH

Frame 2 Without SH

Frame 16 - Without SH

Comparison 2 Analysis: In Frame 16, the rendering without Spherical Harmonics continues to show uniform coloring on the sitting surface, with the top part appearing brighter but lacking realistic shadow integration. With Spherical Harmonics enabled, the sitting part clearly demonstrates shadowing effects where the chair blocks its own light source, creating a cohesive lighting story with proper brightness and shadow interaction across the surface.

1.3.2 Training On a Harder Scene

Submission: Follow Q1.2 format + baseline comparison + modification explanations

Baseline Results

Baseline Configuration (same as Q1.2):
• Isotropic Gaussians
• Random initialization
• Learning rates: position=0.00016, opacity=0.05, scaling=0.005, rotation=0.001
• Mean PSNR: 18.456
• Mean SSIM: 0.385

Improved Results

Improved Configuration:
• Learning rates: position=0.015, opacity=0.01, scaling=0.005, colours=0.02
• Number of iterations: 10,000
• Mean PSNR: 20.123
• Mean SSIM: 0.430
Harder Scene Training

Training Progress on Materials Dataset

Harder Scene Final

Final Renderings on Materials Dataset

Modifications Made to Improve Performance:
1. Increased Learning Rates - Position LR increased 94x (0.00016→0.015), colours LR increased 8x (0.0025→0.02)
2. Extended Training - 10,000 iterations instead of default 1,000
3. Loss Function Change - MSE loss instead of L1 for better high-frequency detail capture
4. Dataset Adaptation - NDC to screen camera conversion for NeRF-style datasets

Explanation: The materials dataset presents complex reflective surfaces and intricate textures that require more aggressive optimization. Higher learning rates enabled faster adaptation to complex material properties. MSE loss provided stronger gradients for capturing high-frequency details in reflective surfaces. Extended training allowed sufficient convergence time for the more challenging scene. Camera coordinate conversion ensured proper alignment with the NeRF-style dataset format. These changes collectively improved PSNR by 1.667 dB and SSIM by 0.045.

Part 2: Diffusion-guided Optimization

2.1 SDS Loss + Image Optimizatio

Submission: Show image output for 4 prompts (2 provided + 2 your choice), with and without guidance, indicating iterations

Prompt 1: "a hamburger"

Hamburger No Guidance

Without Guidance (700 iterations)

Hamburger With Guidance

With Guidance (1100 iterations)

Prompt 2: "a standing corgi dog"

Corgi No Guidance

Without Guidance (600 iterations)

Corgi With Guidance

With Guidance (400 iterations)

Prompt 3: "A stranger things poster"

Custom 1 No Guidance

Without Guidance ( 1400 iterations)

Custom 1 With Guidance

With Guidance (800 iterations)

Prompt 4: "A view of Marina, Lagos"

Custom 2 No Guidance

Without Guidance (700] iterations)

Custom 2 With Guidance

With Guidance ([160] iterations)

2.2 Texture Map Optimization for Mesh

Submission: Show GIF of final textured mesh for 2 different text prompts
Mesh Texture 1

Prompt: "A deep forest green cow"

Mesh Texture 2

Prompt: "A zebra-striped cow"

2.3 NeRF Optimization

Submission: Show video of RGB and depth for 3 prompts (1 provided + 2 your choice)
Hyperparameters Used:
• lambda_entropy: [YOUR VALUE]
• lambda_orient: [YOUR VALUE]
• latent_iter_ratio: [YOUR VALUE]

Prompt 1: "a standing corgi dog"

RGB Rendering

Depth Map

Prompt 2: "A slice of watermelon"

RGB Rendering

Depth Map

Prompt 3: "A country-styled house"

RGB Rendering

Depth Map

2.4.1 View-dependent Text Embedding

Submission: Show video of RGB and depth for 2 prompts, compare with Q2.3, analyze effects

Prompt 1: "a standing corgi dog"

RGB Rendering

Depth Map

Prompt 2: "A country-styled house"

RGB Rendering

Depth Map

Comparison with Q2.3:
The comparison with Q2.3 reveals significant improvements from view-dependent text conditioning. Without view dependence, the results appear blurry and lack coherent 3D structure, whereas view-dependent conditioning produces markedly sharper and more geometrically consistent outputs. For the house prompt, the door becomes clearly defined and properly oriented across views, while for the corgi dog, the tail exhibits stronger three-dimensional presence and maintains consistent visibility from appropriate angles. This enhancement demonstrates that view-dependent text conditioning effectively guides the optimization to respect 3D view consistency, resulting in more plausible and well-structured geometry compared to the ambiguous forms generated without this conditioning.