Assignment 4 – 3D Gaussian Splatting & Diffusion-Guided Optimization

Q1. 3D Gaussian Splatting

1.1.1 - 1.1.5 Rendering Pre-trained Gaussians (all unit test cases passed)

Q1.1.5 Rendered Scene (View-Independent)

1.2 Training 3D Gaussian Representations (15 points)

Learning Rate Configuration

The following learning rates produced the best performance after experimenting with multiple configurations:

Parameter	Learning Rate	Description
`pre_act_opacities`	0.0007	Controls transparency; smaller value stabilizes alpha updates.
`pre_act_scales`	0.010	Determines Gaussian size; moderate learning rate for smooth shape growth.
`colours`	0.020	Controls RGB appearance; slightly higher rate accelerates color convergence.
`means`	0.0003	Updates 3D positions; small rate prevents instability in geometry.

Training Details

Number of Iterations: 1000
Optimizer: Adam
Best Performance:

Mean PSNR: 28.748 dB
Mean SSIM: 0.929

Training Outputs

Training Progress and Final Renderings for 3D Gaussian Representation

Among the tested configurations, this learning rate set produced the most stable convergence and visually accurate reconstruction. Opacity and mean updates benefited from lower learning rates to prevent flickering or instability, while higher rates for colors and scales accelerated appearance fitting.

1.3.1 Rendering with Spherical Harmonics

Q1.1.5

Q1.3.1 With Spherical Harmonics (View-Dependent)

Observations and Differences:

Without Spherical Harmonics (Q1.1.5): The chair looks the same from all angles. The colors and brightness don’t change much, so the surface looks flat and dull. The shape is correct, but the material doesn’t feel real.
With Spherical Harmonics (Q1.3.1): The color and light change slightly as the view moves, especially around the armrests and top edges. This makes the chair look more shiny and realistic, with smoother lighting and better depth.

Explanation: Spherical harmonics let the color change with the viewing direction. Without them, the color stays fixed and looks flat. With them, lighting effects like reflections and shading are captured, making the object look more natural and detailed.

1.3.2 Harder Scene (Materials Dataset)

Disclaimer: All experiments follow the baseline setup from Question 1.2.2, with isotropic Gaussians and identical training parameters unless stated otherwise.

Training Progress (Baseline)

Final Render (Baseline)

Training Details

Baseline: Isotropic Gaussians (init_type = "random")

Learning Rates:

opacities: 0.0018
scales: 0.0015
colours: 0.002
means: 0.001

Performance: PSNR = 16.949, SSIM = 0.639

Improved Approach: Anisotropic Gaussians

Learning Rates:

opacities: 0.00085
scales: 0.01
colours: 0.02
means: 0.00015

Performance: PSNR = 28.586, SSIM = 0.934

Comparison & Analysis

Setup	Gaussian Type	PSNR	SSIM
Baseline	Isotropic	16.949	0.639
Improved	Anisotropic	28.586	0.934

Explanation of Improvements

The improved setup switches from isotropic to anisotropic Gaussians, allowing each Gaussian to represent directional variation in 3D space, improving surface fidelity and material detail reconstruction. Additionally, learning rates were fine-tuned to balance colour and scale updates, preventing over-smoothing in early iterations. This led to a significant boost of ~11.6 PSNR and ~0.29 SSIM.

Harder Scene – Training and Final Renders

Q2. Diffusion-Guided Optimization

2.1 Image Optimization (SDS Loss)

Prompt: "a hamburger"

Prompt: "a house"

Prompt: "a horse"

Prompt: "a gun"

2.2 Texture Map Optimization for Mesh

Mesh Texture Optimization – “Cow covered in iridescent rainbow” and “Golden metallic cow statue with reflective surface”

2.3 NeRF Optimization

Prompt: "a standing corgi dog"

Prompt: "a potted plant"

Prompt: "a potted cactus"

2.4.1 View-Dependent Text Embeddings

Prompt: "a standing corgi dog" (view-dependent)

Prompt: "a potted plant" (view-dependent)

Additional Observation: With view-dependent text embeddings, the results look brighter and more consistent across different views. For example, in the potted plant scene, the leaves appear slightly disconnected from the pot without view-dependence, but with it, the geometry and colors stay aligned and look much more natural.