16-825 Assignment 4: 3D Gaussian Splatting & Diffusion-Guided Optimization

Q1. 3D Gaussian Splatting (50 pts)

1.1 3D Gaussian Rasterization (35 pts)

Description of GIF
Baseline rasterization output

1.2 Training Custom Gaussians

Description of GIF
Training Progress
Description of GIF
Final renders

Training Parameters & Results

Learning Rate (Means) Learning Rate (Covariances) Learning Rate (Opacity) Learning Rate (Color) Iterations PSNR SSIM
0.01 0.01 0.01 0.01 100 28.470 0.927

1.3 Extensions

1.3.1 Rendering Using Spherical Harmonics

Without SH
Without SH
With SH degree 3
With SH degree 3
Left image Right image
Left image Right image
Left image Right image

Caption: qualitative comparison

SH is helping model view-dependent color effects in the 3D Gaussian Splatting representation

1.3.2 Training On a Harder Scene

Default training
Default (1000 iterations)
Longer training
Training longer (10000 iterations)
With SSIM loss
Using additional SSIM loss (1000 iterations)
Default Configuration
Mean PSNRMean SSIMIterationsGaussians per splat
17.2020.6621000-1
Using SSIM Loss (1000 iterations)
Mean PSNRMean SSIMIterationsGaussians per splat
18.9290.8091000-1
Longer Training (10000 iterations)
Mean PSNRMean SSIMIterationsGaussians per splat
19.8010.73810000-1

Observation: A combination of SSIM loss + longer training can yield better results.

Q2. Diffusion-Guided Optimization

2.1 SDS Loss + Image Optimization

Comparison of SDS results with and without classifier-free guidance

No guidance corgi
No Guidance
With guidance corgi
With Guidance

Prompt: "a standing corgi dog"

No guidance Formula 1
No Guidance
With guidance Formula 1
With Guidance

Prompt: "a Formula 1 car"

No guidance hamburger
No Guidance
With guidance hamburger
With Guidance

Prompt: "a hamburger"

No guidance batman
No Guidance
With guidance batman
With Guidance

Prompt: "a lego toy of batman"

2.2 Texture Map Optimization for Mesh

Prompt: A cow made of cotton candy
Prompt: A cow made of grass
Prompt: A cow made of titanium

2.3 NeRF Optimization

Note: Used default parameters.

Rendered depth and RGB results for different prompts (depth map on left, RGB on right):

Prompt: "a standing corgi dog"
Prompt: "a purple cat"
Prompt: "a cactus plant"
Prompt: "a pumpkin"
Prompt: "a wooden apple"
Prompt: "a panda smiling"

2.4 Extensions

2.4.1 View-dependent Text Embedding

Rendered depth and RGB results using view-dependent text embeddings (depth map on left, RGB on right):

Note: Used default parameters.

Incorporating view-dependent text embeddings leads to noticeably more stable and coherent 3D reconstructions. Objects exhibit fewer inconsistent faces and smoother silhouettes, as the model learns to associate each viewing direction (front, side, back) with more appropriate visual cues.
Prompt: "a cactus plant"
Prompt: "a wooden apple"

2.4.3 Variation of implementation of SDS loss

I implemented the SDS loss directly in pixel space, using L2 + LPIPS loss for sharper results, slightly longer training but slightly improved image quality.

Prompt: "a wooden apple"

Latent-space SDS
Pixel-space SDS (L2 + LPIPS)