16-825 Assignment 4: 3D Gaussian Splatting & Diffusion-Guided Optimization

Q1. 3D Gaussian Splatting (50 pts)

1.1 3D Gaussian Rasterization (35 pts)

Description of GIF — Baseline rasterization output

1.2 Training Custom Gaussians

Description of GIF — Training Progress

Description of GIF — Final renders

Training Parameters & Results

Learning Rate (Means)	Learning Rate (Covariances)	Learning Rate (Opacity)	Learning Rate (Color)	Iterations	PSNR	SSIM
0.01	0.01	0.01	0.01	100	28.470	0.927

1.3 Extensions

1.3.1 Rendering Using Spherical Harmonics

Without SH

With SH degree 3

Left image

Right image

Left image

Right image

Left image

Right image

Caption: qualitative comparison

SH is helping model view-dependent color effects in the 3D Gaussian Splatting representation

1.3.2 Training On a Harder Scene

Default training — Default (1000 iterations)

Longer training — Training longer (10000 iterations)

With SSIM loss — Using additional SSIM loss (1000 iterations)

Default Configuration

Mean PSNR	Mean SSIM	Iterations	Gaussians per splat
17.202	0.662	1000	-1

Using SSIM Loss (1000 iterations)

Mean PSNR	Mean SSIM	Iterations	Gaussians per splat
18.929	0.809	1000	-1

Longer Training (10000 iterations)

Mean PSNR	Mean SSIM	Iterations	Gaussians per splat
19.801	0.738	10000	-1

Observation: A combination of SSIM loss + longer training can yield better results.

Q2. Diffusion-Guided Optimization

2.1 SDS Loss + Image Optimization

Comparison of SDS results with and without classifier-free guidance

No guidance corgi — No Guidance

With guidance corgi — With Guidance

Prompt: "a standing corgi dog"

No guidance Formula 1 — No Guidance

With guidance Formula 1 — With Guidance

Prompt: "a Formula 1 car"

No guidance hamburger — No Guidance

With guidance hamburger — With Guidance

Prompt: "a hamburger"

No guidance batman — No Guidance

With guidance batman — With Guidance

Prompt: "a lego toy of batman"

2.2 Texture Map Optimization for Mesh

Prompt: A cow made of cotton candy

Prompt: A cow made of grass

Prompt: A cow made of titanium

2.3 NeRF Optimization

Note: Used default parameters.

Rendered depth and RGB results for different prompts (depth map on left, RGB on right):

Prompt: "a standing corgi dog"

Prompt: "a purple cat"

Prompt: "a cactus plant"

Prompt: "a pumpkin"

Prompt: "a wooden apple"

Prompt: "a panda smiling"

2.4 Extensions

2.4.1 View-dependent Text Embedding

Rendered depth and RGB results using view-dependent text embeddings (depth map on left, RGB on right):

Note: Used default parameters.

Incorporating view-dependent text embeddings leads to noticeably more stable and coherent 3D reconstructions. Objects exhibit fewer inconsistent faces and smoother silhouettes, as the model learns to associate each viewing direction (front, side, back) with more appropriate visual cues.

Prompt: "a cactus plant"

Prompt: "a wooden apple"

2.4.3 Variation of implementation of SDS loss

I implemented the SDS loss directly in pixel space, using L2 + LPIPS loss for sharper results, slightly longer training but slightly improved image quality.

Prompt: "a wooden apple"

Latent-space SDS

Pixel-space SDS (L2 + LPIPS)