16-825 Assignment 4

by ylchen

1 — 3D Gaussian Splatting

1.1 Rasterization & Rendering

The following GIF shows the rendered views of the pre-trained 3D Gaussians (output of render.py).

Q1 render output
q1_render.gif

1.2 Training 3D Gaussian Representations

Training progress and final render GIFs produced by train.py are below.

training progress
q1_training_progress.gif
final renders
q1_training_final_renders.gif

1.3 Extensions — Spherical Harmonics & Harder Scenes

1.3.1 Rendering with Spherical Harmonics

Comparison between view-independent (DC only) and full spherical harmonics renderings.

Without SH
Without Spherical Harmonics
With SH
With Spherical Harmonics

Explanation of differences: SH adds view-dependent lighting, the cushion has additional shadows that are indicative of its depth/shape. The non-SH render is more flat (due to it being lighting-direction agnostic w/ no shading variation.)

2 — Diffusion-Guided Optimization (SDS Loss)

2.1 Image Optimization with SDS Loss

Each prompt is trained with and without guidance. Below are the prompt-image pairs for four examples (including two given prompts).

a 24k labubu no guidance
“a 24k labubu” - no guidance
a 24k labubu with guidance
“a 24k labubu” - with guidance
mountains no guidance
“mountains” - no guidance
mountains with guidance
“mountains” - with guidance
hamburger no guidance
“a hamburger” - no guidance
hamburger with guidance
“a hamburger” - with guidance
corgi no guidance
“a standing corgi dog” - no guidance
corgi with guidance
“a standing corgi dog” - with guidance

2.2 Texture Map Optimization for Mesh

Final textured meshes for two different text prompts using SDS loss on a fixed cow mesh.

mesh texture 1
"a dotted black and white cow" GIF
mesh texture 2
"a pink strawberry pig" GIF

2.3 NeRF Optimization

Rendered RGB and depth videos for three prompts (one “standing corgi dog” and two custom ones). Tuned λ_entropy=3e-3, λ_orient=1e-2, and latent_iter_ratio=0.25.

NeRF RGB
"a 3d render of Frankenstein monster head" RGB
NeRF depth
"a 3d render of Frankenstein monster head" Depth
NeRF RGB
"a standing corgi dog" RGB
NeRF depth
"a standing corgi dog" depth
NeRF RGB
"monet style water lily" RGB
NeRF depth
"monet style water lily" depth

2.4 Extensions

2.4.1 View-Dependent Text Embedding

Comparison between standard and view-dependent conditioning for two prompts. View-dependent embeddings improve 3D consistency.

NeRF RGB
"monet style water lily" RGB not view-dependent
NeRF depth
"monet style water lily" depth not view-dependent
NeRF RGB
"monet style water lily" RGB view-dependent
NeRF depth
"monet style water lily" depth view-dependent
NeRF RGB
"a standing corgi dog" RGB not view-dependent
NeRF depth
"a standing corgi dog" depth not view-dependent
NeRF RGB
"a standing corgi dog" RGB view-dependent
NeRF depth
"a standing corgi dog" depth view-dependent

some individual view-dependent examples.

NeRF RGB
"a 3D render of a scottish terrier" RGB view-dependent
NeRF depth
"a 3D render of a scottish terrier" depth view-dependent