Assignment 4 — 3D Gaussian Splatting

CMU 16‑825 • Vaishnavi Khindkar (vkhindka@andrew.cmu.edu)

1. 3D Gaussian Splatting

1.1 Rendering Pre‑trained Gaussians

We render the pre‑trained chair.ply (original 3DGS export). The panel shows RGB, coloured depth (jet), and silhouette. Camera is placed on a circle (32 views), white background.

Q1.1: RGB / Depth / Mask GIF (DC only)
GIF (DC only — view‑independent colours). Replace the src path with your generated q1_render.gif.

Command

python render.py --data_path data/chair.ply --out_path output --device cuda --gaussians_per_splat 2048 --img_dim 256

Notes

Views32 (azimuth & elevation sweep)
Per‑splat2048
BackgroundWhite for visualization; depth normalized to [5, 7] m for colormap

1.2 Training 3D Gaussian Representations (Truck)

We train isotropic Gaussians from an input point cloud for the toy‑truck multi‑view dataset.

Training progress — predictions vs GT
Training progress (top: predictions, bottom: ground‑truth). Frames sampled every viz_freq iters.
Final renders on train views
Final renders on training views after optimization.

Setup

Optimizer & Hyper‑parameters

GaussiansIsotropic (fixed quaternions)
LR — means1e-3
LR — scales5e-3
LR — colours1e-2
LR — opacities5e-2
Iterations2000
Per‑splat2048
LossL1 (masked if alpha provided)

Results (held‑out)

PSNR31.455
SSIM0.955
Commandpython Q1/train.py --device cuda --num_itrs 2000 --gaussians_per_splat 2048 --viz_freq 50

1.3.1 View‑dependent Colours with Spherical Harmonics

We enable SH‑based view‑dependent appearance (degree 3) by loading SH coefficients from chair.ply and evaluating colours per‑view. Compared against the DC‑only (0th‑order) rendering.

With SH
With SH (view‑dependent). File path placeholder: output1.3/q1_render_with_sh.gif.
  • SH captures subtle view‑dependent highlights and fabric sheen; DC looks flatter.
  • Edges appear crisper with SH due to better colour fitting under changing view.
  • Small speculars on metallic trim are better reproduced with SH.
Without SH (DC only)
Without SH (DC only). Same camera path for a fair comparison.
Run: python render.py --data_path data/chair.ply --out_path output1.3 --device cuda --gaussians_per_splat 2048 --img_dim 256 Ensure: model.py + data_utils.py SH blocks enabled

1.3.2 Training on a Harder Scene (Baseline)

Baseline: isotropic Gaussians with random initialization on NeRF‑Synthetic materials (128×128). No SH training; same renderer and loss as 1.2.

Harder scene training progress
Training progress GIF (baseline).
Harder scene final renders
Final renders on train views (baseline).

Settings

InitRandom points inside unit sphere, N=10k
IsotropyTrue
Iterations3000 (example run)
Per‑splat-1 (one‑shot) or 2048 for lower memory
Commandpython train_harder_scene.py --data_path ./data/materials --out_path ./output_harder --device cuda --gaussians_per_splat -1 --num_itrs 3000 --viz_freq 50

Metrics (validation)

PSNR18.412
SSIM0.708

(Baseline)

2. Diffusion-guided Optimization

We optimize 3D content from text via Score Distillation Sampling (SDS), first on a triangle mesh, then on a NeRF, and finally explore extensions (view-dependent text, other 3D reps, pixel-space variants). Guidance comes from Stable Diffusion 2.1; we apply simple regularizers (alpha entropy, orientation).

SD version
2.1
Train res schedule
256 → 384 → 512
AMP
Enabled
Reg.
entropy, orient

2.1 — SDS Loss + Image Optimization

Implemented SDS in SDS.py (with & without classifier-free guidance). Below we show results for two prompts (“hamburger”, “standing corgi dog”) under both settings.

Prompt: “a hamburger”

hamburger without guidance
Without guidance
hamburger with guidance
With guidance

Prompt: “a standing corgi dog”

corgi without guidance
Without guidance
corgi with guidance
With guidance
Notes.
  • Guidance mixes conditional & unconditional noise predictions → higher fidelity, fewer artifacts.
  • We snapshot every 100 iters.

2.2 — Texture Map Optimization for Mesh

We fix the geometry and optimize a per-vertex color field (via PyTorch3D) using SDS on randomly sampled camera/light. Two example prompts shown below (final turntables).

“black textured mesh”

With SH
360° turntable — final textured result.

“orange textured mesh”

With SH
360° turntable — final textured result.
Implementation sketch.
  • ColorField: (1, Nv, 3) RGB from vertex xyz; textures via TexturesVertex.
  • Sampling: random azimuth/elevation, optional random lights; render → SDS on the rendered RGB.
  • Runtime: ~13 min / prompt (≈13 GB VRAM)

2.3 — Text-to-3D with SDS (NeRF)

We optimize a NeRF from text using SDS (SD-2.1). Below we show a 360° RGB render and the corresponding depth video for three prompts.

Prompt: “a standing corgi dog”

RGB
Depth

Prompt: “a hamburger”

RGB
Depth

Prompt: “a yellow rubber duck”

RGB
Depth

2.4 — Extensions (Choose at least one! More than one is extra credit)

2.4.1 View-dependent text embedding

Prompt: “a standing corgi dog”

RGB
Depth

Prompt: “a hamburger”

RGB
Depth

Prompt: “a yellow rubber duck”

RGB
Depth
  • Fewer inconsistent faces, cleaner silhouettes; slightly slower convergence. Compared to the non-view-dependent runs above, the front/back ambiguity is reduced and the object appearance is more consistent across azimuths.

Training Settings (concise)

ComponentKey Params
Optimizer Adan (lr 5e-3 × base, wd 2e-5, max_grad_norm 5.0), AMP on
SDS Guidance 50–100 (decay late), SD-2.1
Regularizers Entropy λ=1e-2, Orientation λ=1e-3 (when available)
Rendering Res 256→384→512; random bg / shading after latent warm-up