1. 3D Gaussian Splatting

1.1 Rendering Pre‑trained Gaussians

We render the pre‑trained chair.ply (original 3DGS export). The panel shows RGB, coloured depth (jet), and silhouette. Camera is placed on a circle (32 views), white background.

Q1.1: RGB / Depth / Mask GIF (DC only) — GIF (DC only — view‑independent colours). Replace the src path with your generated `q1_render.gif`.

Command

python render.py --data_path data/chair.ply --out_path output --device cuda --gaussians_per_splat 2048 --img_dim 256

Notes

Views32 (azimuth & elevation sweep)

Per‑splat2048

BackgroundWhite for visualization; depth normalized to [5, 7] m for colormap

1.2 Training 3D Gaussian Representations (Truck)

We train isotropic Gaussians from an input point cloud for the toy‑truck multi‑view dataset.

Training progress — predictions vs GT — Training progress (top: predictions, bottom: ground‑truth). Frames sampled every `viz_freq` iters.

Final renders on train views — Final renders on training views after optimization.

Setup

Optimizer & Hyper‑parameters

GaussiansIsotropic (fixed quaternions)

LR — means1e-3

LR — scales5e-3

LR — colours1e-2

LR — opacities5e-2

Iterations2000

Per‑splat2048

LossL1 (masked if alpha provided)

Results (held‑out)

PSNR31.455

SSIM0.955

Commandpython Q1/train.py --device cuda --num_itrs 2000 --gaussians_per_splat 2048 --viz_freq 50

1.3.1 View‑dependent Colours with Spherical Harmonics

We enable SH‑based view‑dependent appearance (degree 3) by loading SH coefficients from chair.ply and evaluating colours per‑view. Compared against the DC‑only (0th‑order) rendering.

With SH (view‑dependent). File path placeholder: `output1.3/q1_render_with_sh.gif`.

Without SH (DC only). Same camera path for a fair comparison.

Run: python render.py --data_path data/chair.ply --out_path output1.3 --device cuda --gaussians_per_splat 2048 --img_dim 256 Ensure: model.py + data_utils.py SH blocks enabled

1.3.2 Training on a Harder Scene (Baseline)

Baseline: isotropic Gaussians with random initialization on NeRF‑Synthetic materials (128×128). No SH training; same renderer and loss as 1.2.

Harder scene training progress — Training progress GIF (baseline).

Harder scene final renders — Final renders on train views (baseline).

Settings

InitRandom points inside unit sphere, N=10k

IsotropyTrue

Iterations3000 (example run)

Per‑splat-1 (one‑shot) or 2048 for lower memory

Command

python train_harder_scene.py --data_path ./data/materials --out_path ./output_harder --device cuda --gaussians_per_splat -1 --num_itrs 3000 --viz_freq 50

Metrics (validation)

PSNR18.412

SSIM0.708

(Baseline)

2. Diffusion-guided Optimization

We optimize 3D content from text via Score Distillation Sampling (SDS), first on a triangle mesh, then on a NeRF, and finally explore extensions (view-dependent text, other 3D reps, pixel-space variants). Guidance comes from Stable Diffusion 2.1; we apply simple regularizers (alpha entropy, orientation).

SD version

2.1

Train res schedule

256 → 384 → 512

AMP

Enabled

Reg.

entropy, orient

2.1 — SDS Loss + Image Optimization

Implemented SDS in SDS.py (with & without classifier-free guidance). Below we show results for two prompts (“hamburger”, “standing corgi dog”) under both settings.

Prompt: “a hamburger”

hamburger without guidance — Without guidance

Prompt: “a standing corgi dog”

corgi without guidance — Without guidance

Notes.

Guidance mixes conditional & unconditional noise predictions → higher fidelity, fewer artifacts.
We snapshot every 100 iters.

2.2 — Texture Map Optimization for Mesh

We fix the geometry and optimize a per-vertex color field (via PyTorch3D) using SDS on randomly sampled camera/light. Two example prompts shown below (final turntables).

“black textured mesh”

360° turntable — final textured result.

“orange textured mesh”

360° turntable — final textured result.

Implementation sketch.

ColorField: (1, N_v, 3) RGB from vertex xyz; textures via TexturesVertex.
Sampling: random azimuth/elevation, optional random lights; render → SDS on the rendered RGB.
Runtime: ~13 min / prompt (≈13 GB VRAM)

2.3 — Text-to-3D with SDS (NeRF)

We optimize a NeRF from text using SDS (SD-2.1). Below we show a 360° RGB render and the corresponding depth video for three prompts.

Prompt: “a standing corgi dog”

RGB

Depth

Prompt: “a hamburger”

RGB

Depth

Prompt: “a yellow rubber duck”

RGB

Depth

2.4 — Extensions (Choose at least one! More than one is extra credit)

2.4.1 View-dependent text embedding

Prompt: “a standing corgi dog”

RGB

Depth

Prompt: “a hamburger”

RGB

Depth

Prompt: “a yellow rubber duck”

RGB

Depth

Fewer inconsistent faces, cleaner silhouettes; slightly slower convergence. Compared to the non-view-dependent runs above, the front/back ambiguity is reduced and the object appearance is more consistent across azimuths.

Training Settings (concise)

Component	Key Params
Optimizer	Adan (lr 5e-3 × base, wd 2e-5, max_grad_norm 5.0), AMP on
SDS	Guidance 50–100 (decay late), SD-2.1
Regularizers	Entropy λ=1e-2, Orientation λ=1e-3 (when available)
Rendering	Res 256→384→512; random bg / shading after latent warm-up