16-825 Assignment 4

Assignment 4
3D Gaussian Splatting & Diffusion-Guided Optimization

By Lamia AlSalloom (lalsallo)

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 pts)

1.1.2 Evaluate 2D Gaussians

Running the Unit Test: !python unit_test_gaussians.py

Unit Test output: [4/4] Tests Passed

1.1.5 Splatting

1.2 Training 3D Gaussian Representations (15 pts)

Hyperparameters & Metrics

Parameter	Value
Learning rates	means = 1e-4, opacity = 1e-3, colors = 1e-3, cov = 1e-3 (isotropic scale)
Iterations	1000
PSNR / SSIM	27.115 / 0.910
Optimizer	Adam (param groups as above, eps = 1e-15)

Training Progress

Final Renders

1.3 Extensions

1.3.1 Spherical Harmonics (10 pts)

Enabled SH in model.py / data_utils.py ([Q 1.3.1] tags). Implemented colours_from_spherical_harmonics.

With SH — GIF

Without SH — GIF

Side by Side Comparisons

Top Row: Without SH (Blue Border) | Bottom Row: With SH (Purple Border)

View 004 — In the without-SH version, lighting appears uniform and slightly washed out, producing a flat, matte look. The with-SH render shows subtle shading variations across the seat and handles, giving a better sense of curvature and depth. Highlights shift slightly with view direction, making the metallic edges appear more realistic.

View 013 — The DC-only image shows consistent brightness across all surfaces, while the SH-based version produces stronger contrast between lit and shadowed regions. Notice how the inner seat area darkens naturally and the armrest picks up a faint reflective tint, showing view dependent lighting. Minor shimmering can be observed at the edges, a common SH artifact from higher-order terms.

View 031 — In the without-SH render, the shadowing on the seat appears overly dark and static — it looks baked into the texture rather than responding to light direction, giving the surface a flat, painted look, notice the sharp fixed shadow edge. While the with-SH version distributes illumination more naturally: the top surface catches light while the lower regions remain softly shaded, creating a smoother gradient and a more physically plausible light falloff.

1.3.2 Harder Scene (10 pts)

Training on NeRF-Synthetic materials dataset with random init (init_type="random"). Isotropic Gaussians; L1 image loss; Adam optimizer with per-parameter learning rates.

Training Configuration

Parameter	Value
Learning rate	`opacity lr=2e-3`, `scales lr=2e-3`, `colours lr=2e-3`, `means lr=1e-3` Adam (eps=1e-15, no LR schedule)
gaussians_per_splat	`-1` (auto)
Loss	L1

Baseline — 2000 iters

Gaussians: Isotropic
Init: Random sphere
Iters: 2000
PSNR / SSIM: 17.323 / 0.592
Notes: Early blur and floaters due to random init; structure starts forming but textures remain soft.

Improved — 4000 iters

Gaussians: Isotropic
Init: Random sphere
Iters: 4000
PSNR / SSIM: 18.311 / 0.661
Changes vs. baseline: Longer training (2k → 4k iters).
Notes: Sharper object boundaries and fewer artifacts; materials begin to separate across the scene.

We also trained with anisotropic Gaussians (isotropic=False) so each splat can learn a full 3D covariance (orientation + axis-aligned scales). Quaternions were enabled for rotation and optimized alongside other parameters.

Setting	Value
Geometry	Anisotropic Gaussians (`isotropic=False`)
Loss / Optimizer	L1 image loss; Adam (`eps=1e-15`)
Learning rates	`opacity lr=2e-3`, `scales lr=2e-3`, `colours lr=2e-3`, `means lr=1e-3`
gaussians_per_splat	`-1` (auto)

Anisotropic — 2000 iters

Parameter	Value / Notes
Iters	2000
PSNR / SSIM	`17.612 / 0.660`
Notes	Training converges noticeably faster than in the isotropic baseline. The shapes emerge earlier in the optimization, with reduced blur and clearer geometric boundaries. Although some edges remain imperfect, the overall structure forms more quickly and material details become sharper.

2. Diffusion-Guided Optimization

2.1 SDS Loss + Image Optimization (20 pts)

Each row shows the same text prompt — left: No Guidance, right: With Guidance.

"a hamburger" — No Guidance (2000 iters)

"a hamburger" — With Guidance (2000 iters)

"a standing corgi dog" — No Guidance (2000 iters)

"a standing corgi dog" — With Guidance (2000 iters)

"a red sports car" — No Guidance (2000 iters)

"a red sports car" — With Guidance (2000 iters)

"a cozy mountain cabin" — No Guidance (2000 iters)

a cat wearing sunglasses — With Guidance

"a cozy mountain cabin" — With Guidance (2000 iters)

2.2 Texture Map Optimization for Mesh (15 pts)

We Optimized ColorField over cow mesh using SDS. Randomized cameras per iteration.

"a dotted black and white cow"

"an orange golden bull"

"a fall tree"

2.3 NeRF Optimization (15 pts)

Tuned lambda_entropy, lambda_orient, and latent_iter_ratio for better geometry; switched shading modes as warmup then randomized.

"a hamburger" — RGB (iters 10000)

"a hamburger" — Depth (iters 10000)

"a rose" — RGB (iters 3000)

"a rose" — Depth (iters 3000)

"a rose" — RGB (iters 10000)

"a rose" — Depth (iters 10000)

"a standing corgi dog" — RGB (iters 10000)

"a standing corgi dog" — Depth (iters 10000)

Best hyperparameters:
λ_entropy = 1×10⁻⁴, λ_orient = 1×10⁻², latent_iter_ratio = 0.2

Prompts and training details:

Prompt	Iterations	λ_entropy	λ_orient	latent_iter_ratio
"a standing corgi dog"	10,000	1×10⁻⁴	1×10⁻²	0.2
"a rose"	3,000	1×10⁻³	1×10⁻²	0.2
"a hamburger"	10,000	1×10⁻⁴	1×10⁻²	default (0)

Notes:
The Corgi run used latent_iter_ratio = 0.2 to warm up with normal shading before alternating between lambertian and textureless, yielding cleaner geometry and recognizable body contours. The rose required a slightly higher λ_entropy for color regularization to prevent over saturation, while the hamburger, trained with default shading, produced the correct global shape but exhibited less fine geometric detail.

2.4 Extensions

2.4.1 View-Dependent Text Embedding (10 pts)

Enabled view-dependent conditioning (--view_dep_text 1) to improve 3D consistency and lighting coherence across rendered views. We reused the best hyperparameters from Section 2.3, changing only the view-dependent flag.

Common hyperparameters:
λ_entropy = 1×10⁻⁴, λ_orient = 1×10⁻², latent_iter_ratio = 0.2, view_dep_text = 1

Prompts and iterations:

Prompt	Iterations	Postfix
"a standing corgi dog"	10,000	`_corgi_vd`
"a hamburger"	10,000	`_hamburger_vd`

“a standing corgi dog” — RGB

“a standing corgi dog” — Depth

“a hamburger” — RGB

“a hamburger” — Depth

Compared with Q2.3’s view independent runs, the view dependent results show stronger 3D consistency and more stable color appearance across camera rotations. The corgi’s facial features are starting to shape up, trying to be more coherent when viewed from different angles, notice how in the previous section the dog has two mouths, while in this one the mouth has merged into one, probably with more training it would fill in its volume. And for the hamburger no longer exhibits hallowness between its layers or duplicated textures across sides. Lighting also aligns more naturally with viewpoint, giving a subtle sense of depth and material realism.