Assignment 4
3D Gaussian Splatting & Diffusion-Guided Optimization
By Lamia AlSalloom (lalsallo)
1. 3D Gaussian Splatting
1.1 3D Gaussian Rasterization (35 pts)
1.1.2 Evaluate 2D Gaussians
Running the Unit Test: !python unit_test_gaussians.py
Unit Test output: [4/4] Tests Passed
1.1.5 Splatting
1.2 Training 3D Gaussian Representations (15 pts)
Hyperparameters & Metrics
| Parameter | Value |
|---|---|
| Learning rates |
means = 1e-4,
opacity = 1e-3,
colors = 1e-3,
cov = 1e-3
(isotropic scale) |
| Iterations | 1000 |
| PSNR / SSIM | 27.115 / 0.910 |
| Optimizer | Adam (param groups as above, eps = 1e-15) |
Training Progress
Final Renders
1.3 Extensions
1.3.1 Spherical Harmonics (10 pts)
Enabled SH in model.py / data_utils.py ([Q 1.3.1] tags). Implemented colours_from_spherical_harmonics.
With SH — GIF
Without SH — GIF
Side by Side Comparisons
Top Row: Without SH (Blue Border) | Bottom Row: With SH (Purple Border)
View 004
View 013
View 031
View 004
View 013
View 031
View 004 — In the without-SH version, lighting appears uniform and slightly washed out, producing a flat, matte look. The with-SH render shows subtle shading variations across the seat and handles, giving a better sense of curvature and depth. Highlights shift slightly with view direction, making the metallic edges appear more realistic.
View 013 — The DC-only image shows consistent brightness across all surfaces, while the SH-based version produces stronger contrast between lit and shadowed regions. Notice how the inner seat area darkens naturally and the armrest picks up a faint reflective tint, showing view dependent lighting. Minor shimmering can be observed at the edges, a common SH artifact from higher-order terms.
View 031 — In the without-SH render, the shadowing on the seat appears overly dark and static — it looks baked into the texture rather than responding to light direction, giving the surface a flat, painted look, notice the sharp fixed shadow edge. While the with-SH version distributes illumination more naturally: the top surface catches light while the lower regions remain softly shaded, creating a smoother gradient and a more physically plausible light falloff.
1.3.2 Harder Scene (10 pts)
Training on NeRF-Synthetic materials dataset with random init (init_type="random").
Isotropic Gaussians; L1 image loss; Adam optimizer with per-parameter learning rates.
Training Configuration
| Parameter | Value |
|---|---|
| Learning rate |
opacity lr=2e-3,
scales lr=2e-3,
colours lr=2e-3,
means lr=1e-3Adam (eps=1e-15, no LR schedule) |
| gaussians_per_splat | -1 (auto) |
| Loss | L1 |
Baseline — 2000 iters
- Gaussians: Isotropic
- Init: Random sphere
- Iters: 2000
- PSNR / SSIM:
17.323 / 0.592 - Notes: Early blur and floaters due to random init; structure starts forming but textures remain soft.
Improved — 4000 iters
- Gaussians: Isotropic
- Init: Random sphere
- Iters: 4000
- PSNR / SSIM:
18.311 / 0.661 - Changes vs. baseline: Longer training (2k → 4k iters).
- Notes: Sharper object boundaries and fewer artifacts; materials begin to separate across the scene.
We also trained with anisotropic Gaussians (isotropic=False) so each splat can learn a full 3D covariance (orientation + axis-aligned scales).
Quaternions were enabled for rotation and optimized alongside other parameters.
| Setting | Value |
|---|---|
| Geometry | Anisotropic Gaussians (isotropic=False) |
| Loss / Optimizer | L1 image loss; Adam (eps=1e-15) |
| Learning rates |
opacity lr=2e-3,
scales lr=2e-3,
colours lr=2e-3,
means lr=1e-3
|
| gaussians_per_splat | -1 (auto) |
Anisotropic — 2000 iters
| Parameter | Value / Notes |
|---|---|
| Iters | 2000 |
| PSNR / SSIM | 17.612 / 0.660 |
| Notes | Training converges noticeably faster than in the isotropic baseline. The shapes emerge earlier in the optimization, with reduced blur and clearer geometric boundaries. Although some edges remain imperfect, the overall structure forms more quickly and material details become sharper. |
2. Diffusion-Guided Optimization
2.1 SDS Loss + Image Optimization (20 pts)
Each row shows the same text prompt — left: No Guidance, right: With Guidance.
"a hamburger" — No Guidance (2000 iters)
"a hamburger" — With Guidance (2000 iters)
"a standing corgi dog" — No Guidance (2000 iters)
"a standing corgi dog" — With Guidance (2000 iters)
"a red sports car" — No Guidance (2000 iters)
"a red sports car" — With Guidance (2000 iters)
"a cozy mountain cabin" — No Guidance (2000 iters)
"a cozy mountain cabin" — With Guidance (2000 iters)
2.2 Texture Map Optimization for Mesh (15 pts)
We Optimized ColorField over cow mesh using SDS. Randomized cameras per iteration.
"a dotted black and white cow"
"an orange golden bull"
"a fall tree"
2.3 NeRF Optimization (15 pts)
Tuned lambda_entropy, lambda_orient, and latent_iter_ratio for better geometry; switched shading modes as warmup then randomized.
"a hamburger" — RGB (iters 10000)
"a hamburger" — Depth (iters 10000)
"a rose" — RGB (iters 3000)
"a rose" — Depth (iters 3000)
"a rose" — RGB (iters 10000)
"a rose" — Depth (iters 10000)
"a standing corgi dog" — RGB (iters 10000)
"a standing corgi dog" — Depth (iters 10000)
-
Best hyperparameters:
λentropy = 1×10−4, λorient = 1×10−2, latent_iter_ratio = 0.2 -
Prompts and training details:
Prompt Iterations λentropy λorient latent_iter_ratio "a standing corgi dog" 10,000 1×10−4 1×10−2 0.2 "a rose" 3,000 1×10−3 1×10−2 0.2 "a hamburger" 10,000 1×10−4 1×10−2 default (0) -
Notes:
The Corgi run usedlatent_iter_ratio = 0.2to warm up with normal shading before alternating between lambertian and textureless, yielding cleaner geometry and recognizable body contours. The rose required a slightly higherλentropyfor color regularization to prevent over saturation, while the hamburger, trained with default shading, produced the correct global shape but exhibited less fine geometric detail.
2.4 Extensions
2.4.1 View-Dependent Text Embedding (10 pts)
Enabled view-dependent conditioning (--view_dep_text 1) to improve 3D consistency
and lighting coherence across rendered views. We reused the best hyperparameters from
Section 2.3, changing only the view-dependent flag.
-
Common hyperparameters:
λentropy = 1×10−4, λorient = 1×10−2, latent_iter_ratio = 0.2, view_dep_text = 1 -
Prompts and iterations:
Prompt Iterations Postfix "a standing corgi dog" 10,000 _corgi_vd"a hamburger" 10,000 _hamburger_vd
“a standing corgi dog” — RGB
“a standing corgi dog” — Depth
“a hamburger” — RGB
“a hamburger” — Depth
Compared with Q2.3’s view independent runs, the view dependent results show stronger 3D consistency and more stable color appearance across camera rotations. The corgi’s facial features are starting to shape up, trying to be more coherent when viewed from different angles, notice how in the previous section the dog has two mouths, while in this one the mouth has merged into one, probably with more training it would fill in its volume. And for the hamburger no longer exhibits hallowness between its layers or duplicated textures across sides. Lighting also aligns more naturally with viewpoint, giving a subtle sense of depth and material realism.
