Assignment 4
3D Gaussian Splatting & Diffusion-Guided Optimization

By Lamia AlSalloom (lalsallo)

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 pts)

1.1.2 Evaluate 2D Gaussians

Running the Unit Test: !python unit_test_gaussians.py

Unit Test output: [4/4] Tests Passed

1.1.5 Splatting

Q1 Render GIF

1.2 Training 3D Gaussian Representations (15 pts)

Hyperparameters & Metrics

Parameter Value
Learning rates means = 1e-4,  opacity = 1e-3,  colors = 1e-3,  cov = 1e-3
(isotropic scale)
Iterations 1000
PSNR / SSIM 27.115 / 0.910
Optimizer Adam
(param groups as above, eps = 1e-15)



Training Progress GIF

Training Progress

Final Renders GIF

Final Renders

1.3 Extensions

1.3.1 Spherical Harmonics (10 pts)

Enabled SH in model.py / data_utils.py ([Q 1.3.1] tags). Implemented colours_from_spherical_harmonics.

With SH GIF

With SH — GIF

Without SH GIF

Without SH — GIF


Side by Side Comparisons


Top Row: Without SH (Blue Border)    |    Bottom Row: With SH (Purple Border)

With SH — View 004

View 004

With SH — View 013

View 013

With SH — View 031

View 031

Without SH — View 004

View 004

Without SH — View 013

View 013

Without SH — View 031

View 031

View 004 — In the without-SH version, lighting appears uniform and slightly washed out, producing a flat, matte look. The with-SH render shows subtle shading variations across the seat and handles, giving a better sense of curvature and depth. Highlights shift slightly with view direction, making the metallic edges appear more realistic.

View 013 — The DC-only image shows consistent brightness across all surfaces, while the SH-based version produces stronger contrast between lit and shadowed regions. Notice how the inner seat area darkens naturally and the armrest picks up a faint reflective tint, showing view dependent lighting. Minor shimmering can be observed at the edges, a common SH artifact from higher-order terms.

View 031 — In the without-SH render, the shadowing on the seat appears overly dark and static — it looks baked into the texture rather than responding to light direction, giving the surface a flat, painted look, notice the sharp fixed shadow edge. While the with-SH version distributes illumination more naturally: the top surface catches light while the lower regions remain softly shaded, creating a smoother gradient and a more physically plausible light falloff.



1.3.2 Harder Scene (10 pts)

Training on NeRF-Synthetic materials dataset with random init (init_type="random"). Isotropic Gaussians; L1 image loss; Adam optimizer with per-parameter learning rates.

Training Configuration

Parameter Value
Learning rate opacity lr=2e-3scales lr=2e-3colours lr=2e-3means lr=1e-3
Adam (eps=1e-15, no LR schedule)
gaussians_per_splat -1 (auto)
Loss L1
2k Progress GIF
2k Final Renders GIF

Baseline — 2000 iters

  • Gaussians: Isotropic
  • Init: Random sphere
  • Iters: 2000
  • PSNR / SSIM: 17.323 / 0.592
  • Notes: Early blur and floaters due to random init; structure starts forming but textures remain soft.
4k Progress GIF
4k Final Renders GIF

Improved — 4000 iters

  • Gaussians: Isotropic
  • Init: Random sphere
  • Iters: 4000
  • PSNR / SSIM: 18.311 / 0.661
  • Changes vs. baseline: Longer training (2k → 4k iters).
  • Notes: Sharper object boundaries and fewer artifacts; materials begin to separate across the scene.

We also trained with anisotropic Gaussians (isotropic=False) so each splat can learn a full 3D covariance (orientation + axis-aligned scales). Quaternions were enabled for rotation and optimized alongside other parameters.

Setting Value
Geometry Anisotropic Gaussians (isotropic=False)
Loss / Optimizer L1 image loss; Adam (eps=1e-15)
Learning rates opacity lr=2e-3scales lr=2e-3colours lr=2e-3means lr=1e-3
gaussians_per_splat -1 (auto)
Anisotropic Progress GIF
Anisotropic Final Renders GIF

Anisotropic — 2000 iters

Parameter Value / Notes
Iters 2000
PSNR / SSIM 17.612 / 0.660
Notes Training converges noticeably faster than in the isotropic baseline. The shapes emerge earlier in the optimization, with reduced blur and clearer geometric boundaries. Although some edges remain imperfect, the overall structure forms more quickly and material details become sharper.

2. Diffusion-Guided Optimization

2.1 SDS Loss + Image Optimization (20 pts)

Each row shows the same text prompt — left: No Guidance, right: With Guidance.

a hamburger — No Guidance

"a hamburger" — No Guidance (2000 iters)

a hamburger — With Guidance

"a hamburger" — With Guidance (2000 iters)

a standing corgi dog — No Guidance

"a standing corgi dog" — No Guidance (2000 iters)

a standing corgi dog — With Guidance

"a standing corgi dog" — With Guidance (2000 iters)

a red sports car — No Guidance

"a red sports car" — No Guidance (2000 iters)

a red sports car — With Guidance

"a red sports car" — With Guidance (2000 iters)

a cat wearing sunglasses — No Guidance

"a cozy mountain cabin" — No Guidance (2000 iters)

a cat wearing sunglasses — With Guidance

"a cozy mountain cabin" — With Guidance (2000 iters)

2.2 Texture Map Optimization for Mesh (15 pts)

We Optimized ColorField over cow mesh using SDS. Randomized cameras per iteration.

a dotted black and white cow

"a dotted black and white cow"

an orange golden bull

"an orange golden bull"

a fall tree

"a fall tree"

2.3 NeRF Optimization (15 pts)

Tuned lambda_entropy, lambda_orient, and latent_iter_ratio for better geometry; switched shading modes as warmup then randomized.

"a hamburger" — RGB (iters 10000)

"a hamburger" — Depth (iters 10000)

"a rose" — RGB (iters 3000)

"a rose" — Depth (iters 3000)

"a rose" — RGB (iters 10000)

"a rose" — Depth (iters 10000)

"a standing corgi dog" — RGB (iters 10000)

"a standing corgi dog" — Depth (iters 10000)

  • Best hyperparameters:
    λentropy = 1×10−4,  λorient = 1×10−2,  latent_iter_ratio = 0.2

  • Prompts and training details:
    Prompt Iterations λentropy λorient latent_iter_ratio
    "a standing corgi dog" 10,000 1×10−4 1×10−2 0.2
    "a rose" 3,000 1×10−3 1×10−2 0.2
    "a hamburger" 10,000 1×10−4 1×10−2 default (0)

  • Notes:
    The Corgi run used latent_iter_ratio = 0.2 to warm up with normal shading before alternating between lambertian and textureless, yielding cleaner geometry and recognizable body contours. The rose required a slightly higher λentropy for color regularization to prevent over saturation, while the hamburger, trained with default shading, produced the correct global shape but exhibited less fine geometric detail.

2.4 Extensions

2.4.1 View-Dependent Text Embedding (10 pts)

Enabled view-dependent conditioning (--view_dep_text 1) to improve 3D consistency and lighting coherence across rendered views. We reused the best hyperparameters from Section 2.3, changing only the view-dependent flag.

  • Common hyperparameters:
    λentropy = 1×10−4,  λorient = 1×10−2,  latent_iter_ratio = 0.2,  view_dep_text = 1

  • Prompts and iterations:
    Prompt Iterations Postfix
    "a standing corgi dog" 10,000 _corgi_vd
    "a hamburger" 10,000 _hamburger_vd

“a standing corgi dog” — RGB

“a standing corgi dog” — Depth

“a hamburger” — RGB

“a hamburger” — Depth

Compared with Q2.3’s view independent runs, the view dependent results show stronger 3D consistency and more stable color appearance across camera rotations. The corgi’s facial features are starting to shape up, trying to be more coherent when viewed from different angles, notice how in the previous section the dog has two mouths, while in this one the mouth has merged into one, probably with more training it would fill in its volume. And for the hamburger no longer exhibits hallowness between its layers or duplicated textures across sides. Lighting also aligns more naturally with viewpoint, giving a subtle sense of depth and material realism.