python train_harder_scene.py --out_path ./output/q1.3.2/baseline \
    --gaussians_per_splat -1 \
    --isotropic_gaussians \
    --lr_opacity 0.02 \
    --lr_scale 0.02 \
    --lr_colour 0.02 \
    --lr_mean 0.02 \
    --init_type random

python train_harder_scene.py --out_path ./output/q1.3.2/extension \
    --init_type vggt \
    --num_itrs  500 \
    --gaussians_per_splat -1 \
    --lr_opacity 0.1 \
    --lr_scale 0.1 \
    --lr_colour 0.001 \
    --lr_mean 0.0001 \
    --lr_quat 0.00005

parameter	learning rate
opacities	0.025
scales	0.005
colours	0.0025
means	0.00016

Frame #	With SH	Without SH	Observations
Frame 0			Notice that the cushion looks a lot brighter without spherical harmonics then the one with SH. The cushion has a view-dependent reflectance effect that comes from the microstructure of the material.
Frame 12			Similar effects to the 0th frame can be observed here.
Frame 24			The back of the seat is mostly diffuse and unaffected the presence of Spherical Harmonics.

Prompt	Iteration	With Guidance	Without Guidance
a hamburger	2000
a standing corgi dog	2000
a F-16_fighter_jet	2000
a chimpanzee holding a banana	2000

Prompt	Iteration	Initial Mesh	Final Mesh
a black and white cow	2000
a blue cow with red patches	2000
a brown cow with orange patches	2000

Prompt	Iteration	Depth	RGB
a hamburger	10000
a standing corgi dog	10000
a green frog on top of a horizontal flat rock	10000

HW4¶

Q1 3D Gaussian Splatting¶

1.1 3D Gaussian Rasterization (35 points)¶

1.1.1 - 1.1.4¶

1.1.5 Perform Splatting¶

1.2 Training 3D Gaussian Representations (15 points)¶

1.2.2 Perform Forward Pass and Compute Loss¶

Fig 1. Training Progress¶

Fig 2. Final Rendering¶

Fig 3. Eval (Iterations, PSNR, SSIM)¶

Learning Rates¶

1.3 Extensions¶

1.3.1 Rendering Using Spherical Harmonics (10 points)¶

1.3.2 Training On a Harder Scene (10 points)¶

Isotropic Baseline¶

Anisotropic Extension with VGGT init¶

Q2. Diffusion-guided Optimization¶

2.1 SDS Loss + Image Optimization (20 points)¶

2.2 Texture Map Optimization for Mesh (15 points)¶

2.3 NeRF Optimization (15 points)¶

2.4 Extensions¶

2.4.1 View-dependent text embedding (10 points)¶

2.4.2¶

2.4.3 Variation of implementation of SDS loss (10 points)¶

Prompt	Iteration	Depth	RGB
a red sports car	10000
a standing corgi dog	10000
a green frog on top of a horizontal flat rock	10000

Prompt	Iteration	Depth	RGB	Observation
a green frog on top of a horizontal flat rock	8000			The latent-space version of the SDS loss produces much cleaner depth renderings, yielding more solid geometry and finer details in the RGB outputs.
a green frog on top of a horizontal flat rock	8000			The pixel-space version of the SDS loss is significantly slower due to its higher dimensionality - and even more so if LPIPS loss is used, as it requires an additional VGG network. To trade off speed, I only used a subsampled set of pixels for the loss computation, which leads to noisier depth maps and RGB renderings with more floating artifacts and less solid appearances.