HW4 Gaussian Splatting
Name: David Chen
Andrew ID: davidch2
Q1: 3D Gaussian Splatting
Q1.1.5: 3D Gaussian Rasterization
Here is the rendered GIF from the pre-trained chair.ply model.

Q1.2: Training 3D Gaussian Representations
Training Parameters
- Learning Rates:
opacities: 0.05
scales: 0.005
colours: 0.0025
means: 0.005
- Number of Iterations: 200
- Final Metrics:
- PSNR: 28.357
- SSIM: 0.923
Training Progress GIF
This GIF shows the rendering at different stages of the training process.

Final Render GIF
This GIF shows the novel views rendered from the final, trained model.

Q1.3.1: Rendering Using Spherical Harmonics (SH)
Render GIF (Q1.1.5 - View-Independent DC only)

Render GIF (Q1.3.1 - Full Spherical Harmonics)

Side-by-Side Comparisons
Analysis:
The differences are very clear. The original render from Q1.1.5, which only used the base color (DC component), makes the chair look 'flat' or 'matte.' Its color is uniform and doesn't change with the viewing angle.
In contrast, the new render using full Spherical Harmonics (Q1.3.1) is much more realistic. The chair now appears 'shiny' or 'glossy.' You can see highlights and reflections on its surface that are view-dependent, meaning the bright spots move and change as the camera rotates. This correctly simulates how a real material reflects light, making the render far more convincing.
Q2: Diffusion-guided Optimization
Q2.1: SDS Loss + Image Optimization (all images are trained with 400 epochs, left is without guidance, right is with guidance)
Prompt 1: "a hamburger"
Prompt 2: "a standing corgi dog"
Prompt 3: "a cute cat"
Prompt 4: "a futuristic shoe"
Q2.2: Texture Map Optimization for Mesh
Prompt 1: "'a cow made of zebra stripes"

Prompt 2: "a cow with red dots"

Q2.3: NeRF Optimization
Hyperparameters Tuned:
--lambda_entropy 1e-3
--lambda_orient 1e-2
--latent_iter_ratio 0.3
--iters 10000
--h 128
--w 128
--sds_resolution 512
--sds_every_n_steps 1
--num_steps 64
--upsample_steps 32Prompt 1: "a standing corgi dog"
Prompt 2: "a swimming shark"
Prompt 3: “a running horse"
Q2.4.1: View-dependent text embedding
!python Q23_nerf_optimization_view.py \
--prompt "a standing corgi dog / a swimming shark" \
--lambda_entropy 1e-3 \
--lambda_orient 1e-2 \
--latent_iter_ratio 0.5 \
--iters 10000 \
--h 128 \
--w 128 \
--sds_resolution 512 \
--sds_every_n_steps 1 \
--num_steps 64 \
--upsample_steps 32Prompt 1: "a standing corgi dog"
Prompt 2: "a swimming shark"
Analysis:
This extension provided a significant improvement in 3D consistency over the baseline Q2.3 model. The "Janus problem" (or multi-face effect) was noticeably reduced, and the resulting NeRF now correctly generates a single, coherent 3D object with distinct front, side, and back views.
However, I found that this added complexity made the training process more challenging. The model now has to satisfy six different text prompts (front, back, left, right, top, bottom) instead of just one. This makes the loss landscape much harder to navigate, as the model can get "confused" trying to learn both complex colors and complex view-dependent geometry at the same time.
To solve this, I increased the latent_iter_ratio from the default to 0.5 (or 50%). This forces the model to spend the first half of its training in the "shape-only" phase (using shading = "normal"). This extended "warm-up" period was crucial, as it allowed the NeRF to establish a solid, 3D-consistent geometric foundation before it started learning the more complex, view-dependent color shading. This two-stage approach led to a much more stable convergence and a better final 3D shape.










.gif)








