HW4 Gaussian Splatting

Name: David Chen
Andrew ID: davidch2

Q1: 3D Gaussian Splatting

Q1.1.5: 3D Gaussian Rasterization

Here is the rendered GIF from the pre-trained chair.ply model.

Q1.2: Training 3D Gaussian Representations

Training Parameters

Learning Rates:
- opacities: 0.05
- scales: 0.005
- colours: 0.0025
- means: 0.005

Number of Iterations: 200

Final Metrics:
- PSNR: 28.357
- SSIM: 0.923

Training Progress GIF

This GIF shows the rendering at different stages of the training process.

Final Render GIF

This GIF shows the novel views rendered from the final, trained model.

Q1.3.1: Rendering Using Spherical Harmonics (SH)

Render GIF (Q1.1.5 - View-Independent DC only)

Render GIF (Q1.3.1 - Full Spherical Harmonics)

Side-by-Side Comparisons

Analysis:

The differences are very clear. The original render from Q1.1.5, which only used the base color (DC component), makes the chair look 'flat' or 'matte.' Its color is uniform and doesn't change with the viewing angle.

In contrast, the new render using full Spherical Harmonics (Q1.3.1) is much more realistic. The chair now appears 'shiny' or 'glossy.' You can see highlights and reflections on its surface that are view-dependent, meaning the bright spots move and change as the camera rotates. This correctly simulates how a real material reflects light, making the render far more convincing.

Q2: Diffusion-guided Optimization

Q2.1: SDS Loss + Image Optimization (all images are trained with 400 epochs, left is without guidance, right is with guidance)

Prompt 1: "a hamburger"

Prompt 2: "a standing corgi dog"

Prompt 3: "a cute cat"

Prompt 4: "a futuristic shoe"

Q2.2: Texture Map Optimization for Mesh

Prompt 1: "'a cow made of zebra stripes"

Prompt 2: "a cow with red dots"

Q2.3: NeRF Optimization

Hyperparameters Tuned:

    --lambda_entropy 1e-3 
    --lambda_orient 1e-2 
    --latent_iter_ratio 0.3 
    --iters 10000 
    --h 128 
    --w 128 
    --sds_resolution 512 
    --sds_every_n_steps 1 
    --num_steps 64 
    --upsample_steps 32

Prompt 1: "a standing corgi dog"

Prompt 2: "a swimming shark"

Prompt 3: “a running horse"

Q2.4.1: View-dependent text embedding

!python Q23_nerf_optimization_view.py \
    --prompt "a standing corgi dog / a swimming shark" \
    --lambda_entropy 1e-3 \
    --lambda_orient 1e-2 \
    --latent_iter_ratio 0.5 \
    --iters 10000 \
    --h 128 \
    --w 128 \
    --sds_resolution 512 \
    --sds_every_n_steps 1 \
    --num_steps 64 \
    --upsample_steps 32

Prompt 1: "a standing corgi dog"

Prompt 2: "a swimming shark"

Analysis:

This extension provided a significant improvement in 3D consistency over the baseline Q2.3 model. The "Janus problem" (or multi-face effect) was noticeably reduced, and the resulting NeRF now correctly generates a single, coherent 3D object with distinct front, side, and back views.

However, I found that this added complexity made the training process more challenging. The model now has to satisfy six different text prompts (front, back, left, right, top, bottom) instead of just one. This makes the loss landscape much harder to navigate, as the model can get "confused" trying to learn both complex colors and complex view-dependent geometry at the same time.

To solve this, I increased the latent_iter_ratio from the default to 0.5 (or 50%). This forces the model to spend the first half of its training in the "shape-only" phase (using shading = "normal"). This extended "warm-up" period was crucial, as it allowed the NeRF to establish a solid, 3D-consistent geometric foundation before it started learning the more complex, view-dependent color shading. This two-stage approach led to a much more stable convergence and a better final 3D shape.