Assignment 4: 3D Gaussian Splatting and Diffusion Guided Optimization

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

1.2 Training 3D Gaussian Representations (15 points)

Hyper Parameters Settings

opacities: 0.05
scales: 0.05
colours: 0.005
means: 0.0025
num_itrs: 1500

Evaluation

Mean PSNR: 30.791
Mean SSIM: 0.949

1.3.1 Rendering Using Spherical Harmonics (10 Points)

GIF Comparision

View Independent Version (1.1.5)

View Dependent Version (1.3.1)

Side by Side RGB Image Comparisons

1. Frame 0

View Independent Version (1.1.5)

View Dependent Version (1.3.1)

Observations: For Frame 0, comparing the two approaches, we can see that the view-dependent one shows more natural lighting on the chair’s seat. The colors blend smoothly, and the shadows look softer and more realistic. In contrast, the view-independent approach shows a sharp boundary between shadow and bright areas, which doesn't seem realistic.

2. Frame 17

View Independent Version (1.1.5)

View Dependent Version (1.3.1)

Observations: For Frame 17, comparing the two approaches, we can see that the view-dependent one shows deeper shading and subtle reflections along the chair’s edges, which makes the scene looks more natural. In contrast, the view-independent method appears to have fewer changes in brightness.

2. Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

1. Prompt: "a hamburger"

Without Guidance (700 iterations)

With Guidance (2000 iterations)

2. Prompt: "a standing corgi dog"

Without Guidance (2000 iterations)

With Guidance (2000 iterations)

3. Prompt: "a chilling cat"

Without Guidance (2000 iterations)

With Guidance (2000 iterations)

4. Prompt: "a croissant"

Without Guidance (700 iterations)

With Guidance (2000 iterations)

2.2 Texture Map Optimization for Mesh (15 points)

1. Prompt: "Tiger"

2. Prompt: "Galaxy"

2.3 NeRF Optimization (15 points)

1. Prompt: "a standing corgi dog"

Video of rendered depth images

Video of rendered rgb images

2. Prompt: "a hamburger"

Video of rendered depth images

Video of rendered rgb images

3. Prompt: "a standing cat"

Video of rendered depth images

Video of rendered rgb images

2.4.1 View-dependent text embedding (10 points)

2.4.2 Other 3D representation (10 points)

2.4.3 Variation of implementation of SDS loss (10 points)