Assignment 4: 3D Gaussian Splatting and Diffusion Guided Optimization

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

1.2 Training 3D Gaussian Representations (15 points)

Hyper Parameters Settings

Evaluation

1.3.1 Rendering Using Spherical Harmonics (10 Points)

GIF Comparision

View Independent Version (1.1.5)
View Dependent Version (1.3.1)

Side by Side RGB Image Comparisons

1. Frame 0
View Independent Version (1.1.5)
View Dependent Version (1.3.1)

Observations: For Frame 0, comparing the two approaches, we can see that the view-dependent one shows more natural lighting on the chair’s seat. The colors blend smoothly, and the shadows look softer and more realistic. In contrast, the view-independent approach shows a sharp boundary between shadow and bright areas, which doesn't seem realistic.

2. Frame 17
View Independent Version (1.1.5)
View Dependent Version (1.3.1)

Observations: For Frame 17, comparing the two approaches, we can see that the view-dependent one shows deeper shading and subtle reflections along the chair’s edges, which makes the scene looks more natural. In contrast, the view-independent method appears to have fewer changes in brightness.

2. Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

1. Prompt: "a hamburger"

Without Guidance (700 iterations)
With Guidance (2000 iterations)

2. Prompt: "a standing corgi dog"

Without Guidance (2000 iterations)
With Guidance (2000 iterations)

3. Prompt: "a chilling cat"

Without Guidance (2000 iterations)
With Guidance (2000 iterations)

4. Prompt: "a croissant"

Without Guidance (700 iterations)
With Guidance (2000 iterations)

2.2 Texture Map Optimization for Mesh (15 points)

1. Prompt: "Tiger"

2. Prompt: "Galaxy"

2.3 NeRF Optimization (15 points)

1. Prompt: "a standing corgi dog"

Video of rendered depth images
Video of rendered rgb images

2. Prompt: "a hamburger"

Video of rendered depth images
Video of rendered rgb images

3. Prompt: "a standing cat"

Video of rendered depth images
Video of rendered rgb images

2.4.1 View-dependent text embedding (10 points)

2.4.2 Other 3D representation (10 points)

2.4.3 Variation of implementation of SDS loss (10 points)