16-825 Assignment 4: 3D Gaussian Splatting and Diffusion Guided Optimization¶

1. 3D Gaussian Splatting¶

1.1 3D Gaussian Rasterization¶

Here is my rendered GIF:

gaussian splat

1.2 Training 3D Gaussian Representations¶

The learning rates I used:

  • pre_act_opacities: 1e-2
  • pre_act_scales: 1e-2
  • colours: 1e-2
  • means: 1e-3

I trained the model for 1000 iterations. The final PSNR was 28.238 and the final mean SSIM was 0.936.

training final renders

training progress

1.3 Extensions: Rendering Using Spherical Harmonics¶

Here is the rendered GIF from question 1.3.1:

gaussian splat

Below is the rendered GIF I obtained using spherical harmonics:

rendering with spherical harmonics

Side-by-side comparisons (top image is without SH, bottom is with SH):

VIEW 0:

From this view, the renderings appear pretty different. Due to the use of spherical harmonics, we get a more realistic depiction of shadows on the seat and back cushion of the chair. Also the metal decorations on the arms and the back of the chair have a more realistic rendering in terms of shadows/reflections.

VIEW 2:

From this view, we can see that we get more realistic shadows when using spherical harmonics compared to without -- the shadow on the chair in the top image is different now compared to View 0 since the viewing angle changed, whereas the shadow in the bottom image is still the same as in View 0.

2. Diffusion-guided Optimization¶

2.1 SDS Loss + Image Optimization¶

Prompt: "a hamburger"

Without guidance (400 iterations):

With guidance (700 iterations):

Prompt: "a standing corgi dog"

Without guidance (1500 iterations):

With guidance (700 iterations):

Prompt: "a koala in a tree"

Without guidance (1200 iterations):

With guidance (1300 iterations):

Prompt: "a penguin in a top hat"

Without guidance (1500 iterations):

With guidance (1300 iterations):

2.2 Texture Map Optimization for Mesh¶

Prompt: "a zebra-striped cow"

Prompt: "a tie-dyed cow"

2.3 NeRF Optimization¶

Prompt: "a standing corgi dog"

Prompt: "a pumpkin"

Prompt: "an orca whale"

2.4 Extensions: View-dependent text embedding¶

Prompt: "a standing corgi dog"

We see that the view-dependent text conditioning helped get rid of the corgi's third ear, making for a more realistic result compared to the previous result in 2.3.

Prompt: "a pumpkin"

I was hoping that I would get a jack-o'-lantern with only one face (unlike the one from 2.3), but I ended up just getting a normal pumpkin this time.