Assignment 4: 3D Gaussian Splatting and Diffusion Guided Optimization¶

Question 1¶

Question 1.1¶

Q1.1

Question 1.2¶

Learning Rates:

pre_act_opacities	pre_act_scales	colors	means
0.01	0.005	0.01	0.01

PSNR	SSIM
29.043	0.932

Trained the model for 1000 iterations, and got the following output:

Training Progress

Final Render

Question 1.3¶

Question 1.3.1¶

Final rendered gif (Q1.1.5): Q1.1

Final rendered gif (Q1.3): Q1.3

View 0, Without Spherical Harmonics: Q1.1_0

View 0, With Spherical Harmonics: Q1.3_0

Here, we can see a stark difference between the image without spherical harmonics and with. Without spherical harmonics, we just see a small, constant shadow appear on the cushion since the image was view independent. It's as if it were drawn on and not occuring from the lighting.

Meanwhile, we see that for the view with spherical harmonics, the shadow on the cushion is more natural and seems to occur from a light source. It's less boundary-like and more like a shadow that could occur on the same chair in real life, with the entire cushion being darker.

View 11, Without Spherical Harmonics: Q1.1_11

View 11, With Spherical Harmonics: Q1.3_11

For the image without spherical harmonics, we can see the same shape and type of shadow that we saw in view 0 above. It looks more drawn on, rigid and constant, not at all realistic. On the other hand, we can see that the shadow on the image with spherical harmonics is lighter and not at all like the image in view 0 above. The shadow changes dynamically as the view of the chair changes.

Question 2¶

Question 2.1¶

Prompt	Without Guidance	With Guidance	# Training Iters
"a hamburger"			2000
"a standing corgi dog"			2000
"a prancing hyena"			2000
"a spiraling floating wizard"			2000

Question 2.2¶

Prompt	Final Mesh
"an albino cow"
"a plump, medium rare cow"

Question 2.3¶

Prompt	Depth	View
"a standing corgi dog"
"a monkey climbing a tree"
"a sleeping bear"

Question 2.4¶

Question 2.4.1¶

Prompt	Depth	View
"a standing corgi dog"
"a bird"

When comparing the results with view dependence versus without view dependence (as in Q2.3), we see that the results are more photorealistic and consistent with view dependence. In Q2.3, while we can see the outline of the corgi's head and body, it's difficult to tell where the corgi's actual face is and its body doesn't really match that of a corgi's (doesn't look realistic). In contrast, here we can clearly see the corgi's face (and the back of its head is clear), and the corgi's body looks more realistic in terms of its geometry, as compared to Q2.3.