Assignment 4¶

1. 3D Gaussian Splatting¶

1.1 3D Gaussian Rasterization (35 points)¶

q1.1

1.2 Training 3D Gaussian Representations (15 points)¶

q1.2

Learning rates:

Opacities: 0.0008
Scales: 0.001
Colors: 0.005
Means: 0.0002

Number of Iterations: 2000

Mean PSNR: 29.186

Mean SSIM: 0.937

1.3 Extensions (Choose at least one! More than one is extra credit)¶

1.3.1 Rendering Using Spherical Harmonics (10 Points)¶

q1.2

With all Spherical Harmonics	With only DC Spherical Harmonics

As seen in the images, the difference between using all spherical harmonics vs only using DC spherical harmonics is very slight but allows for better shading and more accurate lighting. This is specifically more evident when looking at the seat part of the chair which is slightly rounded. When using all the spherical harmonics, you can see the shading gives you a more realistic "bulge" of the seat and the shadow of the back rest of the chair on the seat is more realistic and gradual than the harsher boundary that does really change with camera angle that you get from the DC only method.

2. Diffusion-guided Optimization¶

2.1 SDS Loss + Image Optimization (20 points)¶

Prompt	Guidance	Output
"a hamburger"	No Guidance
"a hamburger"	Yes Guidance
"a standing corgi dog"	No Guidance
"a standing corgi dog"	Yes Guidance
"a helicopter"	No Guidance
"a helicopter"	Yes Guidance
"a castle"	No Guidance
"a castle"	Yes Guidance

2.2 Texture Map Optimization for Mesh (15 points)¶

Prompt	Output
"a hamburger"
"a cow"

2.3 NeRF Optimization (15 points)¶

Prompt	Output
"a hamburger"
"a standing corgi dog"
"a castle"

2.4 Extensions (Choose at least one! More than one is extra credit)¶

2.4.1 View-dependent text embedding (10 points)¶

Prompt	Output
"a helicopter"
"a standing corgi dog"

Comparison: The major difference is how the other sides of the NeRF looks like. Speficially looking at the corgi, both the version before and now has 3 ear like shapes. However, using view dependent text embeddings results in a nerf that actually has a face for the corgi, where as the earlier one does not. There seems like a clear front, back and side, which is especially noticable by the way the face has been generated. So, the 3D shape from different sides is much better when using view-dependent text embeddings rather than not, although none of them are perfect.