16-825 Assignment 4

Duc Doan

Q1. 3D Gaussian Splatting

Q1.1: 3D Gaussian Rasterization

q1.1

Q1.2: Training 3D Gaussian Representations

Learning rates:

Number of iterations: 1000

PSNR: 29.191

SSIM: 0.939

Final renders:

q1.2 final

Training progress renders:

q1.2 progress

Q1.3: Extensions

Q1.3.1: Rendering using spherical harmonics

0th-order only Full
0th-order full
f3 f3_sh
f14 f14_sh

First frame comparison: the texture details are much more defined in the SH version.

Second frame comparison: the lighting is the same as in the previous frame for the 0th-order only version. In contrast, the SH version now looks darker. The SH version correctly captures the lighting because from the previous frame, we know that the light is pointing from right to left. Now that the chair is facing left, it becomes darker because the light is not directly hitting it.

Q2: Diffusion-guided optimization

Q2.1: SDS Loss + Image Optimization

All images are trained for 2000 iterations.

Prompt Without guidance With guidance
"a hamburger"
"a standing corgi dog"
"a spherical rubik's cube"
"a pikachu holding a gun"

Q2.2: Texture map optimization for mesh

Prompt: "a cow with orange skin and blue dots"

q2.2 1

Prompt: "a dotted black and white cow"

q2.2 2

Q2.3: NeRF optimization

Prompt RGB Depth
"a standing corgi dog"
"a pikachu holding a sword"
"an f1 racing car"

Q2.4: Extensions

Q2.4.1: View-dependent text embedding

Prompt RGB, without VD RGB, with VD Depth, without VD Depth, with VD
"a standing corgi dog"
"a pikachu holding a sword"
"an f1 racing car"

Visual results comparison:

In summary, view-dependent text conditioning helps improving the correctness of the generated 3D objects. Without it, the model is biased to generate all front-facing features due to the vast proportion of front-facing views in the training set.