16-825 Assignment 4

Duc Doan

Q1. 3D Gaussian Splatting

Q1.1: 3D Gaussian Rasterization

q1.1

Q1.2: Training 3D Gaussian Representations

Learning rates:

pre_act_opacities: 0.01
pre_act_scales: 0.001
colours: 0.01
means: 0.0005

Number of iterations: 1000

PSNR: 29.191

SSIM: 0.939

Final renders:

q1.2 final

Training progress renders:

q1.2 progress

Q1.3: Extensions

Q1.3.1: Rendering using spherical harmonics

0th-order only	Full

First frame comparison: the texture details are much more defined in the SH version.

Second frame comparison: the lighting is the same as in the previous frame for the 0th-order only version. In contrast, the SH version now looks darker. The SH version correctly captures the lighting because from the previous frame, we know that the light is pointing from right to left. Now that the chair is facing left, it becomes darker because the light is not directly hitting it.

Q2: Diffusion-guided optimization

Q2.1: SDS Loss + Image Optimization

All images are trained for 2000 iterations.

Prompt	Without guidance	With guidance
"a hamburger"
"a standing corgi dog"
"a spherical rubik's cube"
"a pikachu holding a gun"

Q2.2: Texture map optimization for mesh

Prompt: "a cow with orange skin and blue dots"

q2.2 1

Prompt: "a dotted black and white cow"

q2.2 2

Q2.3: NeRF optimization

Prompt	RGB	Depth
"a standing corgi dog"
"a pikachu holding a sword"
"an f1 racing car"

Q2.4: Extensions

Q2.4.1: View-dependent text embedding

Prompt	RGB, without VD	RGB, with VD	Depth, without VD	Depth, with VD
"a standing corgi dog"
"a pikachu holding a sword"
"an f1 racing car"

Visual results comparison:

Without view-dependent embedding:
- the dog has a face on its back
- the Pikachu has faces on all sides and 3 ears
- the car is square
With view-dependent embedding: all renderings look mostly correct. The only problem is the Pikachu still having back face, even though the third ear is now removed. Maybe this is due to non-symmetry while we only have one embedding for both left and right sides.

In summary, view-dependent text conditioning helps improving the correctness of the generated 3D objects. Without it, the model is biased to generate all front-facing features due to the vast proportion of front-facing views in the training set.