Assignment 4¶

1.1 3D Gaussian Rasterization¶

q1.1

1.2 Training 3D Gaussian Representations¶

Learning rates that used for each parameter:
(1) pre_act_opacities: 0.04
(2) pre_act_scales: 0.08
(3) colours: 0.001
(4) means: 0.003

Number of iterations: 1000
PSNR: 29.345
SSIM: 0.943

q1.2_! q1.2_2

1.3 Rendering Using Spherical Harmonics¶

GIF:
q1.3_1 q1.3_2

VIEW 1:
q1.3_1a q1.3_2a

VIEW 2:
q1.3_1b q1.3_2b

DIFFERENCES:

  1. The most prominent difference is the increased brightness in the rendering using full spherical harmonics, which causes surfaces like the couch to appear lighter and more reflective compared to the darker, flatter tones in the view-independent rendering.

  2. Fine details, such as subtle shadows and highlights on curved surfaces, are more accurately represented in the SH-enhanced rendering, giving the scene a more realistic and three-dimensional appearance.

  3. The color transitions across surfaces are smoother and more nuanced in the spherical harmonics rendering.

2.1 SDS Loss + Image Optimization¶

PROMPT 1 : "a hamburger"
q2.1_1

PROMPT 2 : "a standing corgi dog"
q2.1_2

PROMPT 3 : a_motor_bike
q2.1_3

PROMPT 4 : an airplane
q2.1_4

2.2 Texture Map Optimization for Mesh¶

PROMPT 1 : "a cow with geometric patterns"

Initial Mesh Final Mesh
q2.2_1a q2.2_1b

PROMPT 2 : "a cow painted in Van Gogh style"

Initial Mesh Final Mesh
q2.2_2a q2.2_2b

PROMPT 3 : "a cow made of glass"

Initial Mesh Final Mesh
q2.2_3a q2.2_3b

2.3 NeRF Optimization¶

PROMPT 1 : "a standing corgi dog"

RGB Video Depth Video

PROMPT 2 : "a standing corgi dog"

RGB Video Depth Video

PROMPT 3 : "a standing corgi dog"

RGB Video Depth Video

2.4 View-dependent text embedding¶

PROMPT 1 : "a standing corgi dog"

RGB Video Depth Video

PROMPT 2 : "a hamburger"

RGB Video Depth Video

COMPARISON:

  1. Using view-dependent text embeddings, the optimized NeRF captures finer structural details across multiple views, resulting in fewer artifacts like duplicate front faces or inconsistent geometry compared to the previous Q2.3 renderings.

  2. The output with view-dependent conditioning exhibits more accurate shading and color distribution under different camera angles, giving a more realistic perception of surface normals and lighting effects relative to the fixed text embedding version.

  3. With view-dependent embeddings, the model achieves recognizable and coherent 3D object appearance in fewer iterations, showing clearer object boundaries and spatial alignment across frames than the standard view-independent SDS optimization.