Assignment 4¶

q1.1

Learning rates that used for each parameter:
(1) pre_act_opacities: 0.04
(2) pre_act_scales: 0.08
(3) colours: 0.001
(4) means: 0.003

Number of iterations: 1000
PSNR: 29.345
SSIM: 0.943

q1.2_! q1.2_2

GIF:
q1.3_1 q1.3_2

VIEW 1:
q1.3_1a q1.3_2a

VIEW 2:
q1.3_1b q1.3_2b

DIFFERENCES:

The most prominent difference is the increased brightness in the rendering using full spherical harmonics, which causes surfaces like the couch to appear lighter and more reflective compared to the darker, flatter tones in the view-independent rendering.
Fine details, such as subtle shadows and highlights on curved surfaces, are more accurately represented in the SH-enhanced rendering, giving the scene a more realistic and three-dimensional appearance.
The color transitions across surfaces are smoother and more nuanced in the spherical harmonics rendering.

PROMPT 1 : "a hamburger"
q2.1_1

PROMPT 2 : "a standing corgi dog"
q2.1_2

PROMPT 3 : a_motor_bike
q2.1_3

PROMPT 4 : an airplane
q2.1_4

PROMPT 1 : "a cow with geometric patterns"

Initial Mesh	Final Mesh

PROMPT 2 : "a cow painted in Van Gogh style"

Initial Mesh	Final Mesh

PROMPT 3 : "a cow made of glass"

Initial Mesh	Final Mesh

PROMPT 1 : "a standing corgi dog"

RGB Video	Depth Video

PROMPT 2 : "a standing corgi dog"

RGB Video	Depth Video

PROMPT 3 : "a standing corgi dog"

RGB Video	Depth Video

PROMPT 1 : "a standing corgi dog"

RGB Video	Depth Video

PROMPT 2 : "a hamburger"

RGB Video	Depth Video

COMPARISON:

Using view-dependent text embeddings, the optimized NeRF captures finer structural details across multiple views, resulting in fewer artifacts like duplicate front faces or inconsistent geometry compared to the previous Q2.3 renderings.
The output with view-dependent conditioning exhibits more accurate shading and color distribution under different camera angles, giving a more realistic perception of surface normals and lighting effects relative to the fixed text embedding version.
With view-dependent embeddings, the model achieves recognizable and coherent 3D object appearance in fewer iterations, showing clearer object boundaries and spatial alignment across frames than the standard view-independent SDS optimization.