Assignment 4¶
1.1 3D Gaussian Rasterization¶

1.2 Training 3D Gaussian Representations¶
Learning rates that used for each parameter:
(1) pre_act_opacities: 0.04
(2) pre_act_scales: 0.08
(3) colours: 0.001
(4) means: 0.003
Number of iterations: 1000
PSNR: 29.345
SSIM: 0.943

1.3 Rendering Using Spherical Harmonics¶
GIF:

VIEW 1:

VIEW 2:

DIFFERENCES:
The most prominent difference is the increased brightness in the rendering using full spherical harmonics, which causes surfaces like the couch to appear lighter and more reflective compared to the darker, flatter tones in the view-independent rendering.
Fine details, such as subtle shadows and highlights on curved surfaces, are more accurately represented in the SH-enhanced rendering, giving the scene a more realistic and three-dimensional appearance.
The color transitions across surfaces are smoother and more nuanced in the spherical harmonics rendering.
2.1 SDS Loss + Image Optimization¶
PROMPT 1 : "a hamburger"

PROMPT 2 : "a standing corgi dog"

PROMPT 3 : a_motor_bike

PROMPT 4 : an airplane

2.2 Texture Map Optimization for Mesh¶
PROMPT 1 : "a cow with geometric patterns"
| Initial Mesh | Final Mesh |
|---|---|
![]() |
![]() |
PROMPT 2 : "a cow painted in Van Gogh style"
| Initial Mesh | Final Mesh |
|---|---|
![]() |
![]() |
PROMPT 3 : "a cow made of glass"
| Initial Mesh | Final Mesh |
|---|---|
![]() |
![]() |
2.3 NeRF Optimization¶
PROMPT 1 : "a standing corgi dog"
| RGB Video | Depth Video |
|---|---|
PROMPT 2 : "a standing corgi dog"
| RGB Video | Depth Video |
|---|---|
PROMPT 3 : "a standing corgi dog"
| RGB Video | Depth Video |
|---|---|
2.4 View-dependent text embedding¶
PROMPT 1 : "a standing corgi dog"
| RGB Video | Depth Video |
|---|---|
PROMPT 2 : "a hamburger"
| RGB Video | Depth Video |
|---|---|
COMPARISON:
Using view-dependent text embeddings, the optimized NeRF captures finer structural details across multiple views, resulting in fewer artifacts like duplicate front faces or inconsistent geometry compared to the previous Q2.3 renderings.
The output with view-dependent conditioning exhibits more accurate shading and color distribution under different camera angles, giving a more realistic perception of surface normals and lighting effects relative to the fixed text embedding version.
With view-dependent embeddings, the model achieves recognizable and coherent 3D object appearance in fewer iterations, showing clearer object boundaries and spatial alignment across frames than the standard view-independent SDS optimization.





