Assignment 4¶

author: Yu Jin Goh (yujing)

Q1.1 3D Gaussian Rasterization¶

Rendered GIF:
chair

Q1.2 Training 3D Gaussian Representations¶

Learning rates parameters used:
| Learning Rate Variable | Value | | :------- | :------: | | pre_act_opacities | 0.0005 | | pre_act_scales | 0.005 | | colours | 0.005 | | means | 0.0005 |

Number of training iterations: 1000

Metric Value
PSNR 29.446
SSIM 0.937

Training progress:
car_training

Final Render:
car_final

Q1.3.1 Rendering Using Spherical Harmonics¶

Previous Render (Q1.1.5):
chair

Now with view-dependent effects (Q1.3.1):
chair

No View Dependence View Dependence Explanations
chair_000 chair_000_view You can observe that the patterns on the chair were originally yellow and diffuse when no view dependent effects were included but became gold as the different parts of the patterns on the chair emitted different colors based on the viewing angle once view dependent effects were accounted for
chair_013 chair_013_view In this view you can also observe that the decorations on the chair have highlights once view dependent effects are included and shadows on parts of the cloth that vary with viewing angle

Q2.1 SDS Loss + Image Optimization¶

Prompt No Guidance With Guidance
a hamburger hamburger_u hamburger
a standing corgi dog corgi_u corgi
a unicorn unicorn_u unicorn
a flying fire breathing dragon dragon_u dragon

Q2.2 Texture Map Optimization for Mesh¶

Prompt Generated Texture
a white dairy cow with pink nose and black spots cow_tex
a fearsome orange and black striped tiger tiger_tex

2.3 NeRF Optimization¶

Prompt RGB Depth
a standing corgi dog corgi_rgb corgi
a white unicorn unicorn_rgb unicorn
a tiger tiger_rgb rathalos

2.4.1 View-dependent text embedding¶

Prompt RGB Depth
a standing corgi dog corgi_rgb corgi
a white unicorn unicorn_rgb unicron

Its clear that the view-dependent text conditioning helps to improve the generated volume. As we can see in the generated radiance fields, the corgi now have only 2 ears instead of 3 and the unicorn has one horn instead of two. This is becuase the images are now scored against a more likely observation of the prompt from the corresponding view.

In [ ]: