16-825 Learning for 3D Vision
Assignment 4
Rodrigo Lopes Catto | rlopesca
Rodrigo Lopes Catto | rlopesca

Number of iterations: 250
Learning rate parameters:
opacities: lr- 0.005scales: lr - 0.01colours: lr - 0.02means: lr - 0.001The values for PSNR & SSIM are as follows:
Mean PSNR: 28.516Mean SSIM: 0.924Training progress

Final rendered GIF


| Frame | Without Spherical Harmonics | With Spherical Harmonics | Differences |
|---|---|---|---|
| Comparison 1 | ![]() |
![]() |
Lighting appears more realistic with spherical harmonics, showing softer shadows and richer texture detail on the seat. |
| Comparison 2 | ![]() |
![]() |
Spherical harmonics add shading variation and highlight depth, making materials look less flat and more natural. |
All the models below were trained for 2000 iterations.
| Without Guidance | With Guidance |
|---|---|
![]() |
![]() |
| Without Guidance | With Guidance |
|---|---|
![]() |
![]() |
| Without Guidance | With Guidance |
|---|---|
![]() |
![]() |
| Without Guidance | With Guidance |
|---|---|
![]() |
![]() |
Note: the gifs that are saved are not on continuous loop. Please refresh the webpage to start the gif video.
Prompt: 'Cow with tiger skin'

Prompt: 'Black and white cow'

Parameters:
lambda_entropy: 0.0001lambda_orient: 0.01latent_iter_ratio: 0.1| RGB | Depth |
|---|---|
| RGB | Depth |
|---|---|
| RGB | Depth |
|---|---|
Parameters:
lambda_entropy: 0.0001lambda_orient: 0.01latent_iter_ratio: 0.1| RGB | Depth |
|---|---|
Comparing this result with the corgi generated without view dependence, the main difference is the dog’s pose, as it is standing in the previous result and sitting here. In addition, the view-dependent model achieved a comparable or slightly better level of structural detail, with the nose now visible, while using only half the number of iterations of the non-view-dependent one.
| RGB | Depth |
|---|---|
The dinosaur rendered with view dependence shows a more coherent structure and better-defined silhouette, particularly around the head and tail regions. Despite being trained for only 2000 iterations, it already achieves a recognizable shape and some texture surface compared to the non-view-dependent version, which after 6000 iterations still appears flatter and less consistent in geometry.