Here is my rendered GIF:

The learning rates I used:
I trained the model for 1000 iterations. The final PSNR was 28.238 and the final mean SSIM was 0.936.


Here is the rendered GIF from question 1.3.1:

Below is the rendered GIF I obtained using spherical harmonics:

Side-by-side comparisons (top image is without SH, bottom is with SH):
VIEW 0:

From this view, the renderings appear pretty different. Due to the use of spherical harmonics, we get a more realistic depiction of shadows on the seat and back cushion of the chair. Also the metal decorations on the arms and the back of the chair have a more realistic rendering in terms of shadows/reflections.
VIEW 2:

From this view, we can see that we get more realistic shadows when using spherical harmonics compared to without -- the shadow on the chair in the top image is different now compared to View 0 since the viewing angle changed, whereas the shadow in the bottom image is still the same as in View 0.
Prompt: "a hamburger"
Without guidance (400 iterations):

With guidance (700 iterations):

Prompt: "a standing corgi dog"
Without guidance (1500 iterations):

With guidance (700 iterations):

Prompt: "a koala in a tree"
Without guidance (1200 iterations):

With guidance (1300 iterations):

Prompt: "a penguin in a top hat"
Without guidance (1500 iterations):

With guidance (1300 iterations):

Prompt: "a standing corgi dog"
We see that the view-dependent text conditioning helped get rid of the corgi's third ear, making for a more realistic result compared to the previous result in 2.3.
Prompt: "a pumpkin"
I was hoping that I would get a jack-o'-lantern with only one face (unlike the one from 2.3), but I ended up just getting a normal pumpkin this time.