Late days used: 1

vis_grid output for xy_grid:

vis_rays output for ray_bundle:

render_points output for point_samples:

Depth output:

Colour output:

Please see code for get_random_pixels_from_image method in ray_utils.py for this part (No output figure/visualization has been asked for this part).
Centre of box:(0.25, 0.25, -0.00)
Side Lengths of box: (2.00, 1.50, 1.50)
(rounded off to 1/100 decimal place)
Generated gif:

Original gif provided in the assignment:

Generated gif:

Original gif provided in the assignment:

For the above, I have used the same network architecture as in the NeRF paper with the following changes:
Please refer to the code for full details of the architecture.
For this part, I've used a 3 layer initial-network (as compared to the 5 layer initial-network in part 3 of the question).

Using a 5 layer inital-network, I obtain the following results:

There doesn't seem to be a significant difference in the visualizations of the view-dependent and view-independent predictions with my architecture. It seems that there are some subtle differences, such as the view-dependent predictions seem to represent specularity better, and the view-independent predictions seem to be more sharper, but I won't call my observations conclusive. In my opinion, a more detailed analysis with more sample points and other objects is required before commenting conclusively. (The authors in the paper mention a difference in the metrics to compare the view-dependent and view-independent predictions, that seems to be a good direction).
Running the NeRF model with the default nerf_lego_highres.yaml configuration with the same architecture as part 3, yields the following visualization:

For hyperparameter tuning, I tried changing the number of n_pts_per_ray config parameter from 128 to 64, 256 and 512. 512 n_pts_per_ray uses a lot of GPU memory, and trains very slowly. Hence, I could train it for only 25 epochs. It yields the following sub-optimized result:

For 64 n_pts_per_ray, I get the following result:

For 256 n_pts_per_ray, I get the following result:

Note that the visualization is considerabley sharper for 256 points per ray as compared to 64 points per ray and marginally sharper than the model with 128 points per ray.
I also tried changing the number of layers in the intial part of the network. Specifically, I changed the number of layers from 5 to 3 and 7.
For 3 layers, I obtain the following resutl:

For 7 layers, I obtain the following resutl:

Here also, the results are sharper for the 7 layer network as compared to the 3 layer network. We can conclude that the 7 layer network has more experessive power.