Assignment 4

Question 1.1.5: Perform Splatting

Question 1.2.2: Perform Forward Pass and Compute Loss

    parameters = [
        {'params': [gaussians.pre_act_opacities], 'lr': 0.05, "name": "opacities"},
        {'params': [gaussians.pre_act_scales], 'lr': 0.002, "name": "scales"},
        {'params': [gaussians.colours], 'lr': 0.02, "name": "colours"},
        {'params': [gaussians.means], 'lr': 0.00005, "name": "means"},
        {'params': [gaussians.pre_act_quats], 'lr': 0.02, "name": "quats"},
    ]

The above were the learning rates used for the best results. Trained for 2000 iterations.

PSNR: 29.336
SSIM: 0.932

Training Progress

Final Renders

Question 1.3.1: Rendering Using Spherical Harmonics

View dependent

No view dependence

Frame 1 - No view dependence

Frame 1 - View dependent

Frame 2 - No view dependence

Frame 2 - View dependent

The main differences I noticed are that in the spherical harmonic case, I can clearly notice that the green cushion is shiny and the texture appears to be more discernible. The former observation is naturally explained by the view-dependence enabled by the spherical harmonics. The latter observation is probably due to the fact that the texture is easily discernible using albedo and specularity than with the albedo alone.

Question 2.1: SDS Loss + Image Optimization (20 points)

a hamburger

With guidance

Without guidance

a standing corgi dog

With guidance

Without guidance

a sleepy kitten

With guidance

Without guidance

a standing penguin

With guidance

Without guidance

All of the above results were obtained by training for 2000 iterations.

Question 2.2: Texture Map Optimization for Mesh

a dotted black and white cow

an orange golden bull

Question 2.3: NeRF Optimization

a hamburger

RGB

Depth

a standing corgi dog

RGB

Depth

a sleepy orange cat

RGB

Depth

Question 2.4.1: View-dependent text embedding

a standing corgi dog

RGB

Depth

a sleepy orange cat

RGB

Depth

If we look at the results produced by the same prompts without view-dependent conditioning in the previous section, the corgi and the cat have 3 ears each. This is because it is implicitly trying to make every view look similar to a typical front-facing version of the corgi/cat. This issue has been resolved by using view-dependent conditioning.

1. 3D Gaussian Splatting

Question 1.1.5: Perform Splatting

Question 1.2.2: Perform Forward Pass and Compute Loss

Question 1.3.1: Rendering Using Spherical Harmonics

2. Diffusion-guided Optimization

Question 2.1: SDS Loss + Image Optimization (20 points)

Question 2.2: Texture Map Optimization for Mesh

Question 2.3: NeRF Optimization

Question 2.4.1: View-dependent text embedding