16-825: Learning for 3D Vision - Assignment 4

Vaibhav Parekh | Fall 2025

1. 3D Gaussian Splatting

1.1.2 Evaluate 2D Gaussians

Fig. 1.1.2. Unit test results

1.1.5 Perform Splatting

Fig. 1.1.5. Render

1.2.2 Perform Forward Pass and Compute Loss

Opacities: 0.005
Scales: 0.001
Colors: 0.002
Means: 0.001

No. of iterations: 1000

PSNR: 29.628
SSIM: 0.934

Fig. 1.2.2(a). Training progess
Fig. 1.2.2(b). Final render

1.3.2 Training On a Harder Scene

Baseline

Opacities: 0.005
Scales: 0.001
Colors: 0.002
Means: 0.001
Quats: 0.002

No. of iterations: 1000
PSNR: 21.360
SSIM: 0.648

Fig. 1.3.2(1-a). Training progess - Baseline
Fig. 1.3.2(1-b). Final render - Baseline

Improved

Opacities: 0.005
Scales: 0.001
Colors: 0.002
Means: 0.001
Quats: 0.002

No. of iterations: 1000
PSNR: 24.673
SSIM: 0.693

Fig. 1.3.2(2-a). Training progess - Improved
Fig. 1.3.2(2-b). Final render - Improved

Explanation of improvements: For improvement over the baseline, I used anisotropic Gaussians while keeping hyperparameters remained unchanged. Although the improved version slightly increased training time as compared to baseline, it produced cleaner results.

2.1 SDS Loss + Image Optimization

"a hamburger"

Fig. 2.1(1-a). Without guidance (400 iterations)
Fig. 2.1(1-b). With guidance (1100 iterations)

"a standing corgi dog"

Fig. 2.1(2-a). Without guidance (1500 iterations)
Fig. 2.1(2-b). With guidance (700 iterations)

"a bear combing his hair"

Fig. 2.1(3-a). Without guidance (1500 iterations)
Fig. 2.1(3-b). With guidance (1600 iterations)

"a goat rowing a boat"

Fig. 2.1(4-a). Without guidance (1000 iterations)
Fig. 2.1(4-b). With guidance (3500 iterations)

Prompt inspiration :)

2.2 Texture Map Optimization for Mesh

"a camouflage cow"

Fig. 2.2(a). Final textured mesh (--prompt "a camouflage cow")

"a cow with stars all over the body"

Fig. 2.2(b). Final textured mesh (--prompt "a cow with stars all over the body")

2.3 NeRF Optimization

a standing corgi dog

Fig. 2.3(1-a). Rendered RGB
Fig. 2.3(1-b). Depth

a racecar

Fig. 2.3(2-a). Rendered RGB
Fig. 2.3(2-b). Depth

castle on hill

Fig. 2.3(3-a). Rendered RGB
Fig. 2.3(3-b). Depth

2.4.3 Variation of implementation of SDS loss

a racecar

Fig. 2.4(a). Rendered RGB
Fig. 2.4(b). Depth

Implementation:
In this implementation, I first encode the image, find latent targets, and then decode the latents into the image. Mean squared error is computed between input and target images.

Analysis:
The training is heavier in terms of memory and time, and yields only marginal improvements in results.