16-825 Assignment 4: 3D Gaussian Splatting and Diffusion Guided Optimization

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

Rendered output from the 3D Gaussian rasterizer:

Final render from 3D Gaussian rasterization

1.2 Training 3D Gaussian Representations (15 points)

The learning rates used for training the 3D Gaussian representation were:

The model was trained for 200 iterations, achieving the following metrics:

Training final renders
Training progress

1.3.1 Rendering Using Spherical Harmonics (10 points)

Rendered output without Spherical Harmonics (same as Section 1.1):

Without Spherical Harmonics

Rendered output using Spherical Harmonics:

With Spherical Harmonics

Side-by-side Comparison:

2. Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

2.2 Texture Map Optimization for Mesh (15 points)

2.3 NeRF Optimization (15 points)

2.4.1 View-dependent Text Embedding (10 points)

The use of view-dependent text embedding significantly improves the results. In the Corgi example, without view-dependent text embedding, the dog appears to have three ears, and its face remains visible from nearly every angle. By incorporating view-dependency, the geometry becomes more realistic and consistent with the expected appearance.

A similar improvement is observed in the hotdog example. Without view-dependency, the result lacks asymmetry, which is a natural characteristic of a hotdog. Adding view-dependency introduces the expected asymmetry, leading to a more realistic and visually accurate representation.

2.4.3 Variation of SDS Loss Implementation (10 points)

For this implementation, the code from Section 2.3 was extended to create a new loss function named sds_loss_pixel. This function computes the loss directly between the predicted and target images, enabling the inclusion of LPIPS loss alongside MSE loss. This approach enhances the evaluation of perceptual similarity and pixel-wise accuracy.

The results are worse compared to Section 2.3, specially on depth estimation. This could be due to the decoder not properly reconstructing the image, leading to noisy gradients and suboptimal loss values.