Assignment 4: 3D Gaussian Splatting and Diffusion-guided Optimization

Part 1: 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

Rendering Results:

Rendered scene using pre-trained 3D Gaussians

1.2 Training 3D Gaussian Representations (15 points)

Training Configuration:

Learning Rates:

Means: 0.00022
Colors: 0.03
Opacities: 0.00065
Scales: 0.005

Number of Iterations: 1000

PSNR: 27.8

SSIM: 0.91

Training Progress:

Training progress (top: predictions, bottom: ground truth)

Final Results:

Final rendering after training completion

1.3 Extensions

1.3.1 Rendering Using Spherical Harmonics (10 points)

Implementation Notes:

Extended the rasterizer to support spherical harmonics for view-dependent effects. Implemented vectorized computation of colors from spherical harmonic coefficients and view directions.

Without Spherical Harmonics (DC only)

With Spherical Harmonics (all degrees)

Comparison Analysis:

With spherical harmonics, we can see specular variations of shadows and colours, enhancing the overall visual quality.

Part 2: Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

Results - "a hamburger":

Without guidance (1000 iterations)

With guidance (1000 iterations)

Results - "a standing corgi dog":

Without guidance (1000 iterations)

With guidance (1000 iterations)

Results - "a white aeroplane":

Without guidance (1000 iterations)

With guidance (1000 iterations)

Results - "a red sports bike":

Without guidance (1000 iterations)

With guidance (1000 iterations)

2.2 Mesh Texture Optimization (15 points)

Results - "a porcelain cow statue with delicate blue floral patterns":

Prompt: "a porcelain cow statue with delicate blue floral patterns"

Results - "a cow with starry night sky textures":

Prompt: "a cow with starry night sky textures"

Results - "Military camouflage helicopter with weathered paint":

Implementation Notes:

I downloaded a new mesh object of a chopper and rendered with the above prompt. I tried more prompts but the results were not very clear for more detailed prompts.

2.3 NeRF Optimization (15 points)

Hyperparameter Settings:

lambda_entropy: 0.01
lambda_orient: 0.0001
latent_iter_ratio: 0.2

Results - "a standing corgi dog":

RGB rendering

Depth rendering

Results - "a cupcake with candle":

RGB rendering

Depth rendering

Results - "a headphone with leaf shape speakers":

RGB rendering

Depth rendering

2.4 Extensions

2.4.1 View-dependent Text Embedding (10 points)

Results - "a standing corgi dog":

Without view-dependent conditioning

With view-dependent conditioning

Results - "a cupcake with candle":

Without view-dependent conditioning

With view-dependent conditioning

Analysis:

With view dependence, we can see that the generated rendered views are consistently smooth. There are no artifacts of image replication. The colors are sharper and more vibrant when conditioned with view-dependent information.

2.4.2 Alternative 3D Representation: Gaussian Splatting (10 points)

Chosen Representation: 3D Gaussian Splatting

Implementation Details:

Initialization: 5,000 Gaussians with random positions sampled from a unit sphere
Rendering: Used the rasterization pipeline from Part 1 - projects 3D Gaussians to 2D, sorts by depth, and alpha-composites them
Camera Sampling: Random viewpoints with azimuth ∈ [-180°, 180°], elevation ∈ [-30°, 30°], distance ∈ [2.5, 3.5]
Loss Function: SDS loss without guidance
Regularizations:
- Opacity regularization (λ=0.01)
- Scale regularization (λ=0.001)
Training: 2,000 iterations with Adam optimizer, different learning rates per parameter type

Results - "a cupcake with candle":

NeRF 360 View

Gaussian 360 View

Analysis:

Analysis of Gaussian Splatting:

Speed: NeRF took about 2 hours to render, while Gaussian splatting took 25 minutes which is a significant improvement.
Render Quality: Gaussian splatting produced high-quality specular colours, but struggled with fine details compared to NeRF.