Assignment 4: 3D Gaussian Splatting and Diffusion-guided Optimization

AndrewID: kpullala

Course: 16-825 Learning for 3D Vision

Part 1: 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

Rendering Results:

3D Gaussian Rendering

Rendered scene using pre-trained 3D Gaussians

1.2 Training 3D Gaussian Representations (15 points)

Training Configuration:

Learning Rates:

  • Means: 0.00022
  • Colors: 0.03
  • Opacities: 0.00065
  • Scales: 0.005

Number of Iterations: 1000

PSNR: 27.8

SSIM: 0.91

Training Progress:

Training Progress

Training progress (top: predictions, bottom: ground truth)

Final Results:

Final Rendering

Final rendering after training completion

1.3 Extensions

1.3.1 Rendering Using Spherical Harmonics (10 points)

Implementation Notes:

Extended the rasterizer to support spherical harmonics for view-dependent effects. Implemented vectorized computation of colors from spherical harmonic coefficients and view directions.

Without Spherical Harmonics

Without Spherical Harmonics (DC only)

With Spherical Harmonics

With Spherical Harmonics (all degrees)

Without Spherical Harmonics
With Spherical Harmonics
Without Spherical Harmonics
With Spherical Harmonics

Comparison Analysis:

With spherical harmonics, we can see specular variations of shadows and colours, enhancing the overall visual quality.

Part 2: Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

Results - "a hamburger":

Hamburger without guidance

Without guidance (1000 iterations)

Hamburger with guidance

With guidance (1000 iterations)

Results - "a standing corgi dog":

Corgi without guidance

Without guidance (1000 iterations)

Corgi with guidance

With guidance (1000 iterations)

Results - "a white aeroplane":

Custom 1 without guidance

Without guidance (1000 iterations)

Custom 1 with guidance

With guidance (1000 iterations)

Results - "a red sports bike":

Custom 2 without guidance

Without guidance (1000 iterations)

Custom 2 with guidance

With guidance (1000 iterations)

2.2 Mesh Texture Optimization (15 points)

Results - "a porcelain cow statue with delicate blue floral patterns":

Mesh texture result 1

Prompt: "a porcelain cow statue with delicate blue floral patterns"

Results - "a cow with starry night sky textures":

Mesh texture result 2

Prompt: "a cow with starry night sky textures"

Results - "Military camouflage helicopter with weathered paint":

Implementation Notes:

I downloaded a new mesh object of a chopper and rendered with the above prompt. I tried more prompts but the results were not very clear for more detailed prompts.

Heli
Heli
Heli
Heli

2.3 NeRF Optimization (15 points)

Hyperparameter Settings:

  • lambda_entropy: 0.01
  • lambda_orient: 0.0001
  • latent_iter_ratio: 0.2

Results - "a standing corgi dog":

RGB rendering

Depth rendering

Results - "a cupcake with candle":

RGB rendering

Depth rendering

Results - "a headphone with leaf shape speakers":

RGB rendering

Depth rendering

2.4 Extensions

2.4.1 View-dependent Text Embedding (10 points)

Results - "a standing corgi dog":

Without view-dependent conditioning

Without view-dependent conditioning

With view-dependent conditioning

With view-dependent conditioning

Results - "a cupcake with candle":

Without view-dependent conditioning

Without view-dependent conditioning

With view-dependent conditioning

With view-dependent conditioning

Analysis:

With view dependence, we can see that the generated rendered views are consistently smooth. There are no artifacts of image replication. The colors are sharper and more vibrant when conditioned with view-dependent information.

2.4.2 Alternative 3D Representation: Gaussian Splatting (10 points)

Chosen Representation: 3D Gaussian Splatting

Implementation Details:

  • Initialization: 5,000 Gaussians with random positions sampled from a unit sphere
  • Rendering: Used the rasterization pipeline from Part 1 - projects 3D Gaussians to 2D, sorts by depth, and alpha-composites them
  • Camera Sampling: Random viewpoints with azimuth ∈ [-180°, 180°], elevation ∈ [-30°, 30°], distance ∈ [2.5, 3.5]
  • Loss Function: SDS loss without guidance
  • Regularizations:
    • Opacity regularization (λ=0.01)
    • Scale regularization (λ=0.001)
  • Training: 2,000 iterations with Adam optimizer, different learning rates per parameter type

Results - "a cupcake with candle":

NeRF 360 View

Gaussian 360 View

Gaussian 360 View

Analysis:

Analysis of Gaussian Splatting:

  • Speed: NeRF took about 2 hours to render, while Gaussian splatting took 25 minutes which is a significant improvement.
  • Render Quality: Gaussian splatting produced high-quality specular colours, but struggled with fine details compared to NeRF.