Part 1: 3D Gaussian Splatting
1.1 3D Gaussian Rasterization (35 points)
Rendering Results:
Rendered scene using pre-trained 3D Gaussians
1.2 Training 3D Gaussian Representations (15 points)
Training Configuration:
Learning Rates:
- Means: 0.00022
- Colors: 0.03
- Opacities: 0.00065
- Scales: 0.005
Number of Iterations: 1000
PSNR: 27.8
SSIM: 0.91
Training Progress:
Training progress (top: predictions, bottom: ground truth)
Final Results:
Final rendering after training completion
1.3 Extensions
1.3.1 Rendering Using Spherical Harmonics (10 points)
Implementation Notes:
Extended the rasterizer to support spherical harmonics for view-dependent effects. Implemented vectorized computation of colors from spherical harmonic coefficients and view directions.
Without Spherical Harmonics (DC only)
With Spherical Harmonics (all degrees)
Comparison Analysis:
With spherical harmonics, we can see specular variations of shadows and colours, enhancing the overall visual quality.
Part 2: Diffusion-guided Optimization
2.1 SDS Loss + Image Optimization (20 points)
Results - "a hamburger":
Without guidance (1000 iterations)
With guidance (1000 iterations)
Results - "a standing corgi dog":
Without guidance (1000 iterations)
With guidance (1000 iterations)
Results - "a white aeroplane":
Without guidance (1000 iterations)
With guidance (1000 iterations)
Results - "a red sports bike":
Without guidance (1000 iterations)
With guidance (1000 iterations)
2.2 Mesh Texture Optimization (15 points)
Results - "a porcelain cow statue with delicate blue floral patterns":
Prompt: "a porcelain cow statue with delicate blue floral patterns"
Results - "a cow with starry night sky textures":
Prompt: "a cow with starry night sky textures"
Results - "Military camouflage helicopter with weathered paint":
Implementation Notes:
I downloaded a new mesh object of a chopper and rendered with the above prompt. I tried more prompts but the results were not very clear for more detailed prompts.
2.3 NeRF Optimization (15 points)
Hyperparameter Settings:
- lambda_entropy: 0.01
- lambda_orient: 0.0001
- latent_iter_ratio: 0.2
Results - "a standing corgi dog":
RGB rendering
Depth rendering
Results - "a cupcake with candle":
RGB rendering
Depth rendering
Results - "a headphone with leaf shape speakers":
RGB rendering
Depth rendering
2.4 Extensions
2.4.1 View-dependent Text Embedding (10 points)
Results - "a standing corgi dog":
Without view-dependent conditioning
Without view-dependent conditioning
With view-dependent conditioning
With view-dependent conditioning
Results - "a cupcake with candle":
Without view-dependent conditioning
Without view-dependent conditioning
With view-dependent conditioning
With view-dependent conditioning
Analysis:
With view dependence, we can see that the generated rendered views are consistently smooth. There are no artifacts of image replication. The colors are sharper and more vibrant when conditioned with view-dependent information.
2.4.2 Alternative 3D Representation: Gaussian Splatting (10 points)
Chosen Representation: 3D Gaussian Splatting
Implementation Details:
- Initialization: 5,000 Gaussians with random positions sampled from a unit sphere
- Rendering: Used the rasterization pipeline from Part 1 - projects 3D Gaussians to 2D, sorts by depth, and alpha-composites them
- Camera Sampling: Random viewpoints with azimuth ∈ [-180°, 180°], elevation ∈ [-30°, 30°], distance ∈ [2.5, 3.5]
- Loss Function: SDS loss without guidance
- Regularizations:
- Opacity regularization (λ=0.01)
- Scale regularization (λ=0.001)
- Training: 2,000 iterations with Adam optimizer, different learning rates per parameter type
Results - "a cupcake with candle":
NeRF 360 View
Gaussian 360 View
Analysis:
Analysis of Gaussian Splatting:
- Speed: NeRF took about 2 hours to render, while Gaussian splatting took 25 minutes which is a significant improvement.
- Render Quality: Gaussian splatting produced high-quality specular colours, but struggled with fine details compared to NeRF.