Learning for 3D Vision Hw4¶
1. 3D Gaussian Splatting¶
1.1 3D Gaussian Rasterization¶

1.2 Training 3D Gaussian Representations¶
The results
- PSNR: 29.326
- Mean SSIM: 0.939
are achieved with
parameters = [
{'params': [gaussians.pre_act_opacities], 'lr': 0.05, "name": "opacities"},
{'params': [gaussians.pre_act_scales], 'lr': 0.005, "name": "scales"},
{'params': [gaussians.colours], 'lr': 0.0025, "name": "colours"},
{'params': [gaussians.means], 'lr': 1.6e-4, "name": "means"},
]
and training with 1000 iterations and


1.3.2 Training On a Harder Scene¶
The result is achieved using learning rate decay on the means, a weighted SSIM loss with weight 0.2, Gaussian means initialized from a uniform distribution with side length 1.8 and opacity 0.1, anisotropic Gaussians, and 8000 training iterations.
| Method | progress GIF | PSNR | SSIM |
|---|---|---|---|
| Baseline | ![]() |
18.253 | 0.669 |
| Improved | ![]() |
22.479 | 0.693 |
2. Diffusion-guided Optimization¶
2.1 SDS Loss + Image Optimization¶
| a hamburger (400, 1000 iters) | a standing dog (400, 1000 iters) | a fancy sports car (400, 1000 iters) | a cute cat (400, 1000 iters) | |
|---|---|---|---|---|
| w. guidance | ![]() |
![]() |
![]() |
![]() |
| w.o. guidance | ![]() |
![]() |
![]() |
![]() |
2.2 Texture Map Optimization for Mesh¶
| a dotted black and white cow | an orange golden bull | a cow painted in the style of storms and lightning | a cow painted in the style of Vangogh |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
2.3 NeRF Optimization¶
| a standing corgi dog | a fancy sports car | a cute cat |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2.4 Extensions¶
2.4.1 View-dependent text embedding¶
| a standing corgi dog | a fancy sports car | a cute cat |
|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
2.4.2 Other 3D representatio¶
I've tried using SDS loss on 3D gaussian splating, to make the render output more smooth, the TV (total varience) loss is applied on rendered images.
Compared to NeRF, which is a naturally continuous representation, 3DGS is relatively discrete, and many noisy signals are produced by the unstable SDS loss. However, if the training recipe can be improved, I think 3DGS may achieve superior quality. One cool thing is the Janus effect, the corgi has multiple heads, which helps it minimize the SDS loss.
| a standing corgi dog |
|---|
![]() |
![]() |
2.4.3 Variation of implementation of SDS loss¶
The pixel-space SDS is implemented by predicting the clean latent, decoding it back to the image space, and using it as the target to compute the L2 and LPIPS losses. The weights for different noise levels in each loss are determined by the alpha cumulative product.
The image quality is comparable but slightly worse. But the trainig time increase from ~4800s to ~18000s, about 4 times longer. The quality is worse because it is harder to optimize when we directly ask the image to become the clean image.
| a standing corgi dog |
|---|


































