Learning for 3D Vision Hw4¶

1. 3D Gaussian Splatting¶

1.1 3D Gaussian Rasterization¶

123

1.2 Training 3D Gaussian Representations¶

The results

  • PSNR: 29.326
  • Mean SSIM: 0.939

are achieved with

parameters = [
    {'params': [gaussians.pre_act_opacities], 'lr': 0.05, "name": "opacities"},
    {'params': [gaussians.pre_act_scales], 'lr': 0.005, "name": "scales"},
    {'params': [gaussians.colours], 'lr': 0.0025, "name": "colours"},
    {'params': [gaussians.means], 'lr': 1.6e-4, "name": "means"},
]

and training with 1000 iterations and

121

122

1.3 Extensions¶

1.3.1 Rendering Using Spherical Harmonics¶

SH 123
old 123

Comparison

SH 123 123 123
old 123 123 123
Diff The pad are darker for SH The back are darker for SH The pad are darker for SH

SH one can show different color when we view from different camera angle.

1.3.2 Training On a Harder Scene¶

The result is achieved using learning rate decay on the means, a weighted SSIM loss with weight 0.2, Gaussian means initialized from a uniform distribution with side length 1.8 and opacity 0.1, anisotropic Gaussians, and 8000 training iterations.

Method progress GIF PSNR SSIM
Baseline 123 18.253 0.669
Improved 123 22.479 0.693

2. Diffusion-guided Optimization¶

2.1 SDS Loss + Image Optimization¶

a hamburger (400, 1000 iters) a standing dog (400, 1000 iters) a fancy sports car (400, 1000 iters) a cute cat (400, 1000 iters)
w. guidance 123 123 123 123
w.o. guidance 123 123 123 123

2.2 Texture Map Optimization for Mesh¶

a dotted black and white cow an orange golden bull a cow painted in the style of storms and lightning a cow painted in the style of Vangogh
123 123 123 123

2.3 NeRF Optimization¶

a standing corgi dog a fancy sports car a cute cat
123 123 123
123 123 123

2.4 Extensions¶

2.4.1 View-dependent text embedding¶

a standing corgi dog a fancy sports car a cute cat
123 123 123
123 123 123

2.4.2 Other 3D representatio¶

I've tried using SDS loss on 3D gaussian splating, to make the render output more smooth, the TV (total varience) loss is applied on rendered images.

Compared to NeRF, which is a naturally continuous representation, 3DGS is relatively discrete, and many noisy signals are produced by the unstable SDS loss. However, if the training recipe can be improved, I think 3DGS may achieve superior quality. One cool thing is the Janus effect, the corgi has multiple heads, which helps it minimize the SDS loss.

a standing corgi dog
123
123

2.4.3 Variation of implementation of SDS loss¶

The pixel-space SDS is implemented by predicting the clean latent, decoding it back to the image space, and using it as the target to compute the L2 and LPIPS losses. The weights for different noise levels in each loss are determined by the alpha cumulative product.

The image quality is comparable but slightly worse. But the trainig time increase from ~4800s to ~18000s, about 4 times longer. The quality is worse because it is harder to optimize when we directly ask the image to become the clean image.

a standing corgi dog
123
123