16-825 Assignment 4

1. 3D Gaussian Splatting

1.1 3D Gaussian Rasterization (35 points)

1.1.1 Project 3D Gaussians to Obtain 2D Gaussians

1.1.2 Evaluate 2D Gaussians

1.1.3 Filter and Sort Gaussians

1.1.4 Compute Alphas and Transmittance

1.1.5 Perform Splatting

1.2 Training 3D Gaussian Representations (15 points)

1.2.1 Setting Up Parameters and Optimizer

1.2.2 Perform Forward Pass and Compute Loss

Submission: In your webpage, include the following details:

Learning rates that you used for each parameter. If you had experimented with multiple sets of learning rates, just mention the set that obtains the best performance in the next question.

Number of iterations that you trained the model for.

The PSNR and SSIM.

Both the GIFs output by train.py.

gaussians.pre_act_opacities: lr = 0.05
gaussians.pre_act_scales: lr = 0.05
gaussians.colours: lr = 0.05
gaussians.means: lr = 0.001
1000 iterations
Mean PSNR: 29.383
Mean SSIM: 0.937

1.3 Extensions (Choose at least one! More than one is extra credit)

1.3.1 Rendering Using Spherical Harmonics (10 Points)

Submission: In your webpage, include the following details:

Attach the GIF you obtained using render.py for questions 1.3.1 (this question) and 1.1.5 (older question).

Attach 2 or 3 side by side RGB image comparisons of the renderings obtained from both the cases. The images that are being compared should correspond to the same view/frame.

For each of the side by side comparisons that are attached, provide some explanation of differences (if any) that you notice.

Comparisons

without spherical harmonics with spherical harmonics

The details and shades are slightly more refined with spherical harmonics.

without spherical harmonics with spherical harmonics

The render with spherical harmonics have more realistic shading and finer details, whereas the one without retains the same shading pattern similar to the previous image, despite having different view angles.

without spherical harmonics with spherical harmonics

Both cases appear quite blurry and show little noticeable difference.

2. Diffusion-guided Optimization

2.1 SDS Loss + Image Optimization (20 points)

"a hamburger"

Without guidance (400 iterations) With guidance (700 iterations; guidance scale=15)
"a standing corgi dog"

Without guidance (2000 iterations) With guidance (1600 iterations; guidance scale=10)
"a flying cucumber"

Without guidance (2000 iterations) With guidance (1600 iterations; guidance scale=10)
"a renaissance painting"

Without guidance (2000 iterations) With guidance (2000 iterations; guidance scale=100)

2.2 Texture Map Optimization for Mesh (15 points)

B&W cow Orange bull

"a black and white cow" (guidance=15) "a cow with zebra stripes" (guidance=100)

2.3 NeRF Optimization (15 points)

"a standing corgi dog"

B&W cow Orange bull

RGB depth

"a tv monitor"

B&W cow Orange bull

RGB depth

"a table"

B&W cow Orange bull

RGB depth

2.4 Extensions (Choose at least one! More than one is extra credit)

2.4.1 View-dependent text embedding (10 points)

"a standing corgi dog"

Before incorporating view dependence, the corgi has two faces, which is a common artifact caused by the Stable Diffusion SDS loss lacking sufficient context to distinguish between front and side views. Introducing view dependence resolved this problem.

rgb

"Without view-dependent" "With view-dependent

depth

"Without view-dependent" "With view-dependent

"a tv monitor"

In this case, incorporating view dependence didn’t alleviate the multi-face artifact. The monitor still resembles a lamp, because none of the viewpoints sufficiently capture its flat surface.

rgb

"Without view-dependent" "With view-dependent

depth

"Without view-dependent" "With view-dependent