Sagar Chandrashekhar Bellad | Andrew ID: sbellad
I have referred online documentation, StackOverflow, and used GPT — not to copy the code blindly or directly, but to understand a concept or code that was new to me or that I did not fully understand.
The following learning rates produced the best performance after experimenting with multiple configurations:
| Parameter | Learning Rate | Description |
|---|---|---|
pre_act_opacities |
0.0007 | Controls transparency; smaller value stabilizes alpha updates. |
pre_act_scales |
0.010 | Determines Gaussian size; moderate learning rate for smooth shape growth. |
colours |
0.020 | Controls RGB appearance; slightly higher rate accelerates color convergence. |
means |
0.0003 | Updates 3D positions; small rate prevents instability in geometry. |
Among the tested configurations, this learning rate set produced the most stable convergence and visually accurate reconstruction. Opacity and mean updates benefited from lower learning rates to prevent flickering or instability, while higher rates for colors and scales accelerated appearance fitting.
Observations and Differences:
Explanation: Spherical harmonics let the color change with the viewing direction. Without them, the color stays fixed and looks flat. With them, lighting effects like reflections and shading are captured, making the object look more natural and detailed.
Disclaimer: All experiments follow the baseline setup from Question 1.2.2, with isotropic Gaussians and identical training parameters unless stated otherwise.
opacities: 0.0018
scales: 0.0015
colours: 0.002
means: 0.001
opacities: 0.00085
scales: 0.01
colours: 0.02
means: 0.00015
| Setup | Gaussian Type | PSNR | SSIM |
|---|---|---|---|
| Baseline | Isotropic | 16.949 | 0.639 |
| Improved | Anisotropic | 28.586 | 0.934 |
The improved setup switches from isotropic to anisotropic Gaussians, allowing each Gaussian to represent directional variation in 3D space, improving surface fidelity and material detail reconstruction. Additionally, learning rates were fine-tuned to balance colour and scale updates, preventing over-smoothing in early iterations. This led to a significant boost of ~11.6 PSNR and ~0.29 SSIM.
Additional Observation: With view-dependent text embeddings, the results look brighter and more consistent across different views. For example, in the potted plant scene, the leaves appear slightly disconnected from the pot without view-dependence, but with it, the geometry and colors stay aligned and look much more natural.