by ylchen
The following GIF shows the rendered views of the pre-trained 3D Gaussians (output of render.py).
q1_render.gifTraining progress and final render GIFs produced by train.py are below.
q1_training_progress.gif
q1_training_final_renders.gifComparison between view-independent (DC only) and full spherical harmonics renderings.
Explanation of differences: SH adds view-dependent lighting, the cushion has additional shadows that are indicative of its depth/shape. The non-SH render is more flat (due to it being lighting-direction agnostic w/ no shading variation.)
Each prompt is trained with and without guidance. Below are the prompt-image pairs for four examples (including two given prompts).
Final textured meshes for two different text prompts using SDS loss on a fixed cow mesh.
Rendered RGB and depth videos for three prompts (one “standing corgi dog” and two custom ones).
Tuned λ_entropy=3e-3, λ_orient=1e-2, and latent_iter_ratio=0.25.
Comparison between standard and view-dependent conditioning for two prompts. View-dependent embeddings improve 3D consistency.
some individual view-dependent examples.