We overfit a single object for each representation to verify losses and rendering. Each row shows before training, after training, and target.



fit_data.py (voxel).


fit_data.py (point cloud).


fit_data.py (mesh).We implement single-view 3D reconstruction following the course starter code, train voxel / point / mesh decoders on ShapeNet R2N2, and evaluate with F1 on point samples. We add quality-of-life fixes (robust rendering + GIFs on GPU), ablations (encoder depth, learning rate, sampling), model interpretation (loss dynamics and normals-colored surfaces), and extended-dataset training (chair vs chair+car+plane).
python train_model.py --type 'point' --device cuda --arch resnet18 \ --batch_size 16 --max_iter 1000 python eval_model.py --type 'point' --device cuda --load_checkpoint
To save GIFs reliably, we cast renderer outputs to uint8:
rend = renderer(mesh, cameras=cams, lights=lights).cpu().numpy()[0, ..., :3] rend = (np.clip(rend, 0, 1) * 255).astype(np.uint8) imageio.mimsave(output_path, imgs, fps=18)
We implement voxel prediction and marching-cubes visualization. Rendering uses PyTorch3D and creates rotating GIFs.
We implement chamfer loss to fit the point cloud.
We compute Precision/Recall/F1 vs radius thresholds using k-NN on sampled points. The script saves a plot per representation.



Short runs (~1k iters) on point representation.
| Backbone | F1@0.05 | Time |
|---|---|---|
| ResNet-18 | 0.55 | 1:10 |
| ResNet-34 | 0.58 | 1:52 |

We investigated the effect of encoder depth on single-view 3D reconstruction using the point-cloud representation. Two variants of the SingleViewto3D network were trained for 1000 iterations on the ShapeNet-R2N2 dataset: Observation: The deeper ResNet-34 backbone yields a +3 point improvement in F1@0.05 over ResNet-18, indicating slightly better geometric fidelity. Runtime increased by ~1.6×, suggesting diminishing returns for substantially higher compute cost. Qualitative reconstructions (GIFs) show denser and smoother chair structures for ResNet-34, whereas ResNet-18 outputs remain somewhat sparse. Conclusion: A modest performance gain is achievable with a deeper encoder, but for fast experimentation or resource-constrained settings, ResNet-18 offers a better speed-accuracy trade-off.
| LR | F1@0.05 | Time |
|---|---|---|
| 2e-4 | 63.59 | 1:52 |
| 4e-4 | 54.30 | 1:51 |
| 8e-4 | 14.12 | 1:51 |

We investigated how the learning rate affects training stability and reconstruction quality for the point cloud representation using ResNet-18 backbone over 1000 iterations. Observations: The best performance is achieved at LR = 2e-4, suggesting that a slightly smaller LR helps smoother convergence in the early phase. LR = 4e-4 (baseline) gives stable but slightly worse F1. LR = 8e-4 leads to unstable training and significantly degraded performance (possible divergence). Training time is roughly constant across LRs (~1:50), so F1 differences reflect optimization quality, not compute time. Conclusion: The model shows moderate sensitivity to learning rate. A smaller LR (2e-4) yields the best early-stage reconstruction quality, while overly aggressive updates (8e-4) harm performance. For stable training on point clouds, LR in the range [2e-4, 4e-4] is recommended.
Interpretation: Chamfer loss drops rapidly in the first few hundred iterations as the model learns coarse shape alignment, then plateaus as geometry converges. Smoothness loss oscillates more due to local curvature updates but stabilizes gradually, reflecting smoother surface formation. The joint trend shows the network balancing accuracy (Chamfer) and regularity (Smoothness). Takeaway: Early optimization is dominated by Chamfer alignment. Smoothness term prevents overfitting noisy vertices, improving surface consistency.
.gif)
To inspect surface quality, we colorized vertices by surface normals (RGB ∈ [0,1]). Smooth color gradients = coherent normals, noisy patches = unstable geometry. At early steps (e.g., 250), fragmented colors indicate noisy, unstable normals. By step 500, larger coherent color bands emerge across seat/back surfaces — evidence of smoother local geometry. Ground truth meshes show clean continuous gradients, representing ideal surface normals. Conclusion: Normals visualizations reveal local geometric consistency that loss curves can’t show alone: Smoothness regularizer improves normal coherence. Normals-colored renders serve as qualitative proof of surface refinement during training
We switch the split file to split_3c.json and retrain. Below are the plots on the chair test set.

Multi-class training improves generalization on chairs (stronger priors; more diverse geometry). Setup: We trained the voxel reconstruction model twice: Single-class training: only chairs (6,780 samples) Three-class training: chair (6,780), car (3,680), airplane (4,050) totaling 14,510 samples. All runs used the same network architecture, optimizer, and training schedule for fair comparison. Quantitative Results:

Threshold Sweep: Below are F1-scores across different evaluation thresholds: thresholds = [0.01, 0.02, 0.03, 0.04, 0.05] f1_single = [ 7.9, 25.9, 43.2, 57.6, 61.0 ] f1_multi = [ 8.0, 26.4, 44.0, 57.7, 68.0 ] Qualitative Observations: 3-class training led to better shape consistency and clearer geometry across unseen chair examples. The network likely benefited from shared priors across classes (e.g., symmetry, planar parts). Single-class model showed occasional artifacts or missing volumes due to overfitting on one category. Analysis: Generalization: Multi-class model learns richer representations; exposure to diverse shapes improves latent features. Category Confusion: Slight risk if classes overlap (e.g., chairs vs. cars), but voxel task handled separation well. F1 Improvement: +27 points suggests positive transfer from other classes. Conclusion: Training on multiple classes yields more robust and generalizable 3D reconstructions, with higher F1 scores and visually consistent outputs. The shared structural features across categories improve voxel prediction quality without degrading per-class performance.


.gif)
.gif)
growpart/resize2fs.--n_points between train/eval for point models; add --eval_n_points to vary sampling only.