1.1. Fitting a voxel grid
1.2. Fitting a point cloud
1.3. Fitting a mesh
2.1. Image to voxel grid
2.2. Image to point cloud
2.3. Image to mesh
2.4. Quantitative comparisions
Across all thresholds, the pointcloud model achieves the highest F1, the mesh model is a close second, and the voxelgrid lags behind (e.g., at τ=0.05: point ≈80, mesh ≈73, voxel ≈62; and at stricter τ=0.03: point ≈50, mesh ≈46, voxel ≈38). This ordering is intuitive: voxelgrids quantize space (e.g., 32³), so surfaces become stair-stepped and sensitive to the chosen isovalue, which depresses precision/recall at tight thresholds and only improves when the tolerance is relaxed. Pointclouds, by contrast, predict surface samples directly and are trained with pointwise distances, avoiding discretization and connectivity constraints; they capture fine geometry that scores best under pointwise F1. Meshes start from a coarse prior (e.g., an ico-sphere) and typically include smoothness regularization, yielding clean, watertight shapes but slightly washing out thin details and concavities—hence they trail pointclouds at small thresholds while narrowing the gap as thresholds increase.
2.5. Analyse effects of hyperparams variations
I varied the mesh smoothness weight w_smooth (Laplacian regularization) while keeping everything else fixed, sweeping 0.0 → 0.05 → 0.10 → 0.30 → 0.50. The trend is clear: F1 at strict thresholds (τ=0.01–0.03) peaks around w_smooth≈0.05–0.10 and degrades on either side, while looser thresholds (τ=0.04–0.05) are comparatively stable. Intuitively, with too little smoothing (0.0) the network overfits local noise and produces crinkled surfaces—this helps recall on tiny features but hurts precision, lowering F1 at small τ. With moderate smoothing (0.05–0.10) the surface regularizes just enough to remove chatter while preserving salient high-curvature detail, giving the best F1 across τ, especially at τ=0.03 where small geometric errors matter most. With heavy smoothing (0.30–0.50), fine structures and concavities wash out; the surface becomes overly low-frequency, improving watertightness/recall but penalizing precision at tight τ, so F1 drops. Qualitatively this is visible in vertex-displacement heatmaps: as w_smooth increases, concentrates into fewer, larger regions and thin elements fade. Overall, w_smooth≈0.1 is a robust default: it balances denoising with feature retention, maximizing F1 where evaluation is strictest.
2.6. Interpret your model (Mesh deformation heatmap)
This Mesh deformation heatmap visualization provides an interpretable, side-by-side assessment of reconstruction quality. By displaying the input image alongside synchronized turntables of the ground truth and the prediction—both colored with a shared red→blue error map—it localizes deviations on the surface rather than obscuring them in aggregate metrics. The consistent colormap and synchronized view ensure that differences in thin structures, edges, and concavities are immediately apparent across azimuths. This makes systematic failure modes—e.g., oversmoothing from regularization, missed parts due to weak image cues, or scale/alignment drift—straightforward to diagnose. The result is a practical tool for directing next steps (e.g., tuning smoothness, adjusting isosurface thresholds, increasing sampling density, or augmenting the dataset) while also strengthening the clarity and credibility of qualitative results in reports and reviews.
3.3 Extended dataset for training
On the chair test set, the model trained on three classes (chair+car+plane) consistently outperforms the chair-only baseline across all thresholds: roughly 10 vs 5 at τ=0.01, 36 vs 25 at τ=0.02, 58 vs 46 at τ=0.03, 73 vs 63 at τ=0.04, and 81 vs 73 at τ=0.05. This strict dominance suggests that multi-class training acts as an effective regularizer, yielding a broader, transferable shape prior (e.g., symmetry and part cues) that boosts reconstruction fidelity without sacrificing chair detail. Qualitatively, red→blue surface-error heatmaps show fewer residuals on thin legs, backrest edges, and concavities for the 3-class model, indicating better recall of fine structures and more stable predictions across viewpoints. Overall, exposing the network to diverse categories strengthens generalization and improves chair performance rather than diluting it. """