Andrew ID: kpullala
F1 score comparison across different 3D representations.
The F1-score comparison demonstrates that the model achieved nearly identical high performance (approaching an F1-score of 80) regardless of whether the output was represented as a Voxel, Point Cloud, or Mesh. The strong correlation between F1-score and Threshold across all plots indicates robust model learning, where the specific shape complexity doesn't critically stress the expressive power of any one format.
Efficiency and Scalability
While the final metric is similar, the Point Cloud representation offers distinct practical advantages. It is the most computationally efficient and scalable, as its memory and time complexity scale linearly with the number of points (O(N)). This contrasts sharply with the Voxel representation, which scales cubically (O(N
3
)), making high-resolution reconstructions infeasible due to extreme memory demands.
Point Clouds also excel at detail capture due to their adaptive, non-grid-bound nature, enabling effective training via sophisticated, differentiable losses like the Chamfer Distance. The Mesh representation, while offering explicit topology, involves more complex training pipelines and loss functions.
Effects of varying hyperparameters on model performance.
Methodology: I varied the point cloud density by adjusting the number of points predicted by the model. Initially, it was 1000 and later increased to 2000.
With density as 2000, the F1 Score increased by 1.5% compared to the 1000 density configuration. My analysis suggests that higher point cloud density allows for better representation of complex geometries, leading to improved model performance. But, in this case, I couldn't train for the same number of steps due to increased computational requirements.
Methodology: I varied the point cloud density by adjusting the number of points as done above, and also experimented with different weights for the Chamfer Distance loss function. I made chamfer weight to 0.8 and increased smooth factor to 0.4.
As seen previously, increasing the points showed slightly improved F1 scores, indicating better model performance with higher point cloud density. But, by changing the Chamfer Distance weight and smooth factor, there was not much improvement observed. This suggested that slight change in chamfer weight does not significantly impact the overall performance. I plan to further investigate this by changing these parameters more drastically.
Voxel model analysis
This visualization uses Monte Carlo Dropout to assess prediction uncertainty in a single-view 3D reconstruction task by comparing the model's state at 1,000 steps (under-trained) and 60,000 steps (well-trained).
1,000 Steps (Under-Trained, Left):
60,000 Steps (Well-Trained, Right):
The evolution from 1,000 to 60,000 steps demonstrates proper model calibration:
Decoder that predicts occupancy values from 3D coordinates and image features.
Architecture:
1. Multi-patch MLP system with multiple independent decoders (e.g., 10 patches)
Each patch: 2D latent (u,v) → 4-layer MLP (512 units) → 3D coordinates (x,y,z)
Batch normalization and ReLU activations throughout
2. Trained on airplane mesh from ShapeNet dataset
Sampling Strategy:
Random (u,v) coordinates sampled from [0,1] uniform distribution
Points divided equally across all patch decoders
Training: Chamfer loss against target point cloud samples
Inference: Generate points by sampling random 2D coordinates through trained decoders
Preprocessing: Point clouds centered and normalized to unit sphere
This approach allows multiple surface patches to collectively represent complex 3D shapes like aircraft with wings and fuselage.
Code can be found in `bonus_3_2.py`.
| Training Setup | F1 Score (Chair) |
|---|---|
| Single Class (Chair) | 76.2 % |
| Three Classes | 63.0 % |
Quantitative Observations: F1 score decreased for chairs but was very high for aeroplanes (about 86.0 %). My understanding is that when we add more classes and train it for similar number of steps, the model may struggle to fit to the data due to increased complexity. Since chairs are harder to learn, the models is clearly struggling to learn the representation in the same number of steps.
Loss Curve Observations: The loss curve for the multi-class model shows a more erratic pattern compared to the single-class model, indicating that the multi-class model is having difficulty converging. It can also be seen that the loss for single class converges very quickly.
Qualitative Observations: The drop in F1 score is also reflective of the visualization differences between the single-class and multi-class models. The single-class model appears to have a more focused and accurate representation of chairs, while the multi-class model struggles with ambiguity and misclassification.
Conclusions: Multiple objects with diversity require more training data and potentially more complex models to achieve similar performance to a model trained on a single class.