The HuBMAP Data Portal is the public face of the Human BioMolecular Atlas Program — the place where the data actually lands and where the broader research community can access it. This preprint describes the portal’s architecture, capabilities, and current scale.
As of October 2025, the portal holds 5,032 datasets spanning 22 data types across 27 organ classes from 310 donors. That’s not a static archive: it’s a queryable, visualizable, analysis-ready resource.
What the Portal Does
The portal goes well beyond file storage. Key capabilities:
- Integrated Jupyter workspaces — run analysis directly against portal data without downloading it
- Interactive visualization for over 1,500 datasets
- Standardized processing pipelines — all data processed through consistent workflows so results are comparable across labs and technologies
- Metadata-driven search — find datasets by organ, donor demographics, assay type, or molecular target
- Community contributions — datasets from external labs can be deposited and made publicly available
- Bulk download for large-scale computational studies
What Makes This Hard
Building a data portal for a program like HuBMAP is not a straightforward engineering problem. The data is heterogeneous — 22 distinct data types, from bulk RNA-seq to multiplexed imaging to spatial transcriptomics — and the scale is significant. Ensuring that datasets deposited by different labs, using different instruments and protocols, are processed consistently and remain interoperable requires continuous pipeline development and infrastructure maintenance.
My work at the Pittsburgh Supercomputing Center contributes directly to that infrastructure. PSC is a core node of the HuBMAP consortium, and the computational resources and engineering that PSC provides underpin the portal’s ability to store, process, and serve data at this scale.