Publication
Authors
Morgan L. Turner, Thomas C. Smits, Tiffany S. Liaw, Brendan Honick, Bill Shirey, Ivan Cao-Berg, Alex Ropelewski, Joel Welling, Philip Blood, Jonathan Silverstein, Nils Gehlenborg, et al.
Journal
arXiv (2025)

The HuBMAP Data Portal is the public face of the Human BioMolecular Atlas Program — the place where the data actually lands and where the broader research community can access it. This preprint describes the portal’s architecture, capabilities, and current scale.

As of October 2025, the portal holds 5,032 datasets spanning 22 data types across 27 organ classes from 310 donors. That’s not a static archive: it’s a queryable, visualizable, analysis-ready resource.


What the Portal Does

The portal goes well beyond file storage. Key capabilities:

  • Integrated Jupyter workspaces — run analysis directly against portal data without downloading it
  • Interactive visualization for over 1,500 datasets
  • Standardized processing pipelines — all data processed through consistent workflows so results are comparable across labs and technologies
  • Metadata-driven search — find datasets by organ, donor demographics, assay type, or molecular target
  • Community contributions — datasets from external labs can be deposited and made publicly available
  • Bulk download for large-scale computational studies

What Makes This Hard

Building a data portal for a program like HuBMAP is not a straightforward engineering problem. The data is heterogeneous — 22 distinct data types, from bulk RNA-seq to multiplexed imaging to spatial transcriptomics — and the scale is significant. Ensuring that datasets deposited by different labs, using different instruments and protocols, are processed consistently and remain interoperable requires continuous pipeline development and infrastructure maintenance.

My work at the Pittsburgh Supercomputing Center contributes directly to that infrastructure. PSC is a core node of the HuBMAP consortium, and the computational resources and engineering that PSC provides underpin the portal’s ability to store, process, and serve data at this scale.