The NIH funds 18 Common Fund programs — each generating valuable datasets, each with its own data formats, metadata standards, and portals. Individually, they’re useful. Together, they could be transformative. The Common Fund Data Ecosystem (CFDE) was built to make that integration real.
This preprint describes the evolution, architecture, and practical outcomes of CFDE: a collaborative infrastructure that links Common Fund programs and makes their data findable, accessible, and reusable across program boundaries.
The Problem CFDE Solves
Common Fund programs span genomics, imaging, metabolomics, proteomics, and more. Each program was built independently, with its own conventions. Researchers wanting to ask questions that cross program boundaries — combining, say, HuBMAP spatial data with metabolomics from another program — face significant friction: different portals, different APIs, incompatible metadata.
CFDE addresses this by building shared infrastructure on top of the programs rather than replacing them.
What CFDE Provides
- Cross-program search and discovery — find datasets across all 18 Common Fund programs through a unified interface
- Integrated metadata — a common data model that links records across programs without forcing each program to abandon its own standards
- Integrative tools — cross-dataset queries and analysis workflows that span program boundaries
- Processed data access — not just raw files but standardized processed data for direct use
- Training programs — lowering the barrier to adoption for researchers unfamiliar with multi-program data
My Connection
PSC is a core contributor to HuBMAP, one of the 18 Common Fund programs integrated through CFDE. My work on the computational infrastructure and data engineering side of HuBMAP feeds directly into the broader CFDE ecosystem. Making data from HuBMAP — and from the other 17 programs — genuinely interoperable is the kind of infrastructure work that doesn’t always get visibility, but enables an enormous amount of downstream science.