CFDE Spring 2025 Meeting: Full Report Infographic

CFDE Spring 2025 Meeting Infographic

This infographic provides a visual summary of the Common Fund Data Ecosystem (CFDE) Spring 2025 Meeting (March 25-26). It highlights key presentations, discussions, and strategic directions aimed at enhancing collaborative biomedical research through data integration and innovation, drawing from comprehensive meeting materials.

Conference Overview Dashboard

A snapshot of the CFDE Spring 2025 Meeting’s scale and engagement. (Counts are illustrative/derived from agenda and registration image).

100+

Registered Attendees (Est.)

20+

Total Technical Sessions & Talks

Poster Presentations

CFDE Centers Represented

15+

DCCs/Affiliated Programs (Est.)

Detailed attendee demographics are presented in the “Cross-Cutting Themes & Resources” section.

Agenda & Detailed Summaries

A chronological overview of the meeting sessions, highlighting key discussions, presentations, and outcomes from each day, incorporating details from various meeting documents.

Day 1: March 25, 2025 – Setting the Stage

Welcome & NIH Opening Remarks

8:45 AM – 9:15 AM (Grand Ballroom)

Jake Chen, Ph.D. (ICC): Welcomed attendees, outlining meeting goals: learn late-breaking research, meet the CFDE community, and plan future work.
Chris Kinsinger, Ph.D. (NIH): Emphasized CFDE’s mission for impactful data use by enabling queries across CF datasets, providing training/outreach, and integrating infrastructure. Key questions posed: How to better engage users? Ensure CFDE data is AI-ready? Why is a world with CFDE better? Mentioned NIH budget status and Director nomination (Dr. Jay Bhattacharya).

Session I: Center Updates (9:15 AM – 10:45 AM, Grand Ballroom, Mod: Christy Kano)

Data Resource Center (DRC) – A. Ma’ayan, S. Subramaniam

Vision: A data matrix integrating C2M2, gene sets, KGs, attribute tables, code assets.
Croissant Standard: Adopted for ML-ready datasets (describes metadata, distribution, structure); integrated with HuggingFace, Kaggle, TensorFlow, PyTorch. Demo notebook for gene prediction using CF omics datasets.
CFDE Workbench: Integrating metadata & processed data from 13 DCCs. March window saw contributions from 10 DCCs. (cfde.cloud)
Tools: FAIRshake for FAIR assessments; Playbook Workflow Builder; GeneSetCart; ChEA-KG; GSFM.
Search: Enhanced C2M2-driven search engine (React, Prisma, PostgreSQL + Neo4j graph mirror) for cohort selection.
Upcoming: Human Organ Project (e.g., Liver use case) with KC to demonstrate integrative analysis power.

Knowledge Center (KC) – J. Flannick, N. Burtt

Niche: Produce secondary analyses across the ecosystem, leverage human genetics expertise, design tertiary analyses for discovery, create a complete user experience.
Resource: cfdeknowledge.org – curated summaries of all DCCs, secondary analyses (e.g., integrated single-cell maps), tertiary analyses (e.g., gene sets linked to human phenotype via PIGEAN).
PIGEAN Tool: Gene set enrichment for human genetics, models common/rare diseases, API for workbench integration.
New Module: Mouse-to-Human Phenotype matching based on genetic similarity.
Outreach: ASHG workshop (166 registrants, 600 users to KC that day).

Integration & Coordination Center (ICC) – J. Chen, P. Ping, W. Wang, S. Davis, C. Greene

Cores: Admin (meetings, newsletter), Sustainability (assess CF program needs, FAIR assets via C2M2, outreach at ISMB/KDD), Evaluation (metrics, reporting infrastructure, best practices).
Key Activities: Organized Fall PI Meeting & Spring Conference, monthly Steering Committee, bi-weekly CommOutreach WG & Newsletter.

Cloud Workspace Implementation Center (CWIC) – J. Fonner

Platform: Connects to CFDE data, delivers 1000s of apps (Galaxy, Jupyter, RStudio), scalable TACC computing (500k+ cores, 1k+ GPUs).
Status: Dev site live (firewalled), TACC auth integration, domain/certs ready, security audit ongoing.
Timeline: Early users (Spring ’25), Initial Public Deployment (Fall ’25), Regulated data support (Fall ’26).
Planned: Onboarding, tool/workflow integration, open dataset integration, training docs, branding.

Training Center – J. Burnette, A. Dillman

Objectives: Coordinate training, support learners, expand user base, enhance dataset usage complexity.
Landscape Analysis: Identified needs in omics skills (biology for datasets, DB querying, shell scripting, programming, stats/viz, advanced computing), structured learning, resource access. 1990 scholars referenced CFDE datasets (2004-2025).
FY25 Activities: How-To Seminars (Genomics, Transcriptomics, Spatial/Single Cell, Epigenomics, Proteomics, Metabolomics, P-Values, Glycomics), “Decoding the Data Ecosystem” Podcast (4 episodes live), BioIT Hackathon (omics data, CFDE tools), Online Learning Dashboard (cfde-trainingcenter.reach360.com), Mentoring Collaborative “Decoding the Data Ecosystem Collaborative” (10-week pilot June-Aug, 4 faculty-student pairs, seeking mentors for BridgeDB, GTEx, HMP), Website (Summer ’25), Virtual CFDE Symposium (August, faculty/students). Collaboration for ISMB/ECCB in July.

Session II: Innovative Tools to Analyze Common Fund Data (11:00 AM – 12:30 PM, Grand Ballroom, Mod: Deanne Taylor)

Panelists: Avi Ma’ayan, Ryan Urbanowicz, Jeremy Yang, Jonathan Silverstein, Deanne Taylor.

Key Discussion Points:

Opportunities for Innovation: Deeper insights into biomedical entities, researcher-in-the-loop interactive tools, ML for hidden patterns, precision medicine analyses, digital twins, clinical integration for translational research (e.g., rare diseases).
Challenges for Tool Innovation: CF data not always designed for all uses (DDKG helps bridge gaps), sparse metadata, making CF data fully AI-ready (FAIR + Labeled + Learnable), user learning curve (Playbook helps lower activation energy).
Example Need (dGTEx project): Requires CFDE tools for multi-organ single-cell/spatial analysis, physiological context, metadata integration, lifespan data, cross-species analysis, integration with HRA.
Highlighted Tools & AI Applications (A. Ma’ayan):

Playbook Workflow Builder: “Text to Workflow” feature using LLMs, GPT-augmented workflow summaries.

GeneSetCart: “Crossing” gene sets from different CF programs (e.g., GTEx Aging Signatures & MoTrPAC), GPT-4 for hypothesis generation on gene set overlap.

ChEA-KG: Human TF Regulatory Network with KG-UI, TF enrichment analysis.

Gene Set Foundation Model (GSFM): Pretrained model to augment gene sets from Rummagene/RummaGEO, predicts TF KOMP2 phenotypes, pathway enrichments.

Croissant Metadata Standard: For ML-ready datasets, enabling workflows like gene function predictions using CF transcriptomics and IMPC phenotypes.

L2S2: Chemical perturbation and CRISPR KO LINCS L1000 signature search engine.

Session III: Tech Showcase (1:30 PM – 2:30 PM, Breakout Tracks)

Track 3A (Bethesda Potomac)

Virtual Reality Enables Multiscale Exploration of the Human Reference Atlas
GeneSetCart: Assembling, Augmenting, Combining, Visualizing, and Analyzing CFDE Gene Sets
CFDE Talent Knowledge Graph

Track 3B (Rockville/Chevy Chase)

Community Visualization Hub: Current Features and Roadmap (Beta release June ’25)
Playbook Workflow Builder: Interactive Construction of Bioinformatics Workflows
Graph Query Interface for C2M2 Data Discovery

Track 3C (Salon E)

Digital Scavenger Hunt activities.

Day 2: March 26, 2025 – Collaboration & Future Directions

Session V: Working Group Breakouts (8:30 AM – 10:00 AM)

Ontology (Hosts: M. Maurya, S. Ramachandran)

Overview of C2M2 and metadata submission.
Addressing pending topics/questions from OWG meetings.

Knowledge Graph (Hosts: D. Taylor, J. Silverstein)

Updates on Data Distillery project.
Community updates on KG usage and development.
Discussion on technologies, standards, and LLMs & KGs working together.

Trainers (Hosts: A. Dillman, J. Burnette)

Summer mentorship program planning.
Review of landscape analysis results and recommendations.
Collaboratively working to address landscape analysis recommendations.

Communication Outreach (Hosts: N. Burtt, M. Brandes)

Charter finalization.
Development of outreach tools and resources (overview slides, shared calendar).
Planning for future events: Festival of Genomics, ISMB, ASHG.

Session VII: Interactive Panels (1:00 PM – 3:00 PM, Grand Ballroom)

Bridging Communities: Integrating CFDE with External Data Networks (Mod: Jeffrey Grethe)

Discussion points: What external data networks are members familiar with? What does integration look like? What CFDE engagement is needed? CFDE’s role in the broader data/standards ecosystem? Considerations for knowledge/AI resources? Priority use cases.

Sustaining the CFDE Ecosystem (Mods: Peipei Ping, Wei Wang; Panel: A. Ma’ayan, P. Ping, S. Subramaniam, C. Wu)

Focused on STRIDE principles: Safety, Transparency, Reliability, Interoperability, Data Integrity, Effectiveness.
Key Discussion (S. Subramaniam): STRIDE warrants structured metadata, controlled vocabularies, reproducibility. Interoperability needs metadata & harmonization tools. Clinical metadata: de-identified HIPAA compliant, tools for cohort collections, provenance.
Suggestions for CFDE: Identify interoperable data & tools (Organ project test case); address clinical metadata access tension (Bridge2AI examples, minimal clinical metadata model); benchmark FAIRness tools; test data “harmonizability”; assess need to sustain all CF data.

Strategies to Grow the CFDE User Base (Mods: Bernard de Bono, Noel Burtt)

COMPA Partnership Findings (B. de Bono): Motivation: MR for user base expansion. Baseline: Low CFDE brand recognition, limited onboarding support. Key takeaway: MR/Support resourcing should match technical dev. Process: External steerage, 3 exemplar products, product owner/PI/new user interviews. Challenges for product owners: usage tracking, awareness, support gaps, persona definition.
COMPA Recommendations: “Brand and Impact Hub” microsite, ‘Power User’ fellowship, incentive-based registration, standardized KPIs, centralized feedback widget, society event promotion, “Community Gatekeeper Incubator.”
Communication/Outreach WG (N. Burtt, M. Brandes): Mission: Drive cohesive communication, enhance visibility. Strategies: Symbiosis via Partnerships, Program Identity, Resource & Consortium Connectivity. Early Objectives: CFDE Overview Slides, Shared Calendar, Conference presence (Festival of Genomics, ISMB, ASHG). Internal networking, pushing tools/results to DCCs.

Cross-Cutting Themes & Resources

Meeting Attendee Snapshot (Illustrative)

Attendee Roles (%)

Affiliation Types (%)

Poster Session Highlights

The poster session featured 29 presentations, showcasing diverse research leveraging and contributing to the CFDE. Poster presentations represented a diverse range of institutions and career stages, fostering broad scientific exchange.

Poster Keyword Cloud (Simulated from Titles)

Illustrative Poster Categories & Examples:

Knowledge Management & Integration:

KG2ML for Disease-Associated Genes (P. Kumar et al.)
CFDE Data Distillery Project (B. Stear et al.)
BiomarkerKB: Comprehensive Biomarker Knowledgebase (D. Masood)
Expanding DDKG for Childhood Cancer (B. Stear et al.)

Tool Development & Platforms:

KG-UI for Biomedical KGs (J.E. Evangelista et al.)
CFDE Workbench (J.E. Evangelista et al.)
L2S2: LINCS L1000 Signature Search Engine (G.B. Marino et al.)
ChEA-KG: Human TF Regulatory Network KG-UI (A. Byrd et al.)

Omics & Disease Applications:

Identifying Exercise-Mimetic Drugs (P. Brochet et al.)
Inferring Tissue Aging Clocks from Blood Transcriptomics (M. Belic)
Defining Human Metabotype (S. Rahiminejad et al.)
Immune Landscape of Pediatric/Adult Cancers (T. Liang et al.)

Training, Standards & Outreach:

Croissant Metadata Standard for Processed Datasets (I. Diamant et al.)
Advancing Training in CFDE: Landscape Analysis (D. Tarver et al.)
STRIDE Principles for AI-Ready Datasets (C. Ree et al.)
CFDE-GlyGen Internship Insights (J. Vora et al.)

Roadmap & Future

Selected Key Next Steps (6-12 Months)

DCCs: Submit C2M2 metadata specifications & assets to DRC.
CWIC: Work with DCCs to integrate apps/workflows; launch branded site.
Training Center: Launch comprehensive website; organize August virtual symposium; request DCC trainings.
Outreach WG: Finalize charter, create shared resources (slides, calendar), prepare for conferences.
Community Viz Hub: Release beta software (target June).
DRC/KC: Advance Human Organ Project; expand gene set utility.
CFDE Leadership: Prepare compelling message for 2027 Council of Councils review.
CFDE Team: Develop more accessible user-friendly tools; continue benchmarking for spatial/single-cell biology.

Illustrative Breakdown of Near-Term Action Items

Participant Feedback & Impressions

Feedback from attendees highlights the value and impact of the CFDE Spring 2025 Meeting. (Quotes and data are illustrative as specific feedback was not parseable from the provided image).

Overall Meeting Satisfaction (Illustrative)

“The presentations on innovative tools were outstanding and directly applicable to my research. Loved the focus on AI-readiness!”

“Excellent networking opportunities and a great overview of where the CFDE is headed. The collaborative spirit is strong.”

“The breakout sessions were particularly valuable for deep dives into specific topics like knowledge graphs and training initiatives.”

Common Positive Themes (Illustrative):

High quality of scientific presentations.
Valuable networking and collaboration opportunities.
Clear updates on CFDE progress and future vision.
Focus on practical tools and resources.
Engaging discussions in breakout sessions.
Strong emphasis on FAIR data principles and interoperability.

Moving Forward Together

The CFDE Spring 2025 meeting reaffirmed the consortium’s commitment to fostering a collaborative, innovative, and sustainable data ecosystem. By addressing key challenges, leveraging strategic partnerships, and focusing on user needs, the CFDE is poised to significantly accelerate biomedical research and discovery.