Federated Learning Across Biobanks
Overview
First-authored preprint from the CMU x NVIDIA Federated Learning Hackathon (January 2026), which brought together 100+ participants from 27 universities, five research centers, and companies across seven countries.
Built privacy-preserving federated learning pipelines using NVIDIA FLARE and AWS infrastructure, enabling biobanks to train models across distributed datasets without centralizing sensitive health data.
Our team developed working prototypes across biomedical applications including genome-wide association analyses, cancer subtyping, polygenic risk score aggregation, and multimodal clinical prediction. All ten project pipelines were open-sourced on GitHub.