A biomedical knowledge graph integrating 20 resources to describe 17,080 diseases with over 4 million relationships across ten biological scales.
PrimeKG (Precision Medicine Knowledge Graph) is a biomedical knowledge graph that integrates data from 20 resources to describe 17,080 diseases with over 4 million relationships across ten biological scales. It connects diseases with genes, drugs, phenotypes, and clinical information to enable data-driven research in precision medicine. The project provides ready-to-use datasets and tools for building and updating the knowledge graph.
Bioinformaticians, medical researchers, and data scientists working on precision medicine, drug discovery, or disease modeling who need integrated biomedical data for analysis.
PrimeKG offers a uniquely comprehensive and clinically relevant knowledge graph that bridges multiple biological scales, with multimodal data and extensive coverage of diseases, including rare conditions. Its integration of diverse resources and ready-to-use format saves researchers time in data preparation.
Precision Medicine Knowledge Graph (PrimeKG)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes over 17,000 diseases, including rare ones, optimized for clinical relevance, as highlighted in the README's unique features section.
Augments disease and drug nodes with clinical descriptors from authorities like Mayo Clinic and Orphanet, enabling richer analyses for precision medicine.
Provides pre-processed CSV files for easy download from Harvard Dataverse, saving researchers significant time in data preparation.
Includes scripts to process 20 primary resources and build updated versions, supporting long-term usability as detailed in the July 2023 update.
Building an updated graph requires running multiple scripts and notebooks, some of which are noted as potentially out-of-date or needing fixes, as seen in the update details.
The knowledge graph is static between releases, which may not suit applications requiring the latest biomedical discoveries or real-time data integration.
Individual datasets have varying licenses, requiring users to navigate legal restrictions for commercial use, as mentioned in the license section.