Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Computational Biology
  3. PrimeKG

PrimeKG

MITJupyter Notebook

A biomedical knowledge graph integrating 20 resources to describe 17,080 diseases with over 4 million relationships across ten biological scales.

Visit WebsiteGitHubGitHub
786 stars152 forks0 contributors

What is PrimeKG?

PrimeKG (Precision Medicine Knowledge Graph) is a biomedical knowledge graph that integrates data from 20 resources to describe 17,080 diseases with over 4 million relationships across ten biological scales. It connects diseases with genes, drugs, phenotypes, and clinical information to enable data-driven research in precision medicine. The project provides ready-to-use datasets and tools for building and updating the knowledge graph.

Target Audience

Bioinformaticians, medical researchers, and data scientists working on precision medicine, drug discovery, or disease modeling who need integrated biomedical data for analysis.

Value Proposition

PrimeKG offers a uniquely comprehensive and clinically relevant knowledge graph that bridges multiple biological scales, with multimodal data and extensive coverage of diseases, including rare conditions. Its integration of diverse resources and ready-to-use format saves researchers time in data preparation.

Overview

Precision Medicine Knowledge Graph (PrimeKG)

Use Cases

Best For

  • Integrating heterogeneous biomedical data for precision medicine research
  • Building machine learning models for drug discovery or disease prediction
  • Analyzing relationships between diseases, genes, and drugs at scale
  • Studying rare diseases with comprehensive clinical and molecular data
  • Creating knowledge graph embeddings for biomedical entity relationships
  • Developing data pipelines for updated biomedical knowledge graphs

Not Ideal For

  • Projects requiring real-time or frequently updated biomedical data for dynamic applications
  • Teams without dedicated bioinformatics or data engineering resources to manage complex script-based updates
  • Applications needing interactive, graph-based query interfaces out-of-the-box without additional development
  • Research focused narrowly on a single data source where simpler, specialized datasets would suffice

Pros & Cons

Pros

Comprehensive Disease Coverage

Includes over 17,000 diseases, including rare ones, optimized for clinical relevance, as highlighted in the README's unique features section.

Multimodal Clinical Integration

Augments disease and drug nodes with clinical descriptors from authorities like Mayo Clinic and Orphanet, enabling richer analyses for precision medicine.

Ready-to-Use Datasets

Provides pre-processed CSV files for easy download from Harvard Dataverse, saving researchers significant time in data preparation.

Extensive Update Scripts

Includes scripts to process 20 primary resources and build updated versions, supporting long-term usability as detailed in the July 2023 update.

Cons

Complex Update Pipeline

Building an updated graph requires running multiple scripts and notebooks, some of which are noted as potentially out-of-date or needing fixes, as seen in the update details.

Static Data Limitations

The knowledge graph is static between releases, which may not suit applications requiring the latest biomedical discoveries or real-time data integration.

Licensing Fragmentation

Individual datasets have varying licenses, requiring users to navigate legal restrictions for commercial use, as mentioned in the license section.

Frequently Asked Questions

Quick Stats

Stars786
Forks152
Contributors0
Open Issues6
Last commit1 day ago
CreatedSince 2022

Tags

#biomedical-data#clinical-informatics#data-science#data-integration#knowledge-graph#graph-machine-learning#nlp-machine-learning#precision-medicine#medical-research#bioinformatics#graph-database#dataset

Built With

p
pip
C
Conda
J
Jupyter
p
pandas
P
Python

Links & Resources

Website

Included in

Computational Biology122
Auto-fetched 3 minutes ago

Related Projects

Hetionet: an integrative network of diseaseHetionet: an integrative network of disease

Hetionet: an integrative network of disease

Stars355
Forks77
Last commit3 years ago
Drug Mechanism Database (DrugMechDB)Drug Mechanism Database (DrugMechDB)

A database of paths that represent the mechanism of action from a drug to a disease in an indication.

Stars75
Forks24
Last commit29 days ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub