A tool-augmented LLM that uses NCBI Web APIs to answer biomedical questions with high accuracy and reduced hallucinations.
GeneGPT is a tool-augmented large language model specifically designed for biomedical information retrieval. It enhances LLMs' ability to answer specialized biomedical questions by teaching them to use NCBI Web APIs, significantly reducing hallucinations and improving accuracy compared to general-purpose models. The system achieves state-of-the-art performance on biomedical question-answering tasks through in-context learning and a novel API call execution algorithm.
Bioinformatics researchers, computational biologists, and developers working on biomedical AI applications who need accurate, API-backed answers to specialized biological and genetic questions.
GeneGPT provides significantly higher accuracy on biomedical tasks than general LLMs or specialized biomedical models by directly integrating with authoritative NCBI databases, offering a reliable solution for information retrieval in a domain where factual correctness is critical.
Code and data for GeneGPT.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Achieves an average score of 0.83 on GeneTuring tasks, vastly outperforming models like New Bing (0.44) and BioGPT (0.04), as shown in the evaluation results.
Minimizes incorrect information by executing API calls to NCBI databases, addressing LLM challenges in specialized knowledge areas, as emphasized in the introduction.
Can handle complex queries requiring chains of API calls, demonstrated by its ability to generalize to longer sequences in multi-hop question answering.
Directly leverages NCBI Web APIs for accessing trusted biomedical databases, ensuring reliable and up-to-date information retrieval.
Requires an OpenAI API key to run with Codex, introducing ongoing costs and vendor lock-in, as specified in the setup instructions.
Only integrates with NCBI Web APIs, so it cannot handle queries requiring data from other biomedical databases or custom sources, restricting flexibility.
Evaluation results show wide variance in accuracy across tasks, from 0.44 for Human genome DNA alignment to perfect scores, indicating potential reliability issues in certain scenarios.