How do I run ClawBio skills with my own genome data?

Use the CLI command 'python clawbio.py run [skill] --input [your_file]' or the Python API 'run_skill()', which auto-detects formats like 23andMe or VCF files. Skills like PharmGx Reporter can process consumer genetic data in seconds, as shown in the demo examples.

ClawBio vs Galaxy: which is better for bioinformatics analysis?

ClawBio excels in local, reproducible analyses with AI agent integration and specification-driven correctness, while Galaxy is a web-based platform with a vast toolset but less focus on agent compatibility. ClawBio's Galaxy Bridge skill allows chaining both, enabling hybrid workflows.

Can I use ClawBio in a cloud environment or Docker?

Yes, but it's designed for local-first execution; cloud deployment requires manual setup, and Docker isn't natively supported—skills depend on local Conda environments and external tools, which may complicate containerization.

How to add a custom skill to ClawBio?

Follow the CONTRIBUTING.md guide: copy SKILL-TEMPLATE.md, implement Python code with demo data and tests, then submit a PR. Community contributions like NutriGx Advisor show this process in action, with support via Telegram for contributors.

What are the system requirements for running ClawBio?

Requires Python 3.10+, core dependencies in requirements.txt, and for some skills, additional tools like Conda for metagenomics. The demo uses the Corpas genome, but user data can be any compatible format, though storage needs vary by skill.

Is ClawBio suitable for clinical diagnostics or medical use?

No, it's for research and educational purposes only; the demo genome is CC0 licensed, but skills are not validated for clinical settings and should not be used for medical decisions, as highlighted in the reference genome section.

ClawBio

MITPythonv0.5.2

A bioinformatics-native AI agent skill library for reproducible, local-first genomic analysis, built on OpenClaw.

Visit Website

What is ClawBio?

ClawBio is the first bioinformatics-native AI agent skill library. It provides a collection of executable, specification-constrained skills for genomic analysis—such as pharmacogenomic reporting, GWAS lookup, and polygenic risk scoring—that run locally and ensure reproducibility. The project solves the problem of irreproducible bioinformatics by encoding expert decisions into versioned contracts, so AI agents can orchestrate analyses correctly without improvising from training data.

Target Audience

Bioinformaticians, genomic researchers, and AI agent developers who need reproducible, local-first analysis pipelines that integrate seamlessly with AI coding assistants like Claude Code or Telegram bots.

Value Proposition

Developers choose ClawBio because it guarantees correctness and reproducibility through specification-first skills, keeps sensitive genomic data local, and works agent-agnostically across any AI platform—all while being open-source and community-driven.

Overview

🦖 ClawBio - The first bioinformatics-native AI agent skill library. Local-first. Reproducible. Open. Free.

Use Cases

Best For

Generating pharmacogenomic reports from consumer genetic data (e.g., 23andMe files)

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

GitHub

1.1k stars231 forks0 contributors

Performing federated variant queries across multiple genomic databases simultaneously

Building reproducible analysis pipelines with built-in provenance tracking

Integrating bioinformatics tools into AI agent workflows (e.g., Telegram bots, Claude Code)

Running local, privacy-focused genomic analyses without cloud dependencies

Teaching or demonstrating reproducible research practices in bioinformatics

Not Ideal For

Projects requiring instant, cloud-based genomic analysis with no local setup, as ClawBio mandates local execution and lacks built-in cloud deployment
Teams needing graphical user interfaces or point-and-click tools, since interaction is limited to CLI, Python API, or AI agents like Telegram bots
Environments where rapid prototyping with minimal dependency management is critical, due to the need to clone the repository and manually install external tools for skills like metagenomics
Applications demanding real-time, collaborative editing of live analyses, as ClawBio focuses on reproducible, static output bundles rather than interactive workflows

Pros & Cons

Pros

Specification-First Correctness

Skills encode expert bioinformatics decisions in versioned SKILL.md files, preventing LLM hallucination—for example, the PharmGx skill accurately applies CPIC guidelines to avoid misclassifying alleles like CYP2D6*4.

Local Data Privacy

Genomic data never leaves the user's machine, as emphasized in the philosophy, ensuring privacy by avoiding cloud uploads or data exfiltration in sensitive health analyses.

Comprehensive Reproducibility

Every analysis exports a reproducibility bundle with commands.sh, environment.yml, and SHA-256 checksums, enabling exact reproduction of results without relying on the original author, as detailed in the provenance section.

Broad Skill Coverage

With 46+ skills spanning pharmacogenomics to scRNA-seq, plus integration with 8,000+ Galaxy tools via the Galaxy Bridge, it offers extensive bioinformatics functionality out of the box.

Validation Benchmarks

Systematic validation infrastructure includes ground truth benchmarks, mock APIs, and 74+ tests, such as the AD gene benchmark and swappable fine-mapping pipelines, ensuring skill reliability.

Cons

Manual Installation Overhead

There is no pip package yet; users must clone the git repository and install dependencies manually, which adds setup time compared to one-command installs, as noted in the quick start.

External Tool Dependencies

Skills like metagenomics require external bioinformatics tools (Kraken2, RGI) that need separate installation, complicating setup and potentially limiting portability.

Early Development Volatility

As version 0.5.0, the project is in active development with planned features and potential breaking changes, which might affect production stability for long-term projects.

Domain-Specific Focus

It is solely focused on bioinformatics, lacking general-purpose data analysis features, making it less suitable for interdisciplinary projects without genomic components.

Frequently Asked Questions

Home

Computational Biology

BioGPT

BioGPT is a generative pre-trained transformer model specifically designed for biomedical text generation and mining. It leverages large-scale biomedical literature to understand and generate domain-specific text, enabling advanced natural language processing applications in healthcare and life sciences. ## Key Features - **Biomedical Pre-training** — Trained on PubMed abstracts and articles for domain-specific language understanding. - **Text Generation** — Generates coherent biomedical text, such as research summaries or hypothesis descriptions. - **Relation Extraction** — Identifies relationships between biomedical entities like drug-target interactions. - **Question Answering** — Answers biomedical questions based on contextual knowledge from literature. - **Document Classification** — Classifies biomedical documents into relevant categories. - **Hugging Face Integration** — Available through the transformers library for easy deployment and experimentation. ## Philosophy BioGPT focuses on bridging the gap between general-purpose language models and domain-specific needs by providing a model that understands the nuances and terminology of biomedical literature.

Stars4,489

Forks481

Last commit2 years ago

GeneGPT

Code and data for GeneGPT.

Stars428

Forks34

Last commit1 year ago

GenePT

GenePT is a foundation model for single-cell biology that leverages ChatGPT embeddings of NCBI gene descriptions to perform gene-level and cell-level tasks. It offers an efficient alternative to traditional models that require extensive data curation and resource-intensive training from gene expression profiles. ## Key Features - **Gene Embeddings** — Uses GPT-3.5 embeddings of NCBI gene summary texts to represent genes. - **Cell Embeddings** — Generates single-cell embeddings by averaging gene embeddings weighted by expression or creating sentence embeddings from ordered gene names. - **Efficient Approach** — Eliminates the need for dataset curation and additional pre-training, making it user-friendly. - **Competitive Performance** — Achieves comparable or superior performance to existing single-cell foundation models in tasks like gene property classification and cell type annotation. - **Pre-computed Data** — Provides readily available datasets including extracted NCBI gene summaries and pre-computed OpenAI embeddings. ## Philosophy GenePT demonstrates that using large language model embeddings of scientific literature is a straightforward and effective approach for developing biological foundation models, complementing traditional expression-based methods.

Stars321

Forks47

Last commit2 years ago

MolT5

Associated Repository for "Translation between Molecules and Natural Language"

Stars194

Forks20

Last commit2 years ago

#reproducible-research

Computational Biology122