An open-source data catalog tool that integrates into CI systems to test downstream impacts of data changes, preventing pipeline and dashboard breaks.
Grai is an open-source data catalog and lineage tool that automatically maps how data flows across databases, warehouses, APIs, and BI dashboards. It integrates into CI/CD systems to run downstream impact tests on data changes, preventing modifications that could break data pipelines or business intelligence reports from reaching production.
Data engineers, analytics engineers, and data platform teams who need to maintain data quality, understand dependencies, and automate testing of data changes across complex data stacks.
Developers choose Grai because it provides automated, column-level data lineage with pre-built connectors, integrates directly into existing CI/CD workflows for proactive testing, and is fully self-hostable, giving teams complete control over their metadata and deployment environment.
Grai is an open-source data lineage and catalog tool designed to help data teams understand and test how data flows across their entire stack. It automatically builds column-level lineage from databases, warehouses, BI tools, and transformation layers, then integrates this metadata into CI/CD workflows to validate changes before they reach production.
Grai believes data teams should have full control over their metadata and be able to test data changes with the same rigor as code changes, preventing broken pipelines and dashboards before they impact production.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Automatically sync lineage from popular sources like Snowflake, BigQuery, and dbt with simple pip install commands, as shown in the connectors table, reducing integration effort.
Runs data validation tests in GitHub Actions workflows to prevent breaking changes from reaching production, aligning data changes with code deployment practices.
Provides detailed visibility into column dependencies across systems, not just tables, enabling precise impact analysis for data transformations.
Fully open-source and deployable on your own infrastructure using Docker, Kubernetes, or Helm, giving teams full sovereignty over metadata and deployment.
Some connectors, like Looker, are marked as alpha, indicating they may be unstable or feature-incomplete, which could hinder adoption for those tools.
Deployment requires managing Docker, Kubernetes, or Helm, which can be resource-intensive and challenging for teams without dedicated DevOps expertise.
Connectors are installed via pip and built in Python, which may not align with teams using other programming languages, adding a learning curve or integration overhead.