A Python-powered SQL lineage analysis tool that extracts source and target tables from SQL commands without deep parser knowledge.
SQLLineage is a Python-based tool that analyzes SQL statements to automatically extract data lineage—specifically identifying source and target tables. It solves the problem of manually tracing data dependencies by parsing SQL commands and visualizing lineage without requiring deep knowledge of SQL parsers.
Data engineers, analysts, and developers who need to track data flow across SQL queries for governance, debugging, or documentation purposes.
Developers choose SQLLineage because it simplifies lineage extraction with a user-friendly interface, supports multiple SQL dialects, and provides both table and column-level analysis without the complexity of raw parser manipulation.
SQL Lineage Analysis Tool powered by Python
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports multiple SQL dialects like ANSI, Hive, and SparkSQL, enabling accurate parsing of non-standard syntax, as shown in the example where SparkSQL correctly handles 'INSERT OVERWRITE' while others fail.
Traces column dependencies across queries for granular data flow analysis, evidenced by the detailed column lineage output that maps paths from source to target columns.
Enhances lineage accuracy by connecting to databases via SQLAlchemy, resolving ambiguities like wildcards and unqualified columns, as demonstrated with the sqlite example improving results.
Generates interactive DAG visualizations in a web browser, making complex lineage graphs easy to interpret visually without manual diagramming.
Column-level lineage is incomplete without database metadata, leading to ambiguous results for wildcards or unqualified columns, as the README admits this limitation upfront.
Leveraging metadata integration requires configuring SQLAlchemy connections and schema management, adding operational overhead compared to basic command-line usage.
Cannot analyze dynamic SQL or runtime-generated queries, restricting its use to pre-written SQL statements or files rather than live database environments.