A Python CLI tool for comparing data across heterogeneous databases and data warehouses to ensure migration accuracy.
The Data Validation Tool (DVT) is an open-source Python command-line tool that compares data between different database systems and data warehouses. It solves the problem of ensuring data accuracy and consistency during migration projects by automating validation checks between source and target environments.
Data engineers, database administrators, and DevOps professionals involved in data migration, replication, or ETL pipeline validation across heterogeneous data systems.
Developers choose DVT for its extensive connector support, ability to handle large-scale validations through partitioning, and its automation-friendly CLI and YAML configuration, which replaces manual SQL comparison scripts.
Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports 15+ data sources including BigQuery, PostgreSQL, Oracle, and Snowflake, enabling cross-platform validations without custom SQL for each pair.
Automatically generates partitions for large datasets and supports distributed runs via Kubernetes or Cloud Run Jobs, handling billions of rows efficiently as documented in the scaling section.
Offers column, row, schema, and custom query validations with detailed options like group by and calculated fields, replacing manual comparison scripts.
Uses YAML/JSON config files for defining validations, making it easy to automate, version control, and repeat checks in CI/CD pipelines.
Does not support nested or complex columns for column or row validations, which can be a blocker for modern data warehouses with JSON or array fields.
Optimized for GCP services like BigQuery and Secret Manager; on-prem setups require extra configuration for endpoints and lack seamless integration with non-GCP clouds.
Row-level comparisons can cause MemoryError on large tables, forcing users to manually partition data with generate-table-partitions, adding operational complexity.