Showing 20 of 20 projects
An orchestration platform for developing, deploying, and monitoring data pipelines and assets.
Generate comprehensive data quality profiles and exploratory data analysis reports for Pandas and Spark DataFrames with a single line of code.
Generate comprehensive data quality profiling and exploratory data analysis reports for Pandas and Spark DataFrames with a single line of code.
A transformation tool that enables data analysts and engineers to transform data using software engineering best practices.
A transformation workflow that enables data teams to transform data in their warehouse using SQL and software engineering best practices.
A unified open-source metadata platform for data discovery, observability, and governance with column-level lineage and team collaboration.
An open-source data-centric AI library for automatically detecting and fixing data quality issues in machine learning datasets.
A Python library for data quality testing and validation using expressive, extensible Expectations.
An open-source Python framework to evaluate, test, and monitor ML and LLM systems with 100+ built-in metrics.
An open-source feature store for managing and serving machine learning features for training and online inference.
An open-source tool that transforms object storage into a Git-like repository for versioned, atomic, and repeatable data lake operations.
A Python library using machine learning for accurate and scalable fuzzy matching, record deduplication, and entity resolution on structured data.
A flexible and expressive API for performing statistical data validation on dataframe-like objects.
A Python library for visualizing missing data in pandas DataFrames using matrix, bar, heatmap, and dendrogram plots.
A library built on Apache Spark for defining unit tests to measure data quality in large datasets.
Fast tool for comparing datasets within or across SQL databases to identify differences.
Automatically visualize any dataset with a single line of code, including data quality assessment and fixes.
A Go library for email verification without sending emails, featuring syntax validation, SMTP checks, disposable email detection, and domain typo suggestions.
A Python library that automatically extracts schema, statistics, and sensitive entities (PII/NPI) from datasets.
A unified data pipeline tool for ingestion, transformation with SQL/Python/R, and data quality checks across major platforms.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.