A high-performance data profiler for discovering and validating complex patterns like functional dependencies, inclusion dependencies, and association rules.
Desbordante is a high-performance data profiler that discovers and validates complex patterns in datasets, such as functional dependencies, inclusion dependencies, and association rules. It helps users uncover hidden relationships, improve data quality, and prepare data for analysis by identifying errors, duplicates, and integrity constraints.
Data scientists, data engineers, and researchers who need to perform deep data profiling, ensure data quality, or explore datasets for scientific or business insights. It is also suitable for database administrators looking to recover schema constraints.
Developers choose Desbordante for its extensive support of over 20 pattern types, high-performance dynamic algorithms, and flexible interfaces (console, Python, web). Its ability to explain validation failures and support real-world data cleaning scenarios sets it apart from simpler profiling tools.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Supports over 20 pattern types, including exact/approximate functional dependencies, inclusion dependencies, and association rules, with linked Colab notebooks for each, enabling deep data exploration.
Offers dynamic validation that incrementally updates results after data changes, providing up to several orders of magnitude speed improvements over static recomputation, as highlighted in the task definitions.
Provides console, Python bindings with pandas DataFrame integration, and a web application, allowing adaptation to various workflows, though the web app is limited in scope.
Validation tasks return not just true/false but also explanations like conflicting rows or values, aiding in debugging data quality issues, as emphasized in the pattern descriptions.
The web interface currently supports only a limited number of patterns and is described as more of an interactive demo, with time and memory limits enforced in the deployed version.
Pip installation may fail on unsupported systems, requiring manual building with specific compiler versions (e.g., GCC 10+) and Boost libraries, which can be cumbersome and error-prone.
Patterns are based on academic research, and the README admits a lack of comprehensive guides, directing users to research papers for understanding, which may deter non-expert users.
Financial data platform for analysts, quants and AI agents.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Extremely fast Query Engine for DataFrames, written in Rust
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.