A high-performance data profiler for discovering and validating complex patterns in datasets, enabling data cleaning and quality analysis.
Desbordante is a high-performance, science-intensive data profiler. It is a tool that automatically discovers and validates a wide variety of complex patterns and dependencies within tabular data, such as functional dependencies and inclusion dependencies. It solves the problem of understanding data structure, ensuring data quality, and uncovering hidden relationships for tasks like error cleaning, schema matching, and feature engineering.
Data scientists, data engineers, and researchers who need to perform deep data profiling, ensure data quality, or use advanced data dependency discovery for analysis, cleaning, or machine learning preparation.
Developers choose Desbordante for its unparalleled breadth of supported data patterns, high-performance C++ core, and practical multi-interface approach (CLI, Python, Web). Its unique selling point is the implementation of dynamic algorithms and complex, research-backed patterns not commonly found in other profiling tools, making it ideal for sophisticated data analysis scenarios.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports over 20 complex pattern types including functional dependencies, inclusion dependencies, and denial constraints, enabling deep data analysis beyond basic profiling.
Offers dynamic algorithms that incrementally update results after data changes, providing orders-of-magnitude speedups over static recomputation for efficient processing.
Provides a console CLI for basic tasks, Python bindings for integration into data pipelines, and a web app for interactive exploration, catering to diverse workflows.
Includes demo scenarios for typo detection, deduplication, and anomaly detection, showing how to build real-world cleaning pipelines using discovered patterns.
The web application only supports a subset of patterns and is described as more of an interactive demo, reducing its utility for comprehensive profiling tasks.
Requires C++ compilation and specific Boost versions, with pip install potentially failing on unsupported systems, as noted in the installation troubleshooting.
Users must familiarize themselves with complex pattern definitions, often requiring reading research papers, which can be daunting for non-experts.