How to install Desbordante on Windows?

Desbordante's pip install primarily supports Linux and macOS; for Windows, you may need to build from source using WSL or a similar environment, as the installation section notes dependencies on GCC or Clang compilers.

Desbordante vs Great Expectations for data profiling?

Desbordante excels in discovering complex data dependencies like functional and inclusion dependencies with high-performance algorithms, while Great Expectations focuses more on rule-based data validation and testing. Choose Desbordante for in-depth pattern mining and Great Expectations for declarative quality checks.

How to use Desbordante for functional dependency discovery in Python?

Import the desbordante module, load your data into a pandas DataFrame, and use the provided functions for discovery tasks, as demonstrated in the Colab notebooks linked in the README for exact and approximate functional dependencies.

What are dynamic algorithms in Desbordante and when to use them?

Dynamic algorithms update discovery results incrementally after data changes, making them ideal for scenarios with frequent small updates, such as streaming data or iterative data cleaning, to avoid recomputing from scratch.

Can Desbordante handle large datasets efficiently?

Yes, Desbordante is built with a high-performance C++ core and optimized algorithms, but performance varies by pattern type; dynamic algorithms can significantly speed up updates, though memory and time limits may apply in the web version.

How to validate denial constraints with Desbordante?

Use the Python bindings or CLI to specify the denial constraint and dataset; Desbordante returns a boolean result and lists conflicting rows if the constraint fails, as shown in the denial constraints example notebook.

Desbordante — Data Pattern Discovery & Validation

What is Desbordante?

Desbordante is a high-performance, science-intensive data profiler. It is a tool that automatically discovers and validates a wide variety of complex patterns and dependencies within tabular data, such as functional dependencies and inclusion dependencies. It solves the problem of understanding data structure, ensuring data quality, and uncovering hidden relationships for tasks like error cleaning, schema matching, and feature engineering.

Target Audience

Data scientists, data engineers, and researchers who need to perform deep data profiling, ensure data quality, or use advanced data dependency discovery for analysis, cleaning, or machine learning preparation.

Value Proposition

Developers choose Desbordante for its unparalleled breadth of supported data patterns, high-performance C++ core, and practical multi-interface approach (CLI, Python, Web). Its unique selling point is the implementation of dynamic algorithms and complex, research-backed patterns not commonly found in other profiling tools, making it ideal for sophisticated data analysis scenarios.

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Use Cases

Best For

Discovering all functional dependencies in a database to infer primary keys and relationships
Validating specific business rules or constraints as denial constraints on a dataset
Building automated data cleaning pipelines for typo detection and deduplication
Performing exploratory data analysis to generate hypotheses from scientific or business data
Incrementally updating dependency results after small data changes using dynamic algorithms
Preparing training data for machine learning by identifying relevant features and constraints

Not Ideal For

Teams requiring a fully-featured, no-code web interface for all supported patterns
Projects needing immediate, simple data quality reports without understanding complex pattern types
Environments with restricted system dependencies where C++ compilation or specific Boost versions are not feasible
Users looking for a tool that provides pre-built, standard data profiling metrics without algorithm selection or parameter tuning

Pros & Cons

Pros

Extensive Pattern Library

Supports over 20 complex pattern types including functional dependencies, inclusion dependencies, and denial constraints, enabling deep data analysis beyond basic profiling.

High-Performance Dynamic Updates

Offers dynamic algorithms that incrementally update results after data changes, providing orders-of-magnitude speedups over static recomputation for efficient processing.

Flexible Multi-Interface Access

Provides a console CLI for basic tasks, Python bindings for integration into data pipelines, and a web app for interactive exploration, catering to diverse workflows.

Practical Data Cleaning Workflows

Includes demo scenarios for typo detection, deduplication, and anomaly detection, showing how to build real-world cleaning pipelines using discovered patterns.

Cons

Limited Web Interface

The web application only supports a subset of patterns and is described as more of an interactive demo, reducing its utility for comprehensive profiling tasks.

Complex Installation Process

Requires C++ compilation and specific Boost versions, with pip install potentially failing on unsupported systems, as noted in the installation troubleshooting.

Steep Conceptual Learning Curve

Users must familiarize themselves with complex pattern definitions, often requiring reading research papers, which can be daunting for non-experts.

Frequently Asked Questions

What is Desbordante?

Target Audience

Value Proposition

Use Cases

Best For

Discovering all functional dependencies in a database to infer primary keys and relationships
Validating specific business rules or constraints as denial constraints on a dataset
Building automated data cleaning pipelines for typo detection and deduplication
Performing exploratory data analysis to generate hypotheses from scientific or business data
Incrementally updating dependency results after small data changes using dynamic algorithms
Preparing training data for machine learning by identifying relevant features and constraints

Not Ideal For

Teams requiring a fully-featured, no-code web interface for all supported patterns
Projects needing immediate, simple data quality reports without understanding complex pattern types
Environments with restricted system dependencies where C++ compilation or specific Boost versions are not feasible
Users looking for a tool that provides pre-built, standard data profiling metrics without algorithm selection or parameter tuning

Pros & Cons

Pros

Extensive Pattern Library

Supports over 20 complex pattern types including functional dependencies, inclusion dependencies, and denial constraints, enabling deep data analysis beyond basic profiling.

High-Performance Dynamic Updates

Offers dynamic algorithms that incrementally update results after data changes, providing orders-of-magnitude speedups over static recomputation for efficient processing.

Flexible Multi-Interface Access

Provides a console CLI for basic tasks, Python bindings for integration into data pipelines, and a web app for interactive exploration, catering to diverse workflows.

Practical Data Cleaning Workflows

Includes demo scenarios for typo detection, deduplication, and anomaly detection, showing how to build real-world cleaning pipelines using discovered patterns.

Cons

Limited Web Interface

The web application only supports a subset of patterns and is described as more of an interactive demo, reducing its utility for comprehensive profiling tasks.

Complex Installation Process

Requires C++ compilation and specific Boost versions, with pip install potentially failing on unsupported systems, as noted in the installation troubleshooting.

Steep Conceptual Learning Curve

Users must familiarize themselves with complex pattern definitions, often requiring reading research papers, which can be daunting for non-experts.

Frequently Asked Questions

Desbordante

What is Desbordante?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?

Desbordante

What is Desbordante?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?