Question 1

How does Desbordante compare to Great Expectations for data profiling?

Accepted Answer

Desbordante focuses on discovering complex patterns like functional and inclusion dependencies for deep data exploration, while Great Expectations is geared toward defining and testing data quality expectations in a more user-friendly way. Desbordante excels in performance with dynamic algorithms but has a steeper learning curve due to its academic origins.

Question 2

How to use Desbordante for data deduplication?

Accepted Answer

Desbordante provides a demo scenario for data deduplication using pattern discovery, such as approximate unique column combinations. You can build ad-hoc Python programs that incorporate Desbordante's algorithms, as shown in the linked Colab notebook and expert examples folder.

Question 3

What are approximate functional dependencies and how are they validated in Desbordante?

Accepted Answer

Approximate functional dependencies allow for some errors in data, using metrics like g1 or μ+. Desbordante validates them by checking if a pattern instance holds, returning explanations for failures, such as conflicting rows, with detailed examples provided in the pattern list and notebooks.

Question 4

Is Desbordante suitable for real-time data processing?

Accepted Answer

While Desbordante offers dynamic algorithms for incremental updates, it's primarily designed for batch processing on static datasets. The web app has processing limits, so for high-throughput real-time streaming, other specialized tools might be more appropriate.

Question 5

How to install Desbordante on Windows?

Accepted Answer

The README does not provide native Windows instructions, focusing on Ubuntu and macOS. For Windows, you may need to use Windows Subsystem for Linux (WSL) or build from sources with compatible toolchains, which can be complex and unsupported out-of-the-box.

Question 6

Can Desbordante handle graph data for profiling?

Accepted Answer

Yes, Desbordante supports graph functional dependencies for discovering patterns in graph data, as listed in the pattern types with references to research papers. However, this requires understanding specialized definitions and may involve additional setup.

desbordante

What is desbordante?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions