Question 1

How to ingest metadata from Snowflake into DataHub?

Accepted Answer

Use the DataHub CLI with a YAML recipe specifying Snowflake connection details; the README provides a full example that extracts schemas, lineage, and usage stats via the Python SDK. Ensure you have the Snowflake connector installed via 'pip install acryl-datahub[snowflake]'.

Question 2

DataHub vs OpenMetadata: which metadata catalog is better for my team?

Accepted Answer

DataHub excels in real-time streaming and AI agent integration via MCP, making it ideal for large-scale, dynamic environments. OpenMetadata might be simpler for batch-based workflows with built-in data quality features. Consider your need for real-time updates versus ease of setup.

Question 3

What are the hardware requirements for running DataHub in production?

Accepted Answer

Self-hosted deployments require significant resources: at least 8GB RAM for Docker quickstart, plus additional capacity for Kafka, Elasticsearch, and MySQL in Kubernetes setups. The README recommends Helm charts for production, implying infrastructure overhead.

Question 4

Is DataHub free to use in a commercial environment?

Accepted Answer

Yes, under the Apache 2.0 license, DataHub allows commercial use, modification, and distribution without cost. However, managed SaaS options (DataHub Cloud) are paid, and some advanced features may have commercial ties.

Question 5

How does DataHub integrate with AI agents like Claude or Cursor?

Accepted Answer

Via the Model Context Protocol (MCP) server, which you can set up with 'npx -y @acryldata/mcp-server-datahub init'; this allows natural language queries to metadata, as shown in the README's demo gif and configuration examples.

Question 6

Can DataHub be deployed on AWS ECS or only Kubernetes?

Accepted Answer

While Kubernetes (Helm) is the recommended production deployment, the README mentions Docker-based options; AWS ECS is possible but requires manual orchestration of services like Kafka and Elasticsearch, lacking out-of-the-box support compared to Helm charts.

DataHub

What is DataHub?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions