Question 1

How to integrate Amundsen with Apache Airflow for metadata ingestion?

Accepted Answer

Use the amundsendatabuilder library to create Airflow DAGs that extract metadata from sources; the README provides example DAGs in the databuilder/example/dags directory. This allows scheduled, automated metadata updates into Amundsen's search and metadata services.

Question 2

Amundsen vs DataHub: which is better for a data catalog?

Accepted Answer

Amundsen excels with its usage-based search ranking and proven scalability at companies like Lyft, while DataHub offers a more unified architecture and stronger real-time capabilities. Choose Amundsen if prioritizing search relevance based on usage patterns, but evaluate both for specific integration needs.

Question 3

Does Amundsen support real-time metadata updates?

Accepted Answer

No, Amundsen primarily relies on batch ingestion via databuilder scripts or Airflow, so metadata updates are not real-time. For near-real-time needs, you'd need to implement custom frequent ingestion jobs, which isn't out-of-the-box.

Question 4

Can Amundsen connect to cloud data warehouses like Snowflake or Databricks?

Accepted Answer

Yes, Amundsen has built-in connectors for Snowflake, BigQuery, and Databricks SQL, as listed in the supported integrations. It uses dbapi or sql_alchemy interfaces, making it compatible with most major databases.

Question 5

What are the hardware requirements for running Amundsen in production?

Accepted Answer

Amundsen requires separate services for frontend, search (Elasticsearch), and metadata (Neo4j or Apache Atlas), so production needs depend on data volume. The README specifies Python >=3.8 and Node v12, but plan for significant memory and CPU for scaling.

Question 6

How customizable is the search algorithm in Amundsen?

Accepted Answer

The search service uses Elasticsearch with a default page-rank style ranking; customization requires modifying the search library code or Elasticsearch configurations, as there's no simple UI for tuning rankings without development effort.

Amundsen

What is Amundsen?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions