A metadata-driven data discovery and catalog platform that helps data teams find, understand, and trust their data resources.
Amundsen is an open-source data discovery and metadata platform that helps organizations catalog and search their data assets. It indexes tables, dashboards, and ML features from various sources, then provides a search interface ranked by usage to help data teams find and understand data faster. Think of it as Google search for your company's data.
Data engineers, data analysts, and data scientists in organizations with large, distributed data ecosystems who need to discover, understand, and trust data resources.
Developers choose Amundsen because it's an open-source, extensible alternative to commercial data catalogs, with a proven trackbook at companies like Lyft and Square. Its usage-based ranking and broad connector ecosystem make it particularly effective for improving data discovery at scale.
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports connectors for over 20 databases (e.g., BigQuery, Snowflake), dashboards (e.g., Tableau, Superset), and tools like Airflow, as listed in the README, enabling comprehensive metadata aggregation.
Implements a page-rank style search that surfaces frequently used tables and dashboards first, directly addressing the core pain point of data discovery efficiency.
Built as separate services (frontend, search, metadata) with pluggable backends like Neo4j and Elasticsearch, allowing teams to customize and scale components independently.
Provides detailed pages for tables, columns, and dashboards with statistics, lineage, and descriptions, as shown in the UI mockups, enhancing data understanding.
Requires managing multiple microservices, dependencies (e.g., Neo4j, Elasticsearch), and Python/Node environments, making initial setup and maintenance resource-intensive.
Metadata ingestion relies on scheduled scripts or Airflow DAGs via the databuilder library, which may not support real-time updates and requires separate orchestration.
While the frontend is built with Flask and React, significant UI changes or branding adjustments demand development effort, as no low-code configuration is highlighted.