An open-source metadata service for collecting, aggregating, and visualizing data lineage and ecosystem metadata.
Marquez is an open-source metadata service that collects, aggregates, and visualizes metadata for data ecosystems. It solves the problem of tracking data lineage and understanding dependencies between datasets and jobs, providing visibility into data workflows and centralizing dataset lifecycle management.
Data engineers, data platform teams, and organizations managing complex data pipelines who need to track data provenance, monitor job performance, and ensure data governance.
Developers choose Marquez for its adherence to the OpenLineage standard, offering a vendor-neutral, interoperable solution for metadata collection with a rich Web UI for lineage visualization and comprehensive APIs for integration.
Collect, aggregate, and visualize a data ecosystem's metadata
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Adheres to the OpenLineage specification for vendor-neutral interoperability, as emphasized in the project philosophy, ensuring compatibility with a growing ecosystem of data tools.
Offers a Web UI with an interactive lineage graph for exploring job-dataset dependencies visually, demonstrated in the quickstart demo GIF, enhancing data discovery.
Provides HTTP API, GraphQL (beta), and administrative endpoints, allowing flexible integration and querying of metadata, as detailed in the documentation.
As an LF AI & Data Foundation graduated project with continuous integration badges and community support, it shows strong maintenance and evolution potential.
The GraphQL endpoint is marked as beta in the README, indicating it may lack stability or full feature parity, posing a risk for production use.
The HTTP API does not require authentication or authorization by default, as noted in the README, necessitating extra configuration for secure deployments.
Requires Java 17, PostgreSQL 14, and manual configuration of marquez.yml, which can be cumbersome compared to managed alternatives, as outlined in the building and configuration steps.