A high-performance, geo-distributed, and federated open data catalog for unified metadata management across diverse data and AI assets.
Apache Gravitino is an open-source data catalog that acts as a federated metadata lake. It unifies metadata management across diverse data sources and AI assets, providing high-performance, geo-distributed access and governance. It solves the problem of fragmented metadata in complex, multi-cloud, and hybrid data architectures.
Data engineers, platform architects, and organizations managing large-scale, distributed data lakes and warehouses who need unified metadata governance and discovery. It's also for teams integrating multiple query engines like Trino and Spark with consistent metadata access.
Developers choose Gravitino for its direct metadata integration that reflects changes in real-time, its ability to federate metadata across regions and clouds without engine modifications, and its comprehensive governance features including access control and auditing for both data and AI assets.
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Manages diverse metadata sources like Hive, MySQL, HDFS, and S3 through a single API, simplifying governance and discovery across disparate systems.
Changes in underlying systems are immediately reflected via connectors, ensuring metadata consistency without manual updates or delays.
Seamlessly integrates with query engines like Trino and Spark without modifying SQL dialects, enabling plug-and-play access to federated metadata.
Enables metadata sharing across regions and clouds, supporting global architectures and hybrid multi-cloud setups for distributed data environments.
AI asset management is listed as a work in progress (WIP) in the README, meaning it lacks full functionality for comprehensive AI model lineage and tracking.
Building from source is not supported on Windows, restricting development and deployment options for teams using Windows-based environments.
The recommended quick start involves Docker Compose and manual configuration, which can be daunting for new users and requires infrastructure expertise.