A graph database framework for storing and querying large-scale graphs with rich properties and in-database aggregation.
Gaffer is a graph database framework built for storing and querying very large graphs with rich properties on nodes and edges. It solves the challenge of handling massive, evolving graph datasets by offering high-performance ingest, automatic in-database aggregation of statistical properties, and flexible querying capabilities. The framework is designed to be scalable and extensible, supporting backends like Accumulo and integration with Apache Spark for advanced analysis.
Data engineers and developers working with large-scale graph data who need to store, aggregate, and query complex entity-relationship datasets with rich properties. It is particularly suited for organizations dealing with high-velocity data ingest and requiring built-in data governance features.
Developers choose Gaffer for its ability to automatically aggregate rich statistical properties directly in the database, its support for high-throughput data ingest, and its flexibility in storage backends and querying. Its extensible design and integration with tools like Apache Spark make it a powerful solution for prototyping and production-scale graph data applications.
A large-scale entity and relation database supporting aggregation of properties
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Supports very large graphs with rich properties on nodes and edges using backends like Accumulo, enabling storage of complex datasets as highlighted in the README.
Allows continual high-rate ingest and batch processing via MapReduce or Spark, making it suitable for evolving data streams, per the feature list.
Provides user-configurable aggregation of statistical properties such as counts and histograms directly in the database, reducing post-processing overhead.
Includes fine-grained access controls, policy hooks for compliance, and automated data removal rules, enhancing security and compliance as described.
The project is no longer under active maintenance, as stated in the README, meaning no updates, bug fixes, or security patches, posing significant risks.
Requires setting up Java, Maven, and storage backends like Accumulo, with dependencies on the Hadoop ecosystem and limited Windows support, increasing setup complexity.
Significant effort needed to define schemas and aggregation rules, which can be a barrier to entry for rapid prototyping, as implied by the documentation.
Focuses on storage and aggregation rather than providing built-in graph algorithms, often necessitating integration with Spark for advanced analytics.