A Clojure library for writing map-reduce queries that compile to Apache Pig or Cascading, enabling distributed data processing with Clojure syntax.
PigPen is a Clojure library that enables developers to write distributed map-reduce queries using Clojure syntax, which then compile to Apache Pig or Cascading for execution on big data clusters. It solves the problem of bridging Clojure's expressive power with scalable data processing frameworks, allowing functional data transformations without switching languages.
Clojure developers working with large-scale data processing who want to leverage Apache Pig or Cascading without learning Pig Latin or Java APIs, and data engineers building ETL pipelines in a functional style.
Developers choose PigPen because it allows them to write distributed data processing logic in pure Clojure, with seamless local testing and debugging, while still benefiting from the scalability of proven big data frameworks like Pig and Cascading.
Map-Reduce for Clojure
Lets developers use Clojure's syntax and functions for map-reduce, avoiding the need to learn Pig Latin or Java APIs, as evidenced by functional operators like map and reduce in the API.
Includes a local mode that allows query testing with Clojure's REPL and data structures, enabling faster iteration without a cluster setup, as highlighted in the tutorial.
Compiles queries to either Apache Pig or Cascading, providing choice in execution engines based on cluster infrastructure, with separate dependencies for each backend.
Supports loading and storing data in multiple formats including Parquet, Avro, JSON, CSV, and TSV, with dedicated loaders and storage functions for ease of use.
Certain features, such as Parquet and Avro loaders, are only supported with the Pig backend, not Cascading, as noted in the README, limiting cross-backend compatibility.
Generated Pig or Cascading scripts are not intended for human consumption, making it difficult to debug or optimize at the script level, which can hinder low-level tuning.
Release history shows breaking changes, like in version 0.3.0 with API shifts, which can disrupt existing codebases and require careful migration, as detailed in the notes.
Requires familiarity with Clojure, adding a learning curve for teams not already using the language, as the README strongly recommends prior Clojure knowledge.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.