Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Hadoop
  3. PigPen

PigPen

Apache-2.0Clojure

A Clojure library for writing map-reduce queries that compile to Apache Pig or Cascading, enabling distributed data processing with Clojure syntax.

GitHubGitHub
565 stars51 forks0 contributors

What is PigPen?

PigPen is a Clojure library that enables developers to write distributed map-reduce queries using Clojure syntax, which then compile to Apache Pig or Cascading for execution on big data clusters. It solves the problem of bridging Clojure's expressive power with scalable data processing frameworks, allowing functional data transformations without switching languages.

Target Audience

Clojure developers working with large-scale data processing who want to leverage Apache Pig or Cascading without learning Pig Latin or Java APIs, and data engineers building ETL pipelines in a functional style.

Value Proposition

Developers choose PigPen because it allows them to write distributed data processing logic in pure Clojure, with seamless local testing and debugging, while still benefiting from the scalability of proven big data frameworks like Pig and Cascading.

Overview

Map-Reduce for Clojure

Use Cases

Best For

  • Writing ETL pipelines in Clojure for Hadoop clusters
  • Processing large datasets with functional transformations
  • Migrating existing Pig or Cascading jobs to a Clojure codebase
  • Testing map-reduce logic locally before deploying to production
  • Integrating Parquet or Avro data formats into Clojure applications
  • Building data processing workflows with a REPL-driven development cycle

Not Ideal For

  • Teams requiring direct editing or optimization of generated Pig or Cascading scripts
  • Projects built around Apache Spark or other modern big data frameworks
  • Organizations without existing Clojure expertise
  • Applications needing real-time data processing

Pros & Cons

Pros

Clojure-Native Query Writing

Lets developers use Clojure's syntax and functions for map-reduce, avoiding the need to learn Pig Latin or Java APIs, as evidenced by functional operators like map and reduce in the API.

Local Development and Testing

Includes a local mode that allows query testing with Clojure's REPL and data structures, enabling faster iteration without a cluster setup, as highlighted in the tutorial.

Backend Flexibility

Compiles queries to either Apache Pig or Cascading, providing choice in execution engines based on cluster infrastructure, with separate dependencies for each backend.

Data Format Integration

Supports loading and storing data in multiple formats including Parquet, Avro, JSON, CSV, and TSV, with dedicated loaders and storage functions for ease of use.

Cons

Partial Backend Support

Certain features, such as Parquet and Avro loaders, are only supported with the Pig backend, not Cascading, as noted in the README, limiting cross-backend compatibility.

Opaque Generated Code

Generated Pig or Cascading scripts are not intended for human consumption, making it difficult to debug or optimize at the script level, which can hinder low-level tuning.

Breaking Changes

Release history shows breaking changes, like in version 0.3.0 with API shifts, which can disrupt existing codebases and require careful migration, as detailed in the notes.

Clojure Dependency

Requires familiarity with Clojure, adding a learning curve for teams not already using the language, as the README strongly recommends prior Clojure knowledge.

Frequently Asked Questions

Quick Stats

Stars565
Forks51
Contributors0
Open Issues19
Last commit3 years ago
CreatedSince 2013

Tags

#clojure#big-data#data-pipelines#data-processing#etl#distributed-computing#map-reduce

Built With

P
Parquet
C
Clojure
R
RxJava
A
Avro

Included in

Machine Learning72.2kHadoop1.1k
Auto-fetched 6 hours ago

Related Projects

tech.ml.datasettech.ml.dataset

A Clojure high performance data processing system

Stars746
Forks34
Last commit6 days ago
LipstickLipstick

Pig Visualization framework

Stars467
Forks135
Last commit3 years ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub