A fast, fully-featured, and developer-friendly Clojure API for Apache Spark.
Sparkling is a Clojure library that provides a native API for Apache Spark, allowing developers to write distributed data processing jobs using idiomatic Clojure code. It solves the problem of integrating Spark's powerful big data engine with Clojure's functional programming paradigm, offering performance optimizations and a developer-friendly interface.
Clojure developers and data engineers who need to perform large-scale data processing, ETL tasks, or analytics using Apache Spark without leaving the Clojure ecosystem.
Developers choose Sparkling for its Clojure-native design, which reduces boilerplate and improves performance over generic wrappers, while providing full access to Spark's features like Spark SQL, Avro support, and JDBC connectivity.
A Clojure library for Apache Spark: fast, fully-features, and developer friendly
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses familiar Clojure functions and data structures for Spark operations, reducing boilerplate compared to Java interop, as shown in the sample code with pure Clojure predicates.
Eliminates reflection calls and preserves partitioner information, leading to faster execution and more efficient job plans, evidenced by claims of being twice as fast in release notes.
Reads from JDBC databases, Avro files, text files, and Clojure collections, offering flexibility in data ingestion without leaving the Clojure ecosystem.
Includes features like RDD autonaming from function metadata and deref support for broadcasts, enhancing debugging and unit testing, as noted in the 1.2.3 release notes.
Supports Spark SQL for structured data processing, expanding use cases to analytical queries, with added support in version 2.0.0 for Spark 2.0.
Requires ahead-of-time compilation for specific namespaces like sparkling.serialization, complicating build setup and potentially breaking REPL workflows, as admitted in the README.
Has introduced breaking changes in updates, such as in version 1.2.1 with Kryo registration overhaul, which can disrupt existing codebases and require migration efforts.
For deployment to clusters with pre-installed Spark, dependencies must be set to 'provided', adding an extra step and potential misconfiguration risks, as mentioned in the 1.1.1 notes.
As a Clojure-specific wrapper, it has a smaller community and fewer third-party resources compared to Scala or Python Spark APIs, which may slow down troubleshooting and adoption.