Question 1

How to set up Sparkling with Leiningen?

Accepted Answer

Add '[gorillalabs/sparkling "latest-version"]' to your project.clj dependencies, and ensure AOT-compilation of namespaces like sparkling.serialization as per the README, using the :aot directive. Check the getting-started repo for a sample configuration.

Question 2

Sparkling vs clj-spark: which is better for Clojure and Spark?

Accepted Answer

Sparkling is derived from clj-spark and flambo but offers performance optimizations like eliminating reflection calls and preserving partitioners, making it faster and more efficient for production workloads, though clj-spark might be simpler for basic use.

Question 3

How to read Avro files efficiently with Sparkling?

Accepted Answer

Use the built-in Hadoop-Avro reader for HDFS files, and leverage custom Avro readers to read types or records instead of maps, which improves memory consumption as highlighted in the 1.2.1 release notes.

Question 4

Does Sparkling support real-time streaming with Spark Streaming?

Accepted Answer

The README focuses on batch processing and Spark SQL; while it may allow access to Spark Streaming via underlying APIs, it's not explicitly featured, so check the documentation or source for current support.

Question 5

What are best practices for serialization in Sparkling to avoid issues?

Accepted Answer

Use Kryo registration via the Registrator type for efficiency, and ensure keys in tuples are Serializable with Java serialization, as noted in the lookup functionality to prevent task definition errors.

Question 6

How to unit test Sparkling code with broadcasts?

Accepted Answer

Leverage the deref support added in 1.2.3, which allows using anything deref-able (like atoms) instead of actual broadcasts, simplifying testing without a Spark context in unit tests.

sparklling

What is sparklling?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions