Question 1

How do I install Flambo with Leiningen for Spark 2.x?

Accepted Answer

Add [yieldbot/flambo "0.8.2"] to your project.clj dependencies and include Spark core in the :provided profile, as per the Installation section. Remember to AOT compile namespaces that require flambo.api for REPL use or deployment.

Question 2

Flambo vs clj-spark: which is better for Clojure and Spark?

Accepted Answer

Flambo is a fork of clj-spark with ongoing development and better support for Spark 2.x, as noted in the acknowledgements. Choose Flambo for newer Spark versions, but check community activity as both are niche projects with limited updates.

Question 3

Can I use Spark DataFrames with Flambo?

Accepted Answer

No, Flambo primarily supports the RDD API, so DataFrames are not directly available. You might need to interface with Spark's Scala APIs or use other Clojure libraries if DataFrames are required for optimized queries.

Question 4

How to handle function serialization errors in Flambo?

Accepted Answer

Use flambo.api/fn or defsparkfn for functions to ensure proper serialization, and configure Kryo as the default serializer. For custom types, extend flambo.kryo.BaseFlamboRegistrator, as described in the Kryo section.

Question 5

What's the performance overhead of Flambo compared to Scala Spark?

Accepted Answer

Flambo adds a thin layer over Spark's Java API, so performance is similar for RDD operations, but it misses optimizations in DataFrames. Overhead is minimal for most cases, but benchmark for critical applications due to serialization costs.

Question 6

How to run a Flambo application on a YARN cluster?

Accepted Answer

Set the master URL to 'yarn' in the SparkConf and use spark-submit with your uberjar, as referenced in the master URL table and Standalone Applications section. Consult Spark's YARN documentation for specific configuration details.

Question 7

Best practices for caching RDDs in Flambo?

Accepted Answer

Use f/cache for default memory storage or f/persist with levels from flambo.api/STORAGE-LEVELS. Cache RDDs after transformations to reuse them in iterative algorithms, as shown in the RDD Persistence examples.

flambo

What is flambo?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions