Question 1

How do I submit a Spark job from outside the docker-spark container?

Accepted Answer

Set the YARN_CONF_DIR environment variable to the yarn-remote-client directory provided in the repository, and export HADOOP_USER_NAME=root to access HDFS. Then, use spark-submit with the appropriate master URL and configuration, as detailed in the README.

Question 2

Is sequenceiq docker-spark still actively maintained?

Accepted Answer

No, the project appears dormant with no recent updates; it uses Spark 1.6.0 from 2016. For current versions, consider newer Docker images like official Apache Spark images or community-maintained alternatives.

Question 3

Can I use docker-spark on Kubernetes instead of YARN?

Accepted Answer

No, it's pre-configured exclusively for YARN and depends on Hadoop Docker images for YARN support. For Kubernetes, you'd need to modify the setup extensively or use official Spark images with native K8s integration.

Question 4

What are the memory requirements for running docker-spark?

Accepted Answer

The README recommends over 2GB memory for the VM if using boot2docker, but actual requirements depend on your Spark applications; larger workloads may need more resources, and performance can be impacted by Docker overhead.

Question 5

docker-spark vs official Apache Spark Docker images: which is better?

Accepted Answer

docker-spark simplifies YARN integration with Hadoop for quick setups, but official images are more up-to-date and support various cluster modes. Choose based on your need for YARN compatibility versus access to latest Spark features.

Question 6

How to update the Spark version in docker-spark?

Accepted Answer

You'd need to manually modify the Dockerfile to use a newer Spark version and rebuild the image, but this may break compatibility with Hadoop 2.6.0 and YARN, requiring additional adjustments and testing.

docker-spark

What is docker-spark?

Overview

Key Features

Philosophy

Related Projects

Found a gem we're missing?

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions