This project provides a Dockerized version of Apache Spark, pre-configured to run on YARN with Hadoop 2.6.0. It simplifies setting up a Spark environment by packaging everything into a container, making it ideal for development, testing, and reproducible deployments. ## Key Features - **Pre-configured Spark on YARN** — Includes Apache Spark 1.6.0 and Hadoop 2.6.0, ready to run in YARN-client or YARN-cluster modes. - **Docker-based Deployment** — Offers Docker images for easy pulling, building, and running, reducing setup complexity. - **CentOS Base** — Built on a stable CentOS foundation for compatibility and reliability. - **Remote Submission Support** — Allows submitting Spark jobs from outside the container using environment variables like YARN_CONF_DIR. - **Integrated Testing Examples** — Includes sample commands for testing Spark functionality, such as estimating Pi. ## Philosophy The project focuses on providing a streamlined, containerized Spark environment that leverages Docker for consistency and ease of use, building on existing Hadoop Docker images to ensure compatibility.

Stars757

Forks277

Last commit5 years ago

apache/spark

Apache Spark Official Docker images

Stars0

Forks0

Last commit