Apache Spark Official Docker images
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Ready-to-run Docker images containing Jupyter applications
This project provides a Dockerized version of Apache Spark, pre-configured to run on YARN with Hadoop 2.6.0. It simplifies setting up a Spark environment by packaging everything into a container, making it ideal for development, testing, and reproducible deployments. ## Key Features - **Pre-configured Spark on YARN** — Includes Apache Spark 1.6.0 and Hadoop 2.6.0, ready to run in YARN-client or YARN-cluster modes. - **Docker-based Deployment** — Offers Docker images for easy pulling, building, and running, reducing setup complexity. - **CentOS Base** — Built on a stable CentOS foundation for compatibility and reliability. - **Remote Submission Support** — Allows submitting Spark jobs from outside the container using environment variables like YARN_CONF_DIR. - **Integrated Testing Examples** — Includes sample commands for testing Spark functionality, such as estimating Pi. ## Philosophy The project focuses on providing a streamlined, containerized Spark environment that leverages Docker for consistency and ease of use, building on existing Hadoop Docker images to ensure compatibility.
An easy to setup Docker image for Apache Spark from Data Mechanics