An easy to setup Docker image for Apache Spark from Data Mechanics
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Ready-to-run Docker images containing Jupyter applications
This project provides a Dockerized version of Apache Spark, pre-configured to run on YARN with Hadoop 2.6.0. It simplifies setting up a Spark environment by packaging everything into a container, making it ideal for development, testing, and reproducible deployments. ## Key Features - **Pre-configured Spark on YARN** — Includes Apache Spark 1.6.0 and Hadoop 2.6.0, ready to run in YARN-client or YARN-cluster modes. - **Docker-based Deployment** — Offers Docker images for easy pulling, building, and running, reducing setup complexity. - **CentOS Base** — Built on a stable CentOS foundation for compatibility and reliability. - **Remote Submission Support** — Allows submitting Spark jobs from outside the container using environment variables like YARN_CONF_DIR. - **Integrated Testing Examples** — Includes sample commands for testing Spark functionality, such as estimating Pi. ## Philosophy The project focuses on providing a streamlined, containerized Spark environment that leverages Docker for consistency and ease of use, building on existing Hadoop Docker images to ensure compatibility.
Apache Spark Official Docker images