A RESTful job server for Apache Spark that provides a service interface for submitting and managing Spark jobs, jars, and contexts.
Spark Job Server is a RESTful service that provides an HTTP interface for submitting and managing Apache Spark jobs, jars, and contexts. It allows developers to interact with Spark clusters via API calls, simplifying job deployment and management while supporting features like persistent contexts and named object sharing.
Data engineers and developers working with Apache Spark who need a scalable, service-oriented way to submit and manage Spark jobs across clusters, especially in environments requiring REST APIs or multi-tenant job management.
Developers choose Spark Job Server because it offers a production-ready, extensible REST API for Spark, reducing the overhead of job submission and context management while providing enterprise features like authentication, high availability, and support for multiple Spark deployment modes.
REST job server for Apache Spark
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides a comprehensive HTTP API for submitting, monitoring, and managing Spark jobs, abstracts cluster complexities, and supports both synchronous and asynchronous modes for flexibility.
Compatible with Scala, Java, and Python jobs, allowing diverse teams to use their preferred language without modifying the core infrastructure.
Enables creation of long-running Spark contexts for low-latency job execution and resource sharing, as shown in the WordCount example with reusable contexts.
Facilitates caching and retrieval of RDDs or DataFrames by name, improving data reuse across jobs and reducing computation overhead.
Integrates with LDAP/Shiro and Keycloak for authentication and authorization, making it suitable for secure, multi-tenant environments.
HA deployment is labeled as beta in the README, indicating it may lack stability or full production readiness for critical fault-tolerant systems.
Requires manual configuration, multiple scripts (e.g., server_deploy.sh), and careful setup across different cluster managers, which can be error-prone and time-consuming.
In context-per-JVM mode, processes do not shut down automatically and require manual cleanup, adding operational burden as noted in the README's known issues.
Documentation is scattered across multiple markdown files and includes outdated links (e.g., Bintray migration), making it harder for new users to get started efficiently.