A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.
Awesome Spark is a curated directory of packages, libraries, and resources for the Apache Spark ecosystem. It helps data engineers and scientists discover tools that extend Spark's capabilities for distributed data processing, machine learning, and analytics. The list covers language bindings, notebooks, storage solutions, and specialized libraries across various domains.
Data engineers, data scientists, and developers working with Apache Spark who need to find extensions, utilities, or learning materials to enhance their Spark-based projects.
It saves time by providing a vetted, organized collection of Spark-related tools, eliminating the need to search scattered sources. As a community-maintained resource, it reflects practical, real-world usage and keeps up with ecosystem developments.
A curated list of awesome Apache Spark packages and resources.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Organizes over 50 packages across 20+ categories like language bindings, machine learning, and GIS, as shown in the detailed README sections, making it a one-stop shop.
Includes GitHub last-commit badges for each entry, indicating active maintenance and real-world usage, such as recent updates for Kotlin bindings and Delta Lake.
Curates learning materials like books, papers, and MOOCs, including specific recommendations like 'Learning Spark, 2nd Edition' and edX courses, aiding skill development.
Lists specialized libraries for niches like bioinformatics (ADAM, Hail) and natural language processing (spark-nlp), extending Spark beyond core data processing.
Merely lists tools without ratings, reviews, or guidance on suitability, forcing users to independently evaluate each option for their use case.
Presented as a GitHub README, it lacks search, filtering, or sorting features, making navigation tedious for large lists like the 20+ machine learning extensions.
Relies on community contributions, so it may miss newly released or deprecated projects quickly, despite last-commit badges; for example, some entries have unknown statuses like Mahout Spark Bindings.