A Spark library for reading and writing data between Spark SQL and MongoDB collections.
Spark-MongoDB is a library that provides a connector between Apache Spark and MongoDB, enabling bidirectional data flow between Spark SQL and MongoDB collections. It solves the problem of integrating MongoDB's document database with Spark's distributed processing framework, allowing users to query MongoDB data using Spark SQL and write processed results back to MongoDB.
Data engineers and data scientists working with both Spark and MongoDB who need to perform ETL, analytics, or machine learning on data stored in MongoDB collections.
Developers choose Spark-MongoDB because it provides a native-feeling integration with Spark's DataFrame API, supports multiple programming languages, and eliminates the need for custom connectors when working with MongoDB data in Spark workflows.
Spark library for easy MongoDB access
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides dedicated examples for Scala, Python, Java, and R in the documentation, enabling teams to use their preferred programming language without switching tools.
Allows reading and writing MongoDB data directly into Spark DataFrames, leveraging familiar Spark SQL syntax for querying, as highlighted in the Spark SQL integration feature.
Includes a detailed table matching library versions with Spark and MongoDB releases, reducing setup errors and ensuring proper environment configuration.
Offers multiple parameters to customize connections and data handling, as outlined in the configuration section, allowing fine-tuned control over integration behavior.
Only supports Spark up to 2.0.0 and MongoDB 3.0.x, making it incompatible with modern releases and raising concerns about long-term maintenance and security.
The README primarily links to external documents like 'First Steps.rst', with no comprehensive API reference or troubleshooting guide within the main repository.
Requires specific versions of Casbah 2.8.X and Scala 2.10/2.11, which can lead to conflicts in projects using newer libraries or different Scala versions.