Base classes for writing Apache Spark tests in Scala and Python, simplifying test setup and teardown.
spark-testing-base is a library that provides base classes and utilities for writing tests for Apache Spark applications. It simplifies the process of setting up and tearing down local Spark sessions, reducing boilerplate code and allowing developers to write focused, efficient tests. The library supports both Scala and Python, integrating with common build tools and testing frameworks.
Data engineers and developers who write and maintain Apache Spark applications in Scala or Python and need to create reliable unit and integration tests.
Developers choose spark-testing-base because it eliminates repetitive Spark session management code, provides a consistent testing foundation inspired by Apache Spark's internal test utilities, and supports both Scala and Python ecosystems with easy dependency management.
Base classes to use when writing tests with Spark
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Base classes automatically handle Spark session lifecycle, eliminating repetitive setup and teardown code, as highlighted in the 'Why?' section to reduce boilerplate.
Available for both Scala and Python via Maven/SBT and PyPI/Conda, making it versatile for Spark projects in different ecosystems, as noted in the installation instructions.
Designed to work with standard Scala and Python testing tools, with configurations for memory management and parallel execution provided in the README.
Includes specific environment variable setup (SPARK_TESTING=true) for testing Spark SQL code generation, aiding in comprehensive SQL testing as mentioned in 'Special considerations'.
Requires high JVM memory settings (e.g., 8G) to run tests, which can be prohibitive in resource-constrained environments, as detailed in the 'Minimum Memory Requirements' section.
Necessitates manual adjustments like disabling parallel execution and tuning memory in build tools (SBT/Maven), adding setup overhead and potential for errors.
Primarily focused on local Spark testing, so it may not cover distributed cluster behaviors or performance, limiting its use for integration testing on actual clusters.