A fast Apache Spark testing helper library with beautifully formatted error messages for Scala applications.
Spark Fast Tests is a Scala library that provides fast, specialized assertion helpers for testing Apache Spark applications. It solves the problem of slow and verbose Spark tests by offering optimized methods for comparing DataFrames, Datasets, columns, and schemas with clear error messages. The library is dependency-free and integrates with popular Scala testing frameworks.
Scala developers and data engineers who write and maintain test suites for Apache Spark applications, particularly those working on data pipelines, ETL processes, or analytics codebases.
Developers choose Spark Fast Tests for its speed advantages over generic testing approaches, its beautifully formatted error messages that simplify debugging, and its flexibility to work with multiple testing frameworks without adding unnecessary dependencies.
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers specialized methods like assertSmallDataFrameEquality and assertColumnEquality that run up to 70% faster than alternatives, as demonstrated in the benchmark table showing assertColumnEquality at 108 milliseconds.
Provides color-coded, readable diff outputs for data and schema mismatches, making failures easy to diagnose visually, as shown in the example images for assertSmallDataFrameEquality.
Works seamlessly with Scalatest, uTest, and MUnit, allowing integration into existing test suites without framework lock-in, as stated in the README's compatibility notes.
Avoids unnecessary dependencies like Hive, keeping the library lean and focused on core testing utilities, which simplifies dependency management and reduces bloat.
Users must set up and manage their own SparkSession for tests, adding boilerplate and potential for misconfiguration, as the library does not provide a built-in SparkSession.
Lacks advanced features such as streaming support, which are available in alternatives like spark-testing-base, as admitted in the README's alternatives section.
Future versions will drop support for Spark 2.x, forcing teams to upgrade their Spark version to maintain compatibility, which can be a barrier for legacy projects.