A PySpark testing library providing fast helper methods with descriptive, color-coded error messages for DataFrame and column comparisons.
Chispa is a PySpark testing library that provides helper methods for asserting column and DataFrame equality in unit tests. It solves the problem of opaque test failures in PySpark by outputting descriptive, color-coded error messages that clearly highlight mismatches, making debugging faster and more intuitive.
Data engineers and PySpark developers who write unit tests for data transformations, ETL pipelines, and DataFrame operations and need clear feedback when tests fail.
Developers choose Chispa for its beautifully formatted error messages that visually distinguish mismatched data, its flexible comparison options (like ignoring row/column order), and its focus on improving the PySpark testing experience without sacrificing performance.
PySpark test helper methods with beautiful error messages
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides color-coded, visually formatted error output that highlights mismatched rows and cells, making debugging test failures intuitive and fast, as shown in the README images.
Supports ignoring row order, column order, specific columns, nullability, and metadata in DataFrame comparisons, offering versatility for real-world test scenarios.
Includes methods for asserting approximate equality with tolerance for floating-point numbers, essential for numerical data testing without strict precision requirements.
Performs fast schema comparisons before content analysis, with clear error messages when schemas differ, improving test failure efficiency.
Allows configuration of colors and styles for error messages via FormattingConfig, enabling integration with custom test reporting or preferences.
Ignoring row order requires sorting DataFrames, which can slow down tests for large datasets, as noted in the README's ignore_row_order section.
Focused solely on DataFrame and column assertions, lacking features for mocking, integration testing, or broader test framework capabilities, so it must be paired with other tools.
Requires Python 3.10+ and specific PySpark versions (tested up to 4.1.x), which may not align with older projects or environments with strict dependency management.
Custom formatting requires additional configuration, such as setting up pytest fixtures in conftest.py, adding overhead for simple use cases.