A high-performance SIMD CSV parser library and extensible CLI utility for tabular data processing.
zsv+lib is a high-performance CSV parser library and extensible command-line utility for processing tabular data. It uses SIMD operations to achieve exceptional parsing speeds while maintaining compatibility with real-world CSV formats, including edge cases like non-standard quoting and multi-row headers. The tool includes a versatile CLI with commands for querying, converting, and visualizing CSV data.
Developers and data engineers who need fast, reliable CSV parsing for large datasets, especially those working in environments where performance and memory efficiency are critical. It's also suitable for building custom data processing pipelines or integrating CSV parsing into other applications.
zsv offers unmatched parsing speed through SIMD optimization while handling real-world CSV quirks that other fast parsers may not support. Its extensible CLI and library design allow for easy customization and integration, making it a versatile tool for both one-off data tasks and embedded use cases.
zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses branchless prefix-XOR carry propagation for quote state tracking, making it the fastest CSV parser in benchmarks for compliant data, as detailed in the performance section.
Parses non-standard CSV edge cases like unquoted quotes and mixed newlines with Excel-like compatibility, ensuring robust performance on messy data, as shown in the non-RFC 4180 examples.
Includes a modular plugin system for custom extensions and built-in commands like sql, 2json, and an interactive sheet viewer, allowing versatile data processing without external tools.
Designed for efficient memory usage regardless of input size, with a small binary footprint, enabling processing of large datasets without excessive resource consumption.
The SIMD-accelerated parser only works with RFC 4180 compliant data; for non-standard CSV, users must use the slower compat parser, creating a performance trade-off for mixed datasets.
Creating custom extensions requires compiling C code into shared libraries, which adds overhead compared to scripting in interpreted languages, and documentation for this is acknowledged as needing improvement.
Only a Ruby binding is currently available, limiting integration options for projects in popular languages like Python or Go, despite calls for contributions in the README.