A blazing-fast command-line toolkit for querying, slicing, analyzing, transforming, and validating tabular data (CSV, Excel, JSONL, etc.).
qsv is a command-line data-wrangling toolkit for efficiently processing tabular data like CSV, Excel, and JSONL files. It solves the problem of slow, cumbersome data manipulation by providing dozens of optimized commands for filtering, transforming, validating, and analyzing datasets, even at scales of tens of millions of rows. It combines the speed of Rust with intelligent features like automatic indexing, multithreading, and external sorting.
Data engineers, analysts, and scientists who work with large tabular datasets and need fast, scriptable command-line tools for data preparation, cleaning, and exploration. It's also suitable for developers building data pipelines or integrating data validation into their workflows.
Developers choose qsv for its exceptional performance, comprehensive feature set, and ease of use. Unlike generic CSV parsers, it offers specialized commands for geocoding, schema inference, AI-assisted description, and SQL queries, all while maintaining a simple, composable interface. Its ability to handle files larger than memory and its extensive format support make it a versatile alternative to slower scripting solutions.
Blazing-fast Data-Wrangling toolkit
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses multithreading, memory-mapped indexes, and external sorting to handle massive datasets efficiently, processing a 28-million-row NYC CSV in seconds as noted in benchmarks.
Offers over 60 specialized commands for tasks from deduplication and geocoding to AI-assisted description and SQL queries, providing comprehensive data-wrangling capabilities.
Natively reads and writes CSV, TSV, Excel, Parquet, JSON, JSONL, Arrow, and Avro formats, with automatic Snappy compression and decompression for efficient storage.
Includes schema inference, JSON validation with custom keywords, and anomaly detection, ensuring data integrity with detailed error reports and validation options.
Managing feature flags and compiling from source can be daunting; prebuilt binaries come in variants with different enabled features, adding setup complexity and potential confusion.
Commands marked with 🤯 in the README load entire CSVs into memory, which can be prohibitive for extremely large datasets despite some streaming modes, limiting scalability in memory-constrained environments.
Certain features, like the luau interpreter, are not available on musl-based Linux distributions without manual compilation, and some commands depend on external tools like Python 3.10+.