How do I install qsv on Windows?

Download prebuilt MSVC binaries from the releases page, use the MSI easy installer, or install via Scoop. The README provides detailed steps, including handling Windows-specific encoding issues with Excel.

qsv vs xsv: which one should I use?

qsv is a fork of xsv with significant enhancements, including more commands, better performance, and features like geocoding and AI integration. If you need advanced capabilities and faster processing, qsv is the better choice.

How to join two CSV files with qsv?

Use the join command for in-memory joins or joinp for Polars-based joins that handle larger-than-memory files. Specify join types like inner or outer with column selectors for keys.

Does qsv support compressed files?

Yes, it automatically handles Snappy-compressed .sz files for most commands, and with the polars feature, it supports gzip, zstd, and zlib compression for CSV, TSV, and other formats.

How to validate a CSV against a schema in qsv?

First, generate a JSON schema with the schema command, then use the validate command with that schema to check compliance and output errors, with options for custom formats and dynamic enums.

Can I use Python scripts with qsv?

Yes, via the py command, but it requires the python feature enabled and Python 3.10+ installed. However, Luau scripting is recommended for better performance and integration.

QSV

UnlicenseRust20.1.0

A blazing-fast command-line toolkit for querying, slicing, analyzing, transforming, and validating tabular data (CSV, Excel, JSONL, etc.).

Visit Website

What is QSV?

qsv is a command-line data-wrangling toolkit for efficiently processing tabular data like CSV, Excel, and JSONL files. It solves the problem of slow, cumbersome data manipulation by providing dozens of optimized commands for filtering, transforming, validating, and analyzing datasets, even at scales of tens of millions of rows. It combines the speed of Rust with intelligent features like automatic indexing, multithreading, and external sorting.

Target Audience

Data engineers, analysts, and scientists who work with large tabular datasets and need fast, scriptable command-line tools for data preparation, cleaning, and exploration. It's also suitable for developers building data pipelines or integrating data validation into their workflows.

Value Proposition

Developers choose qsv for its exceptional performance, comprehensive feature set, and ease of use. Unlike generic CSV parsers, it offers specialized commands for geocoding, schema inference, AI-assisted description, and SQL queries, all while maintaining a simple, composable interface. Its ability to handle files larger than memory and its extensive format support make it a versatile alternative to slower scripting solutions.

Overview

Blazing-fast Data-Wrangling toolkit

Use Cases

Best For

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

GitHub

3.7k stars104 forks0 contributors

Cleaning and deduplicating large CSV files with millions of rows
Validating tabular data against JSON schemas for data quality assurance
Performing fast joins, pivots, and SQL queries on multiple CSV files
Geocoding address data using local Geonames and Maxmind databases
Generating data dictionaries and summaries with AI-assisted description
Converting between CSV, Excel, Parquet, JSONL, and other tabular formats

Not Ideal For

Data teams requiring interactive visual data exploration and dashboarding capabilities
Real-time data streaming applications that need sub-second processing and continuous data ingestion
Environments where deep integration with Python or R data science libraries and ecosystems is non-negotiable

Pros & Cons

Pros

Blazing Fast Performance

Uses multithreading, memory-mapped indexes, and external sorting to handle massive datasets efficiently, processing a 28-million-row NYC CSV in seconds as noted in benchmarks.

Extensive Command Library

Offers over 60 specialized commands for tasks from deduplication and geocoding to AI-assisted description and SQL queries, providing comprehensive data-wrangling capabilities.

Broad Format Support

Natively reads and writes CSV, TSV, Excel, Parquet, JSON, JSONL, Arrow, and Avro formats, with automatic Snappy compression and decompression for efficient storage.

Advanced Data Quality

Includes schema inference, JSON validation with custom keywords, and anomaly detection, ensuring data integrity with detailed error reports and validation options.

Cons

Complex Installation and Configuration

Managing feature flags and compiling from source can be daunting; prebuilt binaries come in variants with different enabled features, adding setup complexity and potential confusion.

Memory Intensive Operations

Commands marked with 🤯 in the README load entire CSVs into memory, which can be prohibitive for extremely large datasets despite some streaming modes, limiting scalability in memory-constrained environments.

Platform and Feature Limitations

Certain features, like the luau interpreter, are not available on musl-based Linux distributions without manual compilation, and some commands depend on external tools like Python 3.10+.

Frequently Asked Questions

Home

CSV

awk by example

:zap: From finding text to search and replace, from sorting to beautifying text and more :art:

Stars10,181

Forks704

Last commit2 years ago

ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

Stars9,675

Forks214

Last commit2 months ago

Kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

A command-line tool and Rust library with Python bindings for generating regular expressions from user-provided test cases

Stars8,133

Forks193

Last commit3 months ago