The fastest delimited file reader for R, using lazy loading and multi-threading to achieve speeds over 1 GB/sec.
vroom is an R package for reading delimited files (such as CSV or TSV) at very high speeds, often exceeding 1 GB per second. It solves the problem of slow data import for large datasets by using lazy loading and multi-threading, allowing users to work with big data efficiently in R.
R users, data scientists, and analysts who need to import and process large delimited files quickly, especially those working within the tidyverse ecosystem.
Developers choose vroom for its unmatched speed in reading delimited files, its seamless integration with existing R code, and its advanced features like lazy loading and multi-threading that optimize both performance and memory usage.
Fast reading of delimited files
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Benchmarks show vroom reads at 1.23 GB/sec, up to 53x faster than base R, by indexing files and using multi-threading for parsing non-character columns.
Uses the Altrep framework to load data only when accessed, minimizing memory usage for large datasets without code changes.
Adds capabilities over readr like column selection (similar to dplyr::select()), big integer support, and native reading from multiple files or connections.
Leverages multiple threads for indexing, writing, and parsing, with environment variables to control thread count for optimization.
Handling embedded newlines in headers or fields requires setting num_threads = 1, which disables multi-threading and reduces performance benefits.
Lazy loading via Altrep can introduce access latency and may not be fully compatible with all R operations, potentially causing unexpected behavior in some workflows.
Requires management of numerous environment variables (e.g., VROOM_THREADS, VROOM_USE_ALTREP_*) for fine-tuning, adding setup complexity for advanced use.