RFC 4180 compliant, composable CSV parsing and encoding library for Elixir.
CSV is an Elixir library for decoding and encoding CSV files, designed to handle the format's notorious inconsistencies gracefully. It focuses on extracting all correctly formatted rows from data pipelines while providing clear error diagnostics for problematic lines. The library is RFC 4180 compliant and follows a composable, stream-based design.
Elixir developers building data processing pipelines that need to parse or generate CSV files, especially when dealing with imperfect or inconsistently formatted data sources.
Developers choose this library for its robust error handling in normal mode, which returns all valid rows with descriptive errors, and its seamless integration with Elixir streams for composable data processing. Its performance-oriented design, using fast binary matching, ensures it rarely becomes a bottleneck.
CSV Decoding and Encoding for Elixir
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
In normal mode, decode returns a stream of ok/error tuples, extracting all valid rows even with malformed lines, as emphasized in the README for handling fickle CSV formats.
Uses fast binary matching to parse ~500k rows per second on modest hardware, ensuring it rarely bottlenecks data pipelines, per performance benchmarks.
Designed to work with Elixir streams and enumerables, fitting composably into data processing pipelines without extra overhead, following UNIX philosophy.
Supports custom separators, escape characters, field transformations, and formula unescaping, allowing adaptation to various CSV dialects as shown in decode options.
Allows custom data type encoding via the CSV.Encode protocol, enabling polymorphic output for complex structures, detailed in the encoding section.
Version 3.x eliminated parallelism features (num_workers, worker_work_ratio), which could reduce throughput on multi-core systems for large files compared to earlier versions.
Optimal byte streaming requires fine-tuning chunk sizes (e.g., default 1000 bytes), adding complexity for developers to achieve best performance, as noted in the performance tips.
For efficient encoding, protocols must be consolidated (consolidate_protocols: true), an extra setup step that can be overlooked and lead to performance degradation.