Showing 36 of 258 projects
A high-level Node.js module for creating readable streams with proper backpressure handling and a familiar API.
A Common Lisp library for reading and writing CSV files with extensive customization and error handling.
A Go library for reading and writing CSV files using struct tags for mapping fields.
A Rust-based JsonPath engine with WebAssembly and JavaScript bindings for querying and manipulating JSON data.
A high-performance PHP import framework for distributed data processing with optimized memory consumption.
A Go library that generates type-safe Parquet readers and writers from Go structs or existing Parquet files.
A reliable, flexible, and fast Rust framework for web crawling and request-response services.
An open-source framework for receiving, processing, and redistributing abuse feeds and threat intelligence.
A collection of libraries for large-scale data processing in Hadoop ecosystems, including Spark, Pig, and incremental MapReduce.
An open-source data pipeline that aggregates and standardizes heterogeneous public COVID-19 data from multiple global sources.
A Java implementation of composable algorithmic transformations called transducers, independent from input/output sources.
An experimental Rust client for Apache Spark Connect, providing a DataFrame API to interact with Spark clusters.
A flexible C library for JSON manipulation and schema validation, enabling JavaScript-like ease with C performance.
A parallelized stream implementation for Elixir that maintains order while processing with a worker pool.
An R package for accessing, formatting, and analyzing Global Surface Summary of the Day (GSOD) weather data from NOAA.
A fast, lightweight, single-header C++17 CSV parser library that parses rows and cells lazily on demand.
A Go library providing Java 8 Stream-like functional programming operations for collections and data processing.
An all-in-one MATLAB software suite for state-of-the-art processing and quantitative analysis of in-vivo magnetic resonance spectroscopy (MRS) data.
A library providing first-class, ergonomic match specifications for the Elixir language.
A fast and minimal JSON parser and transformer for Go that works on unstructured JSON without full unmarshalling.
A collection of interactive Jupyter notebooks for learning Hadoop, Spark, and MapReduce with hands-on tutorials and demos.
A Swift library for fast reading and writing of CSV files with JSON conversion support.
A JSON-LD 1.1 implementation for Elixir with RDF.ex integration for semantic web data processing.
A Java library for loading, saving, and validating large GTFS feeds using disk-backed storage.
A Java library implementing rolling hash functions like Randomized Karp-Rabin and Cyclic Polynomial hashing for efficient n-gram hashing.
A Python module for dividing large lists into pages with customizable HTML pagination and framework-agnostic design.
An efficient, easy-to-use .NET library for parsing and writing CSV files, compliant with RFC4180.
A Go tool for generating Graphviz visualizations from JSONL-formatted graph data, designed to work seamlessly with jq.
A Go library for normalizing email addresses to a canonical form to prevent duplicate signups.
A Clojure library for data processing, cleanup, and interactive visualization using D3.
A collection of useful Groovy language extensions for common tasks like clamping, sorting, file operations, and data conversions.
A .NET library for parsing, reading, and writing General Transit Feed Specification (GTFS) data.
A distributed batch data processing framework that handles scalability and intermediate storage, letting users focus on transforms and quality control.
A runtime supervisor for deploying and running data processing programs called Sequences on Linux servers, Docker, and Kubernetes clusters.
A complete, high-performance JSONPath implementation for Swift, enabling efficient querying and modification of JSON data.
A fluent builder for lazy streams and generators in Groovy, enabling functional-style data processing.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.