Showing 36 of 258 projects
A tiny wrapper around Node.js streams.Transform to simplify stream creation without subclassing.
A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.
A high-performance Python package for fast, multi-threaded manipulation of large tabular datasets, inspired by R's data.table.
A Ruby framework for writing reliable, concise, and maintainable ETL (Extract-Transform-Load) data processing jobs.
A C++ library for reading, writing, creating, and modifying Microsoft Excel .xlsx files.
A fast, lightweight JSON Query Language CLI tool built with Rust for querying and transforming JSON data.
A distributed map-reduce framework for parallel computations over large datasets on unreliable computer clusters.
A computational parallel flow library for Elixir built on top of GenStage for parallel processing of collections.
A collection of small, chainable command-line utilities for advanced password cracking operations.
A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.
Python library providing clean, chainable functions for data cleaning and manipulation with pandas DataFrames.
A high-performance streaming CSV parser for Node.js that converts CSV to JSON at up to 90,000 rows per second.
A comprehensive benchmark suite for evaluating speed, throughput, and resource utilization of big data frameworks like Hadoop, Spark, and streaming engines.
A fast, resilient distributed stream processing framework that simplifies real-time data applications with high performance and easy scaling.
A suite of high-performance command line tools for filtering, summarizing, joining, and manipulating large tabular data files.
A modern, minimal, and high-performance .NET library for reading and writing CSV/TSV files with zero allocations and SIMD-accelerated parsing.
A high-performance Rust JSON parser porting simdjson's SIMD techniques, with Serde compatibility.
A Python framework for processing spatio-temporal satellite imagery and extracting features for machine learning applications.
A Ruby library for reading, writing, and modifying Microsoft Excel-compatible spreadsheet documents (XLS format).
A pure Python library for reading and writing ESRI Shapefiles, the popular GIS vector data format.
A lean and fast C++ library for 3D point cloud data processing with efficient implementations of common operations.
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources.
A Rust library providing streaming compression and decompression for DEFLATE, zlib, and gzip formats with multiple backend options.
A high-performance, fully-featured CSV parser and serializer for modern C++ with streaming, random access, and robust format handling.
A Ruby gem for normalizing, formatting, and splitting E164 international phone numbers.
A collection of extra nodes for Node-RED, extending its capabilities with hardware I/O, social APIs, data parsing, and utility functions.
A collection of extra nodes for Node-RED, extending its capabilities with hardware, I/O, social, storage, and utility functions.
Run lambda functions over S3 objects with concurrency control for data pipelining and analytics.
Run lambda functions over S3 objects with concurrency control for data pipelining and analytics.
A header-only C++11 CSV parser library with easy-to-use API for reading and writing CSV files.
A C++ library for parallel text file reading with CSV support and Python bindings.
A fast, idiomatic, and dependency-free Go library for mapping between CSV and Go values.
A fast and friendly R package for reading rectangular data from delimited files like CSV and TSV.
A high-performance C++ JSON serializing and deserializing library accelerated by SIMD instructions.
A REST interface for interacting with Apache Spark from anywhere, enabling remote code execution and job submissions.
A high-performance JSON parser and toolkit for Go, optimized for large and variable datasets.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.