Showing 36 of 81 projects
A Python utility for converting PDFs, Office documents, images, audio, and more into structured Markdown for LLM consumption.
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
A modern, cross-platform shell that treats data as structured tables instead of plain text.
An extremely fast query engine for DataFrames, written in Rust, with multi-language frontends.
A JavaScript library for reading, writing, and processing spreadsheet data across Excel, CSV, and other formats.
A lightweight, zero-dependency command-line JSON processor for slicing, filtering, and transforming JSON data.
A high-performance C++ JSON parser that uses SIMD instructions to parse gigabytes of JSON per second.
A high-performance C++ JSON parser that uses SIMD instructions to parse gigabytes of JSON per second.
A pure Go library for reading and writing Microsoft Excel™ spreadsheets (XLAM/XLSM/XLSX/XLTM/XLTX).
A terminal-based JSON viewer and processor with interactive exploration and JavaScript processing capabilities.
A plugin-driven agent for collecting, processing, aggregating, and writing metrics, logs, and arbitrary data.
A curated list of awesome big data frameworks, resources, and tools across various categories.
A curated list of awesome big data frameworks, resources, and tools across various categories.
Fast, reliable CSV parser for JavaScript with streaming, worker threads, and malformed input handling.
An extremely fast non-cryptographic hash algorithm that processes data at RAM speed limits.
A fast command-line toolkit for indexing, slicing, analyzing, splitting, and joining CSV files, written in Rust.
A modern I/O library for Android, Java, and Kotlin Multiplatform that complements java.io and java.nio.
An extensible SQL query engine written in Rust, using Apache Arrow as its in-memory format for building fast database and analytic systems.
A curated list of command-line tools for manipulating structured text data like CSV, JSON, XML, YAML, and more.
A fast compression/decompression library optimized for speed over maximum compression.
A high-performance neural network training interface for TensorFlow focused on speed, flexibility, and reproducible research.
A high-speed zlib port to JavaScript for compression and decompression, working in both browsers and Node.js.
A functional JavaScript utility library with lazy evaluation for optimal performance and memory efficiency.
An open-source translator library for raster and vector geospatial data formats.
A standard library for JavaScript and TypeScript with an emphasis on numerical and scientific computation.
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
An end-to-end framework for building custom AI applications and agents directly integrated with databases.
An embedded Java database engine providing concurrent collections backed by disk storage or off-heap memory.
A PHP library for parsing, formatting, validating, and geocoding international phone numbers, based on Google's libphonenumber.
A parser generator for JavaScript that creates fast parsers with excellent error reporting.
A streaming JSON parser for JavaScript that delivers parsed objects before the HTTP response completes.
A Python library for handling tabular datasets across multiple formats like XLS, CSV, JSON, and YAML.
A composable and fully extensible C++ execution engine library for building high-performance data management systems.
A curated list of awesome open-source bioinformatics software, libraries, and resources, primarily for command-line analysis.
A modern spreadsheet engine written in Rust for programmatic spreadsheet manipulation across diverse environments.
A command-line tool for running SQL queries against JSON, CSV, Excel, Parquet, and other structured data files.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.