Showing 36 of 81 projects
A library that extends LINQ to Objects with over 100 additional methods for advanced sequence manipulation.
A library that extends LINQ to Objects with over 100 additional methods for advanced sequence manipulation.
A Java dataframe and visualization library for data loading, cleaning, transformation, and analysis.
A blazing-fast command-line toolkit for querying, slicing, analyzing, transforming, and validating tabular data (CSV, Excel, JSONL, etc.).
A jq clone written in Rust focused on correctness, speed, and simplicity, with support for YAML, CBOR, TOML, and XML.
A high-performance distributed map/reduce system with DAG execution, written in Go, supporting standalone or distributed modes.
A curated list of awesome ETL frameworks, libraries, and software for data integration and pipeline development.
A Scala API for Cascading that simplifies writing Hadoop MapReduce jobs with Scala integration.
A high-level streams library for Node.js and the browser that manages synchronous and asynchronous code seamlessly.
A deprecated tool for collecting, processing, and delivering data from multiple sources with Go and Lua plugin support.
Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
A Go library providing DataFrames, Series, and data wrangling operations for structured data manipulation.
A Go library providing DataFrames, Series, and data wrangling operations for tabular data manipulation.
A distributed computation system written in Go for parallel and cluster processing, similar to Hadoop MapReduce and Spark.
A pure JavaScript library for validating, parsing, and building XML without C/C++ dependencies or callbacks.
A RESTful job server for Apache Spark that provides a service interface for submitting and managing Spark jobs, jars, and contexts.
A high-performance Go library for setting values in JSON documents using dot-notation paths.
Build concurrent, multi-stage data ingestion and processing pipelines with Elixir, supporting back-pressure, batching, and fault tolerance.
A Scala API for Apache Beam and Google Cloud Dataflow, enabling unified batch and streaming data processing.
A lightweight query and transformation language for JSON data, inspired by XPath and SQL.
A general-purpose GPU compute framework built on Vulkan for cross-vendor graphics cards, enabling high-performance data processing and machine learning.
A Kubernetes-native, serverless platform for running massively parallel data and streaming jobs with exactly-once semantics.
A fast, header-only C++11 library for reading CSV files with automatic column rearrangement, threading for I/O overlap, and configurable parsing features.
A cluster computing framework for processing large-scale geospatial data within Apache Spark, Flink, and other big data systems.
A Scala library providing abstract algebra types and structures for building aggregation systems and analytics.
A Go package providing an ODM-like API to query and aggregate JSON, YAML, XML, and CSV data.
A CLI tool that executes SQL queries on CSV, LTSV, JSON, YAML, and TBLN files, with output to various formats.
.NET for Apache Spark provides high-performance .NET APIs for Apache Spark, enabling C# and F# developers to work with structured and streaming data.
A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
A high-performance C++ library for parsing floating-point and integer numbers from strings, 4x to 10x faster than strtod.
A Java library and command-line tool for extracting tables from PDF files.
A distributed query execution engine that extends Apache DataFusion to run SQL queries in parallel across multiple nodes.
A Python framework and Rust-based distributed processing engine for stateful event and stream processing.
A connector that enables Apache Spark to read from and write to Apache Cassandra databases for distributed data processing.
A fast and flexible CSV reader and writer for Rust with Serde support for easy data serialization.
A tiny wrapper around Node.js streams.Transform to simplify stream creation without subclassing.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.