Data Processing

361 projects

Showing 36 of 360 projects

MoreLinqC#

A library that extends LINQ to Objects with over 100 additional methods for advanced sequence manipulation.

#functional-programming#csharp#nuget-package

Stars3.8k

Forks418

Last commit7 months ago

MoreLINQC#

A library that extends LINQ to Objects with over 100 additional methods for advanced sequence manipulation.

#functional-programming#csharp#nuget-package

Stars3.8k

Forks418

Last commit7 months ago

TablesawJava

A Java dataframe and visualization library for data loading, cleaning, transformation, and analysis.

#statistical-analysis#chart#data-science

Stars3.8k

Forks650

Last commit15 days ago

QSVRust

A blazing-fast command-line toolkit for querying, slicing, analyzing, transforming, and validating tabular data (CSV, Excel, JSONL, etc.).

A jq clone written in Rust focused on correctness, speed, and simplicity, with support for YAML, CBOR, TOML, and XML.

#jq-clone#yaml#command-line-tool

Stars3.7k

Forks115

Last commit9 days ago

awesome-etl list

A curated list of awesome ETL frameworks, libraries, and software for data integration and pipeline development.

#open-source#workflow-orchestration#data-integration

Stars3.6k

Forks373

Last commit2 months ago

gleamGo

A high-performance distributed map/reduce system with DAG execution, written in Go, supporting standalone or distributed modes.

#stream-processing#cluster-computing#distributed-systems

Stars3.6k

Forks292

Last commit5 days ago

ScaldingScala

A Scala API for Cascading that simplifies writing Hadoop MapReduce jobs with Scala integration.

#cascading#mapreduce#functional-programming

Stars3.5k

Forks698

Last commit3 years ago

Highland.jsJavaScript

A high-level streams library for Node.js and the browser that manages synchronous and asynchronous code seamlessly.

#asynchronous-programming#functional-programming#transducers

Stars3.4k

Forks146

Last commit6 years ago

HekaGo

A deprecated tool for collecting, processing, and delivering data from multiple sources with Go and Lua plugin support.

#pipeline-tool#lua-plugins#real-time-processing

Stars3.4k

Forks517

Last commit2 years ago

KoalasPython

Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.

#apache-spark#spark#mlflow

Stars3.4k

Forks369

Last commit2 years ago

gotaGo

A Go library providing DataFrames, Series, and data wrangling operations for structured data manipulation.

#data-wrangling#go-library#structured-data

Stars3.3k

Forks290

Last commit2 years ago

gotaGo

A Go library providing DataFrames, Series, and data wrangling operations for tabular data manipulation.

#dataframe#data-wrangling#series

Stars3.3k

Forks290

Last commit2 years ago

glowGo

A distributed computation system written in Go for parallel and cluster processing, similar to Hadoop MapReduce and Spark.

#mapreduce#cluster-computing#go-library

Stars3.2k

Forks249

Last commit7 years ago

fast-xml-parserJavaScript

A pure JavaScript library for validating, parsing, and building XML without C/C++ dependencies or callbacks.

#xml2json#js#fast

Stars3.1k

Forks384

Last commit4 days ago

spark-jobserverScala

A RESTful job server for Apache Spark that provides a service interface for submitting and managing Spark jobs, jars, and contexts.

#apache-spark#spark#rest-api

Stars2.8k

Forks971

Last commit4 months ago

NumaflowRust

A Kubernetes-native, serverless platform for running massively parallel data and streaming jobs with exactly-once semantics.

#stream-processing#hacktoberfest#event-driven-architecture

Stars2.8k

Forks162

Last commit2 days ago

SJSONGo

A high-performance Go library for setting values in JSON documents using dot-notation paths.

#json-editing#high-performance#go-package

Stars2.7k

Forks183

Last commit1 month ago

broadwayElixir

Build concurrent, multi-stage data ingestion and processing pipelines with Elixir, supporting back-pressure, batching, and fault tolerance.

#event-driven#back-pressure#elixir

Stars2.7k

Forks178

Last commit15 days ago

JSONata (.6k)JavaScript

A lightweight query and transformation language for JSON data, inspired by XPath and SQL.

#functional-programming#json-query#query-language

Stars2.7k

Forks277

Last commit14 days ago

ScioScala

A Scala API for Apache Beam and Google Cloud Dataflow, enabling unified batch and streaming data processing.

#stream-processing#batch-processing#batch

A general-purpose GPU compute framework built on Vulkan for cross-vendor graphics cards, enabling high-performance data processing and machine learning.

#vulkan#parallel-computing#gpu-compute

Stars2.5k

Forks195

Last commit1 month ago

Fast C++ CSV ParserC++

A fast, header-only C++11 library for reading CSV files with automatic column rearrangement, threading for I/O overlap, and configurable parsing features.

#threading#high-performance#c

Stars2.4k

Forks440

Last commit1 year ago