Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Processing

Data Processing

258 projects

Showing 36 of 258 projects

from2
from2JavaScript

A high-level Node.js module for creating readable streams with proper backpressure handling and a familiar API.

#stream-processing#io-streams#backpressure
Stars132
Forks20
Last commit7 years ago
cl-csv
cl-csvCommon Lisp

A Common Lisp library for reading and writing CSV files with extensive customization and error handling.

#open-source#library#bsd-license
Stars131
Forks23
Last commit2 months ago
go-csv-tag
go-csv-tagGo

A Go library for reading and writing CSV files using struct tags for mapping fields.

#hacktoberfest#data-serialization#go-library
Stars131
Forks32
Last commit11 months ago
jsonpath
jsonpathRust

A Rust-based JsonPath engine with WebAssembly and JavaScript bindings for querying and manipulating JSON data.

#parsing#webassembly#serde
Stars131
Forks42
Last commit1 year ago
Magento 2 Import Framework
Magento 2 Import FrameworkPHP

A high-performance PHP import framework for distributed data processing with optimized memory consumption.

#products#magento#distributed-processing
Stars128
Forks22
Last commit23 days ago
parquet
parquetGo

A Go library that generates type-safe Parquet readers and writers from Go structs or existing Parquet files.

#parquet#data-serialization#dremel
Stars127
Forks13
Last commit1 year ago
dyer
dyerRust

A reliable, flexible, and fast Rust framework for web crawling and request-response services.

#event-driven#web-crawling#spider
Stars127
Forks7
Last commit10 months ago
AbuseHelper
AbuseHelperPython

An open-source framework for receiving, processing, and redistributing abuse feeds and threat intelligence.

#feed-distribution#open-source-framework#abuse-feeds
Stars125
Forks19
Last commit6 years ago
Apache DataFu
Apache DataFuJava

A collection of libraries for large-scale data processing in Hadoop ecosystems, including Spark, Pig, and incremental MapReduce.

#apache-spark#mapreduce#user-defined-functions
Stars124
Forks65
Last commit21 days ago
Covid-19 Google
Covid-19 GooglePython

An open-source data pipeline that aggregates and standardizes heterogeneous public COVID-19 data from multiple global sources.

#data-standardization#python#covid-19
Stars119
Forks68
Last commit4 years ago
transducers-java
transducers-javaJava

A Java implementation of composable algorithmic transformations called transducers, independent from input/output sources.

#stream-processing#functional-programming#transducers
Stars119
Forks13
Last commit3 years ago
spark-connect-rs
spark-connect-rsRust

An experimental Rust client for Apache Spark Connect, providing a DataFrame API to interact with Spark clusters.

#spark-connect#apache-spark#spark
Stars116
Forks24
Last commit1 year ago
WJElement
WJElementC

A flexible C library for JSON manipulation and schema validation, enabling JavaScript-like ease with C performance.

#parsing#c-library#embedded-json
Stars110
Forks55
Last commit2 months ago
parallel_stream
parallel_streamElixir

A parallelized stream implementation for Elixir that maintains order while processing with a worker pool.

#stream-processing#functional-programming#elixir
Stars103
Forks19
Last commit3 years ago
GSODR
GSODRR

An R package for accessing, formatting, and analyzing Global Surface Summary of the Day (GSOD) weather data from NOAA.

#meteorology#r-package#weather-information
Stars99
Forks18
Last commit23 days ago
lazycsv
lazycsvC++

A fast, lightweight, single-header C++17 CSV parser library that parses rows and cells lazily on demand.

#csv-reader#cpp-csv-reader#parsing-csv-files
Stars94
Forks12
Last commit3 months ago
stream
streamGo

A Go library providing Java 8 Stream-like functional programming operations for collections and data processing.

#functional-programming#filter#java-stream
Stars93
Forks11
Last commit2 years ago
Osprey
OspreyMATLAB

An all-in-one MATLAB software suite for state-of-the-art processing and quantitative analysis of in-vivo magnetic resonance spectroscopy (MRS) data.

#spectroscopy-analysis#neuroscience#quantitative-analysis
Stars92
Forks47
Last commit1 month ago
matcha
matchaElixir

A library providing first-class, ergonomic match specifications for the Elixir language.

#tracing#hex#functional-programming
Stars92
Forks6
Last commit1 year ago
ujson
ujsonGo

A fast and minimal JSON parser and transformer for Go that works on unstructured JSON without full unmarshalling.

#unstructured-json#json-transformer#streaming-json
Stars85
Forks9
Last commit1 year ago
Docker for beginners
Docker for beginnersJupyter Notebook

A collection of interactive Jupyter notebooks for learning Hadoop, Spark, and MapReduce with hands-on tutorials and demos.

#google-colab#mapreduce-bash#apache-spark
Stars84
Forks27
Last commit1 month ago
CSVParser
CSVParserSwift

A Swift library for fast reading and writing of CSV files with JSON conversion support.

#file-io#json-conversion#cross-platform
Stars83
Forks12
Last commit7 years ago
JSON-LD.ex
JSON-LD.exElixir

A JSON-LD 1.1 implementation for Elixir with RDF.ex integration for semantic web data processing.

#linked-data#elixir#json-ld
Stars82
Forks14
Last commit2 months ago
Conveyal's gtfs-lib
Conveyal's gtfs-libJava

A Java library for loading, saving, and validating large GTFS feeds using disk-backed storage.

#java-library#disk-backed-storage#transit-data
Stars80
Forks43
Last commit2 years ago
rollinghashjava
rollinghashjavaJava

A Java library implementing rolling hash functions like Randomized Karp-Rabin and Cyclic Polynomial hashing for efficient n-gram hashing.

#algorithm#rolling-hash-functions#java-library
Stars80
Forks13
Last commit10 years ago
paginate
paginatePython

A Python module for dividing large lists into pages with customizable HTML pagination and framework-agnostic design.

#python#pagination#sqlalchemy
Stars79
Forks17
Last commit1 year ago
C#: kbCSV
C#: kbCSVC#

An efficient, easy-to-use .NET library for parsing and writing CSV files, compliant with RFC4180.

#library#async#portable
Stars78
Forks26
Last commit5 years ago
jsonl-graph
jsonl-graphGo

A Go tool for generating Graphviz visualizations from JSONL-formatted graph data, designed to work seamlessly with jq.

#graph#command-line-tools#dot-language
Stars78
Forks5
Last commit4 months ago
go-email-normalizer
go-email-normalizerGo

A Go library for normalizing email addresses to a canonical form to prevent duplicate signups.

#go-library#canonicalization#normalization
Stars78
Forks9
Last commit1 year ago
Envision
EnvisionClojure

A Clojure library for data processing, cleanup, and interactive visualization using D3.

#statistics#data-visualization#clojure
Stars77
Forks3
Last commit8 years ago
groovy-common-extensions
groovy-common-extensionsGroovy

A collection of useful Groovy language extensions for common tasks like clamping, sorting, file operations, and data conversions.

#language-extensions#statistics#code-productivity
Stars72
Forks14
Last commit
GTFS Feed Parser
GTFS Feed ParserC#

A .NET library for parsing, reading, and writing General Transit Feed Specification (GTFS) data.

#csharp#dotnet#transit-data
Stars72
Forks49
Last commit4 years ago
capillaries
capillariesGo

A distributed batch data processing framework that handles scalability and intermediate storage, letting users focus on transforms and quality control.

#batch-processing#workflow-engine#message-queue
Stars72
Forks5
Last commit5 days ago
Scramjet Cloud Platform
Scramjet Cloud PlatformTypeScript

A runtime supervisor for deploying and running data processing programs called Sequences on Linux servers, Docker, and Kubernetes clusters.

#stream-processing#runtime-supervisor#serverless
Stars69
Forks7
Last commit
Sextant
SextantSwift

A complete, high-performance JSONPath implementation for Swift, enabling efficient querying and modification of JSON data.

#json-query#jsonpath#json-manipulation
Stars67
Forks9
Last commit2 months ago
Groovy-stream
Groovy-streamJava

A fluent builder for lazy streams and generators in Groovy, enabling functional-style data processing.

#stream-processing#functional-programming#java-library
Stars67
Forks10
Last commit7 years ago
PreviousPage 7 of 8

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
3 years ago
1 year ago
Next
#Big Data58
#Python37
#Apache Spark30
#Csv Parser28
#Csv28
#Json28
#Distributed Computing27
#Functional Programming26
#Performance25
#High Performance24
#Machine Learning23
#Streaming22