Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Data Pipeline

Data Pipeline

47 projects

Showing 11 of 47 projects

rust-rdkafka
rust-rdkafkaRust

A fully asynchronous, futures-based Apache Kafka client library for Rust built on librdkafka.

#stream-processing#futures#message-queue
Stars2.0k
Forks348
Last commit10 days ago
Secor
SecorJava

A fault-tolerant service that persists Kafka log data to cloud storage like S3, GCS, Azure Blob Storage, and OpenStack Swift.

#distributed-systems#data-archiving#hadoop-ecosystem
Stars1.9k
Forks532
Last commit1 month ago
Embulk
EmbulkJava

A parallel bulk data loader that transfers data between various storages, databases, NoSQL, and cloud services via plugins.

#gradle#jruby#data-transfer
Stars1.8k
Forks203
Last commit14 days ago
Kiba
KibaRuby

A Ruby framework for writing reliable, concise, and maintainable ETL (Extract-Transform-Load) data processing jobs.

#rubydatascience#etl-ruby#ruby-gem
Stars1.8k
Forks90
Last commit3 months ago
Genie
GenieJava

A federated Big Data orchestration service that simplifies job execution across distributed clusters by abstracting infrastructure complexity.

#data-orchestration#spark#netflixoss
Stars1.8k
Forks373
Last commit2 months ago
underscore-cli
underscore-cliJavaScript

A command-line utility for processing JSON and JavaScript data, inspired by Perl and Unix tools like sed and awk.

#cli-tool#nodejs#json-processing
Stars1.7k
Forks80
Last commit5 years ago
Jolt
JoltJava

A Java library for declarative JSON-to-JSON transformations using JSON-based specifications.

#java-library#declarative-spec#data-restructuring
Stars1.7k
Forks351
Last commit19 days ago
Multiwoven
MultiwovenRuby

An open-source Reverse ETL platform for syncing data from warehouses to business tools like Salesforce, HubSpot, and Slack.

#open-source#reverse-etl#data-integration
Stars1.6k
Forks86
Last commit2 days ago
mongo-hadoop
mongo-hadoopJava

A library enabling MongoDB to serve as input source or output destination for Hadoop MapReduce tasks and ecosystem tools.

#mapreduce#bson#spark
Stars1.6k
Forks591
Last commit4 years ago
Bruin
BruinGo

A unified data pipeline tool for ingestion, transformation with SQL/Python/R, and data quality checks across major platforms.

#data-modeling#data-quality#python
Stars1.5k
Forks72
Last commit2 days ago
smarter_csv
smarter_csvRuby

A high-performance CSV ingestion and generation library for Ruby with C acceleration, designed for real-world data with intelligent defaults.

#sidekiq#csv-processing#batch-processing
Stars1.5k
Forks192
Last commit2 days ago
PreviousPage 2 of 2

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
#Etl15
#Data Integration14
#Stream Processing12
#Distributed Systems10
#Message Queue9
#Kafka7
#Observability7
#Real Time Analytics6
#Big Data6
#Docker6
#Golang6
#Data Engineering5