Showing 36 of 87 projects
A C++20 library for fast serialization, deserialization, and validation using reflection, supporting JSON, Avro, CSV, Parquet, and more.
A parallel bulk data loader that transfers data between various storages, databases, NoSQL, and cloud services via plugins.
A Ruby framework for writing reliable, concise, and maintainable ETL (Extract-Transform-Load) data processing jobs.
An open-source Reverse ETL platform for syncing data from warehouses to business tools like Salesforce, HubSpot, and Slack.
A command-line tool that provides an SQL-like query language for reading, updating, and deleting CSV records.
A Python library for agile data preparation workflows that works with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark.
A one-stop, full-scenario integration framework for massive data, supporting data ingestion, synchronization, and subscription.
A standalone, database-driven job scheduler for PostgreSQL with advanced features like task chains, YAML configuration, and built-in operations.
A visual, low-code data preparation tool that generates Python code for ETL, reporting, and AI-assisted workflows.
A command-line tool to import CSV and JSON files into PostgreSQL with automatic table generation.
A Java/Groovy/JavaFX data visualization tool for ETL, machine learning, and publishing web visualizations.
A collection of pre-built Google Cloud Dataflow templates for common data import/export, backup, and bulk API operations.
A logical replication extension for PostgreSQL that enables high-performance, cross-version data replication and upgrades.
A native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, and custom formats.
Run lambda functions over S3 objects with concurrency control for data pipelining and analytics.
An R interface for Apache Spark that enables distributed data processing, machine learning, and SQL queries using familiar R syntax.
LinkedIn's previous generation Kafka to HDFS pipeline for batch data ingestion.
A simple, fast, and flexible ETL framework for .NET with built-in readers and writers for CSV, JSON, XML, Parquet, and more.
A cross-platform workflow automation engine for developers and sysadmins to automate file operations, system tasks, and scheduled jobs.
Official connector for integrating Apache Spark with MongoDB, enabling distributed data processing on MongoDB data.
A JavaScript toolkit for translating, querying, and integrating geospatial data from any API into multiple formats.
A simple, lightweight batch processing framework for Java designed for ETL jobs.
An AWS Lambda function that automatically loads files from S3 into Amazon Redshift clusters with zero server administration.
A .NET Core code generation and ETL tool that builds projects from data sources using configurable templates and tasks.
A Rust-based data transfer suite for ultra-fast replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka, and ClickHouse.
A Clojure library for writing map-reduce queries that compile to Apache Pig or Cascading, enabling distributed data processing with Clojure syntax.
A curated list of awesome system integration software, patterns, and resources.
A library for parsing and querying XML data with Apache Spark SQL and DataFrames.
Kotlin bindings and extensions for Apache Spark, enabling idiomatic Kotlin development with data classes, lambdas, and null safety.
A fast, fully-featured, and developer-friendly Clojure API for Apache Spark.
An open-source Python and CLI tool for reading OpenStreetMap PBF files using DuckDB and exporting to GeoParquet.
A unified platform for big data stream and batch processing on Hadoop YARN with enterprise-grade operability.
A PostgreSQL foreign data wrapper that enables querying and manipulating MongoDB data directly from PostgreSQL.
A Java library for building data pipelines that connect Amazon Kinesis streams to AWS and non-AWS services like DynamoDB, Redshift, S3, and Elasticsearch.
A bi-directional connector enabling Apache Spark to read from and write to Neo4j graph databases using Spark DataSource APIs.
A Spark library for reading and writing data between Spark SQL and MongoDB collections.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.