Showing 36 of 78 projects
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
A free and open-source desktop application that upscales and enhances low-resolution images using AI models like Real-ESRGAN.
A platform to programmatically author, schedule, and monitor workflows as code.
A platform to programmatically author, schedule, and monitor workflows as code.
An open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
A GUI application for Mac that losslessly optimizes images using multiple compression tools.
A curated list of data engineering tools, frameworks, databases, and resources for software developers.
A comprehensive PDF processing library and CLI written in Go, supporting encryption, validation, and batch operations.
A polyglot document intelligence framework with a Rust core for extracting text, metadata, and structured data from 91+ file formats.
A platform for deploying, managing, and scaling machine learning models in production on AWS infrastructure.
An open-source feature store for managing and serving machine learning features for training and online inference.
A Java library for generating high-quality thumbnails with a simple fluent interface and no external dependencies.
Azkaban is a batch workflow job scheduler created at LinkedIn to manage Hadoop jobs.
A desktop app for compressing PNG and JPEG images with a modern GUI, supporting batch optimization and format conversion.
A native macOS client for compressing images using the TinyPNG API without a browser.
A command-line tool that automates ImageOptim, ImageAlpha, and JPEGmini for Mac to batch-optimize images in build processes.
A Scala API for Apache Beam and Google Cloud Dataflow, enabling unified batch and streaming data processing.
A cross-platform desktop GUI application for cleaning metadata from images, videos, PDFs, and other files.
A fast, comprehensive, and dependency-free image processing library for Node.js with native bindings.
A distributed data integration framework for big data ecosystems, handling ingestion, replication, organization, and lifecycle management for both streaming and batch data.
A distributed data integration framework for big data ecosystems, handling ingestion, replication, organization, and lifecycle management for both streaming and batch data.
A library for writing MapReduce programs that execute on distributed platforms like Storm and Scalding using Scala/Java collection-like syntax.
A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
A Java library and command-line tool for extracting tables from PDF files.
Elyra is a set of AI-centric extensions for JupyterLab that adds visual pipeline editing, batch job execution, and AI-assisted coding.
A high-performance CSV ingestion and generation library for Ruby with C acceleration, designed for real-world data with intelligent defaults.
A one-stop, full-scenario integration framework for massive data, supporting data ingestion, synchronization, and subscription.
A Go package for downloading files with progress monitoring, auto-resume, checksum validation, and concurrent batch downloads.
A PHP library for geocoding, coordinate conversion, distance calculation, and other geographic operations.
A collection of pre-built Google Cloud Dataflow templates for common data import/export, backup, and bulk API operations.
A lightweight Java mailing library with a simple API for sending complex emails, built on Jakarta Mail.
Official Java client library for InfluxDB 1.x, enabling Java applications to write and query time series data.
An open-source, in-memory, distributed batch and stream processing engine for Java applications.
Run lambda functions over S3 objects with concurrency control for data pipelining and analytics.
Award-winning, efficient C++ tools for processing LiDAR data in LAS/LAZ formats with multi-core batch processing.
Hardware-accelerated, batchable, and differentiable optimization algorithms implemented in JAX for machine learning research.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.