Showing 26 of 26 projects
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
A platform to programmatically author, schedule, and monitor workflows as code.
A platform to programmatically author, schedule, and monitor workflows as code.
A free and open-source desktop application that upscales and enhances low-resolution images using AI models like Real-ESRGAN.
An open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
A GUI application for Mac that losslessly optimizes images using multiple compression tools.
A comprehensive PDF processing library and CLI written in Go, supporting encryption, validation, and batch operations.
A curated list of data engineering tools, frameworks, databases, and resources for software developers.
A platform for deploying, managing, and scaling machine learning models in production on AWS infrastructure.
A polyglot document intelligence framework with a Rust core for extracting text, metadata, and structured data from 91+ file formats.
An open-source feature store for managing and serving machine learning features for training and online inference.
A Java library for generating high-quality thumbnails with a simple fluent interface and no external dependencies.
Azkaban is a batch workflow job scheduler created at LinkedIn to manage Hadoop jobs.
A desktop app for compressing PNG and JPEG images with a modern GUI, supporting batch optimization and format conversion.
A native macOS client for compressing images using the TinyPNG API without a browser.
A command-line tool that automates ImageOptim, ImageAlpha, and JPEGmini for Mac to batch-optimize images in build processes.
A Scala API for Apache Beam and Google Cloud Dataflow, enabling unified batch and streaming data processing.
A cross-platform desktop GUI application for cleaning metadata from images, videos, PDFs, and other files.
A fast, comprehensive, and dependency-free image processing library for Node.js with native bindings.
A distributed data integration framework for big data ecosystems, handling ingestion, replication, organization, and lifecycle management for both streaming and batch data.
A distributed data integration framework for big data ecosystems, handling ingestion, replication, organization, and lifecycle management for both streaming and batch data.
A library for writing MapReduce programs that execute on distributed platforms like Storm and Scalding using Scala/Java collection-like syntax.
A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
A Java library and command-line tool for extracting tables from PDF files.
Elyra is a set of AI-centric extensions for JupyterLab that adds visual pipeline editing, batch job execution, and AI-assisted coding.
A high-performance CSV ingestion and generation library for Ruby with C acceleration, designed for real-world data with intelligent defaults.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.