A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
Pathway is a Python ETL and stream processing framework designed for building real-time data pipelines, analytics, and live AI applications like LLM and RAG workflows. It solves the challenge of unifying batch and streaming computation with a simple Python API, while delivering high performance through a scalable Rust engine that handles incremental processing and distributed workloads.
Data engineers and ML engineers building real-time ETL pipelines, streaming analytics, or live AI/LLM applications who want Python simplicity with production-scale performance.
Developers choose Pathway for its unique combination of a Python-friendly API and a high-performance Rust backend, enabling them to write once and run anywhere—from local development to distributed cloud deployments—without sacrificing speed or scalability.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The same Python code works for both batch jobs and live streams, simplifying development and deployment, as highlighted in the key features section.
Leverages Differential Dataflow for multithreaded, multiprocessing, and distributed computations, overcoming Python's performance limits, as described in the README.
Includes built-in connectors for Kafka, PostgreSQL, and more, plus Airbyte integration for over 300 sources, and support for custom Python connectors.
Offers dedicated LLM tooling with wrappers, embedders, and an in-memory Vector Index, making it easy to build live RAG and AI pipelines, as shown in the use-cases.
Officially supports only MacOS and Linux; Windows users must run it on a virtual machine, which adds setup complexity and overhead.
Exactly-once processing guarantees are reserved for the paid enterprise version, while the free version offers only at-least-once consistency, limiting reliability for some use cases.
The Rust engine introduces additional installation steps and potential compatibility issues compared to pure Python frameworks, which can complicate deployment.
Pathway is an open-source alternative to the following products:
Apache Spark is an open-source unified analytics engine for large-scale data processing, providing high-level APIs in Java, Scala, Python, and R.
Kafka Streaming refers to the stream processing capabilities of Apache Kafka, allowing real-time processing of data streams with exactly-once semantics.