A platform to programmatically author, schedule, and monitor workflows as code.
Apache Airflow is an open-source workflow orchestration platform that allows users to programmatically author, schedule, and monitor complex data pipelines and computational workflows. It solves the problem of managing dependencies, scheduling, and execution of tasks in a scalable and maintainable way by defining workflows as code.
Data engineers, DevOps professionals, and developers who need to build, schedule, and monitor batch-oriented data pipelines, ETL processes, and other automated workflows.
Developers choose Airflow for its code-based workflow definition, rich extensibility, and powerful UI for monitoring, which together provide greater control, maintainability, and collaboration compared to static configuration tools.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Workflows are defined as Python code, enabling dynamic generation and parameterization, which makes them maintainable, versionable, and collaborative as highlighted in the README.
Includes a wide range of built-in operators and supports custom plugins, allowing users to adapt Airflow to specific needs, evidenced by the extensibility principle in the README.
Provides a comprehensive web UI with Grid, Graph, and Code views for visualizing pipelines and monitoring progress, as shown in the README's UI screenshots.
Encourages idempotent tasks and uses XCom for lightweight metadata passing, promoting reliable workflow execution without data duplication, as stated in the Project Focus section.
Installing Airflow requires using constraint files from specific GitHub branches, making setup non-trivial and error-prone, as admitted in the 'Installing from PyPI' section.
Airflow is designed for batch processing and not real-time streaming, limiting its use in continuous data flow scenarios, per the Project Focus note that it 'is not a streaming solution'.
Production environments are recommended to use Linux-based distros, with Windows support only via WSL2 or containers and not a high priority, as noted in the Requirements section.
Airflow is an open-source alternative to the following products:
Luigi is a Python module for building complex pipelines of batch jobs, handling dependency resolution, workflow management, and failure recovery.
Azkaban is a batch workflow job scheduler created at LinkedIn to manage Hadoop jobs, providing features for scheduling, dependency management, and monitoring of workflows.
A workflow scheduler system for Apache Hadoop that manages and coordinates complex data processing jobs.