A lightweight Python library for creating portable, expressive, and testable data transformation DAGs with built-in lineage and metadata.
Apache Hamilton is a Python library that helps data scientists and engineers define, manage, and execute data transformation workflows as directed acyclic graphs (DAGs). It structures code into modular functions that automatically build a DAG, providing built-in lineage tracking, metadata capture, and data validation to improve collaboration and maintainability.
Data scientists, data engineers, and ML engineers building ETL pipelines, ML workflows, LLM applications, or any data transformation system in Python who need structure, testability, and portability.
Developers choose Apache Hamilton for its unique combination of portability (runs anywhere Python does), expressiveness through function modifiers, and built-in software engineering practices like modularity and validation, which reduce technical debt and ease the transition from development to production.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Hamilton DAGs run anywhere Python is supported, from local scripts to production systems like Airflow or FastAPI, enabling seamless code reuse across environments as highlighted in the README.
Decorators such as @config.when() allow environment-specific DAG modifications without code duplication, keeping logic DRY and reducing maintenance complexity, a key feature emphasized in the documentation.
The @check_output decorator and SchemaValidator() enable automatic validation of data properties and schemas for dataframe-like objects, catching errors early in pipelines as described in the features section.
With the optional UI, Hamilton automatically captures lineage and metadata, providing a data catalog and execution observability to enhance collaboration and debugging, as shown in the UI examples.
As an Apache incubating project, Hamilton may undergo significant changes and lacks full ASF endorsement, posing risks for production stability and long-term support, as noted in the disclaimer.
Visualizations require separate Graphviz installation, and the UI needs additional dependencies like 'ui' and 'sdk', adding complexity to initial setup compared to more plug-and-play tools.
Hamilton is optimized for DAGs and doesn't support loops or complex conditionals directly, often requiring workarounds or switching to sister library Burr for such scenarios, as admitted in the README.
Hamilton is an open-source alternative to the following products: