A curated list of data engineering tools, frameworks, databases, and resources for software developers.
Awesome Data Engineering is a curated GitHub repository listing tools, frameworks, databases, and resources for data engineering. It helps software developers and data practitioners discover and evaluate technologies for building data pipelines, processing systems, and infrastructure. The list is organized by category, covering everything from data ingestion and storage to workflow orchestration and monitoring.
Data engineers, data architects, software developers building data-intensive applications, and anyone involved in designing or maintaining data infrastructure who needs a reference for available tools and best practices.
It saves significant research time by providing a single, community-vetted directory of the data engineering ecosystem. Unlike commercial directories, it is open-source, frequently updated, and includes both established and emerging technologies without vendor bias.
A curated list of data engineering tools for software developers
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Tools are organized into clear sections like Databases, Stream Processing, and Workflow, making it easy to browse by use case without sifting through irrelevant options.
Covers both open-source stalwarts like Apache projects and commercial services from AWS and Google, providing a balanced view of the data engineering landscape.
Includes links to podcasts, conferences, and books, such as the Data Engineering Podcast and curated book lists, aiding continuous professional development beyond tool discovery.
References real-time and static datasets like GitHub Archive and Common Crawl, which are valuable for testing and developing data pipelines without proprietary data.
Entries often consist of just names and GitHub links, lacking comparative insights, performance benchmarks, or guidance on tool suitability for specific scenarios.
The README doesn't specify selection criteria, leading to potential inclusion of deprecated tools (e.g., HyperDex) without clear warnings or context for beginners.
While it lists individual components, it lacks end-to-end workflow examples, forcing users to figure out how to combine tools like Airflow and Spark independently.