The "Awesome Data Engineering" project is a curated collection of resources aimed at supporting professionals in the field of data engineering, which involves the design and construction of systems for collecting, storing, and analyzing data. This list encompasses a variety of categories, including data pipelines, ETL tools, data warehousing solutions, frameworks, and best practices, as well as tutorials and community resources. Whether you are a beginner looking to understand the fundamentals or an experienced engineer seeking advanced techniques, this list offers valuable insights and tools to enhance your data engineering projects. Dive into this collection to discover the tools and methodologies that can streamline your data workflows and improve your data management capabilities.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The "Awesome Public Datasets" project is a curated collection of publicly available datasets across various domains, including government, healthcare, finance, and social sciences. This list features datasets in multiple formats, along with links to tools and platforms that facilitate data analysis and visualization. It is an invaluable resource for researchers, data scientists, and students looking to access high-quality data for their projects or studies. By providing a wide array of datasets, this collection empowers users to explore, analyze, and derive insights from real-world data. Dive in to discover the wealth of information available for your next data-driven endeavor!
The "Awesome Big Data" project is a curated collection of resources focused on big data technologies and practices that enable the processing and analysis of vast amounts of data. This list encompasses a variety of categories, including frameworks, tools, libraries, databases, and tutorials that cater to both beginners and experienced data professionals. Users can explore resources related to data storage, processing, analytics, and visualization, making it an invaluable asset for data scientists, engineers, and researchers. Whether you're looking to enhance your big data skills or find the right tools for your projects, this collection provides a comprehensive guide to navigating the big data landscape.
The "Awesome Network Analysis" project is a curated collection of resources focused on the study and analysis of networks, which are structures made up of interconnected elements. This list encompasses a variety of tools, libraries, datasets, and tutorials that facilitate the exploration of network theory, graph analysis, and visualization techniques. It serves as a valuable resource for researchers, data scientists, and enthusiasts interested in understanding complex systems, social networks, and data relationships. Whether you are a beginner looking to grasp the basics or an experienced analyst seeking advanced methodologies, this collection provides essential tools and insights to enhance your network analysis projects.
The "Awesome Streaming" project is a curated collection of resources focused on streaming technologies, which enable the real-time processing and distribution of data. This list encompasses a variety of categories including frameworks, libraries, tools, tutorials, and community resources that cater to different streaming protocols and architectures. It is beneficial for developers, data engineers, and researchers who are looking to implement or enhance streaming solutions in their applications. With a wealth of information and tools at your disposal, users can explore innovative ways to manage and analyze streaming data effectively.
A lightweight, fault-tolerant distributed relational database built on SQLite, designed for high availability with minimal operational effort.
An open-source, cloud-native, distributed SQL database offering MySQL compatibility, horizontal scalability, and HTAP capabilities.
A high-performance NoSQL database compatible with Apache Cassandra and Amazon DynamoDB, built on a shared-nothing architecture.
A graph database framework for storing and querying large-scale graphs with rich properties and in-database aggregation.
A scalable time series database optimized for real-time metrics, events, and analytics with fast query response.
A fast distributed scalable time series database built on top of Cassandra.
A high-performance real-time analytics database designed for fast queries and ingest to reduce time to insight.
An in-memory computing platform combining a high-performance database and Lua application server for scalable web components.
An open-source graph database for linked data, inspired by Google's Knowledge Graph.
A CLI tool to copy data between any databases and platforms with a single command, no code required.
A lightweight, non-JVM command-line tool for producing, consuming, and inspecting Apache Kafka messages.
A Docker image and configuration for running Apache Kafka in containerized environments.
A web-based tool for managing Apache Kafka clusters, enabling cluster inspection, topic management, and partition operations.
A Node.js client for Apache Kafka 0.9 and later, providing producers, consumers, and administrative APIs.
A fault-tolerant service that persists Kafka log data to cloud storage like S3, GCS, Azure Blob Storage, and OpenStack Swift.
A deprecated tool for collecting, processing, and delivering data from multiple sources with Go and Lua plugin support.
A distributed data integration framework for big data ecosystems, handling ingestion, replication, organization, and lifecycle management for both streaming and batch data.
A Python library that simplifies data integration between pandas and AWS services like Athena, S3, Redshift, and more.
A polyglot document intelligence framework with a Rust core for extracting text, metadata, and structured data from 91+ file formats.
A fast compression/decompression library optimized for speed over maximum compression.
A language-neutral, platform-neutral, extensible mechanism for serializing structured data developed by Google.
A fast and efficient binary object graph serialization and cloning framework for Java.