Showing 16 of 88 projects
A curated collection of resources and guides for understanding, selecting, and using NoSQL databases effectively.
An idiomatic Clojure dataframe library that runs on Apache Spark, providing a seamless interface for data processing and machine learning.
A Python SQL parser that converts SQL queries into JSON-izable parse trees for translation to non-SQL datastores.
An open-source unit test framework for Hive SQL queries, enabling TDD without installed dependencies via JUnit 4 and 5.
A Go library and CLI tool for validating CSV files against RFC 4180 standards.
A DataOps-friendly data quality monitoring platform with customizable checks, dashboards, and incident management for multiple data sources.
A curated list of awesome HBase projects, clients, frameworks, tools, and resources.
A visual development platform for building, deploying, and managing streaming analytics applications with multiple engine bindings.
A Spark library for reading from and writing to Google BigQuery using DataFrames and SQL.
A Rust DataFrame and data engineering library with PySpark/SQL-like syntax, built for business data pipelines with Microsoft stack integration.
An experimental Rust client for Apache Spark Connect, providing a DataFrame API to interact with Spark clusters.
A manifesto advocating for treating database interactions, queries, and lifecycle management as plain code with SQL as the primary language.
A Python framework for building and deploying serverless data and ML pipelines on AWS using AWS CDK.
A PHP client extension for the TDengine big data engine, with Swoole coroutine support.
A simple utility for testing Apache Hive scripts locally without requiring Java development skills.
An easy-to-use Python feature store for machine learning, optimized for timeseries data and built on Dask.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.