Data Warehouse

27 projects

Showing 27 of 27 projects

An all-in-one open-source platform for product analytics, feature flags, session replay, experiments, and more to help build successful products.

#experimentation#developer-tools#open-source

Open-source data integration platform for building ELT pipelines from APIs, databases, and files to data warehouses, lakes, and lakehouses.

#open-source#pipeline#data-integration

A curated list of awesome big data frameworks, resources, and tools across various categories.

#database#data-storage#open-source

Stars14.5k

Forks2.6k

Last commit2 months ago

awesome-bigdata

A curated list of awesome big data frameworks, resources, and tools across various categories.

#database#data-science#distributed-systems

Stars14.5k

Forks2.6k

Last commit2 months ago

DruidJava

A high-performance real-time analytics database designed for fast queries and ingest to reduce time to insight.

#apache#high-performance#real-time-analytics

Stars14.0k

Forks3.8k

Last commit2 days ago

dbt-coreRust

A transformation tool that enables data analysts and engineers to transform data using software engineering best practices.

#version-control#pypa#business-intelligence

A transformation workflow that enables data teams to transform data in their warehouse using SQL and software engineering best practices.

#data-documentation#pypa#business-intelligence

An open-source enterprise data warehouse built in Rust for AI agents, analytics, vector search, and full-text search.

#ai#database#serverless

Stars9.4k

Forks894

Last commit1 day ago

awesome-data-engineering

A curated list of data engineering tools, frameworks, databases, and resources for software developers.

#stream-processing#workflow-orchestration#awesome-list

Stars8.9k

Forks1.6k

Last commit3 days ago

RudderStackGo

An open-source, privacy-focused customer data platform (CDP) that collects, processes, and routes event data to warehouses and tools.

#event-collection#segment-alternative#warehouse-management

Stars4.5k

Forks57

Last commit1 day ago

amazon-redshift-utilsPython

A collection of utilities, scripts, and views for managing, optimizing, and automating Amazon Redshift data warehouse operations.

#sql-scripts#performance-tuning#data-migration

A distributed, multi-tenant gateway providing serverless SQL on data warehouses and lakehouses.

#hiveserver2-alternative#hacktoberfest#spark

Stars2.4k

Forks1.0k

Last commit4 days ago

MultiwovenRuby

An open-source Reverse ETL platform for syncing data from warehouses to business tools like Salesforce, HubSpot, and Slack.

#open-source#reverse-etl#data-integration

Stars1.7k

Forks92

Last commit3 days ago

BigQuery UtilsJupyter Notebook

A collection of utilities, scripts, UDFs, and dashboards for BigQuery migration, optimization, and data warehouse operations.

#google-cloud-platform#performance-optimization#data-migration

An advanced open-source MPP database for data warehousing, large-scale analytics, and AI/ML workloads.

#ai#greenplum#database

A simple, fast, and flexible ETL framework for .NET with built-in readers and writers for CSV, JSON, XML, Parquet, and more.

#parquet#cinchoo-etl#flat

Stars859

Forks141

Last commit1 month ago

VulcanSQLTypeScript

A data API framework that turns SQL into secure RESTful APIs for AI agents and data applications.

#database#reporting#api-framework

Stars793

Forks42

Last commit2 years ago

BlinkDBScala

A large-scale data warehouse system that provides approximate query answers with error bounds on massive datasets up to 300x faster than Hive.

#spark#sampling#performance-optimization

Stars660

Forks121

Last commit12 years ago

aws-lambda-redshift-loaderJavaScript

An AWS Lambda function that automatically loads files from S3 into Amazon Redshift clusters with zero server administration.

#aws-cloudformation#batch-processing#serverless

A Python CLI tool for comparing data across heterogeneous databases and data warehouses to ensure migration accuracy.

#data-migration#ibis-framework#cli-tool

Stars515

Forks166

Last commit3 days ago

PuppetDBClojure

A fast, scalable data warehouse that caches and provides advanced querying for Puppet infrastructure data.

#reporting#devops#api

Stars305

Forks225

Last commit1 year ago

shibJavaScript

A web client for SQL-like query engines including Hive, Presto, and BigQuery, written in Node.js.

#query-engine#hive#presto

Stars199

Forks56

Last commit9 years ago

everythingMe/redshift_consoleJavaScript

A web-based tool for monitoring and managing Amazon Redshift clusters, providing insights into queries, WLM queues, tables, and load errors.

#database-monitoring#python#amazon-redshift

Stars92

Forks22

Last commit

terraform-aws-redshiftHCL

Terraform module for provisioning and managing AWS Redshift clusters and related resources.

#cloud-infrastructure#devops#redshift

Stars88

Forks156

Last commit6 months ago

Google Sheets ETLPHP

A PHP library for live importing Google Sheets data into data warehouses with periodic delta loads.

#data-integration#php-library#data-sync

Stars22

Forks1

Last commit4 months ago

d8a.techGo

Warehouse-native Analytics compatible with Google Analytics and Matomo tracking protocols. Ingest into ClickHouse, BigQuery or CSV/Parquet on S3/GCS/local fs while maintaining complete control over your data.

#ga4#matomo#tracker

Stars15

Forks1

Last commit8 days ago

db2lakeTypeScript

A lightweight Node.js ETL framework for extracting data from databases and loading it into data lakes and warehouses.

#database-migration#data-lake#nodejs

Stars2

Forks0

Last commit10 months ago

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub