Showing 33 of 33 projects
A modern, enterprise-ready business intelligence web application for data visualization and exploration.
A powerful, interactive JavaScript charting and data visualization library for the browser.
A Python ETL framework for stream processing, real-time analytics, and building live LLM/RAG pipelines, powered by a scalable Rust engine.
An open-source business intelligence and embedded analytics platform that enables everyone to explore and visualize data.
A curated list of awesome big data frameworks, resources, and tools across various categories.
A curated list of awesome big data frameworks, resources, and tools across various categories.
A transformation workflow that enables data teams to transform data in their warehouse using SQL and software engineering best practices.
A transformation tool that enables data analysts and engineers to transform data using software engineering best practices.
A fast distributed SQL query engine for big data analytics, enabling interactive queries across diverse data sources.
A distributed, fast open-source graph database for large-scale data with horizontal scalability and high availability.
A re-usable, easy interface JavaScript chart library based on D3.js.
A one-stop data visualization platform that can be used as cloud service or integrated into third-party systems as a plugin.
A metadata-driven data discovery and catalog platform that helps data teams find, understand, and trust their data resources.
A PostgreSQL extension that adds graph database capabilities, enabling hybrid relational and graph querying with openCypher.
Official Python client for Elasticsearch, providing idiomatic access to search and analytics engines.
A language and runtime that optimizes performance of data-intensive applications by lazily building and optimizing computations across libraries.
A connector that enables Apache Spark to read from and write to Apache Cassandra databases for distributed data processing.
A lightweight IoT data analytics and stream processing engine for resource-constrained edge devices.
A lightweight IoT data analytics and stream processing engine designed for resource-constrained edge devices.
A comprehensive benchmark suite for evaluating speed, throughput, and resource utilization of big data frameworks like Hadoop, Spark, and streaming engines.
An ultra high-performance graph database supporting Blueprints and RDF/SPARQL APIs, scaling to 50 billion edges on a single machine.
Code samples and examples from AWS Big Data Blog posts for implementing data analytics solutions on AWS.
An open-source ML-powered analytics engine for automated outlier detection and root cause analysis on high-dimensional metrics.
Official connector for integrating Apache Spark with MongoDB, enabling distributed data processing on MongoDB data.
A Kibana port for Apache Solr that provides rich dashboard and visualization capabilities for time-series and non-time-series data.
A high-performance C++/DPC++ library for accelerated machine learning on CPUs, GPUs, and distributed systems.
A high-performance data profiler for discovering and validating complex patterns like functional dependencies, inclusion dependencies, and association rules.
A high-performance data profiler for discovering and validating complex patterns in datasets, enabling data cleaning and quality analysis.
A high-performance, type-safe DataFrame library for the JVM enabling large-scale data analysis with parallel processing capabilities.
An open source programming model and runtime for analyzing data and events on edge devices, reducing data transmission and storage costs.
A practical guide to exploratory data analytics using Hadoop with Pig and Ruby for terabyte-scale data processing.
A curated collection of awesome apps, visualizations, and resources for the Splunk data platform.
A probabilistic data structure service and storage for efficient frequency, cardinality, and membership queries on large datasets.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.