Showing 36 of 219 projects
A Ruby SDK for integrating applications with Apache PredictionIO's Event Server and Engine APIs.
A curated list of awesome HBase projects, clients, frameworks, tools, and resources.
A JavaScript library for creating interactive tree diagrams with dynamic data updates and customizable visualizations.
A practical guide to exploratory data analytics using Hadoop with Pig and Ruby for terabyte-scale data processing.
A distributed framework extending Apache Spark with unified SQL access to multiple datastores, optimized connectors, and streaming support.
A visual development platform for building, deploying, and managing streaming analytics applications with multiple engine bindings.
A scalable high-performance platform for R that enables large-scale machine learning, statistical analysis, and graph processing across clusters.
A linearly scalable multi-row, multi-table transaction library for HBase with serializable isolation.
An open-source toolkit for analyzing web archives at scale using Apache Spark.
A Java framework for creating real-time time series aggregations from Amazon Kinesis streams.
A framework for creating interactive, details-on-demand data visualizations that scale to millions of records with a declarative API.
A multi-platform data-mining and visualization library for RAD Studio, supporting in-memory databases, pivot tables, and big data.
A scalable malware processing and analytics platform built on Hadoop Pig for binary data extraction and analysis.
A Java library for creating in-memory circular buffers using direct ByteBuffers to minimize garbage collection overhead.
Course materials for UCLA's STATS 418 - Tools in Data Science covering R packages, machine learning libraries, databases, and reproducibility tools.
Operator and codec library for building real-time streaming applications on Apache Apex.
A scalable full history and state API solution for Antelope (formerly EOSIO) blockchain networks.
An R package for creating, storing, and manipulating massive matrices using shared memory and memory-mapped files.
A UI application for viewing and manipulating data stored in Apache HBase distributed databases.
A MapReduce-style framework for processing fast/streaming data, implementing the MapUpdate model.
A Go library that generates type-safe Parquet readers and writers from Go structs or existing Parquet files.
A Go database/sql driver for Apache Avatica server, enabling Go applications to connect to Phoenix and other Avatica-backed databases.
A collection of libraries for large-scale data processing in Hadoop ecosystems, including Spark, Pig, and incremental MapReduce.
An R extension for distributed computing using Apache Hive, enabling HQL queries in R and R functions in Hive.
A unified R API for writing parallel and distributed applications across different backends like parallel, HP Distributed R, and SparkR.
An experimental Rust client for Apache Spark Connect, providing a DataFrame API to interact with Spark clusters.
Run MPI programs on Hadoop YARN clusters using MPICH-3.1.2 and SSH for distributed computing.
An open-source framework for developing large-scale anomaly detection models using Apache Spark.
A Scalding library for machine learning and statistical analysis, featuring Mahout vector integration, K-Means clustering, and Naive-Bayes classifiers.
A Julia package for efficient large-scale Gaussian Mixture Models with support for diagonal/full covariance, parallel training, and variational Bayes.
A cross-platform desktop GUI for managing and querying TDengine databases.
A collection of interactive Jupyter notebooks for learning Hadoop, Spark, and MapReduce with hands-on tutorials and demos.
A PHP client extension for the TDengine big data engine, with Swoole coroutine support.
Mozilla's utility library for Hadoop, HBase, Pig, and related big data technologies.
A simple utility for testing Apache Hive scripts locally without requiring Java development skills.
Go implementation of Count-Min-Log sketch for improved approximate counting of low-frequency events.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.