Showing 36 of 219 projects
A secure time series database backed by Apache Accumulo with Grafana integration for data visualization.
A framework enabling spatial data analysis within Hadoop ecosystems using Hive and SparkSQL.
A Python interface to the Amazon Kinesis Client Library for building distributed applications that process streaming data reliably at scale.
A Python library that provides a Pandas-like API on top of Apache Spark DataFrames for distributed data analysis.
A framework for building scalable machine learning models in Hadoop using the Scalding DSL.
Open-source platform for network security analytics using flow and packet analysis to detect unknown threats at cloud scale.
A unified platform for big data stream and batch processing on Hadoop YARN with enterprise-grade operability.
A free, open-source alternative to Spark UI and Spark History Server with enhanced CPU and memory metrics visualizations.
A serverless proxy for Spark clusters that provides a functional programming framework and deployment model for Spark applications.
A bi-directional connector enabling Apache Spark to read from and write to Neo4j graph databases using Spark DataSource APIs.
A streaming JsonPath processor for Java that extracts JSON data without loading entire documents into memory.
A scalable machine learning library that runs on Apache Hive, Spark, and Pig for distributed ML directly in SQL.
A Spark library for reading and writing data between Spark SQL and MongoDB collections.
An open-source big data security analytics tool that analyzes network packet capture (pcap) files using Apache Pig.
A lightweight real-time big data streaming engine built on Akka for high-throughput, low-latency data processing.
An idiomatic Clojure dataframe library that runs on Apache Spark, providing a seamless interface for data processing and machine learning.
A vendor-neutral, language-independent specification for building interoperable messaging and streaming applications across heterogeneous systems.
An open-source real-time stream processing framework combining high-throughput event processing with low-latency SQL-like streaming queries.
A Java library for sorting very large files using external-memory algorithms and multiple cores.
A distributed Spark/Scala implementation of Isolation Forest and Extended Isolation Forest algorithms for scalable unsupervised outlier detection.
Define, run, and deploy big data applications on AWS, OpenStack, and local machines using Docker.
An experimental Go client for Apache Spark Connect, enabling Go applications to interact with Spark clusters via gRPC.
A distributed streaming machine learning framework for mining big data streams with abstraction over processing engines.
An open-source research framework for distributed temporal graph analytics built on Apache Flink.
A Python toolkit for developing, testing, and managing Apache Storm streaming data processing topologies.
A high-performance, type-safe DataFrame library for the JVM enabling large-scale data analysis with parallel processing capabilities.
A collection of connectors enabling Apache HBase integration with Kafka, Spark, and other data processing systems.
A high-performance Presto connector for querying HBase with 10-100x faster performance than other open-source alternatives.
Interactive visualization tool for monitoring Hadoop HDFS cluster usage and file storage efficiency.
A Ruby wrapper for Apache Spark, enabling large-scale data processing with Ruby's expressive syntax.
A Python wrapper for Cascading that enables building and controlling Hadoop data processing workflows entirely in Python.
A cross-platform desktop GUI for managing and querying TDengine time-series databases.
A Hadoop library for reading and processing packet capture (PCAP) files in MapReduce jobs and Hive queries.
A Go-based toolkit for fast ETL and feature extraction on Hadoop, optimized for rapid development and execution.
A lightweight tool for searching Hadoop jobs, visualizing performance, and viewing cluster utilization.
A thin integration layer connecting Apache Spark with various NoSQL datastores and JDBC databases.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.