Showing 33 of 105 projects
A Go library for master-less peer-to-peer autodiscovery and RPC between HTTP services on the same network.
A Python interface to the Amazon Kinesis Client Library for building distributed applications that process streaming data reliably at scale.
A Python library that provides a Pandas-like API on top of Apache Spark DataFrames for distributed data analysis.
A multi-platform distributed brute-force password cracking system for parallelizing dictionary and word generator attacks.
A web interface for Hashcat that enables distributed password cracking sessions across multiple servers with real-time results.
A scalable machine learning library that runs on Apache Hive, Spark, and Pig for distributed ML directly in SQL.
A Julia interface for XGBoost, providing efficient distributed gradient boosting for regression, classification, and ranking.
An idiomatic Clojure dataframe library that runs on Apache Spark, providing a seamless interface for data processing and machine learning.
A decentralized hyperparameter optimization framework for Go, inspired by Optuna, supporting Bayesian optimization and evolution strategies.
A distributed web interface for collaborative memory forensics analysis using Volatility 3.
A distributed Spark/Scala implementation of Isolation Forest and Extended Isolation Forest algorithms for scalable unsupervised outlier detection.
An experimental Go client for Apache Spark Connect, enabling Go applications to interact with Spark clusters via gRPC.
A distributed streaming machine learning framework for mining big data streams with abstraction over processing engines.
A joblib backend that enables Python parallel computing tasks to run on Apache Spark clusters.
A Ruby wrapper for Apache Spark, enabling large-scale data processing with Ruby's expressive syntax.
A Python wrapper for Cascading that enables building and controlling Hadoop data processing workflows entirely in Python.
A web-based system performance monitoring and task management tool for servers and headless Raspberry Pi setups.
A BOINC-based distributed password cracking system powered by hashcat, enabling recovery of passwords from encrypted media and hashes across GPU-equipped nodes.
A distributed video processing platform built on Apache Storm with OpenCV integration for large-scale computer vision pipelines.
A distributed framework extending Apache Spark with unified SQL access to multiple datastores, optimized connectors, and streaming support.
A scalable high-performance platform for R that enables large-scale machine learning, statistical analysis, and graph processing across clusters.
An Apache Spark framework for efficient data processing, extraction, and derivation from web archives and archival collections.
An open-source toolkit for analyzing web archives at scale using Apache Spark.
An R extension for distributed computing using Apache Hive, enabling HQL queries in R and R functions in Hive.
A unified R API for writing parallel and distributed applications across different backends like parallel, HP Distributed R, and SparkR.
A distributed and concurrent command-line job server & client for parallel command execution across multiple systems.
Run MPI programs on Hadoop YARN clusters using MPICH-3.1.2 and SSH for distributed computing.
An experimental Rust client for Apache Spark Connect, providing a DataFrame API to interact with Spark clusters.
A Common Lisp library for distributing computational tasks across multiple machines using the lparallel API.
A serverless machine learning framework that scales algorithms across cloud lambda functions.
A fast and durable Pub/Sub channel over Websockets for realtime data propagation in FastAPI applications.
A Clojure wrapper for Deeplearning4j, providing idiomatic access to neural networks, data import, and distributed training.
A collection of interactive Jupyter notebooks for learning Hadoop, Spark, and MapReduce with hands-on tutorials and demos.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.