Showing 35 of 35 projects
A unified deep learning system for efficient large-scale model training and inference with advanced parallelism strategies.
A scalable, portable, and distributed gradient boosting library for efficient machine learning across multiple languages and platforms.
A high-performance serving framework for large language models and multimodal models, delivering low-latency and high-throughput inference.
A Java client and real-time data platform for Valkey and Redis, providing distributed Java objects, collections, and services.
A flexible and efficient deep learning framework that mixes symbolic and imperative programming for heterogeneous distributed systems.
A flexible and efficient deep learning framework that mixes symbolic and imperative programming for heterogeneous distributed systems.
A flexible and efficient deep learning framework that mixes symbolic and imperative programming for heterogeneous distributed systems.
A fast, distributed gradient boosting framework based on decision tree algorithms for ranking, classification, and other machine learning tasks.
A fast, distributed gradient boosting framework based on decision tree algorithms for ranking, classification, and other ML tasks.
A hyperparameter optimization framework for machine learning with a define-by-run API for dynamic search spaces.
A drop-in replacement for pandas that scales data analysis workflows to use all CPU cores and handle out-of-memory datasets.
A Python library for distributed asynchronous hyperparameter optimization over complex search spaces.
An open-source, in-memory platform for distributed and scalable machine learning with support for a wide range of algorithms and big data technologies.
An open-source Go engine that replicates AlphaGo Zero's architecture, learning solely through self-play without human knowledge.
Ultra fast distributed actor framework for Go, C#, and Java/Kotlin, enabling cross-platform concurrency and messaging.
An open-source library for building massively scalable machine learning pipelines on Apache Spark.
A state-of-the-art Natural Language Processing library built on Apache Spark, offering 100,000+ pretrained models and pipelines in 200+ languages.
A high-performance distributed map/reduce system with DAG execution, written in Go, supporting standalone or distributed modes.
A PyTorch framework for deep learning research and development, focusing on reproducibility and rapid experimentation.
Koalas provides the pandas DataFrame API on Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
A local-first, single-binary workflow orchestration engine that runs declarative DAGs from laptop to distributed cluster.
A distributed computation system written in Go for parallel and cluster processing, similar to Hadoop MapReduce and Spark.
An open-source implementation of the Message Passing Interface (MPI) specification for high-performance computing.
An open-source framework for machine learning and other computations on decentralized data.
A cluster computing framework for processing large-scale geospatial data within Apache Spark, Flink, and other big data systems.
A library for writing MapReduce programs that execute on distributed platforms like Storm and Scalding using Scala/Java collection-like syntax.
A masterless, cloud-scale, fault-tolerant distributed computation system for batch and stream processing written in Clojure.
An open-source, petabyte-scale, fault-tolerant distributed file system with POSIX compliance and easy scalability.
A connector that enables Apache Spark to read from and write to Apache Cassandra databases for distributed data processing.
Ultra-fast, distributed, cross-platform actor framework for Go, C#, and Java/Kotlin.
A curated list of awesome Apache Spark packages, libraries, and resources for data engineers and scientists.
A federated Big Data orchestration service that simplifies job execution across distributed clusters by abstracting infrastructure complexity.
A multi-platform client-server tool for distributing Hashcat password cracking tasks across multiple computers.
A language for distributed deep learning that simplifies model parallelism by specifying tensor computations across hardware meshes.
Elephas is a Keras extension for distributed deep learning on Apache Spark, enabling data-parallel training at scale.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.