Showing 36 of 108 projects
An open-source framework for building secure, reliable, and performant peer-to-peer applications.
A library for evaluating TensorFlow models on large datasets with distributed computation and slicing analysis.
A Python framework for scalable time series forecasting using machine learning models, designed for production environments.
A DataFrame-based graph processing library for Apache Spark, enabling scalable graph analytics and algorithms.
A high-performance .NET data access layer inspired by MyBatis, offering XML-managed SQL, caching, read/write splitting, and dynamic repositories.
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources.
An open-source, in-memory, distributed batch and stream processing engine for Java applications.
An open-source machine learning system for the end-to-end data science lifecycle from data preparation to model serving.
An open-source, Python-based data analysis tool with specialized data types and methods for genomic data at scale.
A genomics analysis platform that uses Apache Spark to parallelize genomic data processing across clusters, replacing traditional file-based workflows.
A validated, scalable, community-developed pipeline for variant calling, RNA-seq, and small RNA analysis in genomic sequencing.
An R interface for Apache Spark that enables distributed data processing, machine learning, and SQL queries using familiar R syntax.
C# and F# language binding and extensions for Apache Spark, enabling .NET developers to write Spark driver programs and data processing operations.
A unified Python interface for constructing and managing workflows across engines like Argo Workflows, Tekton Pipelines, and Apache Airflow.
Pythonic orchestration tool for AI/ML, HPC, and quantum computing workflows across heterogeneous compute environments.
A scalable machine learning library for training Generalized Linear Models and GLMix models on Apache Spark.
A self-hosted web platform for distributed video encoding using HandBrake across multiple headless devices.
A collection of R packages for interacting with Hadoop ecosystems, enabling big data analysis from R.
TensorFlow binding for Apache Spark DataFrames, enabling TensorFlow program execution on Spark data.
Official connector for integrating Apache Spark with MongoDB, enabling distributed data processing on MongoDB data.
A comprehensive learning guide and interview refresher for Apache Spark, covering core concepts, architecture, and performance optimization.
A serverless distributed hash-cracking platform built on AWS, offering pay-as-you-go GPU power with an intuitive UI.
A high-performance C++/DPC++ library for accelerated machine learning on CPUs, GPUs, and distributed systems.
An R package providing a lightweight frontend to use Apache Spark for distributed data processing from R.
A Clojure DSL for Apache Spark that enables distributed data processing using idiomatic Clojure.
An optimized distributed gradient boosting library for fast and accurate machine learning on large datasets.
A Clojure library for writing map-reduce queries that compile to Apache Pig or Cascading, enabling distributed data processing with Clojure syntax.
A library enabling Apache Spark to read from and write to Apache HBase tables as external data sources using DataFrames and SQL.
A library for parsing and querying XML data with Apache Spark SQL and DataFrames.
A Spark Streaming library for mining big data streams with incremental learning algorithms.
A decentralized marketplace and platform for distributed computations, enabling users to buy and sell computing power.
A library for writing Apache Spark applications in Haskell, enabling resilient analytics that scale to thousands of nodes.
A fast, fully-featured, and developer-friendly Clojure API for Apache Spark.
Deploy Hashtopolis on Google Cloud Shell and Colab for free, zero-infrastructure password cracking.
A scalable, pluggable, and distributed queue and resource system for password cracking and other compute-intensive tasks.
A Scala framework for distributed supervised learning of decision tree ensemble models, inspired by Google's PLANET.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.