Distributed Computing

151 projects

Showing 36 of 151 projects

ElephasPython

Elephas is a Keras extension for distributed deep learning on Apache Spark, enabling data-parallel training at scale.

#apache-spark#model-training#spark

Stars1.6k

Forks303

Last commit3 years ago

HolochainRust

An open-source framework for building secure, reliable, and performant peer-to-peer applications.

#open-source#distributed-systems#holochain

A library for evaluating TensorFlow models on large datasets with distributed computation and slicing analysis.

#data-slicing#model-evaluation#mlops

Stars1.3k

Forks279

Last commit1 month ago

mlforecastPython

A Python framework for scalable time series forecasting using machine learning models, designed for production environments.

#data-science#time-series-forecasting#production-ml

Stars1.3k

Forks129

Last commit2 days ago

GraphFramesScala

A DataFrame-based graph processing library for Apache Spark, enabling scalable graph analytics and algorithms.

#graph-processing#apache-spark#network-motifs

Stars1.2k

Forks268

Last commit13 hours ago

Sparkit-learnPython

PySpark + Scikit-learn = Sparkit-learn

#apache-spark#python#scikit-learn

Stars1.2k

Forks254

Last commit5 years ago

SmartSqlC#

A high-performance .NET data access layer inspired by MyBatis, offering XML-managed SQL, caching, read/write splitting, and dynamic repositories.

#orm#database#high-performance

Stars1.1k

Forks229

Last commit16 hours ago

Hadoop

A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources.

#awesome-list#data-engineering#big-data

Stars1.1k

Forks254

Last commit2 years ago

Hazelcast JetJava

An open-source, in-memory, distributed batch and stream processing engine for Java applications.

#stream-processing#event-processing#hacktoberfest

Stars1.1k

Forks203

Last commit1 year ago

SystemMLJava

An open-source machine learning system for the end-to-end data science lifecycle from data preparation to model serving.

#federated-learning#apache-spark#data-science

An open-source, Python-based data analysis tool with specialized data types and methods for genomic data at scale.

#scientific-computing#spark#python-library

Stars1.1k

Forks266

Last commit21 hours ago

ADAMScala

A genomics analysis platform that uses Apache Spark to parallelize genomic data processing across clusters, replacing traditional file-based workflows.

#genomic-data#apache-spark#parquet

Stars1.1k

Forks312

Last commit4 months ago

BcbioPython

A validated, scalable, community-developed pipeline for variant calling, RNA-seq, and small RNA analysis in genomic sequencing.

#community-driven#high-throughput-sequencing#genomics

Stars1.0k

Forks355

Last commit1 year ago

sparklyrR

An R interface for Apache Spark that enables distributed data processing, machine learning, and SQL queries using familiar R syntax.

#apache-spark#distributed#dplyr

Stars971

Forks308

Last commit22 days ago

Mobius: C# API for SparkC#

C# and F# language binding and extensions for Apache Spark, enabling .NET developers to write Spark driver programs and data processing operations.

#rdd#apache-spark#spark

Stars947

Forks209

Last commit7 months ago

CoulerPython

A unified Python interface for constructing and managing workflows across engines like Argo Workflows, Tekton Pipelines, and Apache Airflow.

#workflow-management#python-sdk#argo-workflows

Stars944

Forks85

Last commit1 year ago

CovalentPython

Pythonic orchestration tool for AI/ML, HPC, and quantum computing workflows across heterogeneous compute environments.

#workflow-management#high-performance-computing#pipelines

Stars866

Forks111

Last commit10 days ago

Photon MLTerra

A scalable machine learning library for training Generalized Linear Models and GLMix models on Apache Spark.

#apache-spark#gradle#large-scale-training

Stars797

Forks175

Last commit4 years ago

HandBrake WebTypeScript

A self-hosted web platform for distributed video encoding using HandBrake across multiple headless devices.

#batch-processing#video-transcoding#media-server

Stars787

Forks17

Last commit3 months ago

RHadoop

A collection of R packages for interacting with Hadoop ecosystems, enabling big data analysis from R.

#mapreduce#data-science#hbase

Stars760

Forks275

Last commit10 years ago

TensorFramesScala

TensorFlow binding for Apache Spark DataFrames, enabling TensorFlow program execution on Spark data.

#apache-spark#python#tensorflow

Stars744

Forks160

Last commit2 years ago

Mongo-SparkJava

Official connector for integrating Apache Spark with MongoDB, enabling distributed data processing on MongoDB data.

#apache-spark#connector#spark

Stars730

Forks320

Last commit3 days ago

Carefully Curated 70 Spark Questions with Additional Optimization Guides (First in the series)

A comprehensive learning guide and interview refresher for Apache Spark, covering core concepts, architecture, and performance optimization.

#apache-spark#spark#performance-optimization

Stars691

Forks80

Last commit4 years ago

NPKJavaScript

A serverless distributed hash-cracking platform built on AWS, offering pay-as-you-go GPU power with an intuitive UI.

#serverless#hash-cracking#password-recovery

Stars660

Forks77

Last commit11 days ago

Intel® oneAPI Data Analytics LibraryC++

A high-performance C++/DPC++ library for accelerated machine learning on CPUs, GPUs, and distributed systems.

#oneapi#hacktoberfest#ai-machine-learning

Stars651

Forks227

Last commit16 hours ago

SparkR <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

An R package providing a lightweight frontend to use Apache Spark for distributed data processing from R.

#apache-spark#r-package#data-science

A Clojure DSL for Apache Spark that enables distributed data processing using idiomatic Clojure.

#rdd#apache-spark#mapreduce

Stars600

Forks83

Last commit8 years ago

xgboost <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">C++

An optimized distributed gradient boosting library for fast and accurate machine learning on large datasets.

#parallel-computing#gbdt#ml-library

A Clojure library for writing map-reduce queries that compile to Apache Pig or Cascading, enabling distributed data processing with Clojure syntax.

#cascading#clojure#big-data

Stars564

Forks52

Last commit3 years ago

SparkScala

A library enabling Apache Spark to read from and write to Apache HBase tables as external data sources using DataFrames and SQL.

#apache-spark#data-integration#dataframe

Stars546

Forks273

Last commit5 years ago

Spark XMLScala

A library for parsing and querying XML data with Apache Spark SQL and DataFrames.

#apache-spark#dataframe#xml-parser

Stars513

Forks223

Last commit1 year ago

streamDMScala

A Spark Streaming library for mining big data streams with incremental learning algorithms.

#classification#stream-mining#data-streams

Stars497

Forks141

Last commit3 years ago

Releases ListRust

A decentralized marketplace and platform for distributed computations, enabling users to buy and sell computing power.

#webassembly#cryptocurrency-payments#cloud-computing

Stars483

Forks87

Last commit16 hours ago

sparkleHaskell

A library for writing Apache Spark applications in Haskell, enabling resilient analytics that scale to thousands of nodes.

#haskell#apache-spark#functional-programming

Stars449

Forks27

Last commit11 months ago

sparkllingClojure

A fast, fully-featured, and developer-friendly Clojure API for Apache Spark.

#apache-spark#functional-programming#data-engineering

Stars447

Forks68

Last commit4 years ago

CloudtopolisPowerShell

Deploy Hashtopolis on Google Cloud Shell and Colab for free, zero-infrastructure password cracking.

#cracking#google-colab#hashes

Stars420

Forks60

Last commit1 year ago

PreviousPage 2 of 5

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub