Executing the loop in parallel
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
R frontend for Spark
DistributedR is a scalable high-performance platform for the R language designed to handle large-scale data processing across distributed systems. It enables and accelerates machine learning, statistical analysis, and graph processing by distributing computations across clusters, making it possible to work with datasets that exceed single-machine memory limits. ## Key Features - **Distributed Data Structures** — Provides distributed arrays, data frames, and lists that store data across a cluster while acting as single abstractions. - **Parallel Data Loading** — Loads data in parallel from any data source, including specialized loaders for Vertica database integration. - **Efficient Algorithm Expression** — Uses distributed arrays to efficiently express both machine learning algorithms (matrix operations) and graph algorithms (adjacency matrix manipulation). - **Cluster Management** — Includes functions to start, monitor, and shutdown distributed R sessions across worker nodes. ## Philosophy DistributedR aims to bring high-performance distributed computing capabilities to the R ecosystem while maintaining familiar R programming patterns, allowing data scientists to scale their analyses without learning entirely new frameworks.
Standard API for Distributed Data Structures in R
Rmpi provides an interface (wrapper) to MPI APIs. It also provides interactive R slave environment