A unified R API for writing parallel and distributed applications across different backends like parallel, HP Distributed R, and SparkR.
ddR is an R package that provides a unified API for distributed data structures and parallel operations, enabling developers to write scalable applications that work across different backends like parallel, HP Distributed R, and SparkR. It solves the problem of backend lock-in by abstracting the underlying execution engine, allowing the same code to run on multiple distributed computing frameworks without modification.
R developers and data scientists who need to write parallel or distributed applications for large-scale data processing, machine learning, or scientific computing, and want to avoid being tied to a specific backend.
Developers choose ddR because it offers a consistent, R-friendly interface for distributed computing that reduces the effort required to learn and program across different backends, while providing portability and scalability for their applications.
Standard API for Distributed Data Structures in R
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides a single interface to write distributed applications that run on multiple backends like parallel and HP Distributed R, reducing the learning curve for different engines as highlighted in the README.
Uses R-style objects such as dlist, dframe, and darray, along with functions like dmapply, making it intuitive for R developers to transition to distributed computing.
Allows seamless switching between execution engines using useBackend(), enabling code portability and flexibility across environments without rewriting logic.
Offers parameters like nparts and the parts() function to specify and process data partitionwise, optimizing performance for distributed operations.
Currently only supports parallel and Distributed R backends, with SparkR support still planned and not confirmed current, limiting its utility for modern big data ecosystems.
Active development was emphasized for summer 2016, suggesting potential stagnation and lack of recent updates or community engagement, which could affect compatibility and bug fixes.
Using backends like Distributed R requires installing additional packages (e.g., distributedR.ddR) with dependencies like Rcpp and XML, increasing setup complexity and potential installation issues.
The layer of abstraction for backend agnosticism can introduce performance penalties compared to native backend usage, especially for compute-intensive tasks that require optimal efficiency.