Julia package providing easy access to 700+ standard R datasets for data analysis and statistical learning.
RDatasets.jl is a Julia package that provides easy access to over 700 standard datasets from R and its popular packages. It solves the problem of data availability for statistical analysis and machine learning in Julia by porting the well-known Rdatasets collection, allowing users to load benchmark datasets with a simple function call.
Julia users involved in data analysis, statistics, econometrics, or machine learning who need access to standard datasets for testing algorithms, educational purposes, or reproducing R-based analyses.
Developers choose RDatasets.jl because it offers a seamless way to access a wide range of curated datasets without leaving the Julia ecosystem, simplifying workflow transitions from R and providing reliable data for benchmarking and experimentation.
Julia package for loading many of the data sets available in R
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Offers over 700 datasets from core R packages like datasets, boot, and MASS, providing a comprehensive resource for statistical analysis and benchmarking without external downloads.
Uses a straightforward `dataset()` function that takes package and dataset names as arguments to return a DataFrame, minimizing code overhead for data access.
Includes `RDatasets.packages()` and `RDatasets.datasets()` functions to browse available datasets and view details like row and column counts, aiding in dataset selection.
Mirrors dataset availability from popular R packages, making it easier for users transitioning from R to Julia to reproduce analyses or tutorials without data sourcing hassles.
Focuses solely on data loading without built-in tools for data cleaning, transformation, or analysis, requiring users to integrate additional Julia packages for practical workflows.
Datasets are sourced from R packages and may not be updated regularly, potentially lacking newer data or revisions, which limits relevance for cutting-edge research.
Assumes all datasets are under GPL-3, as noted in the README, but this could pose risks if some datasets have restrictive or incompatible licenses, leading to potential legal issues.