An R data package providing an excerpt from Gapminder's global development data for teaching and examples.
gapminder is an R data package that provides a curated excerpt of the Gapminder dataset, containing country-level data on life expectancy, GDP per capita, and population over time. It solves the problem of finding a clean, real-world dataset for teaching data analysis and visualization in R, offering a structured tibble that is immediately usable without data wrangling.
R educators, data science instructors, and students learning data manipulation, visualization, or statistical analysis who need a reliable, well-documented dataset for examples and exercises.
Developers choose gapminder because it provides a standardized, tidy dataset specifically designed for teaching, with built-in country metadata and color schemes that simplify creating educational visualizations and analyses.
Excerpt from the Gapminder data, as an R data package and in plain text delimited form
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The gapminder data frame is provided as a tibble, immediately usable with tidyverse packages like dplyr and ggplot2, as demonstrated in the README with aggregate and filter operations.
Designed specifically for teaching data science, with a simple structure that avoids common data quirks, making it ideal for courses and tutorials without extensive cleaning.
Includes premade color schemes for countries and continents and ISO country codes, simplifying the creation of consistent plots in R, as noted in the README.
Available on CRAN with straightforward installation and a vignette for quick onboarding, ensuring minimal setup effort for educators and students.
The dataset only covers up to 2007, making it outdated for analyzing recent global developments, as the README specifies the fixed date range without updates.
It focuses solely on life expectancy, GDP per capita, and population, lacking other socioeconomic or environmental variables that might be needed for in-depth analysis.
While plain text data is provided, features like tibble format and color schemes are tailored for R, limiting seamless integration in other programming environments.