An R package for reading and writing SAS, SPSS, and Stata data files with tidyverse integration.
Haven is an R package designed to read and write data files from proprietary statistical software packages like SAS, SPSS, and Stata. It solves the problem of data interoperability by providing reliable import/export functions that preserve metadata such as value labels, missing values, and date formats. The package integrates with the tidyverse, outputting data as tibbles for consistent and efficient data manipulation in R.
Data analysts, statisticians, and researchers who work with data from SAS, SPSS, or Stata and need to analyze it in R. It's particularly useful for those in academic, government, or industry settings where multi-software workflows are common.
Developers choose Haven for its robust backend (ReadStat), tidyverse compatibility, and semantic accuracy in handling statistical data formats. It offers a more modern and reliable alternative to older R packages like 'foreign', with better support for newer file versions and enhanced metadata preservation.
Read SPSS, Stata and SAS files from R
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages the ReadStat C library to reliably read and write SAS, SPSS, and Stata files, including niche formats like SAS transport (.xpt) and older SPSS .por files.
Outputs data as tibbles with improved printing and works naturally with tidyverse tools like dplyr and ggplot2, streamlining modern R workflows.
Preserves value labels as a labelled class and handles special missing values, maintaining semantic fidelity from the original statistical software.
Converts dates and times to R date/time classes automatically, reducing manual data cleaning for temporal analysis.
Only supports Stata files up to version 15 and may not handle future formats without updates, as admitted in the README's version constraints.
Relies on the ReadStat C library, which can complicate installation on systems without compilers or in locked-down corporate environments, leading to potential setup errors.
Focuses solely on SAS, SPSS, and Stata; lacks built-in support for other common statistical tools or direct database connections, requiring additional packages for broader interoperability.