Fast and portable character string processing in R using the Unicode ICU library.
stringi is an R package for fast and portable character string processing, providing comprehensive functions for text manipulation, pattern searching, and natural language processing. It solves the problem of inconsistent and slow string operations in R by leveraging the Unicode ICU library for reliable cross-platform and cross-locale behavior.
R developers and data scientists who need robust, high-performance string manipulation for text analysis, data cleaning, natural language processing, and internationalization tasks.
Developers choose stringi for its exceptional speed, comprehensive Unicode support, and consistent behavior across all platforms, making it the most reliable package for string processing in R, even powering the popular stringr package.
Fast and Portable Character String Processing in R (with the Unicode ICU)
Full integration with the ICU library ensures consistent string behavior across all languages and platforms, as emphasized in the README for portability and internationalization.
Optimized C++ implementation delivers fast string operations, making it ideal for data-intensive tasks like text cleaning and NLP, as highlighted in the key features.
Includes a wide range of functions from pattern searching to transliteration, covering most string processing needs, detailed in the features list such as collation and normalization.
Powers the popular stringr package since version 1.0.0, indicating reliability and broad adoption in the R community for string manipulation.
Requires ICU4C >= 61, which can complicate installation on some systems, as noted in the system requirements and INSTALL file, potentially needing manual compilation.
Function names and parameters are inspired by an older version of stringr, which might be less intuitive for users accustomed to modern tidyverse conventions, despite the comprehensive tutorial.
Includes a custom subset of ICU source code, increasing the installation footprint and memory usage, which could be a concern for resource-constrained environments.
dplyr: A grammar of data manipulation
R's data.table package extends data.frame:
Easily install and load packages from the tidyverse
Tidy Messy Data
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.