Showing 36 of 267 projects
An R package for robust UTF-8 text processing, fixing bugs in R's native Unicode handling.
A Node.js utility to remove UTF-8 byte order mark (BOM) characters from strings.
A Node.js utility that converts Unicode emoji characters to HTML image tags with high-resolution sprites.
An Elixir/Erlang library providing low-level operations for handling Emoji glyphs in the Unicode standard.
A pure Common Lisp library for charset encoding and decoding, similar to GNU libiconv.
A PHP library for parsing, validating, and highlighting URLs in text strings, including HTML and Markdown conversion.
A fast, safe, and efficient regular expression library for Object Pascal with Unicode support and multiple optimized subengines.
A Node.js implementation of Martin Porter's stemming algorithm for removing morphological endings from English words.
A faster, multi-threaded, feature-rich alternative to the rli utility for line removal, deduplication, and frequency analysis on large text files.
Elixir NIF binding for cmark (C), a CommonMark-compliant Markdown parser library.
A Go library and CLI tool for converting strings into URL-friendly slugs with Unicode support.
A Rust library for simple string matching with single- and multiple-wildcard operators.
A step-by-step guide to parsing using Haskell parser combinators, with practical examples for version numbers and SRT subtitles.
A Go library providing utilities for Persian language text processing, including digit conversion, keyboard layout switching, and currency formatting.
A multilingual Ruby gem for splitting strings into tokens with extensive language support and configurable options.
A Go package for calculating Levenshtein distance and similarity metrics with customizable edit costs and prefix bonuses.
A Go library for generating random strings that match a given regular expression pattern.
Ruby bindings for Stanford NLP tools providing part-of-speech tagging and named entity recognition capabilities.
An Elixir module for translating between simplified and traditional Chinese, converting to pinyin, and slugifying Chinese text.
A Ruby port of the NLTK Punkt algorithm for unsupervised, language-independent sentence boundary detection.
A Clojure library for building complex regexes using a fluent, composable API without writing regex syntax.
A Go library for Unicode text segmentation at word boundaries as defined by Unicode Standard Annex #29.
An Elixir library that converts Markdown to HTML using a NIF binding to the Hoedown C library.
A Crystal implementation of Mustache logic-less templates, compliant with the Mustache spec v1.1.2+λ.
Elixir module for decoding and encoding HTML entities in strings.
A jQuery plugin that automatically generates URL slugs from input fields as users type, similar to Django's slugify function.
A Go library for encoding and decoding text to and from Morse code.
A Go library and CLI tool for aligning delimited text with customizable justification, padding, and column filtering.
Simple sentiment analysis for Elixir based on AFINN-165 with emoji, booster, and negator support.
A collection of fuzzy string matching algorithms and phonetic metrics for Elixir, including Levenshtein, Jaro-Winkler, Soundex, and more.
A Ruby gem for filtering stopwords from text with built-in support for multiple languages via Snowball lists.
A Go library for transliterating Unicode text to ASCII equivalents, similar to Python's unidecode.
Fast Ruby FFI gem providing C implementations of string edit distance algorithms like Levenshtein, Jaro-Winkler, and N-Gram.
A Go package that expands regular expressions into all possible matching strings.
A Go package for n-gram based text categorization and language detection with UTF-8 support.
A terminal-based speed reading tool that implements the Spritz method for reading text faster.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.