Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Text Processing

Text Processing

267 projects

Showing 36 of 267 projects

utf8
utf8C

An R package for robust UTF-8 text processing, fixing bugs in R's native Unicode handling.

#emoji#unicode#r-package
Stars112
Forks6
Last commit15 days ago
strip-bom
strip-bomJavaScript

A Node.js utility to remove UTF-8 byte order mark (BOM) characters from strings.

#npm-package#string-utilities#text-processing
Stars112
Forks19
Last commit5 years ago
emojize
emojizeCSS

A Node.js utility that converts Unicode emoji characters to HTML image tags with high-resolution sprites.

#unicode#html-generation#emoji-conversion
Stars110
Forks21
Last commit10 years ago
exmoji
exmojiElixir

An Elixir/Erlang library providing low-level operations for handling Emoji glyphs in the Unicode standard.

#functional-programming#emoji#elixir
Stars105
Forks29
Last commit1 year ago
babel
babelCommon Lisp

A pure Common Lisp library for charset encoding and decoding, similar to GNU libiconv.

#unicode#pure-lisp#encoding-decoding
Stars102
Forks30
Last commit9 months ago
Url highlight
Url highlightPHP

A PHP library for parsing, validating, and highlighting URLs in text strings, including HTML and Markdown conversion.

#html-highlighting#regex#linkify
Stars102
Forks1
Last commit22 days ago
FLRE
FLREPascal

A fast, safe, and efficient regular expression library for Object Pascal with Unicode support and multiple optimized subengines.

#unicode#object-pascal#c-api
Stars101
Forks25
Last commit1 month ago
porter-stemmer
porter-stemmerJavaScript

A Node.js implementation of Martin Porter's stemming algorithm for removing morphological endings from English words.

#commonjs#information-retrieval#natural-language-processing
Stars101
Forks12
Last commit5 years ago
Rling
RlingC

A faster, multi-threaded, feature-rich alternative to the rli utility for line removal, deduplication, and frequency analysis on large text files.

#deduplication#command-line-tool#gzip-support
Stars98
Forks15
Last commit2 months ago
cmark
cmarkC

Elixir NIF binding for cmark (C), a CommonMark-compliant Markdown parser library.

#cmark#hex#elixir
Stars97
Forks13
Last commit2 years ago
go-slugify
go-slugifyGo

A Go library and CLI tool for converting strings into URL-friendly slugs with Unicode support.

#url-friendly#open-source#slug-generation
Stars97
Forks9
Last commit6 years ago
wildmatch
wildmatchRust

A Rust library for simple string matching with single- and multiple-wildcard operators.

#globbing#rust-lang#matching-algorithm
Stars97
Forks19
Last commit1 month ago
Parsing With Haskell Parser Combinators
Parsing With Haskell Parser CombinatorsHaskell

A step-by-step guide to parsing using Haskell parser combinators, with practical examples for version numbers and SRT subtitles.

#parsing#haskell-learning#haskell
Stars95
Forks3
Last commit
persian
persianGo

A Go library providing utilities for Persian language text processing, including digit conversion, keyboard layout switching, and currency formatting.

#farsi#number#currency-formatting
Stars94
Forks14
Last commit2 years ago
pragmatic_tokenizer
pragmatic_tokenizerRuby

A multilingual Ruby gem for splitting strings into tokens with extensive language support and configurable options.

#nlp-library#text-analysis#multilingual
Stars93
Forks11
Last commit1 year ago
levenshtein
levenshteinGo

A Go package for calculating Levenshtein distance and similarity metrics with customizable edit costs and prefix bonuses.

#string-similarity#spell-checking#similarity-metric
Stars92
Forks8
Last commit5 years ago
goregen
goregenGo

A Go library for generating random strings that match a given regular expression pattern.

#developer-tools#regex#go-library
Stars92
Forks24
Last commit4 years ago
ruby-nlp
ruby-nlpRuby

Ruby bindings for Stanford NLP tools providing part-of-speech tagging and named entity recognition capabilities.

#part-of-speech-tagging#nlp-tools#natural-language-processing
Stars92
Forks14
Last commit12 years ago
chinese_translation
chinese_translationElixir

An Elixir module for translating between simplified and traditional Chinese, converting to pinyin, and slugifying Chinese text.

#pinyin#elixir#unicode
Stars91
Forks11
Last commit8 years ago
punkt-segmenter
punkt-segmenterRuby

A Ruby port of the NLTK Punkt algorithm for unsupervised, language-independent sentence boundary detection.

#nlp-library#sentence-boundaries#nltk
Stars91
Forks9
Last commit8 years ago
Verbal-Exprejon
Verbal-ExprejonClojure

A Clojure library for building complex regexes using a fluent, composable API without writing regex syntax.

#functional-programming#regex-builder#dsl
Stars90
Forks2
Last commit10 years ago
segment
segmentGo

A Go library for Unicode text segmentation at word boundaries as defined by Unicode Standard Annex #29.

#unicode#word-boundaries#ragel
Stars89
Forks15
Last commit3 years ago
Markdown
MarkdownC

An Elixir library that converts Markdown to HTML using a NIF binding to the Hoedown C library.

#elixir#hoedown#library
Stars88
Forks18
Last commit6 years ago
crustache
crustacheCrystal

A Crystal implementation of Mustache logic-less templates, compliant with the Mustache spec v1.1.2+λ.

#template#crystal-library#template-engine
Stars88
Forks13
Last commit2 years ago
html_entities
html_entitiesElixir

Elixir module for decoding and encoding HTML entities in strings.

#elixir#text-processing#decoding
Stars87
Forks22
Last commit5 years ago
Slugify
SlugifyHTML

A jQuery plugin that automatically generates URL slugs from input fields as users type, similar to Django's slugify function.

#web-forms#slug-generation#url-slug
Stars87
Forks40
Last commit10 years ago
morse
morseGo

A Go library for encoding and decoding text to and from Morse code.

#morse-code#library#morse
Stars87
Forks11
Last commit3 years ago
align
alignGo

A Go library and CLI tool for aligning delimited text with customizable justification, padding, and column filtering.

#open-source#library#alignment
Stars84
Forks7
Last commit4 years ago
veritaserum
veritaserumElixir

Simple sentiment analysis for Elixir based on AFINN-165 with emoji, booster, and negator support.

#hex#emoji-analysis#elixir
Stars83
Forks10
Last commit3 years ago
the_fuzz
the_fuzzElixir

A collection of fuzzy string matching algorithms and phonetic metrics for Elixir, including Levenshtein, Jaro-Winkler, Soundex, and more.

#phonetic-algorithms#metaphone#similarity-measurement
Stars82
Forks10
Last commit10 months ago
stopwords-filter
stopwords-filterRuby

A Ruby gem for filtering stopwords from text with built-in support for multiple languages via Snowball lists.

#stopwords#ruby-gem#natural-language-processing
Stars80
Forks54
Last commit2 years ago
gounidecode
gounidecodeGo

A Go library for transliterating Unicode text to ASCII equivalents, similar to Python's unidecode.

#ascii-conversion#unicode#internationalization
Stars80
Forks21
Last commit10 years ago
hotwater
hotwaterRuby

Fast Ruby FFI gem providing C implementations of string edit distance algorithms like Levenshtein, Jaro-Winkler, and N-Gram.

#n-gram#ffi#ruby-gem
Stars80
Forks1
Last commit13 years ago
genex
genexGo

A Go package that expands regular expressions into all possible matching strings.

#regex#domain-discovery#combinatorics
Stars76
Forks8
Last commit6 years ago
textcat
textcatGo

A Go package for n-gram based text categorization and language detection with UTF-8 support.

#text-categorization#n-gram#natural-language-processing
Stars73
Forks11
Last commit1 year ago
SpeedRead
SpeedReadRuby

A terminal-based speed reading tool that implements the Spritz method for reading text faster.

#reading-aid#productivity#ruby-gem
Stars72
Forks2
Last commit12 years ago
PreviousPage 7 of 8

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
6 years ago
Next
#Unicode58
#Markdown33
#Natural Language Processing32
#Go Library31
#Regex30
#Cli Tool30
#Go28
#Markdown Parser27
#Developer Tools27
#Golang26
#String Manipulation25
#Nodejs23