Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Text Processing

Text Processing

267 projects

Showing 36 of 267 projects

bbmustache
bbmustacheErlang

A binary pattern match-based Mustache template engine for Erlang/OTP that avoids regular expressions.

#otp#template-engine#command-line-tool
Stars186
Forks51
Last commit2 years ago
Stringy
StringyPHP

A PHP string manipulation library with multibyte support, optimized for performance and PHP 7+.

#hacktoberfest#composer#library
Stars179
Forks24
Last commit22 days ago
string-length
string-lengthJavaScript

A JavaScript library that accurately calculates string length by handling astral symbols and ANSI escape codes.

#developer-tools#unicode#npm-package
Stars173
Forks12
Last commit4 months ago
one-more-re-nightmare
one-more-re-nightmareCommon Lisp

A fast regular expression engine for Common Lisp that compiles regexes to machine code using derivative-based DFA compilation.

#compiler#lisp#regex
Stars168
Forks9
Last commit8 months ago
words_counted
words_countedRuby

A Ruby natural language processor for tokenizing and analyzing text with flexible filtering and custom regex support.

#nlp-library#word-counter#text-analysis
Stars164
Forks28
Last commit4 years ago
turtle
turtleGo

A Go library and CLI for emoji lookup, search, and categorization with GitHub emoji support.

#search#emoji#developer-tools
Stars164
Forks12
Last commit4 years ago
gh-emoji
gh-emojiJavaScript

A lightweight, dependency-free JavaScript library for parsing and rendering GitHub emoji in text.

#text-processing#javascript-library#github-api
Stars159
Forks7
Last commit9 years ago
blacktex
blacktex

A command-line tool that cleans up LaTeX files by removing comments and correcting common anti-patterns.

#academic-writing#document-formatting#linter
Stars159
Forks11
Last commit2 years ago
GEmojiSharp.Blazor
GEmojiSharp.BlazorC#

A .NET library and toolset for working with GitHub emoji aliases and Unicode characters across C#, ASP.NET Core, Blazor, and command-line tools.

#gemoji#emoji#dotnet-tool
Stars157
Forks9
Last commit3 months ago
GEmojiSharp
GEmojiSharpC#

A .NET library and toolset for working with GitHub emoji aliases and Unicode characters across C#, ASP.NET Core, Blazor, and command-line tools.

#gemoji#emoji#dotnet-tool
Stars157
Forks9
Last commit3 months ago
snakecase
snakecaseR

A systematic R package for parsing strings and converting them to snake_case, camelCase, and other naming conventions.

#data-cleaning#pascalcase#variable-names
Stars155
Forks10
Last commit2 years ago
NLP4J
NLP4JJava

A natural language processing framework for JVM languages with comprehensive linguistic analysis tools.

#coreference-resolution#java-nlp#semantic-role-labeling
Stars155
Forks32
Last commit5 years ago
re2
re2Ruby

Ruby bindings to RE2, a fast, safe, thread-friendly alternative to backtracking regex engines like PCRE.

#regex#regular-expression#ruby-gem
Stars155
Forks14
Last commit18 days ago
stemmer
stemmerElixir

An English (Porter2) stemming implementation in Elixir for reducing words to their base forms.

#nlp-library#elixir#information-retrieval
Stars154
Forks10
Last commit2 years ago
damerau-levenshtein
damerau-levenshteinRuby

A Ruby gem for calculating edit distance between strings using Levenshtein, Damerau-Levenshtein, and Boehmer & Rees algorithms.

#algorithm#ruby-gem#text-processing
Stars151
Forks18
Last commit1 year ago
regroup
regroupGo

A Go library that maps regex named groups into struct fields using struct tags and automatic parsing.

#go-library#text-processing#go-utilities
Stars150
Forks13
Last commit1 year ago
Markup.ml
Markup.mlOCaml

Error-recovering streaming HTML5 and XML parsers for OCaml with lazy, non-blocking, and one-pass processing.

#ocaml-library#functional-programming#error-recovery
Stars150
Forks20
Last commit1 year ago
SimMetrics.Net
SimMetrics.NetC#

A .NET library implementing various string similarity and distance metrics like Levenshtein, Jaro-Winkler, and Soundex.

#string-similarity#phonetic-algorithms#soundex
Stars148
Forks21
Last commit4 months ago
strip-indent
strip-indentJavaScript

Strip leading whitespace from each line in a string, removing redundant indentation based on the least-indented line.

#text-processing#nodejs#javascript-library
Stars146
Forks17
Last commit7 months ago
go-unidecode
go-unidecodeGo

A Go library for converting Unicode text to ASCII transliterations, inspired by python-unidecode.

#unicode#unidecode#internationalization
Stars145
Forks19
Last commit3 years ago
Paasaa
PaasaaElixir

An Elixir library for natural language and script detection using statistical analysis without AI.

#statistical-analysis#elixir#language-identification
Stars143
Forks14
Last commit5 months ago
TF
TFAutoHotkey

An AutoHotkey library for manipulating text files and strings with over 40 functions for line operations, search/replace, and formatting.

#formatting#file-manipulation#remove-lines
Stars142
Forks37
Last commit5 years ago
jingoo
jingooOCaml

An OCaml template engine with near-complete compatibility with Jinja2 syntax and features.

#jinja2#functional-programming#server-side-rendering
Stars142
Forks23
Last commit1 month ago
tokenizer
tokenizerGo

A high-performance, regex-free Go tokenizer for parsing strings, slices, and infinite streams into customizable tokens.

#parsing#parse#unicode
Stars139
Forks11
Last commit1 year ago
stemmer
stemmerJavaScript

A fast implementation of the Porter stemming algorithm for English word normalization in natural language processing.

#stemmer#stemming#text-analysis
Stars137
Forks8
Last commit3 years ago
colibri-core
colibri-coreC++

A C++ and Python library for efficient extraction and analysis of n-grams, skipgrams, and flexgrams from large corpora.

#c-plus-plus-library#computational-linguistics#pattern-modeling
Stars130
Forks20
Last commit4 months ago
Camomile
CamomileOCaml

A comprehensive Unicode library for OCaml providing character handling, string encodings, collation, and locale-sensitive operations.

#library#unicode#internationalization
Stars126
Forks26
Last commit2 years ago
markd
markdCrystal

A fast, CommonMark-compliant markdown parser written in Crystal with syntax highlighting and customization options.

#static-site-generator#commonmark#markdown-parser
Stars124
Forks34
Last commit5 months ago
nickel
nickelRuby

A Ruby gem that extracts structured date, time, and message information from naturally worded text.

#datetime#reminders#time-parsing
Stars118
Forks17
Last commit8 years ago
indent-string
indent-stringJavaScript

A Node.js utility to add consistent indentation to each line of a string with customizable options.

#npm-package#string-formatting#text-processing
Stars116
Forks16
Last commit4 years ago
Boost.Regex
Boost.RegexC++

A C++ regular expression library that is the ancestor to std::regex and offers extended functionality.

#regex#standalone#pattern-matching
Stars115
Forks109
Last commit17 days ago
LPegLJ
LPegLJLua

A pure LuaJIT implementation of LPeg v1.0, a PEG pattern matching library for Lua, with added left recursion support.

#luajit#ffi#memoization
Stars114
Forks10
Last commit4 years ago
Microsoft.PowerShell.UnixCompleters
Microsoft.PowerShell.UnixCompletersC#

A collection of PowerShell modules for remoting, secret management, and text utilities, published to PowerShellGallery.com.

#remoting#text-processing#cmdlets
Stars113
Forks24
Last commit4 years ago
Ambrosia
AmbrosiaGo

A cross-platform CLI tool for cleaning and improving text datasets for machine learning, with fast operations and LLM-based filtering.

#go-application#llm-filtering#cli-tool
Stars113
Forks2
Last commit3 years ago
lemmatizer
lemmatizerRuby

A Ruby gem for lemmatizing English text, converting inflected words to their base dictionary forms.

#text-analysis#nlp-tools#lemmatization
Stars112
Forks15
Last commit4 years ago
JEmoji
JEmojiJava

A lightweight, auto-generated Java library for working with Unicode emojis, featuring type-safe constants and comprehensive utility methods.

#gradle#emoji#emojis
Stars112
Forks11
Last commit6 days ago
PreviousPage 6 of 8

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
Next
#Unicode58
#Markdown33
#Natural Language Processing32
#Go Library31
#Regex30
#Cli Tool30
#Go28
#Markdown Parser27
#Developer Tools27
#Golang26
#String Manipulation25
#Nodejs23