Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Text Processing

Text Processing

267 projects

Showing 36 of 267 projects

uni-algo
uni-algoC++

A C/C++ library implementing Unicode algorithms with strict security, performance, and portability, handling ill-formed UTF sequences correctly.

#utf16#c-plus-plus-library#unicode
Stars323
Forks27
Last commit2 years ago
stringi <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">
stringi <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">C++

Fast and portable character string processing in R using the Unicode ICU library.

#unicode#regex#stringi
Stars317
Forks48
Last commit
rust-encoding
rust-encodingRust

A Rust library for character encoding conversion based on the WHATWG Encoding Standard.

#unicode#whatwg-encoding#text-processing
Stars288
Forks57
Last commit2 years ago
fuzzy-string-match
fuzzy-string-matchRuby

A fast fuzzy string matching library for Ruby that implements the Jaro-Winkler distance algorithm.

#similarity#lucene-port#text-processing
Stars287
Forks37
Last commit6 years ago
readability
readabilityElixir

An Elixir library for extracting and curating the primary readable content from webpages.

#readability#hacktoberfest#elixir
Stars283
Forks64
Last commit7 months ago
suffix
suffixRust

A Rust library providing fast linear time and space suffix arrays with full Unicode support.

#suffix-tree#unicode#search-algorithms
Stars281
Forks31
Last commit2 years ago
Sedlex
SedlexOCaml

A Unicode-aware lexer generator for OCaml that embeds lexer specifications directly in OCaml source files.

#parsing#unicode#text-processing
Stars271
Forks43
Last commit1 month ago
cat-ascii-faces
cat-ascii-facesJavaScript

A Node.js package that provides a collection of cat-themed ASCII emoticons for use in CLI tools and JavaScript projects.

#developer-tools#fun-utilities#npm-package
Stars268
Forks20
Last commit11 years ago
LuLPeg
LuLPegLua

A pure Lua port of LPeg, a Parsing Expression Grammars library for pattern matching and text processing.

#parsing#luajit#lua-5.1
Stars268
Forks35
Last commit4 years ago
Tempura
TempuraClojure

A simple, developer-friendly text localization library for Clojure and ClojureScript applications.

#hiccup#gettext#clojurescript
Stars264
Forks18
Last commit2 years ago
Chalk
ChalkScala

A Scala library for natural language processing with functional and actor-based pipelines.

#nlp-library#functional-programming#pipeline-architecture
Stars260
Forks48
Last commit9 years ago
godocx
godocxGo

A pure Go library for programmatically reading from and writing to Microsoft Word DOCX files.

#go-modules#document#microsoft
Stars258
Forks42
Last commit10 months ago
Re
ReOCaml

A pure OCaml regular expression library supporting Perl, POSIX, Emacs, and glob patterns with DFA-based matching.

#parsing#functional-programming#dfa
Stars255
Forks69
Last commit17 days ago
humanize-url
humanize-urlJavaScript

A JavaScript library that converts URLs into human-readable formats by removing protocol and www prefixes.

#clean-urls#string-formatting#text-processing
Stars252
Forks7
Last commit3 years ago
lowcharts
lowchartsRust

A Rust tool for drawing low-resolution graphs directly in the terminal for quick data analysis from logs and text files.

#plot#terminal-graphics#bar-chart
Stars251
Forks5
Last commit5 months ago
Stringy
StringyGo

A comprehensive Go library for string manipulation including case conversion, padding, truncation, and special character handling.

#manipulation#string-helper#camel-case
Stars249
Forks20
Last commit1 year ago
decamelize
decamelizeJavaScript

Convert camelCase strings to lowercase with custom separators like unicornRainbow → unicorn_rainbow.

#formatting#camelcase#npm-package
Stars244
Forks28
Last commit3 months ago
chomp
chompRust

A fast monadic-style parser combinator library for stable Rust, enabling expressive and performant parsing.

#parsing#functional-programming#text-processing
Stars241
Forks19
Last commit4 years ago
cl-nlp
cl-nlpCommon Lisp

A comprehensive and extensible natural language processing toolkit for Common Lisp, supporting custom pipelines and experimentation.

#pos-tagging#natural-language-processing#text-processing
Stars236
Forks28
Last commit6 years ago
runiq
runiqRust

An efficient command-line tool and library for filtering duplicate lines from textual input, optimized for speed and memory usage.

#duplicate-removal#memory-efficiency#command-line-tool
Stars228
Forks22
Last commit5 months ago
Markdown Processor for Pascal
Markdown Processor for PascalPascal

A Pascal library for converting Markdown to HTML with support for multiple dialects including CommonMark and GitHub Flavored Markdown.

#free-pascal#commonmark#pascal
Stars228
Forks68
Last commit1 year ago
jaro_winkler
jaro_winklerRuby

A Ruby gem providing a fast, accurate, and encoding-aware implementation of the Jaro-Winkler string similarity algorithm.

#string-similarity#algorithm#encoding-support
Stars223
Forks35
Last commit3 months ago
abbreviate
abbreviateGo

A Go tool that shortens strings using common abbreviations and smart word boundary detection for DevOps resource naming.

#hacktoberfest#devops#naming
Stars223
Forks18
Last commit1 year ago
coregex
coregexGo

A pure Go production-grade regex engine with SIMD optimizations, offering 3-3000x speedup over the standard library.

#multi-engine#dfa#high-performance
Stars219
Forks6
Last commit2 months ago
rex
rexGo

A Go library for constructing regular expressions using a human-friendly, composable builder pattern.

#developer-tools#regex-builder#dsl-syntax
Stars212
Forks5
Last commit6 months ago
Rosetta
RosettaJupyter Notebook

A Python toolkit for text-focused data science on medium-sized datasets, bridging memory and cluster-scale processing.

#stream-processing#multiprocessing#scientific-computing
Stars207
Forks45
Last commit
strutil
strutilGo

A comprehensive Go library providing string manipulation functions for formatting, transformation, and analysis.

#go-package#camelcase#go-library
Stars206
Forks27
Last commit1 year ago
lispbuilder-sdl
lispbuilder-sdlCommon Lisp

An umbrella project providing cross-platform Common Lisp libraries for building large, interactive applications including game development.

#text-processing#3d-graphics#game-development
Stars203
Forks27
Last commit1 month ago
detect-indent
detect-indentJavaScript

Detects the indentation type and amount from a string of code to maintain consistent formatting.

#developer-tools#npm-package#text-processing
Stars199
Forks26
Last commit4 months ago
Pluralize.swift
Pluralize.swiftSwift

A Swift string extension for intelligent pluralization with support for irregular nouns, uncountable nouns, and custom rules.

#macos-development#text-processing#pluralization
Stars196
Forks47
Last commit3 years ago
go-porterstemmer
go-porterstemmerGo

A native Go implementation of the Porter Stemming algorithm for NLP and machine learning tasks.

#stemming#natural-language-processing#golang-library
Stars193
Forks45
Last commit5 years ago
Unicode Text Search
Unicode Text SearchPython

Find and paste Unicode symbols using a Python script or Alfred workflow.

#alfred-workflow#productivity-tools#unicode-symbols
Stars193
Forks10
Last commit3 years ago
TRegExpr
TRegExprPascal

A pure Object Pascal regular expressions engine for Delphi and Free Pascal.

#free-pascal#regex#regex-engine
Stars192
Forks69
Last commit7 months ago
Decoda
DecodaPHP

A lightweight lexical string parser for BBCode styled markup, converting custom tags to HTML.

#hooks#decoda#user-content
Stars192
Forks49
Last commit3 years ago
simsearch
simsearchRust

A simple and lightweight fuzzy search engine that works in memory, searching for similar strings.

#levenshtein-distance#text-processing#string-matching
Stars189
Forks28
Last commit1 month ago
ltex_extra.nvim
ltex_extra.nvimLua

A Neovim plugin that implements LTeX language server's off-spec code actions for dictionary management and rule handling.

#spell-checking#language-server#vim
Stars188
Forks28
Last commit7 months ago
PreviousPage 5 of 8

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
5 days ago
3 years ago
Next
#Unicode58
#Markdown33
#Natural Language Processing32
#Go Library31
#Regex30
#Cli Tool30
#Go28
#Markdown Parser27
#Developer Tools27
#Golang26
#String Manipulation25
#Nodejs23