Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Tags
  3. Text Processing

Text Processing

65 projects

Showing 29 of 65 projects

iconv-lite
iconv-liteJavaScript

A pure JavaScript library for converting character encodings with no native dependencies.

#unicode#text-processing#character-encoding
Stars3.2k
Forks294
Last commit23 days ago
Mistune
MistunePython

A fast yet powerful Python Markdown parser with renderers and plugins.

#open-source#markdown-parser#html-generation
Stars3.0k
Forks280
Last commit11 days ago
CommonMark PHP
CommonMark PHPPHP

A highly-extensible PHP Markdown parser that fully supports CommonMark and GitHub-Flavored Markdown specs.

#hacktoberfest#content-management#commonmark
Stars2.9k
Forks208
Last commit2 days ago
markdown2
markdown2Python

A fast and complete Python implementation of Markdown with optional extras for extended syntax.

#python-library#syntax-highlighting#html-generation
Stars2.8k
Forks446
Last commit27 days ago
commonmark-java
commonmark-javaJava

A Java library for parsing and rendering Markdown text according to the CommonMark specification with extensible features.

#java-library#library#commonmark
Stars2.6k
Forks329
Last commit14 days ago
JavaVerbalExpressions
JavaVerbalExpressionsJava

A Java library that simplifies constructing complex regular expressions using a fluent builder API.

#library#regex#java
Stars2.6k
Forks240
Last commit23 days ago
chardet
chardetPython

A Python library that automatically detects the character encoding of text files and byte streams with high accuracy and speed.

#encoding-detection#unicode#python-library
Stars2.6k
Forks297
Last commit11 days ago
pulldown-cmark
pulldown-cmarkRust

A fast, safe, and versatile pull parser for CommonMark and GitHub Flavored Markdown, written in Rust.

#simd#commonmark#markdown-parser
Stars2.5k
Forks279
Last commit2 days ago
RE
REC

A modern, flexible regular expression library supporting multiple character encodings and syntaxes.

#c-library#gnu-regex#multi-encoding
Stars2.5k
Forks350
Last commit1 year ago
Oniguruma
OnigurumaC

A modern, flexible regular expression library supporting multiple character encodings and syntaxes.

#c-library#gnu-regex#multi-encoding
Stars2.5k
Forks350
Last commit1 year ago
retext
retextJavaScript

A natural language processor powered by plugins that transforms and analyzes text using syntax trees.

#open-source#retext#text-analysis
Stars2.4k
Forks92
Last commit1 year ago
emoj
emojTypeScript

A command-line tool that finds relevant emoji from text input using a local emoji database.

#emoji#productivity#terminal
Stars2.4k
Forks64
Last commit2 months ago
Snarkdown
SnarkdownJavaScript

A dead simple 1kb Markdown parser written in JavaScript for constrained use-cases.

#regex#markdown-parser#html-generation
Stars2.4k
Forks111
Last commit3 years ago
jellyfish
jellyfishJupyter Notebook

A Python library for approximate and phonetic string matching, implementing algorithms like Levenshtein distance and Soundex.

#data-cleaning#hacktoberfest#phonetic-algorithms
Stars2.2k
Forks165
Last commit17 days ago
html2text
html2textPython

A Python library and CLI tool that converts HTML into clean, readable Markdown-formatted plain text.

#python-library#markdown-parser#plain-text
Stars2.1k
Forks292
Last commit5 months ago
floki
flokiElixir

A simple HTML parser for Elixir that enables search for nodes using CSS selectors.

#css-selector#elixir#css-selectors
Stars2.1k
Forks163
Last commit3 days ago
sprintf.js
sprintf.jsJavaScript

A complete open source JavaScript sprintf implementation for browser and Node.js environments.

#open-source#string-formatting#text-processing
Stars2.1k
Forks284
Last commit2 years ago
ReText
ReTextPython

A simple but powerful desktop editor for Markdown, reStructuredText, Textile, and AsciiDoc markup languages.

#desktop-application#restructuredtext#restructuredtext-editor
Stars2.0k
Forks208
Last commit5 days ago
cmark
cmarkC

A C reference implementation of CommonMark for parsing and rendering Markdown documents to multiple formats.

#c-library#high-performance#commonmark
Stars2.0k
Forks650
Last commit7 days ago
utf8.h
utf8.hC

A single-header library providing UTF-8 string functions for C and C++, mirroring the standard string.h API.

#library#unicode#c
Stars1.9k
Forks139
Last commit1 month ago
emoji-regex
emoji-regexJavaScript

A regular expression to match all emoji symbols and sequences as per the Unicode Standard.

#emoji#unicode#regex
Stars1.9k
Forks174
Last commit6 months ago
HTML to Markdown
HTML to MarkdownPHP

A PHP library that converts HTML to Markdown with configurable options for clean, editable output.

#dom-manipulation#hacktoberfest#composer
Stars1.9k
Forks216
Last commit4 days ago
string.js
string.jsJavaScript

A lightweight JavaScript library providing extra string methods with a chainable, jQuery-like API.

#lightweight#text-processing#utility-library
Stars1.8k
Forks232
Last commit4 years ago
emojify.js
emojify.jsJavaScript

A JavaScript library that converts emoji keywords and emoticons into images or styled elements.

#emoji#user-content#text-processing
Stars1.8k
Forks240
Last commit7 years ago
simdutf
simdutfC++

A high-performance C++ library for Unicode validation and transcoding (UTF-8/16/32, Latin1, Base64) using SIMD instructions.

#utf16#transcoding#sse2
Stars1.8k
Forks131
Last commit2 days ago
go-pinyin
go-pinyinGo

A Go library and CLI tool for converting Chinese characters to Hanyu Pinyin with tone support.

#pinyin#internationalization#go-library
Stars1.8k
Forks205
Last commit1 month ago
kramdown
kramdownRuby

A fast, pure Ruby Markdown superset converter with strict syntax and common extensions.

#kramdown#html-generation#ruby-gem
Stars1.8k
Forks276
Last commit2 months ago
python-slugify
python-slugifyPython

A Python library that converts Unicode strings into URL-friendly slugs with extensive customization options.

#unicode#slug-generation#url-slugs
Stars1.6k
Forks118
Last commit3 months ago
emojify
emojifyShell

A shell script that converts emoji aliases (like :smile:) into actual emoji characters on the command line.

#emoji#productivity#terminal
Stars1.6k
Forks69
Last commit2 years ago
PreviousPage 2 of 2

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub
#Markdown22
#Unicode16
#Markdown Parser16
#Nodejs11
#Commonmark11
#Javascript11
#Cli Tool10
#Python Library10
#Python9
#Html Generation9
#Regex8
#Javascript Library8