Text Processing

#c-library#embedded-systems#perl-compatible

Forks114

Last commit1 year ago

PCRE2C

A portable C library implementing Perl-compatible regular expression pattern matching with Unicode support and optional JIT compilation.

#deep-learning#question-answering#natural-language-processing

Forks272

Last commit23 days ago

DeepMind QA CorpusPython

Script to generate question/answer pairs from CNN and Daily Mail articles for machine reading comprehension research.

#c-library#unicode#internationalization

Forks239

Last commit9 years ago

utf8procC

A clean C library for Unicode normalization, case-folding, and UTF-8 processing.

#regex#extract-urls#go-library

Forks173

Last commit2 days ago

xurlsGo

A Go library and command-line tool to extract URLs from text using regular expressions.

#emoji#unicode#mobile-devices

Forks118

Last commit3 months ago

php-emojiPHP

A PHP library for converting emoji between native mobile formats and HTML display.

Stars1.2k

Forks285

#scala-js#text-processing#domain-specific-languages

FastParseScala

A Scala library for building fast parsers using parser combinators with minimal boilerplate.

#developer-tools#css-selectors#syntax-highlighting

Forks165

Last commit6 months ago

xqGo

A command-line tool for formatting, highlighting, and extracting content from XML and HTML documents.

#url-slug#slugifier#speakingurl

Forks35

Last commit2 days ago

speakingurlJavaScript

A JavaScript library to generate URL slugs with transliteration and extensive customization options.

Forks84

PidginC#

A lightweight, fast, and flexible parser combinator library for C#.

#parsing#parse#csharp

#computational-linguistics#ruby-gems#pos-tag

Forks75

Last commit25 days ago

NLP with RubyRuby

A curated list of awesome resources, libraries, and tools for natural language processing (NLP) in Ruby.

Forks70

Awesome NLP with RubyRuby

A curated list of awesome resources, libraries, and tools for natural language processing (NLP) in Ruby.

#computational-linguistics#ruby-gems#text-analysis

Forks70

#string-slugification#url-sanitization#text-processing

node-slugCoffeeScript

A Node.js library that converts strings to URL-safe slugs, handling Unicode characters and symbols.

#performance-optimized#commonmark#markdown-parser

Forks90

Last commit7 years ago

CommonMark.NETC#

A C# implementation of the CommonMark specification for converting Markdown to HTML, optimized for performance and portability.

Stars1.0k

Forks144

Last commit6 years ago

Cebe MarkdownHTML

A super fast, highly extensible markdown parser for PHP supporting multiple flavors like GitHub, Markdown Extra, and traditional Markdown.

#hacktoberfest#markdown-parser#markdown-flavors

Stars1.0k

Forks137

#c-library#open-source#zero-dependency

HoedownC

A standards-compliant, fast, and secure C library for parsing and rendering Markdown to HTML.

Stars991

Forks124

Last commit6 years ago

libfsmC

A library for deterministic finite automata (DFA) regular expressions and lexical analysis tools.

#c-library#dfa#regexes

Stars983

Forks57

Last commit13 days ago

UnicodeJavaScript

A curated collection of Unicode resources, character quirks, and practical examples for developers.

#emoji#unicode#internationalization

Stars979

Forks69

#emoji#developer-tools#unicode

UnicodeJavaScript

A curated collection of Unicode resources, quirks, and creative uses for developers.

Stars979

Forks69

Awesome UnicodeJavaScript

A curated collection of Unicode resources, quirks, and creative uses for developers and enthusiasts.

#emoji#developer-tools#unicode

Stars979

Forks69

#c-library#posix-compliant#portable

TREC

A POSIX-compliant regex library with approximate (fuzzy) matching and predictable performance.

Stars916

Forks146

Last commit2 months ago

fasttemplateGo

A minimal Go template engine focused solely on high-speed placeholder substitution without escaping.

#template#fast#high-performance

Stars909

Forks83

#elixir#eex-templates#markdown-parser

earmarkElixir

A pure Elixir library for parsing Markdown into HTML and AST with extensive customization options.

Stars895

Forks153

Last commit1 month ago

ES ReverserJavaScript

A Unicode-aware string reverser for JavaScript that correctly handles combining marks and astral symbols.

#unicode#astral-symbols#cli-tool

Stars888

Forks31

#pre-commit#codeformatter#python-library

mdformatPython

An opinionated, CommonMark-compliant Markdown formatter and Python library for enforcing consistent style.

Stars797

Forks62

Last commit11 days ago

auto_htmlRuby

A Ruby gem that transforms plain text into HTML using a pipeline of composable filters.

#emoji#rails#pipeline

Stars797

Forks183

Last commit1 month ago

trieGo

A Go implementation of a trie data structure with algorithms for extremely fast prefix and fuzzy string searching.

#search-algorithms#go-library#data-structures

Stars791

Forks116

Last commit6 months ago

Code Points

A curated list of interesting Unicode characters with unique features, quirks, and fun uses.

#emoji#developer-tools#unicode

Stars778

Forks22

#developer-tools#r-package#string interpolation

glueR

Interpreted string literals for R that embed expressions in curly braces for easy data interpolation.

Stars748

Forks63

Last commit3 months ago

hckRust

A sharp cut(1) clone with regex delimiters, column reordering, and automatic decompression for data exploration.

#regex-delimiter#unix-tools#cut-clone

Stars743

Forks18

Last commit1 month ago

levenJavaScript

Fastest JavaScript implementation of the Levenshtein distance algorithm for measuring string similarity.

#algorithm#npm-package#levenshtein-distance

Stars734

Forks31

Last commit10 months ago

go-runewidthGo

A Go library for measuring the display width of characters and strings, handling East Asian fullwidth characters.

#wcwidth#unicode#go-library

Stars714

Forks101

Last commit17 hours ago

python-nameparserPython

A simple Python module for parsing human names into their individual components

#text-processing#text-parser#python

Stars712

Forks112

Last commit13 hours ago

camelcaseJavaScript

Convert dash/dot/underscore/space separated strings to camelCase or PascalCase with Unicode support.

#formatting#unicode#camelcase

Stars699

Forks100

Last commit3 days ago

whatlanggoGo

A natural language detection library for Go that identifies 84 languages and scripts with no external dependencies.

#multilingual-support#text-analysis#script-recognition

Stars693

Forks68