Showing 20 of 20 projects
Industrial-strength Natural Language Processing library for Python, featuring pretrained pipelines for 70+ languages and production-ready training.
A lightweight JavaScript library for natural language processing that transforms text into structured data with a modest, pragmatic approach.
Fast, state-of-the-art tokenizers for training and tokenization, optimized for both research and production.
A modular natural language processing library for Node.js and React Native, designed for building multilingual chatbots and language utilities.
A Python NLP library built on spaCy for text preprocessing, feature extraction, and analysis tasks.
A Rust library for natural language detection using trigram models, focusing on simplicity and performance.
A self-contained Japanese morphological analyzer written in pure Go, tokenizing text into words and analyzing parts of speech.
A multilingual command-line sentence tokenizer written in Go, ported from NLTK's Punkt system.
A Julia package providing standard tools and models for text analysis and natural language processing.
A Scala library for natural language processing with functional and actor-based pipelines.
A comprehensive Natural Language Processing (NLP) library for the Crystal programming language.
A Ruby natural language processor for tokenizing and analyzing text with flexible filtering and custom regex support.
A rule-based question classification system for Node.js that categorizes questions by type and answer format.
An English (Porter2) stemming implementation in Elixir for reducing words to their base forms.
A natural language processing library for Uralic and other languages, offering morphological analysis, generation, lemmatization, and lexical information.
A multilingual Ruby gem for splitting strings into tokens with extensive language support and configurable options.
A Ruby port of the NLTK Punkt algorithm for unsupervised, language-independent sentence boundary detection.
A rule-based Unicode tokenizer that separates words from punctuation and splits sentences for NLP preprocessing.
A Ruby wrapper for the spaCy NLP library via PyCall, enabling tokenization, POS tagging, NER, and OpenAI integration.
An Elixir natural language processor for tokenization, counting, and string similarity analysis.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.