Showing 4 of 4 projects
An unsupervised text tokenizer and detokenizer for neural network-based text generation systems with subword units.
Fast, state-of-the-art tokenizers for training and tokenization, optimized for both research and production.
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
A Python NLP library built on spaCy for text preprocessing, feature extraction, and analysis tasks.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.