Showing 21 of 21 projects
A curated list of resources dedicated to Natural Language Processing (NLP), including libraries, datasets, tutorials, and research.
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
A visual roadmap and keyword mind map for students learning Natural Language Processing, from basics to SOTA models.
A curated collection of R tutorials, packages, and resources for Data Science, NLP, and Machine Learning.
A spaCy pipeline and models specifically designed for processing scientific and biomedical documents.
A curated list of awesome information retrieval resources including books, courses, datasets, software, and conferences.
An efficient R package for text analysis and NLP with fast vectorization, topic modeling, and word embeddings.
A fast, open-source platform for topic modeling using Additive Regularization of Topic Models (ARTM).
An R package for creating interactive web-based visualizations of Latent Dirichlet Allocation (LDA) topic models.
A curated list of open-access resources and tools for Natural Language Processing (NLP) focused on the German language.
A curated list of resources for Biomedical Information Extraction (BioIE), including datasets, tools, libraries, and research.
A medical text mining and information extraction framework built on spaCy for rapid prototyping and training of predictive NLP models.
A curated list of free tools, datasets, models, and resources for Hungarian Natural Language Processing.
Course materials for GWU's Data Mining and Machine Learning classes covering preprocessing, modeling, and practical Kaggle applications.
A corpus of academic papers about COVID-19 and related coronavirus research for text mining and NLP.
A modular NLP framework for extracting information from French clinical notes, compatible with spaCy and PyTorch.
A Python toolbox using deep belief networks for topic modeling on document data, producing latent representations for content-based recommendation.
A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm for extracting keywords from text.
A Julia package providing multiple algorithms for non-negative matrix factorization, including multiplicative updates, ALS, coordinate descent, and separable NMF.
A biomedical text corpus with 97 full-text articles annotated for concepts, coreferences, and structural elements.
Go implementation of Count-Min-Log sketch for improved approximate counting of low-frequency events.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.