Text Mining

28 projects

Showing 28 of 28 projects

A curated list of resources dedicated to Natural Language Processing (NLP), including libraries, datasets, tutorials, and research.

#ai#nlp-resources#text-analysis

Stars18.8k

Forks2.9k

Last commit11 days ago

trafilaturaPython

A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.

#text-extraction#readability#article-extractor

Stars6.3k

Forks397

Last commit4 days ago

NLP Roadmap

A visual roadmap and keyword mind map for students learning Natural Language Processing, from basics to SOTA models.

#roadmap#keyword#data-science

Stars3.3k

Forks514

Last commit6 years ago

Curated list of R tutorials for Data Science, NLP and Machine LearningR

A curated collection of R tutorials, packages, and resources for Data Science, NLP, and Machine Learning.

#data-science#statistics#r-programming

A spaCy pipeline and models specifically designed for processing scientific and biomedical documents.

#biomedical-nlp#scientific-text#spacy

Stars2.0k

Forks258

Last commit7 months ago

Information Retrieval

A curated list of awesome information retrieval resources including books, courses, datasets, software, and conferences.

#research-datasets#information-retrieval#awesome-list

Stars1.2k

Forks142

Last commit3 years ago

text2vecR

An efficient R package for text analysis and NLP with fast vectorization, topic modeling, and word embeddings.

#parallel-computing#word2vec#r-package

Stars875

Forks133

Last commit7 months ago

BigARTMC++

A fast, open-source platform for topic modeling using Additive Regularization of Topic Models (ARTM).

#additive-regularization#sparse-modeling#python-library

Stars674

Forks119

Last commit5 months ago

LDAvisJavaScript

An R package for creating interactive web-based visualizations of Latent Dirichlet Allocation (LDA) topic models.

#statistical-visualization#r-package#text-analysis

Stars570

Forks130

Last commit2 years ago

German NLP resources

A curated list of open-access resources and tools for Natural Language Processing (NLP) focused on the German language.

#german-language#computational-linguistics#language-resources

Stars528

Forks67

Last commit1 year ago

Biomedical Information Extraction

A curated list of resources for Biomedical Information Extraction (BioIE), including datasets, tools, libraries, and research.

#biomedical-language#biomedical-nlp#biomedical-data

Stars461

Forks40

Last commit1 month ago

medaCyPython

A medical text mining and information extraction framework built on spaCy for rapid prototyping and training of predictive NLP models.

#spacy#clinical-text#metamap

Stars438

Forks92

Last commit3 years ago

awesome-hungarian-nlp

A curated list of free tools, datasets, models, and resources for Hungarian Natural Language Processing.

#computational-linguistics#hungarian#information-retrieval

Stars280

Forks19

Last commit3 months ago

GWU: Data Mining (Decision Sciences 6279)Jupyter Notebook

Course materials for GWU's Data Mining and Machine Learning classes covering preprocessing, modeling, and practical Kaggle applications.

#data-science#kaggle#educational-materials

A corpus of academic papers about COVID-19 and related coronavirus research for text mining and NLP.

#document-embeddings#semantic-scholar#natural-language-processing

Stars187

Forks23

Last commit1 year ago

EDS_NLPPython

A modular NLP framework for extracting information from French clinical notes, compatible with spaCy and PyTorch.

#medical-text#spacy#fast

Stars164

Forks44

Last commit11 days ago

Deep Belief Nets for Topic ModelingPython

A Python toolbox using deep belief networks for topic modeling on document data, producing latent representations for content-based recommendation.

#deep-belief-networks#research-tool#document-analysis

A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm for extracting keywords from text.

#rake-algorithm#information-retrieval#text-analysis

Stars124

Forks19

Last commit1 year ago

NMFJulia

A Julia package providing multiple algorithms for non-negative matrix factorization, including multiplicative updates, ALS, coordinate descent, and separable NMF.

#image-analysis#julia#statistics

Stars95

Forks32

Last commit21 days ago

CRAFTClojure

A biomedical text corpus with 97 full-text articles annotated for concepts, coreferences, and structural elements.

#coreference-resolution#mondo-ontology#biomedical-text-corpus

Stars80

Forks19

Last commit2 years ago

count-min-logGo

Go implementation of Count-Min-Log sketch for improved approximate counting of low-frequency events.

#probabilistic-data-structures#stream-processing#go-library

Stars70

Forks6

Last commit1 year ago

WoollyElixir

A text mining and natural language processing API for the Elixir programming language.

#elixir#api#text-analysis

Stars54

Forks8

Last commit5 years ago

PubrunnerPython

A framework for keeping biomedical text mining tools running on the latest publications from PubMed.

#scientific-computing#research-tools#natural-language-processing

Stars42

Forks6

Last commit6 years ago

TabInOutPython

A framework and GUI wizard for extracting structured information from tables in scientific literature, particularly biomedical publications.

#rule-engine#gui-wizard#rule-based

Stars41

Forks10

Last commit7 years ago

rake-rsRust

A multilingual Rust implementation of the RAKE algorithm for automatic keyword extraction from text.

#rake-algorithm#algorithm#text-analysis

Stars36

Forks8

Last commit1 year ago

tfidfElixir

An Elixir library for calculating tf-idf (term frequency–inverse document frequency) scores to identify important words in text.

#nlp-library#elixir#information-retrieval

Stars18

Forks5

Last commit6 years ago

Bio-SCoResJava

A modular Java framework for coreference resolution in biomedical text, supporting multiple coreference types and resolution strategies.

#biomedical-nlp#modular-architecture#coreference-resolution

Stars9

Forks1

Last commit6 years ago

Colombian Political Speeches

A collection of Latin American corpora, dictionaries, and text resources for natural language processing and text mining.

#corpora#computational-linguistics#latin-america

Stars6

Forks3

Last commit13 years ago

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub