Computational Linguistics

35 projects

Showing 35 of 35 projects

awesome-chinese-nlp

A curated list of resources, tools, datasets, and learning materials for Chinese Natural Language Processing.

#computational-linguistics#text-corpus#research-tools

Stars7.9k

Forks1.7k

Last commit3 years ago

pkuseg-pythonPython

A multi-domain Chinese word segmentation toolkit offering higher accuracy and domain-specific models.

#part-of-speech-tagging#computational-linguistics#python-library

Stars6.7k

Forks981

Last commit3 years ago

100 NLP Papers

A curated list of 100 foundational and influential papers in natural language processing for students and researchers.

#literature-review#computational-linguistics#research-papers

Stars3.8k

Forks559

Last commit5 years ago

textacyPython

A Python NLP library built on spaCy for text preprocessing, feature extraction, and analysis tasks.

#nlp-library#computational-linguistics#spacy

Stars2.2k

Forks247

Last commit2 years ago

PhonemizerPython

A Python library and CLI tool for converting text to phonetic transcriptions (phones) across multiple languages using various backends.

#computational-linguistics#python-library#ipa

Stars1.6k

Forks200

Last commit1 year ago

treatRuby

A comprehensive natural language processing framework for Ruby with support for text extraction, parsing, and machine learning.

#text-extraction#computational-linguistics#text-analysis

Stars1.4k

Forks124

Last commit1 year ago

NLP with RubyRuby

A curated list of awesome resources, libraries, and tools for natural language processing (NLP) in Ruby.

#computational-linguistics#ruby-gems#pos-tag

Stars1.1k

Forks70

Last commit3 years ago

Awesome NLP with RubyRuby

A curated list of awesome resources, libraries, and tools for natural language processing (NLP) in Ruby.

#computational-linguistics#ruby-gems#text-analysis

Stars1.1k

Forks70

Last commit3 years ago

Computational Neuroscience

A curated directory of academic institutions and principal investigators in computational neuroscience worldwide.

#academia#computational-linguistics#neuroscience

Stars978

Forks87

Last commit2 years ago

quantedaR

An R package for the quantitative analysis of textual data, providing comprehensive tools for natural language processing and text management.

#computational-linguistics#parallel-computing#r-package

A curated list of open-access resources and tools for Natural Language Processing (NLP) focused on the German language.

#german-language#computational-linguistics#language-resources

Stars528

Forks67

Last commit1 year ago

PaperRobot: Incremental Draft Generation of Scientific IdeasPython

An AI system that incrementally generates scientific paper drafts by predicting links between concepts and generating text sections.

#computational-linguistics#knowledge-graphs#paper-generation

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

#evaluation-metrics#computational-linguistics#library

Stars476

Forks

levenshteinGo

A high-performance Go library for calculating Levenshtein distance between strings, including Unicode support.

#algorithm#computational-linguistics#unicode

Stars471

Forks31

Last commit4 months ago

Linguistics

A curated list of resources, tools, datasets, and communities for linguistics and natural language processing.

#computational-linguistics#nlp-resources#natural-language-processing

Stars447

Forks34

Last commit5 months ago

Spanish

A curated collection of linguistic resources, tools, and datasets for Natural Language Processing and Computational Linguistics on Spanish.

#computational-linguistics#pos-tagging#machine-translation

Stars351

Forks42

Last commit2 years ago

awesome-spanish-nlp

A curated collection of linguistic resources, datasets, and tools for Natural Language Processing and Computational Linguistics on Spanish.

#computational-linguistics#text-analysis#nlp-datasets

Stars351

Forks42

Last commit2 years ago

awesome-hungarian-nlp

A curated list of free tools, datasets, models, and resources for Hungarian Natural Language Processing.

#computational-linguistics#hungarian#information-retrieval

Stars281

Forks19

Last commit3 months ago

BLLIP ParserGAP

BLLIP reranking parser (also known as Charniak-Johnson parser, Charniak parser, Brown reranking parser) See http://pypi.python.org/pypi/bllipparser/ for Python module.

#parsing#nlp-library#ai

Stars227

Forks53

Last commit4 years ago

OpenCCGJava

A Java library for parsing and generating text using combinatory categorial grammar and hybrid logic dependency semantics.

#computational-linguistics#java-library#grammar-parsing

Stars219

Forks45

Last commit5 years ago

TGenPython

A statistical natural language generator for spoken dialogue systems, supporting both A*-search and seq2seq algorithms.

#tgen#computational-linguistics#sequence-to-sequence

Stars207

Forks61

Last commit4 years ago

colibri-coreC++

A C++ and Python library for efficient extraction and analysis of n-grams, skipgrams, and flexgrams from large corpora.

#c-plus-plus-library#computational-linguistics#pattern-modeling

Stars131

Forks20

Last commit5 months ago

UralicNLPPython

A natural language processing library for Uralic and other languages, offering morphological analysis, generation, lemmatization, and lexical information.

#sami#nlp-library#computational-linguistics

Stars100

Forks8

Last commit4 months ago

Word TokenizersJulia

A Julia package providing high-performance, configurable tokenizers and sentence splitters for natural language processing.

#julia#computational-linguistics#sentence-splitting

Stars99

Forks25

Last commit4 years ago

frogC++

A tagger, lemmatizer, morphological analyzer, and dependency parser for Dutch using memory-based NLP modules.

#c-plus-plus-library#computational-linguistics#memory-based-learning

Stars82

Forks12

Last commit1 month ago

uctoC++

A rule-based Unicode tokenizer that separates words from punctuation and splits sentences for NLP preprocessing.

#nlp-library#computational-linguistics#rule-based

Stars72

Forks14

Last commit1 month ago

EasyCCGJava

A CCG parser implementing all combinators with parsing to logical form and parameter estimation for probabilistic CCG.

#probabilistic-models#computational-linguistics#nlp-research

Stars62

Forks20

Last commit8 years ago

python-uctoCython

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

#nlp-library#computational-linguistics#text-processing

Stars32

Forks5

Last commit