A curated list of free tools, datasets, models, and resources for Hungarian Natural Language Processing.
Awesome Hungarian NLP is a curated GitHub repository listing free and open-source resources dedicated to Natural Language Processing for the Hungarian language. It solves the problem of fragmented information by providing a centralized directory of tools, datasets, models, and academic materials tailored to Hungarian's linguistic characteristics. This enables researchers and developers to efficiently find specialized resources for building Hungarian NLP applications.
NLP researchers, computational linguists, data scientists, and software developers working with or interested in the Hungarian language. It is particularly valuable for those building language models, text analysis tools, or academic projects requiring Hungarian linguistic data.
Developers choose this because it is the most comprehensive, community-vetted collection of Hungarian NLP resources in one place, saving significant research time. Its focus on free and open-source materials lowers barriers to entry and supports reproducible research and development for a language with fewer dedicated resources than English.
A curated list of NLP resources for Hungarian
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Covers everything from tokenizers and morphological analyzers to LLMs and annotated corpora, as shown by the extensive table of contents with over 20 categories.
Features GitHub badges for automated link checking and stars, indicating active maintenance and community contributions, with a clear maintainer listed.
Explicitly lists only free resources with commercial-friendly licenses, using notations like '🚀 Commercial-friendly license' throughout the README.
Addresses the agglutinative nature of Hungarian with specialized tools like HuSpaCy and magyarlanc, and includes unique datasets such as the Hungarian Webcorpus.
Acts as a passive list without tutorials or best practices for combining resources, leaving users to figure out tool compatibility and pipeline setup on their own.
Relies on external links that may break or become obsolete over time, despite automated checks, and some listed tools like HunPos are older with limited updates.
Does not rank, review, or benchmark resources, making it difficult for users to assess which tools perform best for specific tasks without independent testing.