Showing 12 of 12 projects
A collaboratively maintained, reverse-chronological list of datasets and corpora for natural language processing tasks.
A large-scale multi-domain dataset of over 20k annotated task-oriented dialogues for training and evaluating virtual assistants.
An open-source benchmark toolkit for Natural Language Generation in spoken dialogue systems, featuring multiple RNN-based models and datasets.
Neural machine translation between Shakespearean and modern English using TensorFlow.
A dataset of NBA game summaries aligned with box- and line-scores for data-to-text generation research.
A characteristic-rich dataset for factoid question answering with explicit question specifications to enable fine-grained QA system evaluation.
An enriched dataset for Natural Language Generation research, providing intermediate representations for pipeline tasks like lexicalization and aggregation.
A CCG parser implementing all combinators with parsing to logical form and parameter estimation for probabilistic CCG.
A Julia package providing lazy-loading iterators for various NLP corpora with automatic data dependency management.
A dataset for context-aware natural language generation in task-oriented spoken dialogue systems for public transport information.
A repository for planning and training German transformer language models from scratch.
A curated list of initiatives and projects for adding new or low-resource languages to open-source machine translation models.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.