Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Natural Language Generation
  3. WebNLG

WebNLG

Python

An enriched dataset for Natural Language Generation research, providing intermediate representations for pipeline tasks like lexicalization and aggregation.

GitHubGitHub
70 stars22 forks0 contributors

Overview

The enriched version of the WebNLG described at INLG 2018

Quick Stats

Stars70
Forks22
Contributors0
Open Issues5
Last commit5 years ago
CreatedSince 2018

Tags

#pipeline-architecture#nlp-research#data-to-text#academic#text-generation#natural-language-generation#dataset#benchmark#corpus

Included in

Natural Language Generation480
Auto-fetched 1 day ago

Related Projects

The Schema-Guided Dialogue DatasetThe Schema-Guided Dialogue Dataset

The Schema-Guided Dialogue Dataset

Stars604
Forks134
Last commit2 years ago
Box-score dataBox-score data

This dataset provides structured NBA basketball game data paired with human-written summaries, enabling research in data-to-document generation. It serves as a benchmark for training and evaluating models that convert structured statistics into coherent natural language narratives. ## Key Features - **Aligned Summaries and Statistics** — Each human-written game summary is paired with corresponding box-scores and line-scores. - **Dual Source Coverage** — Includes data from Rotowire (2014–2017) and SBNation (2006–2017) with distinct writing styles. - **Structured JSON Format** — Data is provided in a consistent JSON schema with team, player, and game details. - **Preprocessed for NLP** — Summaries are tokenized and cleaned, with numeric values standardized as integers. - **Standard Splits** — Data is divided into training, validation, and test sets for machine learning experiments. ## Philosophy The dataset is designed to support reproducible research in natural language generation, focusing on the challenge of transforming structured sports data into fluent, informative text.

Stars115
Forks25
Last commit4 years ago
YelpNLGYelpNLG

YelpNLG provides resources for natural language generation of restaurant reviews

Stars0
Forks0
Last commit
WikiBio - wikipedia biography datasetWikiBio - wikipedia biography dataset

This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms

Stars0
Forks0
Last commit
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub