YelpNLG provides resources for natural language generation of restaurant reviews
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The Schema-Guided Dialogue Dataset
This dataset provides structured NBA basketball game data paired with human-written summaries, enabling research in data-to-document generation. It serves as a benchmark for training and evaluating models that convert structured statistics into coherent natural language narratives. ## Key Features - **Aligned Summaries and Statistics** — Each human-written game summary is paired with corresponding box-scores and line-scores. - **Dual Source Coverage** — Includes data from Rotowire (2014–2017) and SBNation (2006–2017) with distinct writing styles. - **Structured JSON Format** — Data is provided in a consistent JSON schema with team, player, and game details. - **Preprocessed for NLP** — Summaries are tokenized and cleaned, with numeric values standardized as integers. - **Standard Splits** — Data is divided into training, validation, and test sets for machine learning experiments. ## Philosophy The dataset is designed to support reproducible research in natural language generation, focusing on the challenge of transforming structured sports data into fluent, informative text.
This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms
Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data