Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Question Answering
  3. NarrativeQA

NarrativeQA

Apache-2.0Shell

A reading comprehension dataset with Wikipedia summaries, full stories, and question-answer pairs for narrative understanding.

GitHubGitHub
513 stars69 forks0 contributors

What is NarrativeQA?

NarrativeQA is a reading comprehension dataset created by DeepMind for evaluating machine understanding of entire narratives. It consists of documents with Wikipedia summaries, links to full stories (books and movie scripts), and corresponding question-answer pairs. The dataset challenges models to comprehend long-form text rather than just extract factual information from short passages.

Target Audience

Researchers and developers working on natural language processing, particularly in reading comprehension, question answering, and narrative understanding tasks. It's especially relevant for those building or evaluating models that need to understand long documents and complex storylines.

Value Proposition

Unlike many QA datasets that focus on factoid extraction from short passages, NarrativeQA requires understanding of entire narratives, making it more challenging and realistic. It provides a benchmark for testing true reading comprehension capabilities in AI systems.

Overview

This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.

Use Cases

Best For

  • Training reading comprehension models on long-form narratives
  • Benchmarking question-answering systems on complex stories
  • Research on narrative understanding and story comprehension
  • Evaluating NLP models' ability to track plot and character relationships
  • Developing models that can answer questions about entire books or movies
  • Comparing machine comprehension against human narrative understanding

Not Ideal For

  • Projects requiring quick, factoid question-answering on short texts, such as news article summarization or simple information retrieval
  • Teams with limited storage or bandwidth, as downloading full stories (books, movie scripts) externally can be resource-intensive
  • Applications needing up-to-date or domain-specific content, since the dataset is static and focused on fictional narratives from a fixed set
  • Research focused solely on extractive QA or span prediction, as answers in NarrativeQA are abstractive and require narrative inference

Pros & Cons

Pros

Long-Form Comprehension Focus

Specifically designed to test understanding of entire narratives, unlike datasets like SQuAD that use short passages, making it ideal for benchmarking advanced reading comprehension on complex stories.

Rich Metadata and Summaries

Includes Wikipedia summaries and detailed metadata such as word counts and source information in documents.csv, providing additional context for model training and evaluation.

Pre-Tokenized Data

Offers tokenized versions of questions and answers in qaps.csv, easing integration into NLP pipelines and reducing preprocessing overhead for researchers.

High-Quality Curation

Created by DeepMind with a peer-reviewed paper, ensuring reliability and academic rigor, which is evident from the structured files and citation guidelines.

Cons

External Story Dependency

Full stories are not included in the repository; users must download them separately using download_stories.sh, which can be time-consuming, prone to link rot, and requires manual verification with compare.sh.

Limited Direct Accessibility

The dataset primarily provides summaries and QA pairs, with stories hosted externally, making it less self-contained and adding steps for full utilization, unlike datasets that bundle all text.

Static and Niche Content

Focuses on a fixed set of books and movies, which may not generalize to other domains like technical or real-time narratives, and lacks updates since its release.

Frequently Asked Questions

Quick Stats

Stars513
Forks69
Contributors0
Open Issues0
Last commit6 years ago
CreatedSince 2017

Tags

#text-analysis#deep-learning#question-answering#natural-language-processing#reading-comprehension#dataset#machine-learning

Included in

Question Answering767
Auto-fetched 1 day ago

Related Projects

DeepMind QA CorpusDeepMind QA Corpus

Question answering dataset featured in "Teaching Machines to Read and Comprehend

Stars1,296
Forks240
Last commit9 years ago
ELI5ELI5

Scripts and links to recreate the ELI5 dataset.

Stars324
Forks42
Last commit4 years ago
NewsQANewsQA

Tools for using Maluuba's NewsQA Dataset (public version)

Stars257
Forks56
Last commit3 years ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub