Question 1

How to set up RC-Data on a Linux system?

Accepted Answer

Install Python 2.7, virtualenv, and development packages like libxml2-dev and libxslt-dev, then follow the README steps: create a virtual environment, install requirements, and run the download and generate scripts. Be prepared for potential issues with outdated dependencies.

Question 2

RC-Data vs SQuAD: which is better for QA research?

Accepted Answer

RC-Data uses news articles with entity anonymization and cloze questions, focusing on context-based reasoning, while SQuAD is based on Wikipedia with human-annotated diverse questions. RC-Data is larger and news-specific, but SQuAD is more widely adopted in recent benchmarks; choose based on your domain needs.

Question 3

What to do if RC-Data fails to download articles?

Accepted Answer

The README recommends downloading preprocessed datasets from http://cs.nyu.edu/~kcho/DMQA/ as an alternative when the Wayback Machine is down or URLs are unavailable. This ensures you can still access the dataset without running the script.

Question 4

Can I use RC-Data with Python 3?

Accepted Answer

No, the script requires Python 2.7 as per the prerequisites, and the dependencies may not be compatible with Python 3. You might need to use a virtual environment with Python 2.7 or modify the code, which can be challenging.

Question 5

How many question/answer pairs does RC-Data generate?

Accepted Answer

For the Daily Mail corpus, it generates approximately 1,000,000 small files, indicating a large-scale dataset with numerous context-question-answer triples, suitable for extensive model training and evaluation.

Question 6

What is the entity mapping in RC-Data output files?

Accepted Answer

Entity mapping refers to the anonymization process where named entities are replaced with placeholders in questions and answers, with the mapping provided in output files to allow for reconstruction or analysis, as described in the dataset format section.

DeepMind QA Corpus

What is DeepMind QA Corpus?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions