Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Question Answering
  3. DrQA

DrQA

Python

A PyTorch implementation of the DrQA model for reading comprehension and open-domain question answering.

GitHubGitHub
401 stars110 forks0 contributors

What is DrQA?

DrQA is a PyTorch implementation of a reading comprehension model designed to answer open-domain questions by processing evidence text. It solves the problem of extracting precise answers from natural language paragraphs, as benchmarked by the SQuAD dataset, with a focus on simplicity and strong performance.

Target Audience

Researchers and developers in natural language processing who need a clean, modifiable codebase for experimenting with reading comprehension models and question-answering systems.

Value Proposition

Developers choose this implementation for its lightweight, focused design that strips away unnecessary complexity, making it easier to understand, modify, and iterate on compared to bulkier official versions or chatbot frameworks.

Overview

A pytorch implementation of Reading Wikipedia to Answer Open-Domain Questions.

Use Cases

Best For

  • Experimenting with reading comprehension model architectures
  • Training and evaluating models on the SQuAD benchmark
  • Rapid prototyping of new ideas in question answering
  • Educational purposes for learning NLP model implementation
  • Comparing performance of grammatical feature pipelines (e.g., spaCy vs. CoreNLP)
  • Single-GPU research environments requiring a simplified codebase

Not Ideal For

  • Projects requiring integrated document retrieval for open-domain question answering
  • Teams needing multi-GPU training for scaling experiments or faster iteration
  • Applications demanding a production-ready interactive API or chatbot framework

Pros & Cons

Pros

SQuAD Benchmark Competitiveness

Achieves Exact Match and F1 scores nearly identical to the original paper and official implementations, as validated in the results table with EM 69.64 and F1 78.76.

Clean, Focused Codebase

Strips away extraneous features like document retrieval, offering a lightweight implementation that is easy to read, modify, and experiment with for SQuAD tasks, as emphasized in the philosophy.

Efficient Grammatical Processing

Uses spaCy for lemma, POS, and named entity tags, which is faster than Stanford CoreNLP while maintaining similar accuracy, as noted in the detailed comparisons section.

Simplified Single-GPU Setup

Optimized for single-GPU environments, reducing configuration complexity and making it accessible for research on standard hardware, unlike bulkier official versions.

Cons

Missing Core Features

Lacks the document retriever and interactive inference API present in the full DrQA system, limiting it to scenarios where evidence text is already provided, as admitted in comparisons.

No Multi-GPU Support

Only supports single-GPU training, which can slow down training times and hinder scalability compared to implementations that leverage multiple GPUs, as noted in the README.

High Memory Footprint

Preprocessing requires approximately 9GB of memory with default settings, as warned in the setup, which may be prohibitive for machines with limited RAM.

Based on Older Architecture

Implements a 2017 model without updates for modern advancements like transformers, making it less competitive with newer models that surpass DrQA on benchmarks.

Frequently Asked Questions

Quick Stats

Stars401
Forks110
Contributors0
Open Issues8
Last commit4 years ago
CreatedSince 2017

Tags

#research-tool#squad#spacy#question-answering#natural-language-processing#reading-comprehension#squad-dataset#machine-learning#pytorch

Built With

M
MsgPack
s
spaCy
P
Python
N
NumPy
P
PyTorch

Included in

Question Answering767
Auto-fetched 1 day ago

Related Projects

BERTBERT

TensorFlow code and pre-trained models for BERT

Stars40,021
Forks9,715
Last commit1 year ago
BiDAFBiDAF

Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.

Stars1,541
Forks670
Last commit3 years ago
QANetQANet

A Tensorflow implementation of QANet for machine reading comprehension

Stars985
Forks298
Last commit8 years ago
R-NetR-Net

Tensorflow Implementation of R-Net

Stars577
Forks209
Last commit7 years ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub