A TensorFlow implementation of QANet for machine reading comprehension on the SQuAD dataset.
QANet is a TensorFlow implementation of the QANet neural network architecture designed for machine reading comprehension. It solves the task of answering questions based on a provided text passage, specifically trained and evaluated on the Stanford Question Answering Dataset (SQuAD). The model replaces traditional recurrent layers with convolutional and self-attention mechanisms to improve training speed and performance.
Machine learning researchers and developers working on natural language processing, particularly those focused on question answering, reading comprehension tasks, or experimenting with hybrid convolutional-attention models.
Developers choose this implementation for a practical, open-source TensorFlow version of QANet that includes training pipelines, an interactive demo, and documented adaptations for hardware constraints, offering a accessible starting point for SQuAD-based projects.
A Tensorflow implementation of QANet for machine reading comprehension
Replaces RNNs with depthwise separable convolution and self-attention layers for faster training and inference, aligning with the paper's efficiency goals as stated in the philosophy.
Includes scripts for data preprocessing, training, testing, and an interactive demo server, adapted from R-Net for workflow efficiency, with modes configurable via config.py.
Provides a results table comparing performance with the original paper and explains adaptations like reduced hidden size and single-head attention due to GPU constraints.
Employs dropout, stochastic depth dropout, and exponential moving average for stable training, as detailed in the implementation section to prevent overfitting.
Uses single-head attention and hidden size 96 instead of 8 heads and 128 due to memory issues, resulting in lower EM/F1 scores (e.g., 70.8/80.1 vs paper's 73.6/82.7) as acknowledged in the README.
TODO list shows missing features like data augmentation and training with full hyperparameters, limiting its capability compared to the original QANet architecture.
Requires Python>=2.7 and specific legacy libraries like spacy==2.0.9, which may cause compatibility issues with modern systems and increase setup complexity.
TensorFlow code and pre-trained models for BERT
Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.