An open-source study on neural question generation using transformers, providing simplified training and inference pipelines.
Question Generation is an open-source project that uses transformer models to automatically generate questions from text passages. It implements various approaches like answer-aware question generation, multi-task QA-QG, and end-to-end question generation to simplify the creation of questions from given content. The project provides easy-to-use pipelines and training scripts to make neural question generation more accessible and practical.
NLP researchers, machine learning engineers, and developers working on educational technology, content automation, or quiz generation who need to automatically create questions from textual data.
It offers a simplified, end-to-end approach to question generation using state-of-the-art transformers, with pre-trained models and reproducible scripts that reduce the complexity typically associated with QG systems. The multi-task model consolidates answer extraction, QG, and QA into a single pipeline, streamlining deployment.
Neural question generation using transformers
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Consolidates answer extraction, question generation, and question answering into a single model, reducing system complexity as described in the multitask QA-QG section.
Provides Hugging Face-style pipelines for quick deployment across different QG tasks, with clear usage examples in the README for answer-aware and end-to-end generation.
Includes data processing and fine-tuning scripts for T5 models, supporting custom datasets and ensuring experimental reproducibility with tools like wandb integration.
Supports both prepend and highlight formats for answer-aware QG, allowing users to choose based on task requirements, as detailed in the initial experiments section.
Pins to transformers==3.0.0, an older version that may lack optimizations, features, or security updates from newer releases, requiring manual upgrades for compatibility.
Relies on T5 models which are resource-heavy for training and inference, making it challenging for environments with limited GPU memory or budget constraints.
Pre-trained models are fine-tuned primarily on SQuAD, so performance may degrade on texts from specialized domains without additional fine-tuning, as admitted in the project's focus on simplicity over breadth.