An open-source benchmark toolkit for Natural Language Generation in spoken dialogue systems, featuring multiple RNN-based models and datasets.
RNNLG is an open-source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue systems. It provides a collection of datasets and implementations of RNN-based models to generate fluent, human-like text from structured meaning representations (like dialogue acts). The toolkit addresses the need for standardized evaluation and reproducible research in NLG for conversational AI.
Researchers and developers working on Natural Language Generation, particularly in spoken dialogue systems, who need benchmark datasets and model implementations for experimentation and evaluation.
RNNLG offers a unified framework with multiple state-of-the-art RNN models and cross-domain datasets, enabling reproducible benchmarking and domain adaptation research without building infrastructure from scratch.
RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes four original domains and counterfeited datasets for cross-domain adaptation, enabling standardized evaluation as detailed in Wen et al., 2016, which facilitates reproducible research.
Provides RNN-based generators like Semantically Conditioned LSTM and Attentive Encoder-Decoder LSTM, plus baselines (kNN, N-gram), allowing for comparative studies in NLG.
Supports Maximum Likelihood and Discriminative Training (Expected BLEU), offering different optimization objectives for model training, as shown in the configuration parameters.
Includes counterfeited datasets and adaptation methods for cross-domain scenarios, crucial for research on multi-domain spoken dialogue systems.
Depends on Theano 0.8.2, which is deprecated and no longer maintained, causing compatibility issues with modern Python environments and libraries.
Focuses solely on RNN-based models, missing contemporary architectures like Transformers that dominate current NLG research and applications.
Requires manual tweaking of numerous config parameters and specific package versions, making initial setup error-prone and time-consuming.
Datasets are limited to specific dialogue domains and haven't been updated since 2016, restricting use cases and failing to reflect modern NLG challenges.