TensorFlow implementation of character-aware neural language models using CNN, highway networks, and LSTM.
lstm-char-cnn-tensorflow is a TensorFlow implementation of character-aware neural language models that process text at both character and word levels. It combines convolutional neural networks for character embeddings with highway networks and LSTM layers to create sophisticated language models that can capture morphological patterns and long-range dependencies in text.
Machine learning researchers and NLP practitioners interested in character-level language modeling, particularly those working with TensorFlow who want to experiment with hybrid character-word architectures.
This implementation provides a clean, modular codebase for the influential Character-Aware Neural Language Models paper, offering researchers a starting point for experimenting with character-level representations in language models without building from scratch.
in progress
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides a clean TensorFlow replication of the Character-Aware Neural Language Models paper, including character-level CNNs, highway networks, and LSTM layers as described in the architecture.
The code is organized into separate components for CNN, highway network, and LSTM, making it easy to modify and experiment with different aspects of the model.
Combines character embeddings for morphological features with optional word embeddings, allowing capture of both subword and word-level linguistic patterns through configurable flags.
Supports LSTM and LSTMTDNN model variants with numerous command-line parameters for adjusting embedding dimensions, CNN kernels, and dropout, enabling customization for research.
The README explicitly states failure to match the paper's perplexity scores on PTB, with results still 'in progress' as of 2016, undermining its reliability for benchmarking.
Issue #3 highlights significant efficiency problems in the implementation, affecting training speed and making it unsuitable for large-scale or time-sensitive experiments.
Relies on Python 2.7 (deprecated) and older TensorFlow versions, limiting compatibility with modern tools and requiring potential migration efforts for current environments.
Lacks extensive tutorials or updates, and the author directs users to another repository for reproduced results, indicating the project is not actively maintained.