A GPT-2 variant that generates plausible fake words, definitions, and usage examples from scratch.
This Word Does Not Exist is a machine learning project that uses a GPT-2 variant to generate entirely new words, complete with definitions, parts of speech, and example sentences. It solves the creative challenge of inventing plausible vocabulary that sounds authentic but doesn't actually exist in any language.
AI researchers, NLP enthusiasts, and developers interested in creative language generation and the boundaries of machine learning models.
It provides a fully open-source pipeline for training and deploying word-invention models, with pre-trained models available for immediate use and tools for custom training on various dictionary sources.
This Word Does Not Exist
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Generates entirely new, pronounceable words with plausible definitions and example sentences, as demonstrated by outputs like 'incromulentness' in the README.
Supports both forward (word to definition) and inverse (definition to word) models, allowing flexible creative tasks, with pre-trained models available for download.
Provides tools to train models on custom dictionary sources, such as Apple dictionaries or Urban Dictionary, using scripts and notebooks outlined in the training section.
Offers downloadable pre-trained models and a simple WordGenerator class for quick word and definition generation without training, as shown in the inference code snippet.
Requires managing Python environments, downloading large model files, and configuring paths, which is cumbersome for quick deployment, as indicated by the separate CPU deploy environment and model download steps.
Relies on notebooks and code snippets rather than comprehensive tutorials, making it less accessible for beginners or those unfamiliar with machine learning workflows.
Inference and training are computationally heavy, especially on CPU, with options like quantization needed to manage performance, limiting use in low-resource environments.