A latent text-to-image diffusion model that generates detailed images from text prompts, running on GPUs with at least 10GB VRAM.
Stable Diffusion is an open-source latent text-to-image diffusion model that generates detailed images from natural language prompts. It solves the problem of high-quality image synthesis by operating efficiently in a compressed latent space, making it runnable on GPUs with at least 10GB VRAM. The model is conditioned on text embeddings from a frozen CLIP ViT-L/14 encoder and is trained on subsets of the LAION-5B dataset.
AI researchers, machine learning engineers, and developers working on generative art, creative tools, or applications requiring text-to-image synthesis. It's also suitable for hobbyists with capable GPU hardware.
Developers choose Stable Diffusion for its open-source nature, relatively lightweight architecture compared to alternatives like Imagen, and strong community support through integrations like diffusers. It offers a balance of high-quality output and computational efficiency.
A latent text-to-image diffusion model
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Stable Diffusion is an open-source alternative to the following products: