A Python tool that uses OpenAI's Whisper to automatically generate subtitle files for YouTube videos.
yt-whisper is a Python command-line tool that automatically generates subtitle files for YouTube videos using OpenAI's Whisper speech recognition model. It solves the problem of manually creating subtitles by automating the entire process from video download to transcription output. The tool makes videos more accessible and enables multilingual subtitle generation through translation features.
Content creators, video editors, accessibility specialists, and developers who need to generate accurate subtitles for YouTube videos efficiently. It's particularly useful for those working with multilingual content or requiring automated transcription workflows.
Developers choose yt-whisper because it combines reliable YouTube downloading with state-of-the-art speech recognition in a simple, single-command tool. Unlike manual transcription services or complex video processing pipelines, it offers an open-source, locally-runnable solution with configurable accuracy levels through different Whisper model sizes.
Using OpenAI's Whisper to automatically generate YouTube subtitles
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Installable via pip with a single command, and the README provides clear instructions for setting up dependencies like ffmpeg across different operating systems.
Supports all Whisper model sizes from tiny to large, allowing users to balance transcription speed and accuracy based on their needs, as detailed in the usage examples.
Can translate subtitles into English using the --task translate flag, leveraging Whisper's capabilities for multilingual content without additional tools.
Licensed under MIT, it offers a cost-effective alternative to paid transcription services with no usage limits, as highlighted in the LICENSE file.
Requires separate installation of ffmpeg, which can be complex on some systems, and relies on yt-dlp for video downloading, adding setup overhead.
Only generates subtitle files in VTT format; common alternatives like SRT are not supported, potentially requiring extra conversion steps for compatibility.
Processes one video at a time with no native batch functionality, forcing users to script loops for multiple videos, which isn't covered in the README.
Larger Whisper models (e.g., 'large') improve accuracy but are slow and resource-intensive, making them impractical for long videos or low-end hardware.