Show HN: A CLI tool to transcribe and clean YouTube videos with Whisper and LLMs
github.comHi HN,
I built a simple command-line tool that quickly transcribes YouTube videos into clean, readable text. It uses OpenAI's Whisper for transcription and leverages the LLM of your choice to intelligently clean up the transcripts, removing filler words, correcting grammar, and improving readability.
Some highlights include:
- Automatically downloads audio directly from YouTube.
- Supports multiple output formats (TXT, SRT, VTT).
- LLM-driven transcript cleaning tailored for presentations, conversations, or lectures.
- Easy setup and straightforward CLI usage.
My main motivation to build this is that I read faster than I listen, and it is not rare that I'm interested in only a short segment of a (long) video, so it's easier to just cmd-F and jump in to that section in the transcript.
Feedback welcome!
> My main motivation to build this is that I read faster than I listen
Yes! However occasionally I find it useful to refer to the original video (especially when I want to share a video at a certain timestamp.) Searchable transcripts are a great way to navigate a video if they have links that jump to the relevant timestamp in the video.
So I designed a special file format and web app based on oTranscribe + Markdown:
- https://raw.githubusercontent.com/Leftium/oTranscribe/refs/h...
- https://otranscribe.netlify.app/?vsl=definedefine
I made a tool to convert YouTube SBV/TTML files; it should be possible to add support for one of your output formats: https://github.com/Leftium/otrgen
---
There was a similar show HN[1] that opened my eyes to OpenAI Whisper, however your python script provides a better starting point than a bash script. I'll probably reference both projects when I make my own projects (including a beat-aware YouTube player that needs the audio data for beat-detection analysis.)
[1]: https://hw.leftium.com/#/item/41473379