Re: Best automatic speech-to-text for challenging audio? Whisper? Something else?
Posted: Sat Feb 24, 2024 10:41 pm
I really like Whisper, which I use through MacWhisper (https://goodsnooze.gumroad.com/l/macwhisper). After I first fooled around with a couple clumsy ways to install Whisper, I discovered an early version of MacWhisper. Since then, I've used nothing else. I've been delighted both by the program and the attention the programmer pays to continually improving it. It seems like he updates it every week or so, and he has a roadmap showing future improvements he plans to make.
A use I've found for it that I haven't seen mentioned before on this site: Now, when I want to read and listen to a book, I buy only the audiobook and make the e-book by running the audio through MacWhisper. MacWhisper can handle batch transcriptions, so if the audiobook is divided into chapters, then the program outputs a separate transcription file for each chapter.
I've given it some audio in Spanish or German that I've listened to a bunch of times without being able to understand a key part, and it's given an accurate transcript.
In the past, Whisper has had problems with audio containing multiple languages. It would transcribe everything in just one of the languages (Whisper itself has built-in translation capability) or just translate the entire transcription into English. I tested that today, and I was impressed.
For kicks and giggles, I gave MacWhisper audio with Norwegian, Swedish, and Danish speakers in conversation (https://www.youtube.com/watch?v=IOnetGsUtLE). For YouTube and some other audio or video files online, you can just paste the URL into MacWhisper and let it do the transcription from there; there's no need to download the audio in a separate step. Using the Large model and the input language set to Auto, I got a reasonably good transcription that changed languages with the speakers. (I checked the first 10 minutes or so.)
It wasn't perfect--when someone broke in with a quick comment in a different language, the transcription was mangled. And at least once, it introduced a Norwegian- or Danish-influenced misspelling of a common word into a long stretch of clear Swedish. Apart from that and some other small errors, it was quite accurate.
For subtitles in two languages, or bilingual parallel texts, MacWhisper will translate its transcription, with the help of the user's free DeepL key.
I like MacWhisper so much that I upgraded my 2015 MacBook Pro to a 2020 Mac mini last year, years before I otherwise would have replaced the MBP, in large part because I wanted to be able to do MacWhisper transcriptions quickly.
A use I've found for it that I haven't seen mentioned before on this site: Now, when I want to read and listen to a book, I buy only the audiobook and make the e-book by running the audio through MacWhisper. MacWhisper can handle batch transcriptions, so if the audiobook is divided into chapters, then the program outputs a separate transcription file for each chapter.
I've given it some audio in Spanish or German that I've listened to a bunch of times without being able to understand a key part, and it's given an accurate transcript.
In the past, Whisper has had problems with audio containing multiple languages. It would transcribe everything in just one of the languages (Whisper itself has built-in translation capability) or just translate the entire transcription into English. I tested that today, and I was impressed.
For kicks and giggles, I gave MacWhisper audio with Norwegian, Swedish, and Danish speakers in conversation (https://www.youtube.com/watch?v=IOnetGsUtLE). For YouTube and some other audio or video files online, you can just paste the URL into MacWhisper and let it do the transcription from there; there's no need to download the audio in a separate step. Using the Large model and the input language set to Auto, I got a reasonably good transcription that changed languages with the speakers. (I checked the first 10 minutes or so.)
It wasn't perfect--when someone broke in with a quick comment in a different language, the transcription was mangled. And at least once, it introduced a Norwegian- or Danish-influenced misspelling of a common word into a long stretch of clear Swedish. Apart from that and some other small errors, it was quite accurate.
For subtitles in two languages, or bilingual parallel texts, MacWhisper will translate its transcription, with the help of the user's free DeepL key.
I like MacWhisper so much that I upgraded my 2015 MacBook Pro to a 2020 Mac mini last year, years before I otherwise would have replaced the MBP, in large part because I wanted to be able to do MacWhisper transcriptions quickly.