The Spanish subtitles were generated using Whisper, and the English translations using the GPT-3.5-Turbo API (in "function calling" mode). The two subtitle tracks were combined using "substudy combine". The total API costs for this episode were under US$0.20.
(I imagine performance will drop off sharply with smaller languages, and I've been focusing on good "beginner" shows for now. Your mileage may vary. Results may not be typical. Trained driver on a close course.)
Carl wrote:This was just one anecdote. But if it's a general issue, maybe you could simply feed ChatGPT shorter chunks of subtitles at a time, rather than, say, a whole episode's worth?
I'm feeding it 10-15 lines at a time, using carefully constructed prompts and the "function calling" API. Seems moderately robust so far?
sfuqua wrote:This is very exciting stuff. I wasted a few days trying to get subtitles for a Children of the Sea, Kaijuu no kodomo, before I gave up... I eventually got a subs2srs deck for the movie where the audio and the written stuff barely matched...
Yeah, it's tough. For audio cards to really work, you need:
- Native L2 audio, complete with inflection, emotions and a story. This should be considered non-negotiable for any language with a media industry. The inflection will be burned into your brain. The emotions and story make it interesting, because you're going to wind up seeing this stuff a lot.
- Mostly accurate L2 subtitles. If a subtitle doesn't match the audio, you'll almost always need to throw that card out.
- A basically decent translation into L1 subtitles. This is the most forgiving part.
But ah, if you could only take an interesting native video, wave a magic wand, and get solid bilingual subs in a couple of minutes for US$0.20. Even if were only for the biggest languages and only for intermediate audio, it would be a win. Let's see how far this will go.
(Binaries will be available soonish, as usual.)