Whooo! Whisper works!I now have some very reasonable looking subs for Avatar 01.03, which I've
never had. When I started this project, I had good subs for espisodes 01.01, 01.02, 01.05 and 01.06—and
nothing else. This was never enough to really do the study experiment right; I could have used another 2-4 episodes.
But Whisper is
working!avatar_01_03_good_subs.jpg
These subs actually match the audio, unlike the SRT files I originally used. It takes less than a minute to process a 20-minute episode, and it costs about US$0.15. I really think this changes the game, at least for popular languages and clearly-enunicated audio.
And there are tons of sites that offer automatic subtitle translation once you have an SRT file. So this means you can get bilingual subtitles!
Subtitle wrapping and splitting. The other thing I've been working on is splitting subtitles. For example, this subtitle is too long in most video players, and it's a bit too long to make an ideal audio card:
Han pasado cien años y la nación del fuego está alcanzando la victoria\nen esta guerra.
But I found a Knuth-Plass line-breaking library and experimented until I could get clean breaks:
Code: Select all
12
00:00:44,300 --> 00:00:47,100
Han pasado cien años
y la nación del fuego
13
00:00:47,100 --> 00:00:49,080
está alcanzando la victoria
en esta guerra.
Remember, when these get turned into cards, I always add (1) the previous and next lines, and (2) 1.5 seconds of audio padding before and after. This makes it feasible to study cards with sentences cut in two.
Next challanges. The Whisper→SRT code isn't quite good enough yet. There are two problems:
- "Phantom" subtitles. These tend to be short, common phrases. But they don't actually exist in the audio track, and the guestimated timing is garbage. I need to strip these out, somehow. I get a half dozen of these per episode.
- Subtitles which start early/end late. Whisper is surprisingly good about know what words it heard in what order. But oftentimes, when it's looking at tiny words like "el" and "en", it knows it heard them, but can only say, "Somewhere in that 5 seconds, I think?" So for most subtitles, the timing is tight. But occasionally you get a subtitle which starts 5 seconds too early, or ends 5 seconds too late. I think I can clean this data up with a few heuristics.
So I'm really optimistic—this is getting close to the point where it's a big win. Pick a clean TV series in a major language, feed the audio into Whisper & substudy (& optionally a translator), spit out a deck of really quite decent audio cards. And the more tools I can combine into one, the easier it gets.
Oh, and all the time that I'm working on this code? I'm listening to Spanish dialog, reading Spanish subtitles, and checking carefully to make sure they match. So some studying is getting done by accident.
You do not have the required permissions to view the files attached to this post.