Stefan wrote:I believe I've got .sub/.idx for 10 movies if it would be of interest?
That would be great!
Anybody who has some sub/idx files to share with me is welcome to
upload them here. Please let me know in advance if you're going to upload more than a gigabyte!
There's a problem with the Sub2SRS/substudy approach that tuckamore describes well:
tuckamore wrote: I had used Subs2srs in the past with huge success. Now, I'm toeing the line from intermediate to advanced (or at least I'd like to think I am) and, if I were to use audio+subs+Anki, I would have the specific goal of mining native audio for specific vocabulary that I want to hear in different contexts. So, as you say, I would "need to prepare far more video for the same results", and this is where I'm questioning the practicality of my proposal.
When you're just starting out, you can usually find TV series or films where there's something new and interesting in almost every line of dialog. So spending a couple hours to rip a few DVDs, OCR the subtitles, and load everything into Anki makes sense. But when you reach a higher level, only a small fraction of the dialog is interesting. (I would maybe get 10 cards out of an episode of
Buffy at this point. More if I went with
Engrenages or
Le Trône de fer, of course!) Even if I
could somehow prepare a deck of cards from a film with a single button press, it would still be too much work just to go through them and delete 95%.
So I'm making no promises. But I think the general
idea of audio cards made from native media is still very useful all the way to C1 and beyond. It's just that the process of making cards needs to be vastly simplified for it to be worth the effort. In a perfect world, I'd love to have a tool that does for video what
readlang does for audio, except with a spaced repetition system as the central feature. Watch your videos normally on your computer, and when you miss a bit of dialog, hit a button and rewind, then make a card with a single click.
However, I do not currently believe there is much money in this (for example, readlang was an enormous amount of work and I understand it wasn't very profitable in relation to the effort invested), so I
don't plan to build the whole thing any time soon. Here are the different things I imagine would need to happen:
- Video and subtitle ripping. Handbrake does an excellent job of this already, except for the fact that you actually need to fiddle around to get the subtitle tracks ripped alongside the video. You may also need a region free DVD drive for your computer; I'm not sure.
- Subtitle OCR. You can do this with Subtitle Edit (and a dozen other programs), but all require some manual fiddling, editing and cleanup. This step could be significantly better, and there are some nice juicy technical challenges here. I would really love to write an open source Rust library that advanced the state of the art in automatic subtitle OCR.
- A video player GUI. I've made several partial sketches of this idea, including here (using tools that ultimately proved too eccentric and annoying) and here (which is an actual native, cross-platform app, solving a ton of design issues!).
- Spaced Repetition Support. Anki is still the gold standard for this, but readlang had a tiny, built-in SRS tool that made it easy for people to get started. And indeed, I spend so much time explaining how to use Anki (delete! delete! automated card creation! don't fail more than a tiny handful of cards! delete leaches!) that it might be useful to have a simpler tool for beginners with some of this built it.
Right now, I'm interested in the technical challenges of (2), and in the libraries and techniques needed to do a good job of (3). I'm not committing to solving the entire problem! But any sub/idx files you
upload here will help motivate me to work on (2).