substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby **emk** » Fri Dec 08, 2017 6:04 pm

rdearman wrote:Don't suppose you could try to include hunalign?

hunalign is written in C++, which unfortunately means that I would need to compile a separate version of substudy for each Linux distribution, because I would no longer be able to cross-compile it using the musl-gcc toolchain. And at the moment, I'm focusing on making substudy very easy to install.

I might consider including some sort of feature where the user can supply their own hunalign/aeneas, etc., and substudy will generate all the necessary command lines to call it, then post-process the output into a format substudy can read. Feedback it welcome!

crush · Postby **crush** » Fri Dec 08, 2017 6:29 pm

emk wrote:Yay! A new substudy has been been released! Among other things, this one includes official command-line binaries, a progress bar, and support for SRT files generated by the Aeneas audiobook aligner.

Very cool! I just downloaded the previous version about a half hour ago, so this was great timing. It's all working great for me on Linux, i just found subtitles for some Welsh shows in Welsh/English (in VTT) so will be testing it out after i convert them over to SRT. It seems like the only difference between VTT and SRT is the index number and using a period instead of a comma for separating the milliseconds, so should be pretty simple to convert over. Just wrote a simple Python script for converting a VTT file to an SRT file, tested it out with substudy and it works great! Thanks for all your hard work, emk!

Regarding hunalign, it would be very cool to have that as well, even if it meant having to supply our own binaries.

kelciour · Postby **kelciour** » Fri Dec 08, 2017 8:52 pm

kunsttyv wrote:emk, would it be sufficient if in VLC, while watching a movie, we could just push a button to create a timestamped bookmark? I imagine the best thing would be if it created a simple text file alongside the video file upon the first bookmark, and then continued to append to the file on subsequent bookmarks. And then we could call substudy with an option to only export bookmarked dialogues: "substudy export csv bookmarks [video] [sub1] [sub2]".

~~VLC~~ → mpv (or iina for macOS)

. For more information - https://mpv.io/manual/master/

For example, bookmarks.lua and mpv.conf (just in case).

Yuurei · Postby **Yuurei** » Mon Dec 11, 2017 2:17 pm

Hey emk,

just wanted to thank for this awesome tool! Installed the Mac binary yesterday and tried it out and everything worked like a charm (on OSX 10.11). I haven't actually done anything with the resulting Anki cards yet, but I definitely love how easy it was to produce and I have to say I very much appreciate the formatting with previous/next lines as well!

One thing I noticed while previewing a couple of cards is that there seems to a pretty generous audio buffer around the targeted line, which in my case (where the timing of the subs was pretty close to perfect, I think) led to a substantial part of the previous and next lines' audio being part of the card, which makes it a lot harder to parse the audio if your still in the beginner realm, at least. Any chance of there being an advanced setting for fine-tuning the audio buffer in the future? And how did other users deal with that? (@rdearman, I remember you did some substudy work with Finnish, right?)

Postby **emk** » Tue Dec 12, 2017 12:46 pm

Yuurei wrote:just wanted to thank for this awesome tool! Installed the Mac binary yesterday and tried it out and everything worked like a charm (on OSX 10.11). I haven't actually done anything with the resulting Anki cards yet, but I definitely love how easy it was to produce and I have to say I very much appreciate the formatting with previous/next lines as well!

I'm really glad that substudy has been able to make some great cards for you! And thank you for confirming that the Mac binaries worked.

Yuurei wrote:One thing I noticed while previewing a couple of cards is that there seems to a pretty generous audio buffer around the targeted line, which in my case (where the timing of the subs was pretty close to perfect, I think) led to a substantial part of the previous and next lines' audio being part of the card, which makes it a lot harder to parse the audio if your still in the beginner realm, at least. Any chance of there being an advanced setting for fine-tuning the audio buffer in the future? And how did other users deal with that? (@rdearman, I remember you did some substudy work with Finnish, right?)

Actually, the extra audio buffer is deliberate. :-)

But the reasons might not be immediately obvious:

The extra audio buffer means that you don't need to be ultra-precise when aligning subtitles. If you're off by a third of a second, no big deal—most of your cards will still be salvageable.
But more importantly, the extra audio buffer is not there for the first time you see the cards! The first time you see all the cards using Anki, all the audio will be in order, and you'll have plenty of context. And that extra buffer will seem weird. But once you start reviewing the cards, you'll mark some of them as easy, and some of them as hard, and so the review intervals will spread out. Three weeks from now, those cards will be completely shuffled, and you'll be trying to remember the context, and it will get much harder to understand isolated sentence fragments.

So the extra audio buffer on either end is mostly there for (2). Yes, it does seem weird when you're first learning the cards. But at least in my experience, that extra little bit of context is a lifesaver once the cards get shuffled and you've half-forgotten the scene.

So, now for some programmer ramblings...

Substudy has relatively few options, because I've found that adding too many options to a program makes it much too easy for programmers to settle for mediocre defaults. :-/ But if there are very few options, then we're forced to work extremely hard to pick the right defaults.

So if after reviewing your deck for a month or two, you still think the standard buffers are too long, I would love to hear about that, and I would be happy to consider changing the padding values for everybody. But adding a new command-line option has a much higher threshold for me than changing the defaults, as weird as that might seem!

vidale3 · Postby **vidale3** » Thu Dec 14, 2017 12:13 am

hey emk, really love the program substudy and its created lots of amazing cards, only issue i have is that some movies i am using it for only have subtitles for my native language, and my target learning language subtitles are actually in the video images themselves, but the images are too small to read, is there a way i can make the images larger in substudy that appear in my anki cards, or is this a future you could add in the near future? as of this point i cant review them since i cant read the subtitles :/ really hoping to have this option to resize images, thanks emk!

Yuurei · Postby **Yuurei** » Thu Dec 14, 2017 8:44 am

emk wrote:Actually, the extra audio buffer is deliberate. But the reasons might not be immediately obvious:

The extra audio buffer means that you don't need to be ultra-precise when aligning subtitles. If you're off by a third of a second, no big deal—most of your cards will still be salvageable.
But more importantly, the extra audio buffer is not there for the first time you see the cards! The first time you see all the cards using Anki, all the audio will be in order, and you'll have plenty of context. And that extra buffer will seem weird. But once you start reviewing the cards, you'll mark some of them as easy, and some of them as hard, and so the review intervals will spread out. Three weeks from now, those cards will be completely shuffled, and you'll be trying to remember the context, and it will get much harder to understand isolated sentence fragments.
So the extra audio buffer on either end is mostly there for (2). Yes, it does seem weird when you're first learning the cards. But at least in my experience, that extra little bit of context is a lifesaver once the cards get shuffled and you've half-forgotten the scene.

Ah, okay, I'll give it some more time then.

emk wrote:So, now for some programmer ramblings...

Substudy has relatively few options, because I've found that adding too many options to a program makes it much too easy for programmers to settle for mediocre defaults. :-/ But if there are very few options, then we're forced to work extremely hard to pick the right defaults.

So if after reviewing your deck for a month or two, you still think the standard buffers are too long, I would love to hear about that, and I would be happy to consider changing the padding values for everybody. But adding a new command-line option has a much higher threshold for me than changing the defaults, as weird as that might seem!

That actually makes a lot of sense when you put it like that. I hadn't considered looking at it that way.

Postby **emk** » Thu Dec 14, 2017 12:01 pm

vidale3 wrote:hey emk, really love the program substudy and its created lots of amazing cards, only issue i have is that some movies i am using it for only have subtitles for my native language, and my target learning language subtitles are actually in the video images themselves, but the images are too small to read, is there a way i can make the images larger in substudy that appear in my anki cards, or is this a future you could add in the near future? as of this point i cant review them since i cant read the subtitles :/ really hoping to have this option to resize images, thanks emk!

Hmm, this is a tricky one! Part of the reason that the images are so small is that:

If I make the images bigger, the file sizes get really huge, really fast. This means that they'll take up a lot more space on Anki's servers when you try to sync them. Substudy already uses quite a bit of space for the audio files. I've spoken to author of Anki about this, and he hasn't complained yet, but if substudy gets more popular and we make everybody's files even bigger...
Even if we make the image files bigger, you'll probably need to review them on a computer, and not a phone, because phone screens are simply too small. So maybe you don't even want to sync after all in this case.
You'll probably need to make your own card templates, and substudy might refuse to pick the right audio track.

If you want to experiment with this idea, you could try making a local copy of substudy and tweaking it. This requires some technical knowledge, but it's not impossible! First, you would need to install Rust, as described here. On Linux or the Mac, this would be:

Code: Select all

# Install Rust.
curl https://sh.rustup.rs -sSf | sh

# Check out substudy's source code
git clone https://github.com/emk/substudy
cd substudy

Then open up the file "substudy/src/video.rs" and edit the following line:

Code: Select all

format!("scale=iw*min(1\\,min({}/iw\\,{}/ih)):-1", 240, 160);

Change the "240, 160" to something like "720, 576", which should allow full-size NSTC or PAL frames, at the cost of making the images 10 times bigger.

But this may only fix half the problem, because substudy uses the "foreign_subs" argument to pick the right audio track. So you may also need to edit this line in "substudy/src/export/csv":

Code: Select all

let foreign_lang = exporter.foreign().language;

You may need to change this to something like:

Code: Select all

use lang::Lang;
let foreign_lang = Lang::iso639("fr").unwrap();

...where "fr" is the two-letter ISO 639 code for the language you're learning. This will force substudy to always pick the audio track for that language. Then you can run:

Code: Select all

# From the "substudy" directory!
cargo install -f

If this fails, I can try to answer questions. If it works, though, I'd be very interested to hear about:

The size of the resulting deck, in megabytes.
If it's really huge, what happens if you try a smaller image size? Can you still read the cards?
How long it takes to sync the first time, if you decide to sync it.
What it's like to review cards with burned in subtitles. What card format works best for you? Was it a good experience?

Postby **emk** » Sun Dec 24, 2017 8:09 pm

I've been doing lots of work behind the scenes on substudy lately, laying the groundwork for a more serious project. :-)

Some highlights:

substudy and subtitles-rs have been merged. I've merged substudy with my "Rust subtitle utilities" project. (This includes things like a vobsub2png converter, and some very preliminary work on subtitle OCR.) You can find the combined code here on GitHub.
Lots of library updates. A lot of the subtitles-rs and substudy code was relying on older libraries. I've gone through and updated almost everything to the latest libraries, with the exclusion of the "nom" binary format parser, which is going to be trickier. In particular, error-message handling has undergone a big overhaul.
Working on an aligned media format. Recently, davidzweig proposed a common library for working with aligned media. I spent a couple of weekends working on a proposed file format that would allow aligning videos, subtitles, audiobooks, regular ebooks, and many other kinds of media. I put together a proposal here. However, after speaking with davidzweig, he doesn't seem to be interested in formats which support "baseTrack" (which I'll explain below), but he is instead interested in formats that are 100% focused on short, aligned sections. Unfortunately, substudy really needs a more general format than that.

"baseTrack": Because I want to build a media player, and not just a "sentences" tool

As I've mentioned before, the current version of substudy works great at the beginner level, when almost everything is new and unfamiliar. But I've noticed that at more advanced levels, I might prefer to watch an episode normally, and to only make 20 cards from the most interesting sentences.

For this to work, I would need to combine a tool like substudy (or subs2srs) with a media player like Lingo. Here's the example screen shot I posted earlier, which shows a primitive prototype of what this might look like:

Now, the actual input to a player could be an mp4 video file and some *.srt files, and that would work fine. But what if we wanted to use hunalign and Aneas to produce an audiobook with bilingual text? So I decided that we really need an "aligned media" format, which could specify both an optional media file, and one or more aligned sections of text. Here's an example what it might look like:

Code: Select all

{
  "baseTrack": {
    "type": "media",
    "lang": "fr",
    "file": "episode1.mp4"
  },
  "alignments": [
    {
      "span": [
        10,
        15.5
      ],
      "tracks": [
        {
          "type": "html",
          "lang": "fr",
          "html": "<i>Jean &amp; Luc:</i> On y va !"
        },
        {
          "type": "html",
          "lang": "en",
          "html": "<i>Jean &amp; Luc:</i> Let's go!"
        }
      ]
    }
  ]
}

You can find more examples on GitHub.

For my purposes, the "baseTrack" portion of the file is really important. That's because I'm not just working with a heap of unrelated sentences, but I'm instead working with an actual media file. If we left out "baseTrack", and instead we gave each subtitle its own media file, then that would make it much harder to build tools which just played the episode straight through.

So this is why I'm going to pull out of the common library effort and go off and do something slightly different: I want to focus on tools that focus on whole media files.

So what's the plan?

Here are some ideas I'd like to pursue:

Create an open source Rust library which reads and writes the "*.aligned/metadata.json" format.
Create a second library which converts to and from the "*.aligned" media format. I'll initially start by re-using parts of substudy's input and output code.
Create a sub/idx converter which can extra VobSub/MPEG-2 subtitles and turn them into PNGs. This might make it possible to do useful things with subtitles without OCRing them first.
Look into adding support for Aeneas audio/text alignment, and hunalign text/text alignment, but only for people who have already installed those tools.
Figure out how to store aligned subtitles in a database and make them searchable. Yay!

And of course, the goal would be to support all these different formats with substudy and a media player at some point. But that could take a while!

Anyway, if you're interested in the "aligned media" format, please let me know! Ideas and suggestions are very welcome, and I'd be happy to modify the format to make it work with a larger set of language learning tools.

crush · Postby **crush** » Sun Dec 24, 2017 8:48 pm

What exactly is the issue with having a baseTrack object? The objects in the alignment array could at least follow the same format.

I'm definitely interested in where this could go, would there be legal repercussions to putting up an aligned subtitle database online (like other subtitles sites)?

A language learners’ forum

substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy v0.4.5: Lots of minor "quality-of-life" fixes

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Who is online