I've been doing lots of work behind the scenes on substudy lately, laying the groundwork for a more serious project.
Some highlights:
- substudy and subtitles-rs have been merged. I've merged substudy with my "Rust subtitle utilities" project. (This includes things like a vobsub2png converter, and some very preliminary work on subtitle OCR.) You can find the combined code here on GitHub.
- Lots of library updates. A lot of the subtitles-rs and substudy code was relying on older libraries. I've gone through and updated almost everything to the latest libraries, with the exclusion of the "nom" binary format parser, which is going to be trickier. In particular, error-message handling has undergone a big overhaul.
- Working on an aligned media format. Recently, davidzweig proposed a common library for working with aligned media. I spent a couple of weekends working on a proposed file format that would allow aligning videos, subtitles, audiobooks, regular ebooks, and many other kinds of media. I put together a proposal here. However, after speaking with davidzweig, he doesn't seem to be interested in formats which support "baseTrack" (which I'll explain below), but he is instead interested in formats that are 100% focused on short, aligned sections. Unfortunately, substudy really needs a more general format than that.
"baseTrack": Because I want to build a media player, and not just a "sentences" toolAs I've mentioned before, the current version of substudy works great at the beginner level, when almost
everything is new and unfamiliar. But I've noticed that at more advanced levels, I might prefer to watch an episode normally, and to only make 20 cards from the most interesting sentences.
For this to work, I would need to combine a tool like substudy (or subs2srs) with a media player like
Lingo. Here's the example screen shot I posted earlier, which shows a primitive prototype of what this might look like:
Now, the actual input to a player could be an mp4 video file and some *.srt files, and that would work fine. But what if we wanted to use hunalign and Aneas to produce an audiobook with bilingual text? So I decided that we really need an "aligned media" format, which could specify both an optional media file, and one or more aligned sections of text. Here's an example what it might look like:
Code: Select all
{
"baseTrack": {
"type": "media",
"lang": "fr",
"file": "episode1.mp4"
},
"alignments": [
{
"span": [
10,
15.5
],
"tracks": [
{
"type": "html",
"lang": "fr",
"html": "<i>Jean & Luc:</i> On y va !"
},
{
"type": "html",
"lang": "en",
"html": "<i>Jean & Luc:</i> Let's go!"
}
]
}
]
}
You can find
more examples on GitHub.
For my purposes, the "baseTrack" portion of the file is really important. That's because I'm not just working with a heap of unrelated sentences, but I'm instead working with an actual media file. If we left out "baseTrack", and instead we gave each subtitle its own media file, then that would make it much harder to build tools which just played the episode straight through.
So this is why I'm going to pull out of the
common library effort and go off and do something slightly different: I want to focus on tools that focus on whole media files.
So what's the plan?Here are some ideas I'd like to pursue:
- Create an open source Rust library which reads and writes the "*.aligned/metadata.json" format.
- Create a second library which converts to and from the "*.aligned" media format. I'll initially start by re-using parts of substudy's input and output code.
- Create a sub/idx converter which can extra VobSub/MPEG-2 subtitles and turn them into PNGs. This might make it possible to do useful things with subtitles without OCRing them first.
- Look into adding support for Aeneas audio/text alignment, and hunalign text/text alignment, but only for people who have already installed those tools.
- Figure out how to store aligned subtitles in a database and make them searchable. Yay!
And of course, the goal would be to support all these different formats with substudy and a media player at some point. But that could take a while!
Anyway, if you're interested in the "aligned media" format, please let me know! Ideas and suggestions are very welcome, and I'd be happy to modify the format to make it work with a larger set of language learning tools.