An "aligned media" format for use with parallel texts & audio

Postby **emk** » Tue Dec 26, 2017 3:48 pm

Related discussions:

What's this all about?

Recently, davidzweig made a really great proposal for creating a library that works with aligned sentences and audio. I would love to have something like this, because I would love to make substudy work with many more kinds of media. But after talking with him for a while, it turned out that we had very different goals: He was mostly interested in the individual sentences, whereas I needed something more ambitious, which focused on entire video files, etc. He suggested that I might need another format for use with substudy, so I decided to go ahead and make one.

This is still very much a work in progress, but I have now created "first drafts" of many of the critical pieces:

- A specification (details here).
- Example metadata.json files for subtitles and audiobooks.
- A Rust implementation of the specification.
- A web-based validation tool that you can use in your browser (no data is uploaded)! This was ridiculously fun to build, because it was my first serious WebAssembly project.

So next, some concrete examples!

Sample 1: A video with subtitles

This file could be used an an input to the substudy-based video player that I'm working on. It specifies a video file, and a set of aligned HTML subtitles.

Code: Select all

{
  "baseTrack": {
    "type": "media",
    "lang": "fr",
    "file": "episode1.mp4"
  },
  "alignments": [
    {
      "timeSpan": [
        10,
        15.5
      ],
      "tracks": [
        {
          "type": "html",
          "lang": "fr",
          "html": "<i>Jean &amp; Luc:</i> On y va !"
        },
        {
          "type": "html",
          "lang": "en",
          "html": "<i>Jean &amp; Luc:</i> Let's go!"
        }
      ]
    }
  ]
}

Sample 2: The same video after being processed by substudy

But what if we didn't care about the video as a whole, and just wanted individual sentences with audio and images? Well, we could feed the above file through a tool like substudy, and it might output:

Code: Select all

{
  "alignments": [
    {
      "timeSpan": [
        10,
        15.5
      ],
      "tracks": [
        {
          "type": "html",
          "lang": "fr",
          "html": "<i>Jean &amp; Luc:</i> On y va !"
        },
        {
          "type": "html",
          "lang": "en",
          "html": "<i>Jean &amp; Luc:</i> Let's go!"
        },
        {
          "type": "image",
          "file": "episode1_12_75.jpg"
        },
        {
          "type": "media",
          "lang": "fr",
          "file": "episode1_9_00_16_50.mp3"
        }
      ]
    }
  ]
}

This is roughly analogous to the format that was proposed in the other thread.

Implementing support

I want to implement support for generating these files in several ways:

Using substudy to convert a video and either 1 or 2 subtitle files.
Using hunalign and Aeneas to produce an audiobook with fully synchronized L1 and L2 text, in a format that can be converted to Anki cards, or played directly using the new substudy media player.

It might also be worthwhile to support aligning L1 & L2 text with no accompanying audio, if people are interested.

Contributions and ideas welcome!

If you're the author of language-learning tools—or even if you just have some scripts lying around that you use to prepare your materials—I'd be very interested in your feedback. In particular, if you'd like to propose a change to the format, you can just submit an issue or a PR on GitHub.

Now I'm almost ready to add a "substudy export aligned" command. But first I want to improve the validator a bit, just to ensure that I actually generate valid data! :lol:

Doitsujin · Postby **Doitsujin** » Thu Jan 18, 2018 10:19 am

IMHO, it's usually better to use an existing standard format. Have you considered creating ePub3 books with synchronized audio files? (ePub3 books support SMIL overlays.)
A good example of epub3 books with synchronized audio files are the free epub3 audio books offered by ReadBeyond. (You'll need to install the free Android/iOS Menestrello app, because most epub3 apps don't support SMIL overlays. The Chrome Readium apps also works, but not as well.)
BTW, the developer, Alberto Pettarin, has open-sourced his toolkit for audio text alignment, forced-alignment-tools, which was used to create the epub3 audio books, and he also has open-sourced his Menestrello app as Minstrel.

mcthulhu · Postby **mcthulhu** » Thu Jan 18, 2018 5:19 pm

I'd agree on using the epub3 format. This Web page on bilingual ebooks with target language audio might be of interest: https://www.linkedin.com/pulse/learn-ne ... er-anguera and http://www.sinkronigo.com/bilingual-ebooks.html. I don't know if their project ever really got off the ground but it also seems like a similar idea.

A language learners’ forum

An "aligned media" format for use with parallel texts & audio

An "aligned media" format for use with parallel texts & audio

Re: An "aligned media" format for use with parallel texts & audio

Re: An "aligned media" format for use with parallel texts & audio

Who is online