crush wrote:What exactly is the issue with having a baseTrack object? The objects in the alignment array could at least follow the same format.
The "baseTrack" element and the elements in the "tracks" array actually do have the same format. For more details, see:
The idea is that the "baseTrack" describes the main episode, film or audiobook. Then for each subtitle, we produce an "alignment", which specifies a span of time. And each "alignment" can contain one or more tracks—typically one subtitle track per language, but there could also be an image track or a short audio file, etc., if we're making cards. It looks like this:
Code: Select all
mediaFile
baseTrack
alignments (repeated)
span: The start and end time for this alignment.
tracks: Individual subtitles, images, etc., for this alignment.
This probably doesn't make much sense, which is why I have a bunch of examples.
You could also use this format for something more exotic, like a PDF of a graphic novel, and then create one "alignment" element for each frame of the comic. I'll eventually post a few more examples.
crush wrote:I'm definitely interested in where this could go, would there be legal repercussions to putting up an aligned subtitle database online (like other subtitles sites)?
That's a really good question, and the practical answer may vary considerably from country to country. Very few copyright holders have historically complained about the Open Subtitles site or the subtitles in the OPUS research corpus. But that doesn't mean they couldn't. But in any case, I wouldn't personally distribute other people's copyrighted works, and I encourage other people to follow their local laws.
I would, however, love to prepare some interesting public domain examples.
But for now, I need to work on a Rust implementation. Talking is fun, code is better.