substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

All about language programs, courses, websites and other learning resources
User avatar
emk
Black Belt - 1st Dan
Posts: 1690
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6607
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby emk » Sun Dec 24, 2017 10:45 pm

crush wrote:What exactly is the issue with having a baseTrack object? The objects in the alignment array could at least follow the same format.

The "baseTrack" element and the elements in the "tracks" array actually do have the same format. For more details, see:

The idea is that the "baseTrack" describes the main episode, film or audiobook. Then for each subtitle, we produce an "alignment", which specifies a span of time. And each "alignment" can contain one or more tracks—typically one subtitle track per language, but there could also be an image track or a short audio file, etc., if we're making cards. It looks like this:

Code: Select all

mediaFile
  baseTrack
  alignments (repeated)
      span: The start and end time for this alignment.
      tracks: Individual subtitles, images, etc., for this alignment.

This probably doesn't make much sense, which is why I have a bunch of examples. :-)

You could also use this format for something more exotic, like a PDF of a graphic novel, and then create one "alignment" element for each frame of the comic. I'll eventually post a few more examples.

crush wrote:I'm definitely interested in where this could go, would there be legal repercussions to putting up an aligned subtitle database online (like other subtitles sites)?

That's a really good question, and the practical answer may vary considerably from country to country. Very few copyright holders have historically complained about the Open Subtitles site or the subtitles in the OPUS research corpus. But that doesn't mean they couldn't. But in any case, I wouldn't personally distribute other people's copyrighted works, and I encourage other people to follow their local laws.

I would, however, love to prepare some interesting public domain examples.

But for now, I need to work on a Rust implementation. Talking is fun, code is better. ;-)
1 x

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby MorkTheFiddle » Wed Dec 27, 2017 8:31 pm

emk wrote:
crush wrote:I'm definitely interested in where this could go, would there be legal repercussions to putting up an aligned subtitle database online (like other subtitles sites)?

That's a really good question, and the practical answer may vary considerably from country to country. Very few copyright holders have historically complained about the Open Subtitles site or the subtitles in the OPUS research corpus. But that doesn't mean they couldn't. But in any case, I wouldn't personally distribute other people's copyrighted works, and I encourage other people to follow their local laws.

I would, however, love to prepare some interesting public domain examples.


Could a legal objection be that with an aligned subtitle file with video someone has a pirated copy of the video?
Or a variation, could someone reverse engineer a subtitle file with video and audio back into the original movie?
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

John Alexander
Posts: 4
Joined: Thu Nov 09, 2017 8:44 am
Languages: English (N), Mandarin (A2)
x 4

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby John Alexander » Mon Jun 18, 2018 10:41 am

Might be helpful for some here.

Has been a huge pain trying to get Chinese movies with subs. So many are traditional only and hard-coded subs, and no soft subs to be found.

Recently been using FlixGrab to grab shows from netflix, just downloaded the entire series of Avatar with mandarin dub and soft-subs. Throw it all in substudy and you get great cards just like OP's original in Spanish.

https://www.freegrabapp.com/flixgrab

(Note you do have to trust FlixGrab with your netflix password - not a big deal for me, but FYI).
1 x

User avatar
Uncle Roger
Orange Belt
Posts: 154
Joined: Tue Sep 05, 2017 8:53 am
Languages: Italian (Native), English (as good as you see me write it here?), Norwegian (C1?), French (B2), Swedish (B1?)
x 193

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby Uncle Roger » Mon Jun 18, 2018 5:55 pm

I'm new to this thread but a very fond lover of Subs2SRS, Anki and the LIE approach.

I was wondering if there has ever been an attempt to create a repository of high quality, peer-reviewed Anki decks taken from movies and with bilingual subtitles.

Subtitles to the audio language would be word by word transcriptions and should therefore be checked.
The translation in a specific language should also be checked.
Time information of the subtitles should also be checked and probably bumped up a bit as Ankidroid (for instance) tends to cut the last 0.3 seconds of audio strings.

So yes, it would be a bit of work for one submission, but putting hands together would be good. Whoever wants to join the circle should make such one submission.

Too crazy? Too idealistic?
1 x
«If you want to get laid, go to college. If you want an education, go to the library.»
Frank Zappa

User avatar
rdearman
Site Admin
Posts: 7255
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23248
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby rdearman » Mon Jun 18, 2018 6:10 pm

Copyright violation is the reason.
1 x
: 26 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
Uncle Roger
Orange Belt
Posts: 154
Joined: Tue Sep 05, 2017 8:53 am
Languages: Italian (Native), English (as good as you see me write it here?), Norwegian (C1?), French (B2), Swedish (B1?)
x 193

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby Uncle Roger » Mon Jun 18, 2018 6:17 pm

What if we are not making money with it (which was never part of my proposal)?

It should be enough for the corporate lawyers to stay at bay?
0 x
«If you want to get laid, go to college. If you want an education, go to the library.»
Frank Zappa

User avatar
rdearman
Site Admin
Posts: 7255
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23248
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby rdearman » Mon Jun 18, 2018 7:50 pm

Uncle Roger wrote:What if we are not making money with it (which was never part of my proposal)?

It should be enough for the corporate lawyers to stay at bay?

Errr no. Doesn't matter that you give it away free, Pirates Bay gave them away for free and those guys went to jail.
0 x
: 26 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
Uncle Roger
Orange Belt
Posts: 154
Joined: Tue Sep 05, 2017 8:53 am
Languages: Italian (Native), English (as good as you see me write it here?), Norwegian (C1?), French (B2), Swedish (B1?)
x 193

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby Uncle Roger » Mon Jun 18, 2018 7:58 pm

Yeah but in that case you are interfering with the revenue stream of the movie. We wouldn't be. But I guess they'd find something else to jail us for!
0 x
«If you want to get laid, go to college. If you want an education, go to the library.»
Frank Zappa

User avatar
rdearman
Site Admin
Posts: 7255
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23248
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby rdearman » Mon Jun 18, 2018 8:02 pm

Basically it is their property and you would need thier permission.
0 x
: 26 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
arthaey
Brown Belt
Posts: 1080
Joined: Sat Jul 18, 2015 9:11 pm
Location: Seattle, WA, USA
Languages: :
EN (native);
ES (adv receptive, int productive);
FR (false beginner);
DE (lapsed beg);
ASL (lapsed beg);
HU (tourist)
Language Log: viewtopic.php?f=15&t=3864&view=unread#unread
x 1675
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby arthaey » Mon Jun 18, 2018 8:06 pm

Your risk tolerance is up to you, but I wouldn't share any substudy decks based on ripped copyrighted material without consulting a lawyer. :/
0 x
Posts in: FrenchGermanHungarianSpanish
NaNoWriMo: 10,000 words
Corrections welcome in any language; I prefer an informal register.


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests