substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

All about language programs, courses, websites and other learning resources
Online
User avatar
kunsttyv
Yellow Belt
Posts: 93
Joined: Mon Aug 03, 2015 11:24 am
Location: Trondheim
Languages: Norwegian (native)
Spanish (learning)
x 178

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby kunsttyv » Tue Dec 05, 2017 8:34 pm

Hey emk, thanks a lot for this amazing software. I tried it out just now, and everything worked perfectly on the first try. Super fast as well in extracting the images and sound clips. I used the linux binary.

Look at this greatness:

Image

Just one tiny thing. Sorry if this has been asked before, but is it possible to reference a subfolder within collection.media for the media files? So that I don't have to deal with tens of thousands of files in the same folder.

(And one forum related question: does the img tag have any resizing attribute? That way I won't have to fire up gimp every time I want to post an image to the forum)

Again, thanks for a kick-ass application!
1 x

User avatar
emk
Brown Belt
Posts: 1287
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 4076
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby emk » Tue Dec 05, 2017 10:02 pm

kunsttyv wrote:Look at this greatness:

That is a very cool card! And thank you for all your kind words!

kunsttyv wrote:Just one tiny thing. Sorry if this has been asked before, but is it possible to reference a subfolder within collection.media for the media files? So that I don't have to deal with tens of thousands of files in the same folder.

You'd have to ask the author of Anki. As far as I know, that folder is internally managed by Anki, and you should never need to do more than just throw new files in. If I recall correctly, Anki can even find and delete "orphan" files that aren't linked to a card.

kunsttyv wrote:(And one forum related question: does the img tag have any resizing attribute? That way I won't have to fire up gimp every time I want to post an image to the forum)

Not that I'm aware of. I use a dodgy shell script that uses "convert -resize" and "aws s3 sync" to upload images to a bucket on S3.
1 x

kelciour
White Belt
Posts: 11
Joined: Wed Feb 01, 2017 11:39 pm
Languages: Russian (N), English (self-study)
x 17

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby kelciour » Wed Dec 06, 2017 10:41 am

kunsttyv wrote:Just one tiny thing. Sorry if this has been asked before, but is it possible to reference a subfolder within collection.media for the media files? So that I don't have to deal with tens of thousands of files in the same folder.
Unfortunately, Anki and AnkiMobile doesn't support subfolders in collection.media folder (official support forum). AnkiDroid (and desktop version of Anki) can play files from subfolders but you have to sync media manually (instead of using AnkiWeb to sync media), maybe use the path separator '/' instead of backslash ('\') and "Check Media..." or "Export as .apkg" options won't work too.

Code: Select all

[sound:Mulan_1998.media/Mulan_1998_0.09.05.754-0.09.23.355.mp3]
0 x

User avatar
emk
Brown Belt
Posts: 1287
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 4076
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby emk » Wed Dec 06, 2017 10:57 am

kelciour wrote:Unfortunately, Anki and AnkiMobile doesn't support subfolders in collection.media folder (official support forum). AnkiDroid (and desktop version of Anki) can play files from subfolders but you have to sync media manually (instead of using AnkiWeb to sync media), maybe use the path separator '/' instead of backslash ('\') and "Check Media..." or "Export as .apkg" options won't work too.

As far as I'm concerned, the "collections.media" subdirectory belongs to Anki, which is fully responsible for playing, synchronizing and deleting the files therein. And since Anki prefers to keep all files in a single top-level directory, that's what substudy generates. :-)

In a future version, I hope to provide a new substudy export format and a Anki import plugin that work together, so that users don't have to mess around directly with CSVs, media files and Anki card templates. The goal is to just click a couple of buttons and have everything work. I already wrote an Anki import plugin for my SRS Collector tool, so I can probably just recycle that with a few modifications.
0 x

Online
User avatar
kunsttyv
Yellow Belt
Posts: 93
Joined: Mon Aug 03, 2015 11:24 am
Location: Trondheim
Languages: Norwegian (native)
Spanish (learning)
x 178

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby kunsttyv » Wed Dec 06, 2017 12:20 pm

Great, I'll just keep everything in the top level directory then, since I'm using AnkiWeb and since I want it to continue to sync everything as flawlessly at it has been doing up until this point.

emk: an Anki import plug-in could be handy, but I just want to point out that it took me maybe less than a minute from the time the .csv and media files were created until I had a working Anki deck of 776 cards. The instructions were crystal clear. So I don't know how crucial such a plug-in would be. But then again I have previous experience with deck creation and the process might not be as obvious to everyone.

My Spanish has gotten relatively advanced lately, I mostly work extensively with native media, and I have been thinking that this type of cards are better suited at a more elementary level. But now I wonder... When I realized how quickly and painlessly I'm able to create these decks, maybe it's worth it to give it a go and see what it can do for my Spanish.

Here are some rough thoughts. It will take me no more than five minutes total to create a deck from one of my movies. Then I'll end up with let's say about 1000 cards. A lot of them will be broken, too easy, not interesting, annoying, unclear or not any fun. I'll just insta-delete them as I go. Let's assume I delete 80% of them. Then I will end up with 200 cards. With 20 new cards a day, the deck will last me 10 days, and then I can just throw a new movie in there.

This might be a good way to expand my vocabulary and maybe especially to expand my knowledge of idioms and colloquialisms. But then again, at an advanced level, it might be better to just keep up the extensive activities. What do you think?
0 x

User avatar
emk
Brown Belt
Posts: 1287
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 4076
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby emk » Wed Dec 06, 2017 3:21 pm

kunsttyv wrote:emk: an Anki import plug-in could be handy, but I just want to point out that it took me maybe less than a minute from the time the .csv and media files were created until I had a working Anki deck of 776 cards. The instructions were crystal clear. So I don't know how crucial such a plug-in would be. But then again I have previous experience with deck creation and the process might not be as obvious to everyone.

I'd really love to make it easy people who don't know much about Anki! I was pretty happy with how my old SRS Collector worked—every time you started Anki, it would check for new cards to import, and handle everything automatically. It would even create any necessary templates.

kunsttyv wrote:My Spanish has gotten relatively advanced lately, I mostly work extensively with native media, and I have been thinking that this type of cards are better suited at a more elementary level. But now I wonder... When I realized how quickly and painlessly I'm able to create these decks, maybe it's worth it to give it a go and see what it can do for my Spanish.

OK, so here's what I'm wrestling with. :-) I've found that subs2srs-style audio cards work great even at advanced levels. Most of my experiments with this involved MC Solaar rap songs. I tried three different techniques:

  1. Listening repeatedly with a transcript. This helped me understand the song for a little while, but I quickly forgot the harder details and I couldn't understand the tricky parts a few months later.
  2. Listening repeatedly without a transcript, transcribing the audio, and checking against a transcript. This worked much better than (1). It takes a fair bit of time, and it can be slightly frustrating, but I understood these songs better and continue to understand most of the tricky bits long afterwards.
  3. Turning a song into subs2srs cards. This worked even better than (2), and it burnt the songs into my memory. Years afterwards, I can still hear virtually every single syllable, no matter how fast the speech or how obscure the cultural references.
I strongly recommend (2). It's a great technique, especially without access to technology. But at least for me, (3) has worked even better. There's something interesting about spaced repetition and sound. Somehow spaced repetition allows me go beyond the standard echoic memory (where you can hear an "echo" of what you just heard a few seconds ago), and drop specific phrases into either my long-term auditory memory or perhaps my music-related memory. Once a sound card has "matured" after 25 to 30 days, I can "hear" the dialog in my head the same way I can hear the lyrics to 80s pop songs: My brain preserves the rhythm, the intonation, even the nuances of unfamiliar vowels.

This isn't a miracle technique or anything, but it does offer a really nice boost. It helps with "decoding" fast audio and with remembering lots of useful fragments of speech. Basically, I'm treating language-learning as a brute-force memory exercise, with a focus on accurate sound memory.

But there's a practical problem at advanced levels. At the beginner level, it makes sense to just turn an entire episode into cards. But once I reach B1, or even C1, I can already understand a large chunk of the audio with no problems. And at that point, spending two hours cleaning and importing an episode is a waste of time. After all, I'm going to throw out 95% of the cards.

So what I really want is to watch the episode normally, and just make maybe 10 cards from dialog I didn't understand. But this only makes sense if all the tools get a lot better, and if all the pointless manual work gets automated. Ideally, you should just open up a video file (with embedded image subtitles!) in a media player, watch normally, mark a few cards, and have everything show up magically in your spaced repetition software.

I would love to do this with difficult French series like Engrenages and Kaamelott and Le Trône de fer in French. I can watch all of these series without subtitles and enjoy them, but I still miss more of the dialog than I'd like. A since (3), above, is the single most effective technique I've tried for "intensive" listening, I want to try to apply it.

Of course, I do often wonder whether I'm just optimizing everything to work well for me, personally, and ignoring what works best for other people. I figure the only way to find out is to build something and let people try it!
4 x

crush
Green Belt
Posts: 259
Joined: Mon Nov 30, 2015 3:35 pm
Languages: :
Speak:
--English, Spanish, Mandarin
Study:
--Basque, Cantonese
x 328

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby crush » Thu Dec 07, 2017 3:54 am

emk wrote:Of course, I do often wonder whether I'm just optimizing everything to work well for me, personally, and ignoring what works best for other people. I figure the only way to find out is to build something and let people try it!

Even if it just works really well for you, that's still a great accomplishment ;) I've gone through my more advanced languages (Spanish/Mandarin) with Subs2SRS and i love the format, i love the cards, but like you said it takes me much longer each day because i'm throwing out 50+ cards and only studying 10. Being able to watch a movie or even listen to a song and mark which sentences you want to learn would be like a Readlang for listening comprehension.
0 x

Online
User avatar
kunsttyv
Yellow Belt
Posts: 93
Joined: Mon Aug 03, 2015 11:24 am
Location: Trondheim
Languages: Norwegian (native)
Spanish (learning)
x 178

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby kunsttyv » Thu Dec 07, 2017 9:04 am

emk, would it be sufficient if in VLC, while watching a movie, we could just push a button to create a timestamped bookmark? I imagine the best thing would be if it created a simple text file alongside the video file upon the first bookmark, and then continued to append to the file on subsequent bookmarks. And then we could call substudy with an option to only export bookmarked dialogues: "substudy export csv bookmarks [video] [sub1] [sub2]"

We would have to write a VLC plugin for this to work of course. The bad thing is that this will clutter up this sleek software with external dependencies. And that we will have to watch the movie with VLC on our computer.
0 x

User avatar
emk
Brown Belt
Posts: 1287
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 4076
Contact:

substudy v0.4.5: Lots of minor "quality-of-life" fixes

Postby emk » Fri Dec 08, 2017 4:35 pm

Yay! A new substudy has been been released! Among other things, this one includes official command-line binaries, a progress bar, and support for SRT files generated by the Aeneas audiobook aligner.

The Mac binaries are still compiling over on TravisCI, and might not be available until later today. As always, your feedback on the binaries is welcome.

crush wrote:Being able to watch a movie or even listen to a song and mark which sentences you want to learn would be like a Readlang for listening comprehension.

Yes! A "readlang/LWT for listening comprehension" has been my dream, even back before readlang existed. :-) And yes, I really do want to have the ability to just select a few cards while watching a movie.

kunsttyv wrote:emk, would it be sufficient if in VLC, while watching a movie, we could just push a button to create a timestamped bookmark? I imagine the best thing would be if it created a simple text file alongside the video file upon the first bookmark, and then continued to append to the file on subsequent bookmarks. And then we could call substudy with an option to only export bookmarked dialogues: "substudy export csv bookmarks [video] [sub1] [sub2]"

You could probably do this now if somebody wrote a script to parse an SRT file and discard everything subtitle that wasn't near a bookmark.

Personally, I do want to build an Electron/HTML 5 video player with a bunch of nice features. I know how to do this, and it can be made to work very nicely cross-platform. Plus, adding new UI features would be easier than with VLC, at least for me.

Also, davidzweig and I are going to discuss a file format for parallel subtitles & audiobooks, and see if we can't get a bunch of language learning tools to all support the same underlying format.
0 x

User avatar
rdearman
Site Admin
Posts: 2728
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
French (studies), Italian (studies), Mandarin (studies),
Esperanto TAC (Only god knows why), Finnish (only in it for the cookies)
Language Log: viewtopic.php?f=15&t=1836
x 5688
Contact:

Re: substudy: Make Anki cards and other resources from video & bilingual subtitles (command-line)

Postby rdearman » Fri Dec 08, 2017 5:53 pm

Don't suppose you could try to include hunalign?
0 x
"Never blame on malice that which can be explained by stupidity."


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: trui and 1 guest