Homepage
substudy is a command-line tool for working with subtitles and video, similar to subs2srs. You can use it to make:
- Anki cards with audio, bilingual text and images.
- Bilingual subtitles.
- MP3 audio tracks with only the dialog from a video.
- "Review" pages showing the text of all the subtitles.
Original Post
Do you use MacOS X or Linux? Are you familiar with the command line? Would you like to watch TV with bilingual subtitles? If so, check this out.
Last year, I wrote a command-line tool called substudy, which should make it a lot easier to generate high-quality bilingual subtitle files. Available options include:
Code: Select all
Subtitle processing tools for students of foreign languages
Usage: substudy clean <subtitles>
substudy combine <foreign-subtitles> <native-subtitles>
substudy --help
For now, all subtitles must be in *.srt format. Many common encodings
will be automatically detected, but try converting to UTF-8 if you
have problems.
This has a lot of useful features, including:
- It automatically detects the encoding of the subtitle files, so you don't have to think (too much) about mixing and matching encodings.
- It tries to align and combine parallel subtitles, so that the two languages stay more-or-less in sync.
- It adjusts subtitle timing, so that subtitles appear earlier and stick around longer, giving you as much time to read and listen as possible.
- It clean up crufty subtitles, including sound effects, speaker names, and other common clutter.
- It's free and open source.
Code: Select all
1
00:00:01,968 --> 00:00:04,837
KATARA:
<i>Water.</i>
2
00:00:04,838 --> 00:00:07,240
<i>Earth.</i>
3
00:00:07,240 --> 00:00:09,208
<i>Fire.</i>
4
00:00:09,209 --> 00:00:12,178
<i>Air.</i>
5
00:00:12,178 --> 00:00:14,713
<i>My grandmother used</i>
<i>to tell me stories</i>
And you have matching Spanish subtitles:
Code: Select all
1
00:00:03,100 --> 00:00:05,091
Agua
2
00:00:05,561 --> 00:00:06,557
Tierra
3
00:00:08,230 --> 00:00:08,841
Fuego
4
00:00:10,326 --> 00:00:11,819
Aire
5
00:00:13,684 --> 00:00:16,583
Mi abuela solia contarme historias de
tiempos antiguos.
You can run:
Code: Select all
substudy combine avatar_01_01.es.srt avatar_01_01.en.srt > avatar_01_01.bilingual.srt
...which will output:
Code: Select all
1
00:00:01,100 --> 00:00:05,091
<i>Water.</i>
<font color="yellow">Agua</font>
2
00:00:05,092 --> 00:00:06,557
<i>Earth.</i>
<font color="yellow">Tierra</font>
3
00:00:06,558 --> 00:00:08,841
<i>Fire.</i>
<font color="yellow">Fuego</font>
4
00:00:08,842 --> 00:00:11,819
<i>Air.</i>
<font color="yellow">Aire</font>
5
00:00:11,820 --> 00:00:16,583
<i>My grandmother used to tell me</i>
<i>stories about the old days; a</i>
<i>time of peace,</i>
<font color="yellow">Mi abuela solia contarme historias de</font>
<font color="yellow">tiempos antiguos.</font>
Notice how the subtitle timings have been adjusted, and text like "KATARA:" has been stripped out entirely.
Who is this for?
This will be most useful around CEFR beginner levels A1 and A2, before you're ready to use just foreign-language subtitles, or no subtitles at all. It's especially useful in combination with Subs2SRS and Anki, which—when used all together—will allow you to watch one or two episodes of an easy TV series within a month of starting your studies.
To use this program, you will need some experience with Unix-like command lines, which means that it's probably limited to MacOS X and Linux users (and exceptionally ambitious Windows developers). For installation instructions, see the README page on GitHub.
Please feel free to use this as a tech support thread. If you can't get it working, or if it produces weird results, please let me know. Enjoy!