A Software Library for Creating Language Learning Materials (such as bilingual texts)
- emk
- Black Belt - 1st Dan
- Posts: 1708
- Joined: Sat Jul 18, 2015 12:07 pm
- Location: Vermont, USA
- Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish. - Language Log: viewtopic.php?f=15&t=723
- x 6737
- Contact:
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
The OPUS corpus also has an XML format for parallel text and subtitles. It might be worth looking at what they do, to see if they have any good ideas. It's a fairly messy format in some ways, but they provide a large amount of data including both parallel text and comprehensive archive of subtitles.
1 x
- emk
- Black Belt - 1st Dan
- Posts: 1708
- Joined: Sat Jul 18, 2015 12:07 pm
- Location: Vermont, USA
- Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish. - Language Log: viewtopic.php?f=15&t=723
- x 6737
- Contact:
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
I've written up a proposed substudy data format, and emailed it to davidzweig. Here's a very rough JSON example:
But there's a lot more that isn't included in the example, including features for things like images. I'm trying to design something that would work well with substudy, Anki, hunalign, aeneas, etc. So feel free to have a look, and let me know what you think!
Code: Select all
{
"title": "My favorite series episode 1",
"attachment": {
"mimeType": "video/mp4",
"relpath": "favorite_series_01.mp4"
},
"syncItems": [
{
"span": { "begin": 10.5, "end": 15.9 },
"syncTracks": [
{
"type": "text",
"language": "fr",
"text": "Hé, les gars`"
},
{
"type": "text",
"language": "en",
"text": "Hey, guys!"
},
]
}
]
}
But there's a lot more that isn't included in the example, including features for things like images. I'm trying to design something that would work well with substudy, Anki, hunalign, aeneas, etc. So feel free to have a look, and let me know what you think!
1 x
- emk
- Black Belt - 1st Dan
- Posts: 1708
- Joined: Sat Jul 18, 2015 12:07 pm
- Location: Vermont, USA
- Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish. - Language Log: viewtopic.php?f=15&t=723
- x 6737
- Contact:
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
This past week, I've been corresponding with davidzweig via email.
I've made an updated proposal available online on the language-learners GitHub account!
This format is designed to support aligned audio, video, text and images, and to allow this data to be shared between language-learning tools. There are comments describing how the metadata should be stored, and I plan to spend today creating some example files.
If you're interested in proposing feedback, I'd be happy to read your notes—or to accept Pull Requests directly on GitHub.
I've made an updated proposal available online on the language-learners GitHub account!
This format is designed to support aligned audio, video, text and images, and to allow this data to be shared between language-learning tools. There are comments describing how the metadata should be stored, and I plan to spend today creating some example files.
If you're interested in proposing feedback, I'd be happy to read your notes—or to accept Pull Requests directly on GitHub.
0 x
- rdearman
- Site Admin
- Posts: 7260
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23317
- Contact:
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
I wondered about files like epub which are compressed files? Presumably this is just a base track and the program must know how to deal with the compression and extraction?
0 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
- emk
- Black Belt - 1st Dan
- Posts: 1708
- Joined: Sat Jul 18, 2015 12:07 pm
- Location: Vermont, USA
- Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish. - Language Log: viewtopic.php?f=15&t=723
- x 6737
- Contact:
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
rdearman wrote:I wondered about files like epub which are compressed files? Presumably this is just a base track and the program must know how to deal with the compression and extraction?
Yeah, you could use an epub as a base track, and it's the job of the program using it to figure out how to read the file, or to give an error if it can't.
But honestly, I would probably just omit the "baseTrack" and represent it as follows:
Code: Select all
{
"alignments": [
{
"tracks": [
{
"type": "html",
"lang": "fr",
"html": "<i>Jean & Luc:</i> On y va !"
},
{
"type": "html",
"lang": "en",
"html": "<i>Jean & Luc:</i> Let's go!"
}
]
}
]
}
One puzzle here is that I haven't specified which language is the base and which is the target. I don't know whether this would matter. Also note that you could obviously align 3 languages if you wanted to.
A full-fledged substudy file might look like:
Code: Select all
{
"baseTrack": {
"type": "media",
"lang": "fr",
"file": {
"relPath": "episode1.mp4"
}
},
"alignments": [
{
"span": [
10,
15.5
],
"tracks": [
{
"type": "html",
"lang": "fr",
"html": "<i>Jean & Luc:</i> On y va !"
},
{
"type": "html",
"lang": "en",
"html": "<i>Jean & Luc:</i> Let's go!"
},
{
"type": "image",
"file": {
"relPath": "episode1_12_75.jpg"
}
},
{
"type": "media",
"lang": "fr",
"file": {
"relPath": "episode1_9_00_16_50.mp3"
}
}
]
}
]
}
For more examples like this, see GitHub. It's actually a pretty basic format, but I want to make sure we go with something that's easy to generate, and that covers the needs of several different kinds of tools.
2 x
-
- White Belt
- Posts: 30
- Joined: Sun Apr 23, 2017 4:58 pm
- Languages: Spanish, Farsi, Russian, and a few more.
- x 75
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
Sorry for the silence. I have been working on the hardware for the player recently, and haven't had much time to work on the software for creating materials. I will make another version of the hardware, and try and finish the basic firmware by the Chinese New Year, in a couple of weeks. When the new version is ready, I'll make a video showing what it actually does. I'm working out of Lab Zero, Shenzhen, China (https://lab0x0.com/), if someone is around. Then I plan to make a small batch of the players, maybe 20-100 units. Here are some more photos from the development:
https://photos.google.com/share/AF1QipP ... Q3amVNQXNn
As Eric wrote, we have been corresponding about the common data format, but we have a slightly different focus, and it probably makes more sense for us to pursue our ideas separately at the moment.
https://photos.google.com/share/AF1QipP ... Q3amVNQXNn
As Eric wrote, we have been corresponding about the common data format, but we have a slightly different focus, and it probably makes more sense for us to pursue our ideas separately at the moment.
1 x
-
- White Belt
- Posts: 30
- Joined: Sun Apr 23, 2017 4:58 pm
- Languages: Spanish, Farsi, Russian, and a few more.
- x 75
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
Time passes and this idea's time has come around again.
I was quite happy to find these old notes here still.
We're working on extending 'Language Learning with Netflix' (soon to be 'Language Reactor', I think) to include tools for working with text and audio. A skeleton version is here: https://www.languagereactor.com/text Reading the old notes, I'm thinking about how to generalise it and allow import/export etc.
NLP processing is done by a slightly more recent version of this code: https://github.com/hobodrifterdavid/dioco-nlp
Translation is done by models running on our servers. I have browser TTS working on the dev version, but hope to get new natural-sounding TTS hooked up (tacotron etc). The TTS is MS Edge browser is very good however.
The bulk of the code is not open currently, but I'm considering it. It's React/Material-UI/MobX/Typescript on the frontend. mcthulhu, I was interested to see that you are still working on jorkens. I sent you a PM.
I was quite happy to find these old notes here still.
We're working on extending 'Language Learning with Netflix' (soon to be 'Language Reactor', I think) to include tools for working with text and audio. A skeleton version is here: https://www.languagereactor.com/text Reading the old notes, I'm thinking about how to generalise it and allow import/export etc.
NLP processing is done by a slightly more recent version of this code: https://github.com/hobodrifterdavid/dioco-nlp
Translation is done by models running on our servers. I have browser TTS working on the dev version, but hope to get new natural-sounding TTS hooked up (tacotron etc). The TTS is MS Edge browser is very good however.
The bulk of the code is not open currently, but I'm considering it. It's React/Material-UI/MobX/Typescript on the frontend. mcthulhu, I was interested to see that you are still working on jorkens. I sent you a PM.
3 x
-
- Orange Belt
- Posts: 228
- Joined: Sun Feb 26, 2017 4:01 pm
- Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
- x 590
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
I'll probably still be working on Jorkens on the day I die. I still have a long to-do list, and there are always things to improve and streamline. It's Electron now though, instead of NW.js.
I'm a fan of LLN, and recently finished watching a series using it. The Language Reactor site looks good. Is the Chrome browser extension there the same as the LLN extension so far?
BTW, are you still working on the hardware player?
I'm a fan of LLN, and recently finished watching a series using it. The Language Reactor site looks good. Is the Chrome browser extension there the same as the LLN extension so far?
BTW, are you still working on the hardware player?
0 x
-
- Posts: 4
- Joined: Sat May 29, 2021 5:57 am
- Languages: Polish (N), English (C1), Spanish (beginner), German (begineer), French (begineer), Danish (beginner)
- x 3
Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)
How to use Language Reactor with youtube as it is said on the Language Reactor website? I have installed the extension, but still cannot see the subtitles next to the movie on youtube. I didn't get an idea how can I use it for learning language with videos. davidzweig, could you support?
0 x
Return to “Language Programs and Resources”
Who is online
Users browsing this forum: No registered users and 2 guests