A Software Library for Creating Language Learning Materials (such as bilingual texts)

All about language programs, courses, websites and other learning resources
User avatar
emk
Black Belt - 1st Dan
Posts: 1620
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6320
Contact:

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby emk » Tue Dec 05, 2017 10:41 pm

The OPUS corpus also has an XML format for parallel text and subtitles. It might be worth looking at what they do, to see if they have any good ideas. It's a fairly messy format in some ways, but they provide a large amount of data including both parallel text and comprehensive archive of subtitles.
1 x

User avatar
emk
Black Belt - 1st Dan
Posts: 1620
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6320
Contact:

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby emk » Fri Dec 08, 2017 6:02 pm

I've written up a proposed substudy data format, and emailed it to davidzweig. Here's a very rough JSON example:

Code: Select all

{
  "title": "My favorite series episode 1",
  "attachment": {
    "mimeType": "video/mp4",
    "relpath": "favorite_series_01.mp4"
  },
  "syncItems": [
    {
      "span": { "begin": 10.5, "end": 15.9 },
      "syncTracks": [
        {
          "type": "text",
          "language": "fr",
          "text": "Hé, les gars`"
        },
        {
          "type": "text",
          "language": "en",
          "text": "Hey, guys!"
        },
      ]
    }
  ]
}

But there's a lot more that isn't included in the example, including features for things like images. I'm trying to design something that would work well with substudy, Anki, hunalign, aeneas, etc. So feel free to have a look, and let me know what you think!
1 x

User avatar
emk
Black Belt - 1st Dan
Posts: 1620
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6320
Contact:

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby emk » Sat Dec 16, 2017 2:03 pm

This past week, I've been corresponding with davidzweig via email.

I've made an updated proposal available online on the language-learners GitHub account!

This format is designed to support aligned audio, video, text and images, and to allow this data to be shared between language-learning tools. There are comments describing how the metadata should be stored, and I plan to spend today creating some example files.

If you're interested in proposing feedback, I'd be happy to read your notes—or to accept Pull Requests directly on GitHub.
0 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23122
Contact:

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby rdearman » Sat Dec 16, 2017 4:32 pm

I wondered about files like epub which are compressed files? Presumably this is just a base track and the program must know how to deal with the compression and extraction?
0 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
emk
Black Belt - 1st Dan
Posts: 1620
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6320
Contact:

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby emk » Sat Dec 16, 2017 5:20 pm

rdearman wrote:I wondered about files like epub which are compressed files? Presumably this is just a base track and the program must know how to deal with the compression and extraction?

Yeah, you could use an epub as a base track, and it's the job of the program using it to figure out how to read the file, or to give an error if it can't.

But honestly, I would probably just omit the "baseTrack" and represent it as follows:

Code: Select all

{
  "alignments": [
    {
      "tracks": [
        {
          "type": "html",
          "lang": "fr",
          "html": "<i>Jean &amp; Luc:</i> On y va !"
        },
        {
          "type": "html",
          "lang": "en",
          "html": "<i>Jean &amp; Luc:</i> Let's go!"
        }
      ]
    }
  ]
}

One puzzle here is that I haven't specified which language is the base and which is the target. I don't know whether this would matter. Also note that you could obviously align 3 languages if you wanted to.

A full-fledged substudy file might look like:

Code: Select all

{
  "baseTrack": {
    "type": "media",
    "lang": "fr",
    "file": {
      "relPath": "episode1.mp4"
    }
  },
  "alignments": [
    {
      "span": [
        10,
        15.5
      ],
      "tracks": [
        {
          "type": "html",
          "lang": "fr",
          "html": "<i>Jean &amp; Luc:</i> On y va !"
        },
        {
          "type": "html",
          "lang": "en",
          "html": "<i>Jean &amp; Luc:</i> Let's go!"
        },
        {
          "type": "image",
          "file": {
            "relPath": "episode1_12_75.jpg"
          }
        },
        {
          "type": "media",
          "lang": "fr",
          "file": {
            "relPath": "episode1_9_00_16_50.mp3"
          }
        }
      ]
    }
  ]
}

For more examples like this, see GitHub. It's actually a pretty basic format, but I want to make sure we go with something that's easy to generate, and that covers the needs of several different kinds of tools.
2 x

davidzweig
White Belt
Posts: 30
Joined: Sun Apr 23, 2017 4:58 pm
Languages: Spanish, Farsi, Russian, and a few more.
x 75

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby davidzweig » Fri Feb 02, 2018 6:41 am

Sorry for the silence. I have been working on the hardware for the player recently, and haven't had much time to work on the software for creating materials. I will make another version of the hardware, and try and finish the basic firmware by the Chinese New Year, in a couple of weeks. When the new version is ready, I'll make a video showing what it actually does. :-) I'm working out of Lab Zero, Shenzhen, China (https://lab0x0.com/), if someone is around. Then I plan to make a small batch of the players, maybe 20-100 units. Here are some more photos from the development:

https://photos.google.com/share/AF1QipP ... Q3amVNQXNn

As Eric wrote, we have been corresponding about the common data format, but we have a slightly different focus, and it probably makes more sense for us to pursue our ideas separately at the moment.
1 x

davidzweig
White Belt
Posts: 30
Joined: Sun Apr 23, 2017 4:58 pm
Languages: Spanish, Farsi, Russian, and a few more.
x 75

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby davidzweig » Tue May 11, 2021 12:31 am

Time passes and this idea's time has come around again. :shock: :D

I was quite happy to find these old notes here still.

We're working on extending 'Language Learning with Netflix' (soon to be 'Language Reactor', I think) to include tools for working with text and audio. A skeleton version is here: https://www.languagereactor.com/text Reading the old notes, I'm thinking about how to generalise it and allow import/export etc.

NLP processing is done by a slightly more recent version of this code: https://github.com/hobodrifterdavid/dioco-nlp
Translation is done by models running on our servers. I have browser TTS working on the dev version, but hope to get new natural-sounding TTS hooked up (tacotron etc). The TTS is MS Edge browser is very good however.

The bulk of the code is not open currently, but I'm considering it. It's React/Material-UI/MobX/Typescript on the frontend. mcthulhu, I was interested to see that you are still working on jorkens. 8-) I sent you a PM.
3 x

mcthulhu
Orange Belt
Posts: 228
Joined: Sun Feb 26, 2017 4:01 pm
Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
x 590

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby mcthulhu » Tue May 11, 2021 8:21 pm

I'll probably still be working on Jorkens on the day I die. I still have a long to-do list, and there are always things to improve and streamline. It's Electron now though, instead of NW.js.

I'm a fan of LLN, and recently finished watching a series using it. The Language Reactor site looks good. Is the Chrome browser extension there the same as the LLN extension so far?

BTW, are you still working on the hardware player?
0 x

jackwhite44
Posts: 4
Joined: Sat May 29, 2021 5:57 am
Languages: Polish (N), English (C1), Spanish (beginner), German (begineer), French (begineer), Danish (beginner)
x 3

Re: A Software Library for Creating Language Learning Materials (such as bilingual texts)

Postby jackwhite44 » Sat May 29, 2021 1:35 pm

How to use Language Reactor with youtube as it is said on the Language Reactor website? I have installed the extension, but still cannot see the subtitles next to the movie on youtube. I didn't get an idea how can I use it for learning language with videos. davidzweig, could you support?
0 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: No registered users and 2 guests