Aligning Bilingual Texts with Machine Learning

General discussion about learning languages
kundalini
Orange Belt
Posts: 113
Joined: Sun Jan 24, 2021 8:17 pm
Languages: English (C), Greek (low intermediate)
x 355

Aligning Bilingual Texts with Machine Learning

Postby kundalini » Sat Mar 11, 2023 3:28 pm

I just came across a Python library that uses machine learning to generate an aligned bilingual text in HTML. I think it looks very promising! More details are in this article:
https://medium.com/@averoo/how-to-make-a-parallel-book-for-language-learning-part-1-python-and-colab-version-cff09e379d8c

The Google Colab notebook is here: https://colab.research.google.com/drive/1_ics0YzWg5qIZIPhA1X_Wbfg0XZzRO-p

If you aren't familiar with using Colab, and are interested in giving it a whirl, you just have to make a copy of the notebook into your own Google Drive, replace the Harper Lee text in the Colab notebook with bilingual texts of your own, and make sure that the language codes ('en' and 'ru' in the example) are correct. Then click on the play button in each cell in the notebook to run the code.

I gave it a go with the first chapter of The Count of Monte Cristo, using texts from https://www.gutenberg.org/cache/epub/17989/pg17989-images.html#I and https://archive.org/stream/thecountofmontec01184gut/crsto12.txt, and the result is shown below:

dumas.png
You do not have the required permissions to view the files attached to this post.
Last edited by kundalini on Sat Mar 11, 2023 5:37 pm, edited 1 time in total.
6 x
Iliad: 12 / 24

Cainntear
Black Belt - 3rd Dan
Posts: 3527
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8794
Contact:

Re: Creating Bilingual Texts with Machine Learning

Postby Cainntear » Sat Mar 11, 2023 4:27 pm

Oooh... I read too quickly and didn't realise that this wasn't a book translator thing, but actually a program that aligns human-translated texts with the original human text. That's a genuinely intriguing prospect that basically removes a massive labour-intensive step from the process of making these things. That's very interesting.
1 x

kundalini
Orange Belt
Posts: 113
Joined: Sun Jan 24, 2021 8:17 pm
Languages: English (C), Greek (low intermediate)
x 355

Re: Creating Bilingual Texts with Machine Learning

Postby kundalini » Sat Mar 11, 2023 5:39 pm

Cainntear wrote:Oooh... I read too quickly and didn't realise that this wasn't a book translator thing, but actually a program that aligns human-translated texts with the original human text. That's a genuinely intriguing prospect that basically removes a massive labour-intensive step from the process of making these things. That's very interesting.


Yes, I changed the thread title to clarify this point. It seems like lingtrain_aligner, the Python aligner program, uses machine learning to align sentences based on their likelihood of being semantically similar.
1 x
Iliad: 12 / 24


Return to “General Language Discussion”

Who is online

Users browsing this forum: Radioclare and 2 guests