Following up on a recent discussion here, I've been working on adding an option to Jorkens to convert Global Voices articles to parallel text epubs. Jorkens prompts for the URL of a Global Voices article, downloads that one, looks for a version in your native language, downloads that, and then generates an epub book with the two languages merged in alternating paragraphs. I'd originally planned on a two-column view; but now have the translation interlaced with the original. It's hidden by default (because I want to avoid looking at the reference version until I need to); and when you hover over the original paragraph, the translation becomes visible just below it, in blue. I'm quite pleased with how nice this looks so far (at least to me), though I still have some tweaks to make before I'm ready to upload this to GitHub.
So far I've tested the generated epubs in another desktop epub reader, but haven't tried them on mobile yet. (I'm not sure what the mobile equivalent of a hover should be -- tap? swipe?, and I'm also unsure about conflicts with mobile epub readers' default behavior for those actions.) I do want the epubs to be usable in other readers, anyway.
One possible improvement to this tool later on might be to download more data, since one article isn't a very long book. Maybe get the last 100 news articles in a given language for which the desired equivalents exist, and treat each article as a chapter in a book of parallel texts? It might take a while to build the book but that might be easier than making an epub for each article.
Another will be to split paragraphs into sentences and try matching those between versions, and have pairs of span elements within a given paragraph. I might save this for a later version.
And of course the plan is to generalize this eventually to produce parallel text epubs easily for other sources of parallel data as well, ideally any two versions of the same book in different languages.
working on a parallel text epub generator
-
- Orange Belt
- Posts: 228
- Joined: Sun Feb 26, 2017 4:01 pm
- Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
- x 590
-
- Green Belt
- Posts: 404
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 807
Re: working on a parallel text epub generator
You might want to consider adding an import filter for the Aglona PBO file format.
The format isn't documented, however, since .pbo files are .xml files, you should be able to reverse-engineer it from this source code file.
BTW, if you're still looking for a suitable target format, maybe you could base it on the Doppeltext epub3 format.
The format isn't documented, however, since .pbo files are .xml files, you should be able to reverse-engineer it from this source code file.
BTW, if you're still looking for a suitable target format, maybe you could base it on the Doppeltext epub3 format.
0 x
-
- Orange Belt
- Posts: 228
- Joined: Sun Feb 26, 2017 4:01 pm
- Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
- x 590
Re: working on a parallel text epub generator
Doitsujin, thanks for the suggestions. I'd looked at Aglona Reader in the past, and I looked at it a little more today. It's an interesting tool, and there are are some things I liked about the reader itself, like the multiple views, and some things I didn't like, like the use of a format that no other reader can handle. It seems like the preparation of parallel books in that non-standard format is a very labor-intensive process, about the same level of effort as NOVA Text Aligner, which is also a GUI for an entirely manual, sentence-by-sentence approach. That might be why there seem to be so few books available in that format, all old public-domain works, and in very few languages (three, I think). Am I mistaken?
I think the author might have been hoping for an army of volunteers to contribute books in this format, but that does not appear to have happened. I agree with him that hand-aligned parallel texts are of higher quality, but personally, I don't have the time, and a semi-automated aligner is time-consuming enough for me. Anyway, it would probably be possible to import .PBO files and convert them into epubs, but given the small number (as far as I can tell) of .PBO books available to convert, I'm not sure this would be worth the investment of time, at least right now. Maybe later. A higher priority for me might be importing the parallel novels on Farkas's site, which cover 15 languages.
The Doppeltext format, though a bit fancier and using clicks instead, looks very much like my first effort, which used HTML title attributes to display translations when hovering over the original text. It might be possible to keep this as one of several possible views, and let a reader choose among them.
I think the author might have been hoping for an army of volunteers to contribute books in this format, but that does not appear to have happened. I agree with him that hand-aligned parallel texts are of higher quality, but personally, I don't have the time, and a semi-automated aligner is time-consuming enough for me. Anyway, it would probably be possible to import .PBO files and convert them into epubs, but given the small number (as far as I can tell) of .PBO books available to convert, I'm not sure this would be worth the investment of time, at least right now. Maybe later. A higher priority for me might be importing the parallel novels on Farkas's site, which cover 15 languages.
The Doppeltext format, though a bit fancier and using clicks instead, looks very much like my first effort, which used HTML title attributes to display translations when hovering over the original text. It might be possible to keep this as one of several possible views, and let a reader choose among them.
0 x
-
- Orange Belt
- Posts: 228
- Joined: Sun Feb 26, 2017 4:01 pm
- Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
- x 590
Re: working on a parallel text epub generator
This is now working, more or less, and the source code has been updated on GitHub.
1 x
- EGP
- White Belt
- Posts: 31
- Joined: Sun Mar 28, 2021 8:36 pm
- Location: Australia
- Languages: English (N), Macedonian (B2), German (A1)
- x 51
- Contact:
Re: working on a parallel text epub generator
You might be interested in something like AntPConc too if you like analysis.
2 x
I research English grammar and vocabulary in corpora.
-
- Orange Belt
- Posts: 228
- Joined: Sun Feb 26, 2017 4:01 pm
- Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
- x 590
Re: working on a parallel text epub generator
Thanks, EGP. I'm aware of AntPConc, though I've used AntConc more. I have both of them and they are very nice tools.
0 x
-
- Orange Belt
- Posts: 116
- Joined: Sun Jan 24, 2021 8:17 pm
- Languages: English (C), Greek (low intermediate)
- x 362
Re: working on a parallel text epub generator
I was able to get the new version up and running on my Mac. I was able to advance to the next page, but not go backwards, but this isn't such a big deal. Looks great!
You do not have the required permissions to view the files attached to this post.
1 x
Iliad:
French Super Challenge Books: (0/5000 pages)
French Super Challenge Films: (0/9000 minutes)
French Super Challenge Books: (0/5000 pages)
French Super Challenge Films: (0/9000 minutes)
Return to “Language Programs and Resources”
Who is online
Users browsing this forum: Google [Bot] and 2 guests