Does anyone have any experience making Goldendict dictionaries?

Ask specific questions about your target languages. Beginner questions welcome!
indeclinable
Yellow Belt
Posts: 76
Joined: Thu Mar 01, 2018 7:57 pm
Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).

Want to study: Japanese & Russian
Language Log: https://forum.language-learners.org/vie ... =15&t=8803
x 184

Does anyone have any experience making Goldendict dictionaries?

Postby indeclinable » Sat May 12, 2018 9:49 pm

I have several old, out of copyright dictionaries that I’d like to transcribe but I have no experience in any sort of programming. I’m not computer-illiterate but I feel this could be a bit more than I can chew so I’d like to know if there’s a veteran here that might offer some advice.
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23128
Contact:

Re: Does anyone have any experience making Goldendict dictionaries?

Postby rdearman » Sat May 12, 2018 10:07 pm

indeclinable wrote:I have several old, out of copyright dictionaries that I’d like to transcribe but I have no experience in any sort of programming. I’m not computer-illiterate but I feel this could be a bit more than I can chew so I’d like to know if there’s a veteran here that might offer some advice.

First are they electronic? Or are they on paper. If paper then you'll need to get them scanned in before you can manipulate them.
1 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

indeclinable
Yellow Belt
Posts: 76
Joined: Thu Mar 01, 2018 7:57 pm
Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).

Want to study: Japanese & Russian
Language Log: https://forum.language-learners.org/vie ... =15&t=8803
x 184

Re: Does anyone have any experience making Goldendict dictionaries?

Postby indeclinable » Sat May 12, 2018 10:21 pm

They’re scanned but because of the age, typography and type of paper no OCR is possible with decent results, so I’m willing to invest my time in transcription, but before I begin I’d like to know the format (txt, ott, plain) and formatting necessary.

And of course the conversion into a Goldendict reable format.

(Note one of them is an Ancient Greek dictionary so that means polytonic Greek)
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23128
Contact:

Re: Does anyone have any experience making Goldendict dictionaries?

Postby rdearman » Sat May 12, 2018 10:30 pm

You might want to approach the Distributed Proofreading team for Project Gutenberg (https://www.pgdp.net/c/). They use volunteer labour to transcribe public domain books. It means lots of people looking at small sections of scanned and OCR'ed books then fixing the errors. You might need to become the project manager for the book, but it would mean you'd get a lot of help, and it would get published on the Gutenberg site for everyone to have access too.
3 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

indeclinable
Yellow Belt
Posts: 76
Joined: Thu Mar 01, 2018 7:57 pm
Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).

Want to study: Japanese & Russian
Language Log: https://forum.language-learners.org/vie ... =15&t=8803
x 184

Re: Does anyone have any experience making Goldendict dictionaries?

Postby indeclinable » Sat May 12, 2018 11:52 pm

Thanks for the suggestion. I'll do that as a first step but I'll ultimately want to transform the transcribed text into a Goldendict compatible file. I have already registered and began doing some volunteer work to, it seems one must wait a while before one can become a project manager and start his own project.
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius

白田龍
Orange Belt
Posts: 242
Joined: Wed Mar 21, 2018 6:54 pm
Languages: English, Portuguese, Spanish, Catalan, French, Persian, Arabic, Mandarin, Japanese.
x 444

Re: Does anyone have any experience making Goldendict dictionaries?

Postby 白田龍 » Sun May 13, 2018 1:31 pm

If you can get the data intp a simple tab separated text, such as WORD/PRONUNCIATION/DEFINITION, you can probably use this tool to convert it:

https://github.com/ilius/pyglossary
2 x

Doitsujin
Green Belt
Posts: 402
Joined: Sat Jul 18, 2015 6:21 pm
Languages: German (N)
x 801

Re: Does anyone have any experience making Goldendict dictionaries?

Postby Doitsujin » Sun May 13, 2018 3:44 pm

@indeclinable: Since GoldenDict supports both StarDict and Babylon BGL files, you could use either StarDict editor or Babylon Glossary Builder to generate dictionaries. The source file needs to be a UTF-8 encoded text file, which may contain HTML 3.2 tags and attributes and must have a byte-order-mark (BOM).

I've attached a sample file (ende.gls) that you can play with.

A minimal dictionary file looks like this:

Code: Select all

#stripmethod=keep
#sametypesequence=h
#bookname=English-German Test Dictionary

book|books
<p>A written or printed <span style="color: red;">work</span> consisting of pages <i>glued</i> or <b>sewn</b> together along one side and bound in covers.</p><p>Buch, das (n)</p>

book|books|booked|booking
<p>Reserve (<a  href="bword://accommodation" >accommodation</a>, a place, etc.); buy (a ticket) <span style="color: blue;">in advance</span>.</p><p>buchen (v)</p>

accommodation|accommodations
<p>A room, group of rooms, or building in which someone may live or stay.</p></p>Unterkunft, die (n); Übernachtungsmöglichkeit, die (n)</p>



Note that there mustn't be line-breaks in the definition and the last entry must be followed by two empty lines. (Internal hyperlinks must be prefixed by bword://.)

The workflow is as follows:

StarDict Editor

1. Select the Compile tab
2. Click Browse and select ende.gls
2. Select Babylon file and click Compile.

Babylon Glossary Builder

1. Select New Project and click Next.
2. Enter the metadata information.
3. Click Advanced and check the first 3 options.
4. Select GLS from the Data Source dialog box, click Browse and select ende.gls.
5. Click Build.
You do not have the required permissions to view the files attached to this post.
2 x

User avatar
vinnie
White Belt
Posts: 30
Joined: Mon Apr 30, 2018 9:24 pm
Languages: Italian (N), Engish (beginner), German (beginner)
x 11

Re: Does anyone have any experience making Goldendict dictionaries?

Postby vinnie » Sun May 13, 2018 10:36 pm

For reason of diffusion I do not recommend using the babylon format since, as far as I am concerned, golden dict is the only program that supports it.
0 x

indeclinable
Yellow Belt
Posts: 76
Joined: Thu Mar 01, 2018 7:57 pm
Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).

Want to study: Japanese & Russian
Language Log: https://forum.language-learners.org/vie ... =15&t=8803
x 184

Re: Does anyone have any experience making Goldendict dictionaries?

Postby indeclinable » Mon May 14, 2018 11:15 pm

Now that's what I call an expert's answer @Doitsujin. Have you done this before?

Now most of the dictionaries I want to convert I have to transcribe first (now with the help of the guys at Gutenberg it might be easier). But for example this one is already transcribed into a 2272 pages .docx document (yeah, all of it transcribed word by word) but at the time I was unaware of GoldenDict.

The result I want is more or less like this:

Abet v. trans.
Encourage: P. and V. ἐπικελεύειν, παρακαλεῖν, ὁρμᾶν, V. ὀτρύνειν; see encourage, aid. Have a hand in: P. and V. συμπράσσειν, V. συμφυτεύειν. Her father Menelaus abets his daughter herein: V. πατήρ τε θυγατρὶ Μενέλεως συνδρᾷ τάδε (Eur., And. 40).


If I get your instructions right... should I write a code like this? (Forgive my incompetence in technical things, I cant' even get LaTex right).

Code: Select all

#stripmethod=keep
#sametypesequence=h
#bookname=English-Greek Dictionary

Abet
<p>v. trans. Encourage: P. and V. ἐπικελεύειν, παρακαλεῖν, ὁρμᾶν, V. ὀτρύνειν; see <a  href="bword://encourage" >encourage</a>, <a  href="bword://aid" >aid</a>. Have a hand in: P. and V. συμπράσσειν, V. συμφυτεύειν. Her father Menelaus abets his daughter herein: V. πατήρ τε θυγατρὶ Μενέλεως συνδρᾷ τάδε (Eur., And. 40).</p>



What if I want to start each definition in a new line?

Abet v. trans.
Encourage: P. and V. ἐπικελεύειν, παρακαλεῖν, ὁρμᾶν, V. ὀτρύνειν; see encourage, aid.
Have a hand in: P. and V. συμπράσσειν, V. συμφυτεύειν.
Her father Menelaus abets his daughter herein: V. πατήρ τε θυγατρὶ Μενέλεως συνδρᾷ τάδε (Eur., And. 40).


What's the purpose of

Code: Select all

<p>Buch, das (n)</p>


anyway? May I skip it?

Sadly I have no Windows so I cannot use the StarDict Editor, is there a Mac alternative?

Thanks for the help.
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius

Doitsujin
Green Belt
Posts: 402
Joined: Sat Jul 18, 2015 6:21 pm
Languages: German (N)
x 801

Re: Does anyone have any experience making Goldendict dictionaries?

Postby Doitsujin » Tue May 15, 2018 7:56 am

indeclinable wrote:@Doitsujin. Have you done this before?
Many years ago I converted the Arabic Buckwalter Corpus files to a StarDict dictionary.

indeclinable wrote:If I get your instructions right... should I write a code like this? (Forgive my incompetence in technical things, I cant' even get LaTex right).
It should work as long as you don't use line-breaks in the headword and definition lines.

To recap. You'll need:

1. A commented out preamble:

Code: Select all


#stripmethod=keep
#sametypesequence=h
#bookname=Name of the dictionary



2. Definitions

Code: Select all

headword1|inflection1|inflection2|inflection3
<p>first definition</p><p>second definition</p><p>third definition</p>

headword2|inflection1|inflection2|inflection3
<p>first definition</p><p>second definition</p><p>third definition</p>

lastheadword|inflection1|inflection2|inflection3
<p>first definition</p><p>second definition</p><p>third definition</p>




3. And additional empty line after the last entry.

indeclinable wrote:What if I want to start each definition in a new line?
IIRC, GoldenDict will display each new <p>...</p> section on a new line. However, StarDict requires a <br /> tag.

indeclinable wrote:What's the purpose of

Code: Select all

<p>Buch, das (n)</p>
It's the German translation of "book." :-)
indeclinable wrote:Sadly I have no Windows so I cannot use the StarDict Editor, is there a Mac alternative?
There should be one, but I haven't found one. However, since the source code is available on Github and there's a macOS version of the dictionary client, a macOS developer should be able to compile it for you.
Otherwise you might have to use PyGlossary, which should work on Macs, too. If you plan to use PyGlossary, you'll need to create a two-column spreadsheet with the headword in the first column and the definition in the second column and save it as a tab-delimted UTF-8 text file.
I don't know whether PyGlossary supports HTML tags, though.
0 x


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 2 guests