Does anyone have any experience making Goldendict dictionaries?
-
- Yellow Belt
- Posts: 76
- Joined: Thu Mar 01, 2018 7:57 pm
- Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).
Want to study: Japanese & Russian - Language Log: https://forum.language-learners.org/vie ... =15&t=8803
- x 184
Does anyone have any experience making Goldendict dictionaries?
I have several old, out of copyright dictionaries that I’d like to transcribe but I have no experience in any sort of programming. I’m not computer-illiterate but I feel this could be a bit more than I can chew so I’d like to know if there’s a veteran here that might offer some advice.
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius
- rdearman
- Site Admin
- Posts: 7231
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23128
- Contact:
Re: Does anyone have any experience making Goldendict dictionaries?
indeclinable wrote:I have several old, out of copyright dictionaries that I’d like to transcribe but I have no experience in any sort of programming. I’m not computer-illiterate but I feel this could be a bit more than I can chew so I’d like to know if there’s a veteran here that might offer some advice.
First are they electronic? Or are they on paper. If paper then you'll need to get them scanned in before you can manipulate them.
1 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
-
- Yellow Belt
- Posts: 76
- Joined: Thu Mar 01, 2018 7:57 pm
- Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).
Want to study: Japanese & Russian - Language Log: https://forum.language-learners.org/vie ... =15&t=8803
- x 184
Re: Does anyone have any experience making Goldendict dictionaries?
They’re scanned but because of the age, typography and type of paper no OCR is possible with decent results, so I’m willing to invest my time in transcription, but before I begin I’d like to know the format (txt, ott, plain) and formatting necessary.
And of course the conversion into a Goldendict reable format.
(Note one of them is an Ancient Greek dictionary so that means polytonic Greek)
And of course the conversion into a Goldendict reable format.
(Note one of them is an Ancient Greek dictionary so that means polytonic Greek)
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius
- rdearman
- Site Admin
- Posts: 7231
- Joined: Thu May 14, 2015 4:18 pm
- Location: United Kingdom
- Languages: English (N)
- Language Log: viewtopic.php?f=15&t=1836
- x 23128
- Contact:
Re: Does anyone have any experience making Goldendict dictionaries?
You might want to approach the Distributed Proofreading team for Project Gutenberg (https://www.pgdp.net/c/). They use volunteer labour to transcribe public domain books. It means lots of people looking at small sections of scanned and OCR'ed books then fixing the errors. You might need to become the project manager for the book, but it would mean you'd get a lot of help, and it would get published on the Gutenberg site for everyone to have access too.
3 x
: Read 150 books in 2024
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter
I post on this forum with mobile devices, so excuse short msgs and typos.
-
- Yellow Belt
- Posts: 76
- Joined: Thu Mar 01, 2018 7:57 pm
- Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).
Want to study: Japanese & Russian - Language Log: https://forum.language-learners.org/vie ... =15&t=8803
- x 184
Re: Does anyone have any experience making Goldendict dictionaries?
Thanks for the suggestion. I'll do that as a first step but I'll ultimately want to transform the transcribed text into a Goldendict compatible file. I have already registered and began doing some volunteer work to, it seems one must wait a while before one can become a project manager and start his own project.
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius
-
- Orange Belt
- Posts: 242
- Joined: Wed Mar 21, 2018 6:54 pm
- Languages: English, Portuguese, Spanish, Catalan, French, Persian, Arabic, Mandarin, Japanese.
- x 444
Re: Does anyone have any experience making Goldendict dictionaries?
If you can get the data intp a simple tab separated text, such as WORD/PRONUNCIATION/DEFINITION, you can probably use this tool to convert it:
https://github.com/ilius/pyglossary
https://github.com/ilius/pyglossary
2 x
-
- Green Belt
- Posts: 402
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 801
Re: Does anyone have any experience making Goldendict dictionaries?
@indeclinable: Since GoldenDict supports both StarDict and Babylon BGL files, you could use either StarDict editor or Babylon Glossary Builder to generate dictionaries. The source file needs to be a UTF-8 encoded text file, which may contain HTML 3.2 tags and attributes and must have a byte-order-mark (BOM).
I've attached a sample file (ende.gls) that you can play with.
A minimal dictionary file looks like this:
Note that there mustn't be line-breaks in the definition and the last entry must be followed by two empty lines. (Internal hyperlinks must be prefixed by bword://.)
The workflow is as follows:
StarDict Editor
1. Select the Compile tab
2. Click Browse and select ende.gls
2. Select Babylon file and click Compile.
Babylon Glossary Builder
1. Select New Project and click Next.
2. Enter the metadata information.
3. Click Advanced and check the first 3 options.
4. Select GLS from the Data Source dialog box, click Browse and select ende.gls.
5. Click Build.
I've attached a sample file (ende.gls) that you can play with.
A minimal dictionary file looks like this:
Code: Select all
#stripmethod=keep
#sametypesequence=h
#bookname=English-German Test Dictionary
book|books
<p>A written or printed <span style="color: red;">work</span> consisting of pages <i>glued</i> or <b>sewn</b> together along one side and bound in covers.</p><p>Buch, das (n)</p>
book|books|booked|booking
<p>Reserve (<a href="bword://accommodation" >accommodation</a>, a place, etc.); buy (a ticket) <span style="color: blue;">in advance</span>.</p><p>buchen (v)</p>
accommodation|accommodations
<p>A room, group of rooms, or building in which someone may live or stay.</p></p>Unterkunft, die (n); Übernachtungsmöglichkeit, die (n)</p>
Note that there mustn't be line-breaks in the definition and the last entry must be followed by two empty lines. (Internal hyperlinks must be prefixed by bword://.)
The workflow is as follows:
StarDict Editor
1. Select the Compile tab
2. Click Browse and select ende.gls
2. Select Babylon file and click Compile.
Babylon Glossary Builder
1. Select New Project and click Next.
2. Enter the metadata information.
3. Click Advanced and check the first 3 options.
4. Select GLS from the Data Source dialog box, click Browse and select ende.gls.
5. Click Build.
You do not have the required permissions to view the files attached to this post.
2 x
- vinnie
- White Belt
- Posts: 30
- Joined: Mon Apr 30, 2018 9:24 pm
- Languages: Italian (N), Engish (beginner), German (beginner)
- x 11
Re: Does anyone have any experience making Goldendict dictionaries?
For reason of diffusion I do not recommend using the babylon format since, as far as I am concerned, golden dict is the only program that supports it.
0 x
-
- Yellow Belt
- Posts: 76
- Joined: Thu Mar 01, 2018 7:57 pm
- Languages: Spanish (N), English (C2), German (C1), Latin (C1), French (B2), Ancient Greek (B1), Italian (A2).
Want to study: Japanese & Russian - Language Log: https://forum.language-learners.org/vie ... =15&t=8803
- x 184
Re: Does anyone have any experience making Goldendict dictionaries?
Now that's what I call an expert's answer @Doitsujin. Have you done this before?
Now most of the dictionaries I want to convert I have to transcribe first (now with the help of the guys at Gutenberg it might be easier). But for example this one is already transcribed into a 2272 pages .docx document (yeah, all of it transcribed word by word) but at the time I was unaware of GoldenDict.
The result I want is more or less like this:
If I get your instructions right... should I write a code like this? (Forgive my incompetence in technical things, I cant' even get LaTex right).
What if I want to start each definition in a new line?
What's the purpose of
anyway? May I skip it?
Sadly I have no Windows so I cannot use the StarDict Editor, is there a Mac alternative?
Thanks for the help.
Now most of the dictionaries I want to convert I have to transcribe first (now with the help of the guys at Gutenberg it might be easier). But for example this one is already transcribed into a 2272 pages .docx document (yeah, all of it transcribed word by word) but at the time I was unaware of GoldenDict.
The result I want is more or less like this:
Abet v. trans.
Encourage: P. and V. ἐπικελεύειν, παρακαλεῖν, ὁρμᾶν, V. ὀτρύνειν; see encourage, aid. Have a hand in: P. and V. συμπράσσειν, V. συμφυτεύειν. Her father Menelaus abets his daughter herein: V. πατήρ τε θυγατρὶ Μενέλεως συνδρᾷ τάδε (Eur., And. 40).
If I get your instructions right... should I write a code like this? (Forgive my incompetence in technical things, I cant' even get LaTex right).
Code: Select all
#stripmethod=keep
#sametypesequence=h
#bookname=English-Greek Dictionary
Abet
<p>v. trans. Encourage: P. and V. ἐπικελεύειν, παρακαλεῖν, ὁρμᾶν, V. ὀτρύνειν; see <a href="bword://encourage" >encourage</a>, <a href="bword://aid" >aid</a>. Have a hand in: P. and V. συμπράσσειν, V. συμφυτεύειν. Her father Menelaus abets his daughter herein: V. πατήρ τε θυγατρὶ Μενέλεως συνδρᾷ τάδε (Eur., And. 40).</p>
What if I want to start each definition in a new line?
Abet v. trans.
Encourage: P. and V. ἐπικελεύειν, παρακαλεῖν, ὁρμᾶν, V. ὀτρύνειν; see encourage, aid.
Have a hand in: P. and V. συμπράσσειν, V. συμφυτεύειν.
Her father Menelaus abets his daughter herein: V. πατήρ τε θυγατρὶ Μενέλεως συνδρᾷ τάδε (Eur., And. 40).
What's the purpose of
Code: Select all
<p>Buch, das (n)</p>
anyway? May I skip it?
Sadly I have no Windows so I cannot use the StarDict Editor, is there a Mac alternative?
Thanks for the help.
0 x
Omnis lingua usu potius discitur quam praeceptis, id est audiendo, legendo, relegendo, imitationem manu et lingua temptando quam creberrime. – Iohannes Amos Comenius
-
- Green Belt
- Posts: 402
- Joined: Sat Jul 18, 2015 6:21 pm
- Languages: German (N)
- x 801
Re: Does anyone have any experience making Goldendict dictionaries?
Many years ago I converted the Arabic Buckwalter Corpus files to a StarDict dictionary.indeclinable wrote:@Doitsujin. Have you done this before?
It should work as long as you don't use line-breaks in the headword and definition lines.indeclinable wrote:If I get your instructions right... should I write a code like this? (Forgive my incompetence in technical things, I cant' even get LaTex right).
To recap. You'll need:
1. A commented out preamble:
Code: Select all
#stripmethod=keep
#sametypesequence=h
#bookname=Name of the dictionary
2. Definitions
Code: Select all
headword1|inflection1|inflection2|inflection3
<p>first definition</p><p>second definition</p><p>third definition</p>
headword2|inflection1|inflection2|inflection3
<p>first definition</p><p>second definition</p><p>third definition</p>
lastheadword|inflection1|inflection2|inflection3
<p>first definition</p><p>second definition</p><p>third definition</p>
3. And additional empty line after the last entry.
IIRC, GoldenDict will display each new <p>...</p> section on a new line. However, StarDict requires a <br /> tag.indeclinable wrote:What if I want to start each definition in a new line?
It's the German translation of "book."indeclinable wrote:What's the purpose ofCode: Select all
<p>Buch, das (n)</p>
There should be one, but I haven't found one. However, since the source code is available on Github and there's a macOS version of the dictionary client, a macOS developer should be able to compile it for you.indeclinable wrote:Sadly I have no Windows so I cannot use the StarDict Editor, is there a Mac alternative?
Otherwise you might have to use PyGlossary, which should work on Macs, too. If you plan to use PyGlossary, you'll need to create a two-column spreadsheet with the headword in the first column and the definition in the second column and save it as a tab-delimted UTF-8 text file.
I don't know whether PyGlossary supports HTML tags, though.
0 x
Return to “Practical Questions and Advice”
Who is online
Users browsing this forum: No registered users and 2 guests