Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Continue or start your personal language log here, including logs for challenge participants
User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Sun Nov 29, 2015 8:08 pm

The time it takes to look up the definition of a word the first time and write or paste it into the vocabulary box produces a hitch in using LWT. How convenient it would be if the definition were already there. In Ancient Greek, there are two problems with providing such. None of the extant dictionaries was designed to be convenient to use, and they provide only headwords.

As a start, I downloaded from the Perseus Project all the vocabulary from the works I expect to read, totaling nearly 48,510 words. There were some "duplications" and (for the most part) their definitions. ἄλλος, for example, once as defined by LSJ, Middle Liddell and Autenrieth, and again as defined by Slater. So far as my inspection can tell, all the words are lemmas, or headwords, or, more precisely, the first person singular active or middle of verbs, the masculine nominative singular of nouns and adjectives, plus conjunctions, prepositions and interjections.

The other word forms the lemmas can appear as are not there. I've contemplated the complex task of programmatically listing the word forms, but all the rules especially for verbs kept me from starting.

A few weeks ago I downloaded Diogenes from the University of Chicago, and a week or so ago I looked at the downloaded files that come with Diogenes. In them were files called Greek-lemmata.txt and Greek-analytical.txt. These two files turned out to be all the lemmas of Ancient Greek as housed by the Perseus Project and all the word forms as well.

To work with so much data I decided to buy Microsoft Access, a desktop database application. I had never worked with Access before, but I used to work with similar databases, so the learning curve was slight, at least for what I wanted to use it for. There are other databases I could have used. MySql is one, but my copy is dedicated to LWT, and I did not want to risk messing it up somehow. Other databases, like MS SQL Server, seemed like overkill. I don't know Visual Basic, otherwise I could have programmed a solution to the conversion. And there are other ways, too, but all I want is a one-time process.

After importing the two Diogenes text files into Access, the next step was separating the words (lemmas in one case, word forms in the other). A query quickly picked out the word forms of the Greek-analytical.txt file. Then I exported them to Excel, and from there to a text file. The why of this in a minute. Access would not cooperate in exporting a clean text file, so I used this two-step method.

From the resulting text file I could take chunks of the data and import it into LibreOffice Writer. There is a macro for Writer that converts betacode, which is what the Greek in the Perseus files is written in, into unicode. Greek in unicode allows for a straightforward search without having to fiddle with betacode. The macro quickly let me know that it had a size limit, which turned out to be about 35,000 rows. The conversion process is still ongoing, because of the size of the file: more than 900,000 rows in the Greek-analytical.txt file. So far, about 100,000 rows have been converted. It takes about an hour to complete 35,000 rows.
2 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Mon Nov 30, 2015 9:52 pm

Now 300,000 Greek words have been converted from betacode to unicode. Remaining is the need to double-check the results. Probably by random sampling.
Presently, my only plan for the converted word list is to upload it, the definitions and parts of speech columns to LWT. I don't know how well mySql will respond to so many rows. :?:

I am still working on Antigone, but I pick up and read a few pages of Daphnis & Chloe from time to time. Goethe was so taken by the book that he recommended reading it every year.

In other languages, I have scrapped La Catedral del mar by Ildefonso Falcones for the second time. Not plausible enough for my taste. Its place will be taken by El Tango de la guardia vieja by Arturo Pérez-Reverte. Previously I read his La Reina del sur. Although I did not care for the second half, the first half did keep my interest. Depending on how El Tango goes, I might try some of his later books as well as some of his other historical novels. He has written a lot.

The French lot I am reading is comprised of Tout ce que j'aimais by Siri Hustvedt and Memoire infidèle by Elizabeth George, both translated from English, Le Doctor Jivago from the Russian, and La révolution francaise by Albert Soboul. All are interesting, but I have just gotten started.
1 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Fri Dec 04, 2015 7:58 pm

I'm some 50 or 60 pages into El Tango de la guardia vieja by Arturo Pérez-Reverte. So far we know that the main character Max Costa is a professional ballroom dancer and that he has become interested in the professional world of chess. A couple of other key characters are a song composer and his wife. I have not seen much of the chess player, who seems to be modeled a lot on Bobby Fischer. So far, so good.

I am only inching along in Hustvedt's novel and George's. Both are still interesting, but I can't say much more than that. In real life, Hustvedt is also an art critic, and art (namely, painting) is playing a large role in her novel. The narrator is an art critic, and one of the main characters is a charismatic painter. The other Hustvedt novel I read was The Enchantment of Lily Dahl (in English), which I liked very much. George's book is a murder mystery, though no one has died yet.

Years ago I devoured the English translation of Dr. Zhivago. I am enjoying the re-read of the French translation, though my pace has been slow. I am reading too many books. Once I am through with these novels, I will go back to reading French novels written in French. This has been a Big Fat Book campaign, to help speed up the pace of my reading French. Documentation for all this reading, such as it is, is in Goodreads, where my nom-de-plume is Cornpone.

La revolution francaise continues to fascinate me. It is refreshing to read about the underpinning of the revolution, putting aside for the moment the larger than life characters. The price of bread kept rising throughout the century, wages did not keep pace and the taxes were crushing and borne mostly by the working class.

Daphnis & Chloe and Antigone proceed apace, more languages exercises than literature still at this point.

More than 700,000 words of the Perseus Greek lexicon have been converted from betacode to unicode, and some 60,000 of them have been uploaded along with their meanings and parts of speech to the LWT database. Although that number is confined to proper names and word forms beginning with α-, a few of them occurred in this morning's Antigone reading. In theory, when done, I should be able to upload a totally new text to LWT and have all the words in it highlighted and defined. Let it be noted that only the word forms were converted. There are words, including the word forms themselves, in Greek in the rest of the data that I did not try to convert. Too complicated at this point. Maybe something like Python or Perl can deal with such things, but I don't know either and don't want to learn them for a one-off project.

This process of loading the LWT database ahead of time should work for other languages, too, provided a suitable lexicon exists.
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Sun Dec 06, 2015 7:55 pm

The LWT database of Ancient Greek definitions.

All the word forms from the Perseus lexicon have been transcoded into unicode, and all the word forms have been aligned with their definitions in spreadsheets.
All the spreadsheets have been converted to txt files. Some 800,000 word forms and their definitions were uploaded into LWT. Roughly 100,000 word forms remain to be uploaded.

The definitions show up clearly enough in the reading I am doing, the Antigone by Sophocles, and in Memorabilia by Xenophon. There are a couple of glitches. The data downloaded from Diogenes is not always complete or clear. Sometimes the definition is merely a reference to a work. Sometimes the definition is missing. Besides that, the database is now too big to back up properly. The backup process takes so long that something times out. There is a 300 second limit. I know where the 300 lurks in the code, but I do not know enough about the code to risk a bald fix. Since I am able to make a copy of the raw mySql files, I can get my backup that way.

Why does mySql take more than 5 minutes to execute a query that a less than optimal app like Access can execute in a second or so? LWT has been set up to run off an Apache server set up on the desktop, and LWT operates through a browser. Is it the server or the browser slowing things down? Don't know and, for now, don't care. This process has taken enough time already, and everything is working. Even if the definitions are not very pretty, having them in place saves an appreciable amount of time.

LWT and Lexique

Lexique 380 has more than 140,000 word forms and some 27,000 lemmas. Enough to make a decent LWT dictionary, except there are no definitions. But the word forms can be fed into Google translate a chunk at a time. Google can handle 15,000 or so words at a time.

Tout ce que j'aimais

Ancient Greek still looks like a foreign language to me, but French and Spanish do not. Still and all, they are harder for me to read than English. To have something easy to read in my "down time" from Greek, I switched from the French translation of What I Loved to the English original. I'm still not very far, just to page 50. Hustvedt explores "making," both the making of art and the making of children, as well as the makers, their partners and their friends. I think it must be more difficult to describe a "fictional" painting than to describe the dance or a dance move, as Reverte does in El tango de la guardia vieja. Why do her characters seem more real to me than Reverte's characters? So far, there is nothing cliché about his characters. The heroine in El tango, such as she is, is beautiful, but Bill in What I Love is described, perhaps not in so many words, as sexy. To go further, am I convinced by the characters in Antigone? Not except for Haemon, who makes a gallant challenge of his father Creon's condemnation of Antigone. What about Mémoire infidèle? There is a violinist, well picked out as personally unappealing. The rest of the characters, I don't know enough about yet. The author George is switching from one to another rather quickly, and after all I am only on page 33 (of 981). The Rougon family in Zola'a work, which I am reading on Kindle and have not even mentioned here yet, is painted in a Dickensian way in a most unflattering light. I believe them as people, though perhaps something might have been said about their good qualities.

I might be trying to answer a question here about Hustvedt, George and Riverte. Do they stand in the first rank of novelists?
0 x

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Tue Dec 08, 2015 9:13 pm

First, I dropped out of Goodreads. Too many Amazon fingerprints all over it for me. I went to Bookfinds, but I could not figure out even how to add a book, so I skedaddled. LibraryThing costs money, so that is not an option either. I exported, or at least I think I exported, my data from Goodreads. Now that I own Access, I can write my own little book tracking app.

The Ancient Greek vocabulary upload project is finished. More than 800,000 vocabulary items were added to the mySql database that LWT uses. Now no more fiddling with dictionary look-ups and typing and/or copying and pasting of definitions into LWT. Both a time saver and a riddance of the constant interruptions.

The downsides, all relatively minor:

1. Not all the words uploaded. I am finding some 2% to 3% of the words in new readings remain undefined. It may be a question of accents. The Perseus lexicon distinguishes acute accents from grave accents, say. The LWT upload process can be set to reject terms already in the database. This is how I set it. But LWT can not recognize differences in accents. To LWT, for example, a wording ending in -α is the same as a word ending in -ά or -ὰ. So, if the upload process found a word already existing in the database but with a different accent, it would not upload the new word. This is what I think happened.

I would like to know more about how MySql gets installed on a computer, about where exactly it is and about how its databases can be copied. For the sake of security, I would like to copy the LWT database to Access and MS Sequel Server.

2. The database is now still so overloaded that it can not do a proper backup, timing out instead. Instead, I am writing the Ancient Greek vocabulary to a tab-delimited text file. The data still gets kept, though words from other languages are not backed up. No matter, so long as I don't add to them. So far, I am concentrating on Ancient Greek in LWT, so I am okay there. If I go to other languages with LWT, they would be Old Norse and Old English.

3. Enough unnecessary crud is in the definitions that I have to work my way though the garbage and unnecessary information to get to the meaning and/or metadata for the word.

4. The dictionary look-up no longer works. I have been using Logeion, but the word never shows up. Luckily, it does not much matter anymore.
2 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4782
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 15020

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby Iversen » Fri Dec 11, 2015 10:49 am

MorkTheFiddle wrote:Lexique 380 has more than 140,000 word forms and some 27,000 lemmas. Enough to make a decent LWT dictionary, except there are no definitions. But the word forms can be fed into Google translate a chunk at a time. Google can handle 15,000 or so words at a time.


I didn't get this - Goggle translate can't translate Ancient Greek so what in your project is it you let it translate?

Apart from that I am impressed by that project - basically you are creating your own popup dictionary from scratch (if I have understood it right) .
0 x

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Fri Dec 11, 2015 7:23 pm

Iversen wrote:
MorkTheFiddle wrote:Lexique 380 has more than 140,000 word forms and some 27,000 lemmas. Enough to make a decent LWT dictionary, except there are no definitions. But the word forms can be fed into Google translate a chunk at a time. Google can handle 15,000 or so words at a time.


I didn't get this - Goggle translate can't translate Ancient Greek so what in your project is it you let it translate?

Apart from that I am impressed by that project - basically you are creating your own popup dictionary from scratch (if I have understood it right) .


All credit for developing LWT must go to HTLAL member lwtproject. His first post about LWT was at HTLAL 4 July 2011.

Using LWT, one does in fact create a popup dictionary, one word at a time. My project was to speed up the process a bit by uploading all the definitions at once, obviating the necessity of looking them up one by one during the reading process. By and large this has been successful.

I point LWT to The University of Chicago's Logeion for an Ancient Greek dictionary. Credit this discovery to HTLAL member gregf. I say in a post above that the Logeion link stopped working, but it is functional again. I don't know why it stopped working and don't know why it started again. :?
1 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby MorkTheFiddle » Sun Dec 13, 2015 6:58 pm

Yesterday afternoon I ported my LWT app from my laptop to my desktop, where because of various reasons it works better. Lucky I did. When I turned on my laptop this morning, I got a message saying that Cortana and the start button did not work. If I restarted, it said, "it" would try to fix it. I did, but it did not. Not to be bothered with that, I dug my Mac laptop out of mothballs and used it for what I needed. Later after a quick Internet search, I found out many others have had the same bug and, more important, there was a fix for it. Reboot in Safe Mode, then reboot again in regular mode. Worked for me, though for some maybe it does not work.

Backing up the LWT database got solved, too. Running from cmd, one line downloads the database directly from MySql. Took less than 5 seconds, where LWT took 5 minutes and then timed out.

Past the halfway mark of Antigone, I must admit a lot of it mystifies me. In part that's because Sophocles alludes a lot to Greek mythology. Also at times his poetic expression strikes me as ridiculous.

In the other reading that I am doing, I am finding out more and more about Max, the hero of El tango de la guardia vieja. From here on out what I say about these readings is going to have spoilers. Max had a checkered past before he became a professional ballroom dancer, though we don't know quite what it consisted of. We learn as well he has a knack for breaking into hotel rooms.

What I Loved has moved through the divorce of painter Bill. In fact, not a lot is happening in the novel except the ruminations of the narrator, the art critic. It is retaining my interest, nevertheless.

Dr Jivago moves on at its leisurely pace. Cossack cavalry has just broken up a street demonstration. One little old lady who lives on the street thinks the demonstration was directed specifically at her, and she can not understand what she did to deserve such wrath.

Mémoire infidèle has dropped a notch from an interesting beginning to the clichéd characters and situations of police thrillers. By now one surely understands that police detectives sometimes don't get a lot of sleep. Detectives are not the only ones. Try working in IT and being on call all night and getting a call at 2:00 in the morning to fix something someone else (who is sleeping soundly) screwed up.

Is the psychological novel tradition that I once read was begun by Constant's Adolph still with us? Do we still judge good novels by how close they adhere to psychological reality? By those standards and those standards alone, all three of the novels I am reading fit the bill. Do the plots have to be plausible, at least within the premises of the individual works? Again, check for all three. But what else? Why can the quiet novels of Jane Austen stand shoulder to shoulder with the boisterous work of Tolstoy and Dostoevsky, say, while the books of many popular novelists from any era can not?
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
gregf
Posts: 9
Joined: Thu Oct 29, 2015 7:22 pm
Location: Paris
Languages: English (native), German (fluent), French (fluent); Studying: Italian, Modern Greek, Ancient Greek
x 7

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby gregf » Tue Dec 15, 2015 2:43 pm

Very happy to have stumbled on this log! :)

I know I've pointed this out to you before, but the Noet software offers the Perseus project for download for free. The texts are morphologically tagged (though automatically, meaning there are still a fair number of ambiguous forms and thus extra information), which, coupled with a dictionary like the Middle Liddel, make for a wonderful Greek and Latin reading experience.

Since discovering Noet/Logos, I've pretty much given up on LWT for reading ancient languages. But I'm looking forward to seeing how your experiments with LWT work out.

(My first post on the new forum!)
0 x
: 510 / 651 Anabasis (Ancient Greek)

User avatar
gregf
Posts: 9
Joined: Thu Oct 29, 2015 7:22 pm
Location: Paris
Languages: English (native), German (fluent), French (fluent); Studying: Italian, Modern Greek, Ancient Greek
x 7

Re: Mork the Fiddle's Log (Ancient Greek, Spanish, French)

Postby gregf » Tue Dec 15, 2015 3:18 pm

I just saw from a different thread (on your daily routine) that you're already using Noet, so I'll post something else you might find interesting.

After a lot of cajoling, I and a few other users managed to convince the LingQ staff to add a Modern Greek slot, so I've been studying a lot of Modern Greek in the hopes that the comprehensive input model will bear fruit for my ancient Greek as well.

Too early to tell what kind of effects learning MG has had on my AG, but I'm certainly reading a lot of Greek, in one form or another. More updates soon.
0 x
: 510 / 651 Anabasis (Ancient Greek)


Return to “Language logs”

Who is online

Users browsing this forum: tastyonions and 2 guests