Iversen's Guide to Learning Languages (version 3b)

All about language programs, courses, websites and other learning resources
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian,
Ahem, not yet: Esperanto, Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Indonesian ...
Language Log: viewtopic.php?f=15&t=1027
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 3:09 pm

1.9. Can you learn a language only by reading?

Can you learn a language only by reading? No, of course not. As a minimum you need to know how to pronounce it inside your head - else it will just amount to rebus solving. The only people who may not use sounds in their thinking are people who have been deaf from they were born, and they can substitute sign language for the sounds - but once you have learnt your native language as a spoken language there is no way back.

I would like to quote a few lines from the story of Christina Hartmann, who learned to read through sign language:

I started learning sign language when I was six months old. Specifically, I learned Signed Exact English (SEE) which incorporates the same grammar and syntax as English with a heavy emphasis on fingerspelling. I'd sign a sentence like this (parentheses denote the signs): "(I) (see) (the) (baby) (cry)(ing)." Fingerspelling is forming individual letters with hand shapes. So, even before I learned how to read, I understood the concept of letters. In a way, I think that was an advantage when I started reading.
My mom read to me, just like many other children. She would sign with the book propped upon her lap and she'd give me some time to look at the book first and then start signing. I could shift my attention between her and the book. Actually, my mom told me that she had a great time and so did I.

It is obvious that writing doesn't have the same status as speech. Almost every human being can speak at least one language, but writing was invented late and still has to taught to children long after they have learned to speak. But the written language is constructed with the spoken language as its model, and much to the dismay of those who deny that written language is a language, a large part of the study of languages is based on written sources. And languages with alphabetical writing systems even mimic the way spoken languages are built with a phonetic and a phonemic layer, which represent respectively the actual sounds emitted and the logical units which are combined to form meaning bearing entities.

It is interesting that reading in the antiquity apparently mostly was a fairly loud affair, but a few select readers nevertheless learned to switch their overt verbalizations to silent muttering:

Augustine's description of Ambrose's silent reading (including the remark that he never read aloud) is the first definite instance recorded in Western literature. Earlier examples are far more uncertain. In the fifth century BC, two plays show characters reading on stage: in Euripides' Hippolytus, Theseus reads in silence a letter held by his dead wife; in Aristophanes' The Knights, Demosthenes looks at a writing-tablet sent by an oracle and, without saying out loud what it contains, seems taken aback by what he has read. According to Plutarch, Alexander the Great read a letter from his mother in silence in the fourth century BC, to the bewilderment of his soldiers.
(quoted from A.Manguel: "A History of Reading" ch.2)

Speed reading should in principle not be confused with skimming, where you deliberately skip irrelevant information. When done properly it does resemble extensive reading, which I described in the first part of this guide as follows: "The other kind of reading is the extensive reading. Here the goal is not to understand everything, but to acquire a kind of momentum while reading, and to get through as much genuine stuff as possible." In practice speed readers will skip those elements of a text which don't contribute to the general meaning of a text. The trick in speed reading is to decrease the number of fixation points for the eye and read more text at each fixation (this can be seen in a very graphical way here). A slow reader stops in many places and sometimes has to skip backwards to reread lost passages; a fast reader makes few and suitably spaced jumps and has trained his/her ability to pick up the relevant information at the first time a passage is read. Besides speed readers learn to minimize their subvocalizing, which takes speed reading a further step away from the spoken language.

So what role does true speed reading have in language learning? Well, if you read something to find something specific then you can of course do speed reading, but if your goal is to learn a language then it is just about the most silly thing you can do. The principle behind speed reading is to pick out just enough of the text to be able to piece together the meaning, - and you should certainly not start looking at the formal side of the text. In other words you skip exactly those elements in the text that are relevant for a language learner.

Reading for general content is another matter, and I have done my share of that. The most extreme case was probably when I was writing my final dissertation at the university about a grammatical topic. If I needed an illustration of a certain phenomenon and I didn't have a suitable example in my notes then I would look through book after book, turning the pages at a rate at about one page per 2-3 seconds, first looking at the right side pages, then the left side pages. But this was skimming, and I didn't even care about the content of the pages I ran through.

The fastest true 'pseudoreading' I have done happened while I studied literature and came unprepared for a lesson where I should have read a whole book (it happened fairly often as my interest in literature was waning already during my study years). In this situation you can actually zip through a few hundred pages of a standard paperback novel in 15-20 minutes, catching some of the plot, noting down some pages where there are things that probably will be discussed, getting a sense of the writing style in general and so on. This was actually enough to be able to participate in the discussion at a university level course, and paradoxically I still remember some of the content of books I have peroused in this way. But it is clearly not enough to really learn anything new, and certainly not to learn anything of the language because you already have to know it well to speed-read like that. Speed reading has its uses, but not in language learning.

So to summarize: you can learn many aspects of your languages from written sources, but the more you exclude speech the more your learning will result in some kind of rebus solving or construction, rather than the free and unhindered use of a living language.

For me there is one state of consciousness which is almost a necessity if I want to activate a language. I think of it as 'the buzz' (no Danish name, sorry). This state occurs when you get so much input that your head starts spinning. If you accept this chaotic state you can try to turn it in the direction of organized thinking, which is just one step ahead of speaking. I primarily achieve this with a combination of extensive reading and listening to comprehensible sources. The spoken sources are necessary because they represent the kind of information transport which is called "push" in the computer (and the marketing) world. A 'push' comes to you, not the other way round. In contrast reading is essentially a "pull" action. For bookworms hours of concentrated reading may function the same way, but you have to really caught by a book to feel that it bombards you. 'Normal' readers have to literally drag the information out of a written text - if the close their eyes or get distracted the stream of input stops. And therefore it is harder to achieve the 'buzz' I mentioned before. In contrast your ears are always open, and the problem is only to find something or something that delivers a steady stream of speech.

You may have seen references to something called the 'din', and it seems that Elizabeth Barber described this thing in 1980, followed by Krashen in 1981 (and 1983). It is supposed to be "an involuntary mental rehearsal of a language that occurs after we have had extensive comprehensible input in that language." The most comprehensive discussion I have seen so far (with references to later research) is in a book called "Inner Speech - L2" by Guerrero, which can be partially seen on Youbooks - but alas, it cost more than 100$ as an ebook, and I'm definitely not going to pay that much. It was briefly discussed on HTLAL in 2010, but the links in that discussion have died since then. There is another article by the same author at archive.org, but I haven't had time to read that one through. Some of Krashen's own reactions to research in the field can be found here.

I'm slightly uneasy with the claim that the din thing is an involuntary rehearsal (and maybe even a rehearsal strictly as part of a Krashen-like acquisition process). Inner speech has been discussed long before Barber and Krashen, and for me the rehearsal happens in its clearest form when you decide to do inner speech. The mumbling background I call buzz is only half voluntary - it is more like a stream of consciousness which has been set in motion by too much input and now runs mad inside your head. And a large proportion of it consists of borrowed elements from the language you have been bombarded with. However this turmoil would have to be consciously directed before I would accept to call it a rehearsal.

The snag is that without a lot of preparation none of those sources would be comprehensible at all - and intensive work primarily with written sources is my way to collect the information that makes them comprehensible.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 5:08 pm

2. Second part - How to learn words and expressions

2.1. How many words do you need to learn?

The main obstacle to reading and listening fluently is lack of vocabulary. For some people it may be difficult to remember words without contexts, but my own experience with wordlists have shown me that I can learn words much faster by using structured methods. By this I mean that it is not enough just to read a long list of words with translation and maybe repeating each combination fifty times. You can use different methods, but writing the lists in groups of 5-7 words and memorizing them as a group, followed by a control in both directions, does the trick for me, and then afterwards I can use the same methods that I would use on passive words to make the words stick.

Extensive reading and listening is of course necessary to get the nuances, construction possibilities and idiomatic uses, but all that is much easier when you already have a nodding acquaintance with each word.

The funny thing is that for once there is a thing in language learning which is measurable: I originally started my concentrated work on dictionaries and word lists because I wanted to assess my passive vocabulary in Romanian. But then I discovered that my vocabulary thundered upwards just by working with a dictionary. Later I experimented with techniques to learn new words, and this resulted in the wordlist technique which I'll describe below.

10000 active, 20000 passive words would be a good estimate of where you are leaving basic fluency and moving towards advanced fluency, but of course only in conjunction with a firm grasp on grammar and idiomatics, plus easy active use of the language in question (at least if you want to claim active fluency, not only passive fluency). And to get there dictionaries, word lists and flashcards are not enough, - you have to meet real living language (and produce it yourself).

Not to discourage anybody, but try to look up every unknown word from some ordinary text in a language you don't already know too well. How many did you find? My own experience with scientific magazines in several languages that is maybe half the words I didn't know aren't included in the my dictionaries either - which proves that even 20-30.000 words aren't enough. And here I'm not talking about specific scientific terms, because they often are international and therefore not among those I had to look up. So a daily word intake of at least 100 words is not only possible, but it really is what you MUST aim for if you want to learn a language within a reasonable time - and this will of course be more difficult if it isn't closely related to something you already know.

There has of course been a lot of research on vocabulary sizes and word frequencies. One of the most important discoveries has been that a small number of words cover most of any standard text, whereas the vast majority of words are so rare that the may occur only a few times in a corpus containing millions of words from thousands of text clips. The curves below are based on the Kilgariff corpus where the standard unit is words, but 'clipped' words and numbers are included.

011-Kilgariff-frequencies,curve.jpg (56.96 KiB) Viewed 955 times

As you can see from the blue curve the most common word in English is "the" with almost 8%, but from there the coverage of each word falls very quickly. You will notice that "be", "is", "was" and "are" are counted separately. You could argue that they belong to the same verb, but in this case there are four separate roots represented, and it is fair enough to count them as separate items. However later in the list you will find "house" and "houses" as separate items even though they not only belong to the same paradigm, but also share one single root. Many researchers therefore prefer statistics based on so-called word families, where not only "house" and "houses", but also "housing" are counted as one unit. This will of course cut down on the total number of unique items (and the effect will be even more drastic in languages with more morphology than English), but the basic shape of the curve will not change.

My own word counts are mostly based on dictionaries, so for me the natural unit is the 'headword' as you'll find it in any standard dictionary. Here "house" and "houses" are counted as one unit and "housing" as another. And even "be", "is", "was" and "are" counted as one unit because they are seen as forms as just one headword, in this case a verb. It is clear that there will be a lot of fairly arbitrary decisions to be made if you want to count word families, but even the notion of 'headword' somewhat fuzzy unless you refer to one specific dictionary. For instance "being" is a verbal form, but it can also be a substantive, and then most dictionaries would treat it as an independent headword. You can choose to follow a certain dictionary blindly, but that just means that you leave it to the lexicographer team to make those arbitrary decisions.

The red curve above is accumulative in the sense that it shows how much of the word stock in a corpus is covered by a certain number of words, positioned after their frequency. A number of researchers have asked themselves how much coverage you need to understand a text, and the numbers quoted are generally quite high. One of the front figures in this research has been Paul Nation. In a paper written with Marcella Hu Hsueh-Chao, " Unknown Vocabulary Density and Reading Comprehension" the readers are first reminded that 1% unknown words roughly will mean one unknown word per 10 lines in an average book. How many unknown words can you cope with and still claim to understand the text? The authors tested this by presenting a group a of native Anglophones (all fluent readers) with a story in which a certain percentage of the words were replaced by nonsense words, starting with the words with the lowest frequency. And their comprehension was tested by asking them 14 multiple choice questions. 12 correct answers would be accepted as full understanding. A few got all questions right, but have a look at this graphic:

015-Hu-Nation-UnknownVocabularyDensity_b.jpg (15.68 KiB) Viewed 955 times

1 On average, learners' comprehension scores increase to a predictable degree as the coverage of known words increases.
2 No learners reading the 80% coverage version of the text gained adequate comprehension. All learners in this group gained uniformly low comprehension scores.
3 The range of scores of learners in the 95% and 90% coverage groups was wide.
4 It was possible for some learners in the 90% group and a few more in the 95% group to gain adequate or close to adequate comprehension, but the majority of learners did not.

(..) It seems that around 98% coverage may be needed for most learners to gain adequate comprehension. 98% coverage would have yielded an average score of 11.53 on the multiple choice test and a score of 70.82 on the cued written recall test

There are definitely some people who are able to get the 'gist' of a text based on less than an 80% coverage. But this doesn't mean that they would be able to answer all 14 questions of Hu and Nation. They may be better to guess, better to organize scattered fragments to a general meaning and better to deal with uncertainties than your average Joe, but they still have to get their facts from somewhere. So in spite of the feat accomplished by 'master guessers' research reports like this one still support the thesis that you need an almost complete coverage to make a text comprehensible. And that means learning a lot of words.

But which words? If you look at the curves above then it becomes clear that there basically are two kinds of words: those that are so common that you have to learn them - but also so common that you will meet them in almost any text. And the rest of the words are so rare that you might consider doing something special in order to remember them - it may take a long before you see them again. And then there are of course the very rare and maybe even outdated words which you don't really have to learn at all unless you aspire to do crosswords in other languages, but knowing at least those rare words which belong to semantic spheres within your field of interest will add to your feeling of knowing a foreign language really well. If you see a word that appeals to you then by all means learn it - even if it is rare and precious.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 5:28 pm

2.2. Wordcounts and active/passive vocabulary

In 2009 and again in 2014 I did some research based on my own "Multiconfused log" at HTLAL, which grew to monstrous proportions from I started it in 2008 until I stopped adding to it in October 2015 - all in all close to 4000 messages in a number of languages, but first and foremost in English. The numbers below only refer to English passages wriitten by me, excluding quotes from other HTLAL messages:

a2.jpg (24.53 KiB) Viewed 954 times

As you can see, I collected one corpus of roughly 15000 English words in 2009 and two corpora in 2014, each with around 36000 words (or rather word forms). Wordforms are shown along the x axis. In both rounds I reduced the original corpora to unique wordforms in a spreadsheet, and these I reduced manually to headwords and got the results you see above along the y axis (there are more details about the methods in a message in my HTLAL log from May 10, 2014).

When I made my first analysis in 2009 it seemed that the conclusion was that you could survive on a fairly small core vocabulary - after all I had written about a bewildering lot of themes in the three months period covered by the sample. But looking at the diagram above it seems that the size of the vocabulary used is growing as a linear function of the sample size. At some point there must be some kind of saturation point where the curve starts to level out, but we are not near that with just 6000 unique headwords. However the most interesting thing is that each of the two corpora from 2014 only shares roughly half its words with the other sample. Or in other words: roughly a third of all the words are common to the two samples, and two thirds of the words are found in just one of the samples. Among the 2000 or so common words you will definitely find the high frequency words of English, which include the 'grammar words' and - given the context- words that have something to do with language learning. But there will definitely also be some words which I use often, but other authors only would use once in a blue moon.

If you look back at the cumulative curve in the preceding chapter, you will see that the 98% coverage recommended by Hu and Nation corresponds to almost 7000 wordforms (more precisely: 6718 wordforms according to my statistics). It is anyone's guess how much that would be in word families, but given that English substantives have singular and plural (and a genitive), and that most verbs have three finite forms and some infinite ones, the number of word families in the Kilgariff sample of some 10 mio. words might hover around 3-4.000 word families or 5-6.000 headwords - just my guess, but with some justification in the results I got during my vocabulary analysis in 2014. Paul Nation quotes some estimates of the vocabulary needed for "unassisted comprehension of written and spoken English" in his article "How Large a Vocabulary Is Needed For Reading and Listening?":

a3.jpg (32.92 KiB) Viewed 954 times

But as my own results with data from my log showed, the overlap between the vocabulary in one book or magazine and another, even by the same author, may be fairly low. This means that you can't just read the tables above and conclude that 9000 words is enough to read all English novels - you'll definitely need more words to do that. And that may come as a shock to learners that the numbers are so high. Luckily there are some caveats. First: do you really need to know each and every word in a book to enjoy it? Every plant, every flower, every part of a medieval fortress? Probably not, and if you see an unknown word you can try to look it up. No need to panic. Secondly: these numbers all refer to passive vocabulary, but even native speakers won't be able to remember all their words at the turn of a hat. In many cases it is enough that you can recognize the words when you see them.

I doubt that there is any scientifically sound method to evaluate the size of a person's active vocabulary, but my own unscientific gut feeling is that I could see myself using most of the words I know in Danish and a fair share of those I know in English - heaven knows how much, but let's just say two thirds. And from there it goes downwards to the languages which I can read, but hardly speak because I haven't spent enough time and effort on using them actively. An added difficulty is that the context in which you are expected to produce a certain word has a lot to say about how difficult or easy it is. For instance I remember a lot more words in Spanish when I'm in the country and hear the language around me all the time than I do at home, where I don't think or speak or write in Spanish all the time. This effect hits the active skills - including your ability to produce suitable words in a concrete context - much harder than it hits the passive skills like reading and listening.

So I can't give reliable figures for the size of my active vocabulary, but I have made a lot of assessments concerning my passive vocabulary in different languages, and I do these word counts using a dictionary. There are alternatives for some languages on the internet, where your vocabulary size is measured using methods that build on the assumption that people with large vocabularies know a lot of extremely rare and learned words. But if you have learnt a bit of Latin this may lead to exaggerated results. With badly constructed tests you may even get higher scores by making more or less wild guesses, but the better ones incorporate non-existing words to catch such attempts to cheat the system. The Testyourvocab.com homepage has not only got an online test, but also a blog with some relevant hints both as to the results as to the methodology behind the test. Some of their results look as follows:

Most adult native test-takers range from 20,000–35,000 words
Average native test-takers of age 8 already know 10,000 words
Average native test-takers of age 4 already know 5,000 words
Adult native test-takers learn almost 1 new word a day until middle age
Adult test-taker vocabulary growth basically stops at middle age

The most common vocabulary size for foreign test-takers is 4,500 words
Foreign test-takers tend to reach over 10,000 words by living abroad
Foreign test-takers learn 2.5 new words a day while living in an English-speaking country

The most shocking figure here is the extremely low score for foreign test takers living at home - 4500 words is a dismal score by any standard. And it is even more shocking when you consider that people who spend time taking vocabulary tests on the internet supposedly are more interested in languages than your average Joe. A complete distribution curve for foreign learners can be seen at the testyourvocab site:

a4.jpg (23.6 KiB) Viewed 954 times

Here it strikes you that the distribution isn't bellshaped - most learners end up at the bottom of the scale, but the curve also shows that the sky is the limits for the 'good' learners - whatever that is.

The same source also have some extremely interesting information about the role of reading fiction, cfr. the graphic below. Other sources suggest that reading quality non-fiction texts is less efficient, but my own gut feeling is that the language in science mags isn't too different from the one you find in ordinary mainstream novels. It would however be relevant to see whether reading books on paper generally gives more than reading messages and blogs on the internet, but unfortunately I haven't seen any direct comparisons between users of these two communication channels.

a5.jpg (25.77 KiB) Viewed 954 times

I have done a number of online tests, mostly for English. With Testyourvocab I got 40.900 words, while another test put 51.000 words on my scoreboard, and the test at Plenilune resulted in estimates of 77717 known, 16973 inferred and 8933 familiar words. However my own dictionary based estimates have so far ended up somewhere between 30.000 and 40.000 words, and that's according to a measuring technique which I can apply to other languages.

The method used with dictionaries is based on choosing random pages in a dictionary and dividing all the headwords on each page into known, unknown and 'dubious', marked by three colours (like blue for OK, green for the middle group and red for the unknown or wrongly translated words. In my earliest tests I didn't have this middle category, but it is nice to have a place to put words which you think you just guessed correctly, or which you definitely could translate if need be, but which deviate in some way from what you would have expected - for instance in the spelling or by having another end vowel. When you have done this for a suitable number of pages, you can divide the figures by the number of pages you have used and multiply by the total number of pages in the dictionary. And by adding the three categories you can also estimate the total number of words in the dictionary, which is necessary for calculating percentages.

Now you will of course be aware that the translations in the dictionary are visible while you look at the headwords, but I have tested the damage done by this by taking the papers with all the words in the three colors I mentioned. If I couldn't give a translation for all the 'known' words on the paper I would have cheated myself. But in practice it seems that the identifications are reasonably trustworthy. I may have problems with a few words, but this is compensated by the fact that I now know some of the previously unknown words. This isn't rock hard science, but it functions.

Another problem: you can obviously not compare a wordcount from a mini dictionary with 5.000 words with one made on a monster with 100.000 words. But I have done so many tests now that I can say with some confidence that the percentages of known and unknown words don't depend much on the size of the dictionary - except with extremely large dictionaries, which are full of rare and arcane and maybe even dialectal words which you don't have to care about. Let's take an example - my Spanish word counts:

a6.jpg (25.78 KiB) Viewed 954 times

As you can see my estimated passive vocabulary fluctuates between 17000 and 34400 , where the highest number by far came by using the monster dictionary of Bratli. The percentages fluctuate far less, but you can see that even the 17% with Bratli results in a higher score than the counts based on midsized dictionaries. Whit even smaller dictionaries it is evident that you can't get high absolute scores, but the percentages still yield some relevant information. And as an added bonus you can see how much influence it has to land on some pages rather than others just by looking at the estimated total number of words for a given dictionary. So ideally you should count not 5 or 10 pages but maybe a 100 pages to get a reliable results, but it would take forever, and besides there are other, mostly subjective factors which can push the scores up or down - like being in a hurry or having spent a lot of time on the language in question lately.

So what fun is it to make estimates of your vocabulary size in different languages - except that I like to count things? Well, to be totally honest: this kind of activity is something few learners would like to spend their precious time on, and there is absolutely no reason they should. If your vocabulary is too restricted you will feel it every time to try to read a book - and that's all you need to know. Maybe I just did it to satisfy a deeply buried remnant of my scientific mind.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 5:55 pm

2.3. Learning words from context

At HTLAL and Language Learners we have had many discussions about vocabulary learning, and a lot of members declare that they don't use formal methods like Anki or wordlists at all to learn vocabulary - they get it all from things they read or listen to. For me that borders on miraculous - unless you already are so advanced that everything is comprehensible to you. How did that happen?

a7.jpg (80.21 KiB) Viewed 953 times

The table above that summarizes a number of studies of 'incidental learning'' (quoted in "Second Language Reading and Incidental Vocabulary Learning" by Waring and Nation, but adapted from Waring & Takaki: "Reading in foreign languages"). The scary part of it is the right hand column, which demonstrates how low the gain in vocabulary has been from that kind of learning in a number of studies.

And there is worse to come: with one exception all the studies in the table have used multiple-choice tests to check whether people had learned something during the experiments - and multiple-choice tests are not the hardest in the world. Actually they only test recognition of foreign vocabulary. I would like to present another report, where the test persons were tested on reading, listening and a combination - and they were checked with both multiple-choice (MC) and a translation test, which gives us a chance to compare the scores on both test types: (Brown, Waring and Donkaewbua: "Incidental vocabulary acquisition from reading, reading-while-listening, and listening to stories":

a8.jpg (62.22 KiB) Viewed 953 times

As you can see people got reasonably high scores on the multiple-choice test, but when they were asked what the test words MEANT the illusion was broken: they had barely learnt anything. Especially from listening alone the results were pathetic. If we can take this as a general tendency then the already low scores in the earlier results becomes even more depressing. Incidental learning seems simply to be an illusion.

How come then that a number of successful language learner claim that they learn a lot of words from their reading and listening? Quite simply: they don't read or listen as part of an experiment, they do it because they like to read and listen, and they have supposedly chosen things they were interested in. If you are mad about orcs and wizards and dragons and oliphaunts and hobbits and read Tolkien & co. day in and day out then you will inevitably learn tons of names and other weird terms, but unless an unknown word is essential for the plot you aren't likely to leave Frodo & co. battle alone just because you have to consult your dictionary - you just skip that particular word and hope that you get the explication later in the text. But you can skip a word thirty times without learning it. In the other hand, if you decide that the word "phial" is essential for the action (which it actually is in the chapter called "Shelob's lair") OR if you are a budding polyglot with an insatiable lust for new words then you may take the long walk to your book shelf (or use Google), and THEN you may learn it. However most readers will just take a wellmeant guess and read on. These readers may never really learn what a phial is, but maybe they'll learn enough about the word to see them through a multiple score test.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 6:24 pm

2.4. Context and associations

So you decide that a certain word is worth reading and somehow you find out how what it means. What now? At your disposal are a number of memorization techniques.

Many of the types of associations that I mention below are not easily available to a total newbee: etymology, similar words in the target language, quotes from publicity and TV series etc. As a general rule a newbee will have to base his/her 'memory hooks' primarily on things outside the target language, - but this phase will of course be shorter with an easier language (which typically is a language that closely resembles one you already know).

First problem: how do you as a beginner discover the meaning of a word or word combination? Well, somebody can point to a fourlegged animal and say "cheval". That's what French parents do, and after 100 repetitions (with different kinds of horses) their child has constructed a mental image of what a "cheval" is. But if I'm learning French as an adult chances are that I will see the word in a textbook with a wordlist, and if not, then I'll look the word up in a dictionary and get the information that a "cheval" is a 'horse' (or in Danish: 'en hest'). And then I take over the Danish concept "hest" and apply it on the French word "cheval". And that's a liability, but it is nevertheless the easiest way to grasp the meaning of "chavel" fast.

When you already have learnt some very simple words you can use them to learn less obvious words, as "ici" in "cheval ici". This is however not just a long-term training of reflexes as postulated by the behaviourists, but rather an active construction process which you feed with chunks of language, almost like you feed a child with cheeseburgers and carrots and chocolate.

When speaking about associations, which you can use as 'memory hooks' the simple imagery method is the one that is closest to the way the word "cheval" was taught above. And one of the simple things you can do as a language learner is to imagine the things you learn. So while memorizing the word "cheval" then draw mental images of big horses, tiny horses and famous horses in your mind. With verbs like "fight" or "eat" you can imagine small videoclips showing mighty battles or heavy eating, etc. The only catch is that some persons seem to have problems making up those mental images. Mental imaging is also possible if you make wordlists from dictionaries, and it might take out some of the potential dreariness of that method. When you use dictionaries a simple trick is to 'see' the spelled word at the same time you imagine (or actually say) how it sounds.

Does it help? Well, some learners benefit more from imagery than others, but there is a catch. I once read about an experiment which was supposed to test the claim that there are visual learners, and that these learn better if they use pictures during the memorization process. The researchers therefore told a test group to memorize words while they showed them pictures. A control group wasn't supplied with pictures, and its members learned words at least as efficiently - maybe even better. So the conclusion was that visual learners either don't exist or that they didn't learn better with imagery than without.

The problem is that the poor test kids were expected to learn not one, but two separate things (a word and a picture), whereas the test group only had to learn one thing (a word). To function properly a mental image must be fused with the word you try to learn so that they form one single item. These uses of imagery resemble, but aren't not quite the same thing as the use of puns to remember words and expressions. A pun is not an image of the thing itself, but a joke based on its written or spoken form.

HTLAL member Fanatic once gave an example of this method:

"Cochon is pig in English. How do we remember that? Cochon sounds like cushion. We join the sound alike, cushion, to the meaning, pig. I picture having small pigs on my lounge instead of cushions and I tell my visitors, pull up a pig and take a seat. That reminds me of the meaning of cochon. It tells me that cochon is French for pig."

As you see it is the outer shape of the word "cochon" that is used for the pun, which then is visualized in the hope that the image will stay in your mind forever. I do believe that many people can profit from this method, but personally I find it somewhat disruptive and cumbersome to have to invent those irrelevant puns while studying a list of words. For me simple imagery has the same effect, but it is worth trying both methods.

a9.jpg (15.21 KiB) Viewed 947 times

Once upon a time at HTLAL someone wanted to remember the German word "kläffen" which means 'to bark'. If you want to remember this then you should imagine a barking dog which says the word, not a barking dog which says "bow wow" and a word "kläffen" somewhere else. I have tried to illustrate this in the painting above, but your complex visualizations (if possible with sound, smell and texture) should be forged by YOU and not by me if they are to function as intended.

There are other memory hooks which aren't based on imagery: a word may remind you of other words in the target language or some other language, or you may remember it because you associate it with a situation, like seeing it used in a specific book.

For the advanced language learner memory tags that are based on target-language information will become ever more important as you add more and more background knowledge. The most obvious application of this is of course the formation of word families (cfr the chapter on word counts), but also 'families' based on affixes. Like when you already know twenty words with the prefix "ter-" in Indonesian and just have to remember that such a derivation exists for a new root - you don't have to remember the actual word.

With this in mind it is clear that wordlists function best for learners who already are at least intermediate and maybe more. One reason is of course that the words you choose aren't really unknown - you may have seen them many times, but didn't remember them. But the main reason is that a given word or expression may remind you of other target language words and expression - for instance through shared prefixes, etymology or similar meaning.

This is not very different from memorizing a word with its translation(s) as seen in a dictionary. Higher skill level means that you can avoid references to other languages, but sometimes it pays to keep those references to other languages - like for instance when you have to memorize the gender of French nouns. If you know the corresponding Italian or Spanish words then the difference between -o (masculine) and -a (feminine) normally tell you the correct gender even in French. This is an example of an intralinguistic association, but more often you use these to remember - or infer - the meaning of a word. For instance a "telpon" in Bahasa Indonesia is a telephone - the similarity is clear. One you have learnt this word you can use it to learn other words, like "menelpon" - 'call somebody'. The "men-" is a prefix which has a tendency to change or eat the first sound in the root that follows - here the t- o "telpon" disappears and leave "-elpon" as part of the new compound.

In other cases you only exploit a similarity in sound, as with Fanatic's "cochon" above. It is generally not necessary to find phrases that sound like the whole of the word or word combination you want to memorize - it is just as efficient to mimic the beginning of it. But once you have found a suitable 'memory hook', it's your job to create the kind of fusion between that thing and your problem word. Memory artists have typically memorized long series of elements, and they then forge associations between these predetermined elements and the words or things they want to remember. And in this way they can remember not only a lot of elements, but also remember them in the right order. As already mentioned the 'Memory Palace' technique is an application of this. But the order is generally irrelevant for language learning, and as far as I can judge from myself and testimony from other learners these techniques aren't used much. Instead we use isolated and unsystematic associations, either of the intralinguistic kind or of the sound-based kind.

And then what about situational associations? It has somehow become popular to claim that vocabulary can't be learned without a context. I admit that a relevant context actually can help you to forge the kind of syncretistic complex I just described, but it is an exaggeration to claim that you ONLY can learn vocabulary with a context. I would like to refer you to a classic mass myth debunking done by the Dutchman Mondria in the article "Myths about vocabulary acquisition". He mentions seven myths, and for pedagogical reasons let's have a look at the whole list, even though it is the question of context which is under scrutiny here:

Myth 1: “Knowing a relatively small number of words takes you far.”
Myth 2: “Word lists are of limited value. ”
Myth 3: “Presenting words in semantic sets facilitates learning.”
Myth 4: “Words should always be learned in context.”
Myth 5: “Words whose meanings have been inferred from context are retained better.”
Myth 6: “Words learned productively are retained better.”
Myth 7: “Vocabulary knowledge should not be tested separately.”

About myth no. 4 he writes:

For example, someone can learn the French word 'canne' with the help of the sentence 'Le vieil homme marche à l’aide d’une canne.' (...)However, there are two caveats to this ‘rule of thumb’. First, many (concrete) words can be learned efficiently without context. Presenting such words without a context – for example when a learner asks for them – can be a practical method that prevents the teacher from having to invent an interesting or useful context, (...)Second, and this is actually the main point, learning a word in a particular context may result in a learner knowing the word only in that context,.

About myth no. 5 he writes:
In order to investigate whether inferring is an effective learning strategy, I carried out a learning experiment with Dutch pupils in secondary education (Mondria 1996, 2003). They had to learn French words (French-Dutch) with the aid of four different learning methods: (...)The learning effect of inferring per se is rather limited: after two weeks, only 6% of the inferred word meanings were remembered. The addition of a verifying stage led to an extra retention of 9%. However, it is only when the word meanings are intentionally memorized that the learning effect becomes substantial, as shown by the retention figures of the meaning-inferred method (47%) and the meaning-given method (50%).

a10.jpg (17.75 KiB) Viewed 947 times

So the idea that you should avoid dictionaries is obviously wrong. On the contrary: if you are in doubt about the meaning of a word then get the problem solved as fast as possible.

Mondria's fear that a context may be too strongly attached to a given word is less well-founded. It is definitely a good idea to use short snippets of text with just enough information as tools to remember things the gender of substantives or the construction possibilities of verbs. For instance you can memorize German substantives whose gender isn't obvious with an article and maybe an adjective, and I can't see why that particular article or adjective should become inseparable from the substantive. For instance it is safe to memorize that that "Zeit" (time) is femininum by memorizing the name of the newspaper "Die Zeit" - the article won't cling to the word in situations where you don't need it. But there is no need to do the same with "Stunde" (hour), because the final -e is a clear marker that the word is feminine. It is enough the do memory tricks to remember the words that don't follow a simple rule of thumb.

Longer sentences are generally irrelevant for this purpose. If your goal is to remember that it is "più di" in Italian then just memorize a passage with 3-4 words. Trying to memorize it as part as a long complicated quote from some famous author will just remove the focus from the essential information.

In some memory systems you try to establish associations that follow a previously memorized pattern, and this can be used for fast memorization of long rows of data, - for instance packs of cards or numbers. This is the idea behond the socalled memory palaces, where you visualize a series of things you want to remember with places in a building or street you know well. To aid you to remember less visually impressive information there are systems to help you - for instance some memory artists have systems to convert numbers into images, which they then can use in making associations. But at the end of the day I'm sceptical about systems based on prelearned structures when it comes to ordinary language learning. Apart from numerals the order of the words you need to learn is completely irrelevant, and you don't have to remember each and every word in a random collection - better learn a number of words or expressions outside the collection than worrying about those that were in the original selection. If you do want to use such techniques and want some suggestions about ways to do it you might have a look at Metivier's 'magnetic method' and the memory palaces of Matteo Ricci.

So what do you really need? In the first place most people should train the ability to form loose associations on the fly from the sound of words - or even of parts of them. It is better to make an association based on just the beginning of a word if you can do it fast than it is to discover a really sneaky association several minutes later. The other thing which should be trained is actually using those associations during memorization instead of mere repetition. If you make it a habit of yours to make associations then your mind will automatically create more and better associations.

Besides the types already mentioned there is a whole world of situational associations: you can remember a word because you didn't know what to say in a certain situation, or you may remember it because some special person used it, or you saw it is a specific text . But there are people who are much more dependent on human interaction during their language learning - and therefore probably also better to utilize it to provide situational memory hooks. I do know that I remember words better when I use them, but that's about as close as I get to use that kind of hooks.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 7:42 pm

2.5. Memorization through controlled repetition

For me the use of associations is one of the two main pillars of memorization. The other pillar is repetition. You can choose to rely on chance to produce multiple returns of rare words, but as we have seen are most words so rare that you can read thousands of pages without seeing them more than once. And if that's true for words, it is even more true for expressions.

But authors have sometimes pet words, and then you can meet the same rare word 10 times on a page. You can also choose books and films about subjects where certain types of words are more likely to occur, like culinary terms in cookery books. If you still don't feel that this is enough to give you the large vocabulary described in chapter 4.1 you can try one of the formalized vocabulary learning methods in this chapter. The common element in all these methods is that they in some way try to manipulate the repetitions of words or expressions in ways that are expected to lead to successful memorization. But they are not supposed also to activate the words you learn, that's not their job. The only way to activate words is actually to use your target language. And finding ways to do this can be even more challenging than finding ways to extend your passive vocabulary.

One traditional technique is the use of flashcards. Such cards are small pieces of paper which you keep in a small box. They can be two-sided or one-sided. Each two-sided card has a translation or other clue on the front and the corresponding foreign word or expression on the backside, sometimes with an example or other information. You take the first card inside the box and look at it. If you know the word you can put it aside, and if not you turn it and read the explanation, and after that the card is placed behind all other cards of the box. With one-sided cards you look at for front side and subjectively judge whether you know the word you see - which obviously is MUCH easier than providing the word on the back of a two-sided card. But in both systems the period between each exposure to the card depends both on the time you spend on the task and on the size of the 'deck'. This is a crude way of making certain that all cards are reviewed once in a while, but it was fairly popular before we got all those electronical gadgets.

As Michel Erard discovered while doing research for his book about hyperpolyglots ("Babel no more"): even the great Cardinal Mezzofanti used flashcards. But today we have other and more sophisticated ways to achieve that, insofar that Anki and other SRS systems can be seen as the digital successors to the venerable flashcard box. SRS means "spaced repetition system", and this refers to the way the time interval between the successive confrontations with a given card is manipulated. The theoretical foundation for this goes back to the German psychologist Hermann Ebbinghaus, who spent most of his life studying not how to remember, but how to forget - with himself as guinea pig. Wikipedia writes:

Ebbinghaus would memorize a list of items until perfect recall and then would not access the list until he could no longer recall any of its items. He then would relearn the list, and compare the new learning curve to the learning curve of his previous memorization of the list. The second list was generally memorized faster, and this difference between the two learning curves is what Ebbinghaus called "savings".

a11.jpg (8.69 KiB) Viewed 937 times

The graph (from Wikipedia) shows intervals of 1 day on the horizontal axis. Strictly speaking only the red curve represents the forgetting curve of Ebbinghaus. The green curves reflect a process where relearning is started regularly at 1 days intervals, and as you can see the forgetting process runs slower after each repetition. After a number of repetitions it lies almost flat, indicating that the items now for all practical purposes have been stored in your longterm memory. If you draw a horizontal line through the grey vertical lines then it will meet the curves at longer and longer intervals, and that is actually the point of the SRS methods: instead of presenting the material at regular intervals (for instance once per day) you present it again and again with longer and longer time between the repetitions. As if you bumped it each time your recall came below a certain treshold.

Now this explanation doesn't actually deal with single items - it is more like a statistical claim referring to a lot of similar items. But if you introduce the concept of graduations of recall then the idea also functions with single items. It is then not a question of recalling a word perfectly or not at all, but of recalling it more or less well.

The following glowing tribute to Supermemo actually comes from a competitor, namely the Anki manual:

The biggest developments in the last 30 years have come from the authors of SuperMemo, a commercial flashcard program that implements spaced repetition. SuperMemo pioneered the concept of a system that keeps track of the ideal time to review material and optimizes itself based on the performance of the user. (...) In SuperMemo’s spaced repetition system, every time you answer a question, you tell the program how well you were able to remember it – whether you forgot completely, made a small mistake, remembered with trouble, remembered easily, etc. The program uses this feedback to decide the optimal time to show you the question again. Since a memory gets stronger each time you successfully recall it, the time between reviews gets bigger and bigger – so you may see a question for the first time, then 3 days later, 15 days later, 45 days later, and so on.

I don't personally use SRS programs, but I can see that Anki has become the most popular SRS program among forum members, and it has the elements mentioned above, including the choice between different recall levels. The basic idea is that you feed the program with something akin to virtual flashcards, and the machine chooses when you see each individual card again by applying some kind of mathematical algorithm. If you indicate that you don't remember a word at all you will see it again soon, whereas if you claim to remember it really well the next repetition may be scheduled for once in a blue moon. Which is logical. You can make 'cards' with just a target language word and its translation, or you can make more elaborate 'backsides' with example sentences. You can even download complete sets made by others, but that is a problematic feature because you then don't have any influence on which words are in the set, and you didn't get the first round of exposure which normally would contribute to establishing a word in your memory. Another problem is that you may end up with so many cards that you can't possibly keep up with the scheduled review periods - and then reviewing may become a burden rather than an agreeable session with your favorite hobby. In that case the logical solution is to throw most of the cards out.

My own main reason for not using the system is that I don't like the concept, which is that the machine pokes a card into your face and demands an explanation - like if you were being interrogated by the KGB or you had got a temporary employment as assistant for the knife thrower in a circus. Actually that is also your situation when you read any arbitrary text, but there you can at least enjoy the context, and you may even have chosen the text yourself. The actual learning with Anki only occurs during the shock face after you have been confronted with a card and have to admit that you have forgotten that particular card. It may work, but I prefer wordlists.

OK, what about wordlists?

We have probably all used a standard text book with short texts and a wordlist in each lesson containing the new words (and sometimes expressions) in that text. What did you do to learn the words in that list? You had a context in the form of a text, but the goal was not to help you translate one specific text, but to teach you a word for later use. I went through many school years and on top of that several years at the university, and to the best of my knowledge I didn't receive any tips about list memorization, apart from suggested associations for specific words. But basically you would use all the memorization tricks mentioned in the preceding chapter, including crude repetition.

I have reserved the following chapter for my own wordlist setup, but there is one serious contender for a more sophisticated use of wordlists, namely David James' Goldlist system. David James may be more known under his pseudonyms Huliganov and Uncle Davey, but his main contribution to language teaching apart from the goldlists is an immense course in Russian in the form of videoclips on Youtube.

If you really want to learn more about goldlists you should go to the David James' own homepage, where the method is described in detail. Here you'll only get a rough sketch. Basically you choose 25 words and spend 20 minutes on writing them down with a translation and relevant grammatical details in a (quote) "beautiful hardback book" with ample free space. You should also read the list aloud. You can do several such sets in one day, but not more than 10 in a row. After at least 2 weeks, but less than 2 months (where you don't even touch the lists) you do a first distillation, i.e. you copy the list minus a third of the words to a new list in the same book. This is called a distillation. What about the words you leave out? Well, David James writes:

Note that we tend to lose and spend time looking for things which we intended to keep and often put in a special hiding place, but we rarely forget the things that we have thrown away or given away. We don’t usually think we still have them and look around for them. So the very conscious act of discarding tricks the subconscious memory, namely the long-term memory, into being sure it jolly well has got those discarded bits. So if in doubt, discard rather than merge, when distilling.

Or in other words, the fear of loosing the discarded words will make you remember them. You do this at least two times more, and each time you discard a 30% of the words. You can use a circular layout, starting in the uppermost left corner, so that all the lists are visible at once, but you are not supposed to refer back to the discarded words. And after the 3 distillations you can copy the few remaining words to a new book and include them in a new cycle.

Now the big question is: does it work? And as with other methods it may work for some persons and not for others.

Sadly I belong to the second category, and I think I know why. I made an experiment with four sets of 25 Russian words each, and as a control I made one of my own wordlists with 100 other Russian words plus a reference with 100 words which I just left untouched. The illustration to the right is one of my goldlists after the second distillation, but I also did a control round on all three systems after 1½ months. And it turned out that I remembered about two thirds of my 'own' words, but only a third of those from the four goldlists - almost exactly the same ratio as I got with the 100 words I hadn't seen for a more than a month.

a12.jpg (62.06 KiB) Viewed 937 times

I'm fairly sure this fiasco is a result of the first two directives in James' blog:

1. No reliance on mnemonics and no creation of strange methods to try and “visualise” words in contexts. No “think of a cat in a cot and you’ll remember that Polish for ‘cat’ is ‘cot’ “. – These are the ways by the way that course makers like Daniels gets phenomenal results over two weeks but they never last. Just as well, if they did, they would create a learner who, when he came to fluency, would not be able to say “kot” without thinking about a baby’s bed. Ridiculous. Oszustwo. Don’t let the oszusty deceive you by filling your shoes with the letter O at tea time.

2. No cramming, no learning against the clock. No learning for next week, or for tomorrow, or for a test, or for an exam. No conscious “memorizing”. The long-term memory is not a conscious function. Its samples are taken automatically and subconsciously out of the material which is run through the conscious.

When I tell my brain not to make associations it is tantamount to telling it to forget the stuff I'm looking at. And telling it not to memorize consciously just put the last nail into the coffin. I have thought about doing a set of goldlists where I consciously use mnemonics ('memory hooks') to remember memorize the words in the lists, but it wouldn't be the goldlist method then. However I'm about the worst test person in the world for an experiment like this because I have spent thousands of hours doing wordlists according to my own receipt, which is built on almost the opposite set of ideas. So I would just suggest that you try several techniques and see what works for YOU.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Thu Jan 28, 2016 8:23 pm

2.6. Three-column wordlists

In this chapter I'll limit myself to one wordlist system, namely the one I developed around January 2007 after I had discovered that even counting the Romanian words I knew using a dictionary had made a lot of old forgotten vocabulary come to life again (see the HTLAL thread "Super-fast learning techniques"). Then maybe a more systematic use of dictionaries could do something for my other languages I thought, and I did some hard thinking about memorizing which resulted in the present system, which also is presented at the learnanylanguage wikia, albeit in a slightly outdated form.

First of all you need a source for your wordlists, and I have two main sources for mine: intensively studied texts and dictionaries. There could be a third, namely words jotted down from extensively read texts and audio, but I simply don't get enough new words from these sources to warrant a special treatment so they just go into whatever list I'm working on.

When I work intensively with texts I often copy passages in them to folded A4 sheets (more handy than unfolded sheets), and I then reserve a right margin of about 5 cm for new words (more about this in chapter 4.4 , "Copying by hand"). Sometimes I just guess the meaning, sometimes it comes from the translation part in a bilingual printout and sometimes I look a word up. So the 'data quality' may be suspect, but in principle I check things properly when I do the wordlist so it doesn't matter. I write "in principle" because I have lately become somewhat uneasy about the time I waste fiddling around in all parts of a dictionary because the words I want to look up are spread all over the alphabet (or they be missing). So if I also am doing wordlists directly based on dictionaries around the same time then it will be more efficient to drop some of the dubious words from the text and learn five from a dictionary instead.

Warning: I do NOT recommend beginners do to wordlists based on dictionaries. In the absence of a textual context you have to rely on intralinguistic cues to form associations, and that's much easier when you already have some experience with the language in question.

With dictionaries I typically choose a random page and choose some, but not all of the words from that page to be used in a wordlist. In some cases I have initiated a campaign where I go through the whole dictionary from A to Z (or alpha to gamma, or whatever) - like I did with Serbian in 2014 in preparation for the polyglot conference at Novi Sad. And I can feel a very clear effect of such campaigns - I may have worked with a language on an unsystematic base for years, but the progress I experience after a campaign that gives me thousands of new words in a short time is in a totally different class. But such campaigns take time away from other language learning projects, so most of my wordlists use words from a few random pages, and then the next list will be in a totally different language.

So what is a wordlist according to my system? Well, there is a first phase, where the original three-column wordlist is written, and the format of this kind of wordlist has not changed much since I designed it in early 2007. And then there are the repetition rounds, where I have experimented a lot (more about that below).

So nowadays I fold a sheet of paper and divide it into three vertical sections, each of which is supposed to contain three columns. With my small handwriting I can fit around 30 words in each column, but people with larger handwritings may have to divide the sheet into just two sections, each with three columns. I fold the paper to make it more handy, but also to make the columns shorter and less intimidating. I have experimented with a layout where I made four sections: two for the main wordlist (with three internal columns each) and two for the first repetition round (with just two internal columns). I did this because I wanted to make certain that I remembered to do the repetitions - otherwise I might throw the original wordlist away by mistake before I had made a repetition round. But now I prefer keeping the original lists on one sheet and the repetitions on another, and I have learned to keep better track of my main wordlists which has solved the problem.

In the beginning I used triple columns, separated by solid lines. As you see in the sample above I use one colour for the columns in the target language (in this case Serbian) and another for the middle column in my base language (for me mostly Danish, but sometimes mixed with isolated words in other languages). But now I prefer using different colours to keep things nicely sorted out, and then I can drop the vertical lines, as illustrated below (target: Indonesian, base: Danish):

a13.jpg (28.62 KiB) Viewed 933 times

The three columns of a wordlist contain 1) words in the target language, 2) a translation, 3) the original word again. Very short word combinations can be integrated in the layout (maybe by doing a word wrap), but the layout is not suited to longer phrases. If you want to memorize an expression then give it its own line (or two if necessary). Some indications of morphology may be included, but only the barest minimum. For instance I sometimes write the characteristic consonant of the aorist with Greek verbs, and I sometimes indicate the gender of a German substantive if there are several possibilities. But adding too much information would make it harder to remember the word itself, and there isn't space for a long text. For the same reason only the central meanings should be quoted. It is tempting to add the whole lot, but your memory functions better if you start out focussing on just one or max. two core meanings.

By the way: you may have noticed that I recalled the first two items wrongly. Normally I would have corrected it when I made the first column visible again , but apparently I forgot to do so in this case. Which goes to show that you should check that the items your 3. column are correctly rendered if there is the slightest risk that there are mistakes - but don't make it a time consuming chore.

And finally the most important advice: learn the words in groups of 5 to 7 words. Why? because you don't learn much just by repeating a word in your head. (cheval horse, cheval horse... or even worse: cheval, cheval, cheval... horse) .Think of something else to break the repetition, and then return to the word - that's where long term learning occurs. Each sequence of words should be difficult and long enough to strain, but not tire out your memory. In the following example I try to learn Serbian words (the translations are in Danish because I'm Danish):

051-wordlists-how.jpg (34.65 KiB) Viewed 933 times

So first you write the first 5-7 foreign word in the left column, check that you can give a simple translation for each and every of these words, and THEN - not before - you write the translations in the middle column with another colour. Now you make certain that you can 'reconstruct' each one of the original words. You can test yourself by temporarily covering the left column, but only when you trust that you can come up with all the foreign words, you cover the left column and write the original words in the right column. After that you proceed to the next block of words.

If you do find that you have forgotten a word (or a translation) you can take a peek in the source or the left column, but then you shouldn't write the solution down immediately. Think about something else before you do it or take that word together with the next block. The whole point of the exercise is to force the word into some kind of long term storage by telling your brain to remove it from the immediate working memory - but not to put it so far away that it can't retrieve it.

And then relax a day or so, but not much longer. The first repetition should come while you still remember what you did to memorize each the word in the first place - associations, maybe thoughts running through your main while writing, maybe your reasons for choosing some words and leaving others aside. Everything counts.

I have been quite loyal to the three-column setup for the original wordlist since I got the idea in 2007, but I have tried many different methods to do the repetition - and as I mentioned above, for most of this time I have just done one repetition, but now I'm more open for doing two or more.

I have normally just done one repetition one day after the main event. The rationale behind this is that I found in 2007 that I remembered more or less the same number of words after round 3 or 4 as after round 2. Lately I have however come to doubt the wisdom of this, because I discovered during the analysis of my Serbian campaign that the words I remembered or had forgotten in each repetition round weren't the same ones. Somehow I ended up with around 20% forgotten words in every round, but less than half of these forgotten words were the same as in the preceding round. Since then I have done to or three repetitions, though not as systematically as I do the first one, which is absolutely crucial to the learning process.

The main repetition methods in logical order are:

1) The use of a SRS system like ANKI (see the comment below!)
2) Revisiting the source to check that you now know all the words on the list (obviously this only functions with lists based on texts)
3a) A simplified wordlist with just two columns, one with the translations, one with the original words.
3b) The same as 3a, but with the target language words in the first column.
3c) The same as 3b, i.e. copy the foreign words, but only add the translation if you are in doubt about the meaning of a word.

With 3c you can cover up the translation column in the original list to make this a genuine control of your results, but you can also make it easier by reading the original list through first. Sometimes learning is better than controlling.

Well, I haven't tried out possibility no. 1 (Anki or similar systems) , but it has been suggested by others, and it seems like a really good combination. The only problem is that I write all my wordlists by hand, and then it is not easy to transfer them to an electronic device (except maybe by scanning a list and chopping it up into suitable pieces). I write my lists by hand because I feel I remember things much better that way. Youngster who have grown up with a keyboard in their hands may have a closer relationship with it than I have.

Possibility 2 (checking the words in the source text)has one problem, namely that it is very easy to be complacent about the control aspect and just start rereading the original text again for fun.

Possibility 3a is the one I suggested first, and I even devised a page setup with the original wordlist and the first and only repetition on the same sheet. It is illustrated below with words from the illustration above:

a15_rep1.jpg (11.97 KiB) Viewed 933 times

I copy 5-7 translations from the original wordlist, and now my task is to reconstruct the original words. This is a fairly hard task because there may be more than one translation of a Danish (or English) word, but you can get some help from the fact that most of the foreign words belong to the same part of the alphabet.

Fairly recently it occurred to me that this setup also would be practical in those cases where I go through a whole dictionary from A to Z (or at least a large portion of it). I visited Spain around New year 2014-15, and here I spent some hours in the evening doing Spanish wordlists based on a tiny dictionary. As Spanish is also a language I know fairly well I dropped the usual three-column format and wrote 5-7 translations in column 1 (as in pattern 3c) , and then I reconstructed the original Spanish words from the dictionary in column 2 without looking. Because this saves a lot of time I managed to do the obligatory repetition round, and for that purpose I used format 3c.

Warning: During a holiday on Cuba two years ago I did a similar exercise with a small Greek dictionary. However there I used the usual set of three columns, but skipped the repetition round. And that was a bad idea. My Greek reading skills improved considerably, but the words I learned during this campaign have seeped away. The repetition round is essential.

In pattern 3b you copy the original words instead of the translations, and this is less strenuous, but probably also less efficient. However it can be used if you go through large numbers of words during one session.

Repetition pattern 3c (the one where you copy the foreign words, but only add the translation with unknown or dubious words) was not new to me, but I first used it systematically during my Serbian vocabulary campaign in 2014, where I also wanted to produce some statistical results. I realized that I had to find a way to show in an unequivocal way whether I remembered the original words in the wordlist or not, so I decided just to list all the foreign words and then add a translation in those cases where I couldn't remember the meaning of a word or only remembered it faintly.

During the 'Serbian campaign' in 2014 I did not one, but two or sometimes even three repetition rounds, and there I used a dot system to indicate that I still didn't remember a certain word even after doing the first repetition. So by inference green indicates a word which I remembered at the first control, but not at no. 2. So in the illustration below I would have forgotten the words with a red translation in repetition round 1, and those with a green dot or a green translation in repetition round 2.

a15_rep3c.jpg (8.53 KiB) Viewed 931 times

With this system it became quite easy to calculate some statistics. The results aren't supposed to be scientific - I was both the test leader and the lone test person, and my assessments of how well I remembered the Serbian words were primarily based on my gut feeling (though I did consult with the original list in doubtful cases). And there is one error more in the test design: I had the translations written on the same sheet as the Serbian words so I can't claim I didn't see them (although the use of different colours makes it easier to avoid it), In spite of these misgivings I found one interesting fact, namely that my forgotten words in round 1, 2 and 3 weren't the same ones - check out the figures below for overlap! The 'loss' in from round 1 to round 2 and 3 goes down, which indicates that some learning took place during, primarily during the first repetition round, but also during round 2 (to compensate for the new lost words). The red figures below refer to all the memorized words, the blue ones are the forgotten or dubious ones.

(PS I used both a Serbian-English and a Serbian-Italian dictionary for the first two letters).
055_wordlist_complete.jpg (128.81 KiB) Viewed 933 times
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Fri Jan 29, 2016 9:59 am

2.7. Bi- or monolingual dictionaries

For the wordlists I always use bilingual dictionaries, and always in the direction from a target language into a base language. I have experimented with the opposite direction, but it didn't function nearly as well - partly because you lose the nicely orderly alphabetical order, which functions as a memorizing aid in itself, partly because the focus shifts from the target language to the base language.

Some language teachers are adamant that you should avoid using your own language in any way during your studies. But there is no justification for this extreme standpoint, at least not as a general rule. I have earlier discussed the distinction between intensive and extensive studies. If you want to understand all details in a short text or speech sample you will be studying it intensively. If you deal with long texts, films or video, but are willing to skip some details, you will be working extensively. In this last situation your main purpose is to train your ability to keep the momentum - and in this case large amounts of babble in your own language would almost certainly break your concentration. However the length is not decisive here: if you expect speech in your target language and get something in your own language instead (or another base language) then you have been served stones for bread. But this doesn't apply to intensive study: here you need first and foremost concise and precise information, and then it is less important what language you get it in as long as you can understand it to the last iota. And a monolingual dictionary which rely on circumlocutions and verbose explanations can't be as concise and precise as a bilingual dictionary. If it was then it would be an encyclopedia - and much thicker.

But even I own a number of monolingual dictionaries. In Danish I have got two, Nudansk ordbog and Retskrivningsordbogen, plus a few special dictionaries like the etymology dictionary from Politiken. I have also monolingual dictionaries in other languages, like English, German, French, Portuguese and Italian - well, even an old Romanian Academy dictionary from Ceausescu's days. This last one has one thing none of my other dictionaries have, namely information about the use of affix or no affix for Romanian verbs. This is an abolsutely essential piece of information for learners, but it is missing even from my otherwise excellent Teora dictionary. The common thing for all these monolingual dictionaries is that I mayuse them to learn more about words I already know. But they are definitely not the ones I grab first if I run into an unknown word.

The lack of essential information in dictionaries has often something to do with the intended user profile. Many of my Slavic dictionaries 'forget' to mark perfective vs. imperfective verbs. Why? Because native speakers don't need that information. These dictionaries were written for Slavic native speakers who wants to understand foreign languages, not for somebody like me who want to learn a couple of Slavic languages.

The same problem arises with dictionaries from your own language into a target language. It is more the rule than the exception that dictionaries from something to something else lack morphological information about the latter, the 'to' language. The morphological information is only given for the headwords. Which is rather strange because you might think that this information would be quite useful for someone who looks a word up in a dictionary in order to find a useful translation. Some lexicographers seem to think that you don't intend to use the translations in their works for your own writings, but only to read old texts or do the exercises in dusty old Textbooks!

Extinct or near-extinct languages is a special case. You can easily find good dictionaries from Latin to other languages, but it seems that most dictionaries from these languages into Latin have been constructed by turning the direction in a Latin-something dictionary - i.e. taking all the small labels with Latin words and their translations and sort them in alphabetical order after the translations without even caring about the possibility that someone might want to say something in Latin.

I normally cling to my dictionaries, but I became so disgusted with my Gyldendal Danish-Latin dictionary that I threw it away. It simply NEVER had the words I wanted to look up. I have kept an older dictionary (Ove Kjær) in an edition from 1979, but mostly for nostalgic reasons. If I really want to find a Latin word for something from my own time I have one favorite source: "The New College" Latin - English dictionary, which I almost by accident saw in a bookstore in Manila and bought for fun. Besides I have found a few wordlists on the internet with suggestions for modern phenomena, including the "Lexicon recentis Latinitatis" published by the Vatican (but marred by circumlocutions), the "Lexicon Latinum" by David Morgan and the "Vocabula computatralia" at obta.uw.edu.pl.

Assume that we want to find the Latin word for "handkerchief". Ove Kjær gives the information that the Romans didn't have pockets - they used a fold in their clothes to store things. But his dictionary does provide one word: "sudarium" (from the word for sweat). New College lists both "sudarium" and "mucinnum" (from the word for snot), "Neues Latein Lexikon" has "sudarium", "linteoleum" and "ricinium", and Morgan beats them all with

" handkerchief / Schweisstuch: sudarium [s.18] -- Taschentuch: linteolum [Apul.]; mucinnium [Arnob.]; nasitergium* [s.16] (HELF.) ]] sudarium, linteolum (LRL) ]] ûcinnium, linteolum [Alb. II] ]] orarium; mucinium; (for sweat) sudarium (LEV.) ".

That's what I want from a dictionary going from a modern dictionary going into an old one - that its authors have tried to find words for notions a modern reader might need and not only for the things which Romans or medieval monks spoke about long ago. In some cases that means that a word which actually was used by the old folks has to be dropped by us because we get wrong associations from it (like calling modern astronomers 'astrologi'), and in cases where there isn't a good genuine Latin word for something the savvy people should propose one - and not propose a long circumlocution. For instance "Neues Latein Lexikon" proposes "instrumentum televisificum" or "scrinium televisificum" for "Fernseher" (television). Why not just "televisificum"? "New College" has "televisio" for the notion and "televisorium" for the gadget, while "televisificum" is reserved for TV programs, and Morgan has two proposals: "instrumentum televisificum" (once again!) and "televisorium". It is not very important which of these possibilities become the new NeoLatin standard - as long at it isn't a long and cumbersome construction as "instrumentum televisificum".
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Fri Jan 29, 2016 11:08 am

2.8. Other kinds of dictionaries and vocabulary lists

There are several kinds of vocabulary lists and dictionaries in existence:

Monolingual dictionaries
Bilingual dictionaries
Multilingual dictionaries
Etymological dictionaries
Dictionaries for specific fields of knowledge
Thematic wordlists
Frequency tables and Swadesh lists
Pictorial dictionaries
Dictionaries with tons of genuine quotes on the internet

I have already given my reasons for preferring bilingual dictionaries over monolingual dictionaries in the preceding chapter. As for the multilingual dictionaries they run into the problem that semantic fields differ among languages, and it would be very difficult to describe these differences in a concise way for the complete vocabulary of x languages. The only place where they have a justification is in thematic wordlists - like for instance lists over the names of birds or tools. And even there I would only use them for reference - it would be quite confusing to learn the names in ten languages for strawberries or blackbirds in one go. I have owned a five language dictionary since the 70s, but I have never used it.

Swadesh lists (named after linguist Morris Swadesh) and frequency lists generally share one characteristic that make them unsuitable for memorization projects: they lack translations so you are supposed already to know the words before you consult them. But they could have fulfilled one important aspiration of all language learners, namely to know at least all the most important words in their target languages. As you saw in chapter 2.1 the number of very common words is exceedingly small. In the Kilgariff corpus only 96 English wordforms have a share of more than 0,1 %, and only 9 have one of more than 1%.

How many words (or word forms) are so common that EVERY language learner should every one of them? I don't know the answer, but if you take all the 'grammatical' words including forms representing the different roots of irregular verbs etc. I doubt that you would reach 500 in any European language. You could ask another question: take a lot of separate corpora. Which - and how many - words or word forms a represented in almost all the samples? Again I have to admit that I don't know the answer, but that group of word/wordforms constitutes the basis of understanding and using any language.

The Swadesh lists were constructed for use in lexicostatistics (the quantitative assessment of the genealogical relatedness of languages) and glottochronology (the dating of language divergence), and that's definitely a good reason - for linguistist, less so for language learners. Such a list is composed of very common and supposedly central words because these words are supposed to be more stable than the rarer words, and therefore they can be used in language comparisons. But because of these characteristics they also contain words which every language learner must learn from the beginning - and not only that, but also learn to use in a multitude of ways and with great ease. Other linguists have compiled Swadesh list for other languages so that it now almost has become a generic term.

You can learn a lot about the grammar words from a grammar book, and if words are really common you'll also automatically run into them in you study material - though maybe not into all of their special meanings or the expressions they are used in. But it is tempting start a 'mopping up' operation to fill out the holes, and the same applies to the words which follow further down on the frequency tables. One tempting way to do this is to find and memorize thematic lists, but at least for wordlists I have found that this is a bad idea: an attempt to memorize a large number of words with almost similar meanings bodes for confusion - but strangely enough I haven't had the same kind of problems with alphabetically ordered words, so similarities in semantic are apparently more likely to cause confusion , at least for me.

If you want to learn the names of 27 species of crocodiles you should first find some information about each of these species so that you get on nodding terms with each species. There is no point in cramming names of croc species which you only know as mere names. And even when you have learnt something about ech and every species it is a bad idea to try to memorize all the names in one go - take max. 3-4 in one session, not more. The same applies to related words or idiomatic expressions you find in dictionaries: just learn 3-4 of these in any one session.

What then about pictorial dictionaries? I own one old pictorial 5-language dictionary, "Hvad hedder det" from Politiken 1947 (sample below), and there are pictures with references to all the details in several of my 'normal' dictionaries - mostly in those of my monolingual dictionaries which have secret ambitions about becoming true encyclopedias.

a14.jpg (66.16 KiB) Viewed 906 times

But why use pictures in dictionaries at all?

There are two cases: pictures with just one motive and pictures that illustrate a 'semantic field'. For me there is no doubt that the latter can be useful. For example you can show an old ship with sails, and you can point to each sail and give its name. Even though you didn't know that those sails even had names you now learn both that they have names and what those names are. That's useful information and such a picture with comments may be more valuable for a language learner than a tematic wordlist with the same words because you also get some information about the words which you might not even have in your own language. But in the example you still don't know whether the terminology is general or restricted to ships of a given type or from a certain period. In a true lexicon like Wikipedia you expect to get that information too.

Now move this idea to a bilingual dictionary and assume that you give the name for each sail in both languages (with a comment if the naming conventions in the two languages aren't parallel or if there are restrictions on period and type of setting). In this way you fuse just about every kind of memory hooks into one complex source, and that's just about everything a learner with a taste for images can wish for.

What about pictures of just one thing? This is a completely different story. If you don't know the thing on the picture it doesn't help you much. A multiword picture wouldn't help you either if you much if you didn't already know the general setting, but if you could see for instance five different kinds of Greek vases with a name for each one you would be able to distinguish them later. With just one vase you can see that it is some kind of pottery, oh yeah, but you don't have a clue whether the word attached to the picture is a general word for pottery or a specific word for instance a big vase from Attica with two handles. In the bilingual case you can at least be lucky to know the name in your own language, and then you may be so extremely lucky that the foreign word covers the same set of uses - but you are just as likely to misinterpret the picture unless there also is an explanation. For abstract notions an explanation in words will normally be more effective (or maybe a film AND an explanation).

So it is a hit or miss operation to show pictures with just one item - they may contain information you didn't have or they may just serve as decoration. Or they could easily show an irrelevant picture. In the bilingual case both are useful because you have a lifeline back to your base language in the form of a translation, but the multi-item picture is by definition more useful because it allows for comparisons, subdivision of semantic fields and for filling out lacunas in otherwise known semantic fields - even when these don't correspond exactly with the organization of the semantic field in your base language.

With pictures that don't show concrete things the usefulness is even more questionable. The old adage - the one that claims that one picture say as much as 1000 words - may be caused by cases where the picture left you totally perplex, and even 1000 words of babble couldn't hide that. Which is one reason why systems like Rosetta Stoned don't function - if you don't know exactly which element or interpretation you should look for in a picture then it has failed its task. And the more abstract the notion you want to illustrate is, the less likely it is that any picture can help you.

We have discussed learner types above, and one of the types that has been discussed is the 'visual' learner. In the few tests I have seen described the researchers believed that this could be tested by showing a learner a picture of something and saying a word (as an alternative to saying a word and giving an verbal explanation or a translation). But I have seen no indication that this would work. It would be much better to test the effectiveness of visual clues by showing five Greek vases as images and the corresponding words, as opposed to five vases with long descriptions in words.

This discussion leads to the question about the usefulness of thematic wordlists and dictionaries. I recently passed a bookstore which had a 4-5 cm thick medical dictionary in the window ("Medicinsk-Odontologisk Ordbog Dansk-Engelsk/Engelsk-Dansk") - 838 pages! Gosh! I don't know how many headwords it constains, but I suspect that each of them is explained in painstaking detail - otherwise the book wouldn't be nearly as thick. And that means that it moves in the direction of encyclopedias and lexica. And that's fine. I once upon a time invested in the complete Grove's dictionary of Musci and Musicians in 29 volumns, which has its articles in alphabetical order. But if you do want to learn the words which are relevant for a certain area of knowledge then it would be more logical to read a book about that area. I have most of my knowledge about bird names in English from from field guides, where the birds are ordered according to their biological relationships, but most language learners would probably prefer reading less 'formal' genres and then just hope that they can pick up the relevant vocabulary in a less systematic fashion. That's OK, but the thing that isn't OK is to waste your time on thematic wordlists without those facts that make each item worth memorizing.

And finally: with the advent of the internet we have got a new kind of dictionaries, where each word is illustrated by a number of quotes - sometimes even in a bilingual layout. The following images shows some of the quotes related to the Danish word "medicinsk" in the glosbe.com dictionary:

a16.jpg (49.08 KiB) Viewed 906 times

In good paper dictionaries there will also be examples, but mostly very short ones. And that's actually a good idea because they otherwise would be so thick that you would need a crane to move them around. On the internet you don't have to be so fuzzy about space and weight, and that makes it possible to get dictionaries where you can see the words and their translations in certified humanmade translations. And that may clear up some of the question you might have after a peek in a traditional dictionary. But you can also be led astray. In one of the examples above the Danish expression "Den seneste medicinske forskning" is juxtaposed with "Recent medical research" in English. The Danish version is actually a little bit more restrictive: the research is not only recent, but the most recent, if you write "den seneste". However such small issues are unavoidable even with texts taken from occifial sources. At the Esperanto World Congress in Lille 2015 I attended a lecture where official EU translations were compared, and the differences between the version were in some cases no less than shocking.

Wikipedia can be used for a similar purpose. If you can find an article in your base language about a certain topic, then the mechanism that gives you links to articles in other languages about the same thing can give you access not only to one word, but to a host of other words from the same semantic field. But again: this information is presented in a less strict way and there is no garantee that the articles correspond in their choice of words. You have been equipped with a splendid new tool, but it doesn't make your oldfashioned alphabetic dictionaries superfluous.

And just one last warning: Google translate should NOT be used as a dictionary. It does provide lists with possible translations for specific words, but these lists are far from complete, and sometimes they contain glaring errors. Use a real dictionary.
Re: Iversen's Guide to Learning Languages (version 3b)

Postby Iversen » Fri Jan 29, 2016 11:23 am

2.9. Learning expressions (and 'chunks')

I have written a lot about memorization of single words and word combinations through word lists. I have written less about stylistics because it is much harder to devise techniques for learning it in a systematic way. There are books with advice about style, but they can only tell you in loos terms how to deal with genuine language (apart from a few tips and tricks that have more to do with psychology than with language). But in between there is the dark uncharted land of the idiomatic expressions.

I have read some of the pages in a French Dictionary of Idiomatic Expressions (in the series Livre de Poche), and it struck me that it was very amusing, but I didn't learn much. So I started to speculate about what the problem was. And I found at least one thing that irritated me, namely that both examples and explanations were in the same language. The problem is that the two expressions compete, - I let the original expression slip away because the explanation takes its place.

OK, one logical reaction to this would be to point out an equivalent expression in another language. This is a very relevant technique, but from the other side: I may use a certain expression frequently in Danish or English ... so what would a Frenchman say in the same situation (probably with totally different words)? I should long ago have started a collection of such expressions, preferably on my computer so that I could make full text searches, but nobody is perfect. The material is in principle not too difficult to find in ordinary good dictionaries, which are full of idiomatic (or at least fixed) expressions, and in principle you could learn them with something like wordlists with extra wide columns. And I have actually included some of them into ordinary wordlists by using two lines. But somehow idiomatic expressions with more than 2-3 words need a special treatment.

One day I got an idea which on paper seemed to be worth exploring, namely using hyperliteral translations. The point is that if I just could remember those pesky expressions I would probably also remember their unexpected meaning (it is the unexpectedness that makes an expression idiomatic, otherwise it would just be a fixed expression). By having a hyperliteral translation I so to say point out its weirdness, and that seems to function as an effective memory crutch and the worse the translation is, the more effective it is probably going to be.

For example "compter sans son hôte" is explained as "se tromper". But I'm much more likely to remember the expression with the help of the English translation "count without one's host", precisely because it is nonsense. As you may remember I criticized in chapter 2.4 (the one about associations and context) some researchers for thinking that presenting a word AND a picture would help visual learners more than just presenting them with a word. Now we have a similar situation: presenting "compter sans son hôte" and "se tromper" together is like showing the kids in chapter 2.4 the word "dog" and some random dog. In contrast, the hyperliteral translation IS the original sentence, just in another language.

There are cases where the classical word list method is relevant, namely where the expression contains at least one unusual word. For example "sous la houlette" means 'sous la conduite de' ('under the control of'). The 'houlette' is the shepherd's crook (or a little garden spade), - if I can learn that word then it will be difficult NOT to remember its use in the expression "sous la houlette". The problem is rather those many expressions that don't have such a 'gimmick word', and that's where I think that a funny hyperliteral translation might help.

There is one thing more to say about expressions, namely that I'm squarely against the idea that you should learn expressions without caring about the elements they are composed of. This would mean that you should learn a language with extremely long 'words', and long words are difficult to remember.

The rule is: learn all the individual words in an expression before or at latest with the expression. The same applies to extremely long words: they are nearly always composed of several elements, and if you know the elements it is much easier to memorize the combinations that produce the words. In that way you can also enjoy a glimpse of the logic that lies buried in the details of your target language. Learning expressions as interminably long indivisible units doesn't give you that.

The notion of 'comprehensible input' is mostly seen from the perspective that it permits you to guess the meaning of unknown words or expressions without having to resort to a dictionary or grammar or whatever. And that may be correct in the sense that you often can guess the meaning of something more or less correctly and more or less precisely. But you can't know beforehand whether a certain combination of words is in common use by natives - it takes time to get a feeling for this, and comprehensible input may be your best source for these expressions, given the scarcity of good collections of those expressions.

Of course you can find many expressions in good dictionaries, but those cited there are mostly those that couldn't be guessed from the words alone. Fortunately many expressions are fairly easy to decode - the problem is which word combinations the natives actually use among all the possible combinations. The Free Dictionary defines an idiomatic expression like this:

idiomatic expression - an expression whose meanings cannot be inferred from the meanings of the words that make it up

But this is not quite true. There is usually some mind of logic behind an idiomatic expression - and if you can't see it the mot likely reason is that some of the words have become obsolete or changed their meaning since the expression was coined. When it was coined there was a visible logic.

I have just opened my big red Barron's book of 12.000 Spanish and English expressions at a random page. Is there just one single expression here which is absolutely incomprehensible or unrelated to its parts? No. Is "ser libre como el viento" incomprehensible? No - be free as the wind is poetical, but totally limpid. Or "haciéndose viejo" for 'trying to avoid all work'? No, you just imagine some old retired men on a bench, doing absolute nothing. And of course "los vicios son los hijos del ocio" (the vices are the children of idleness). In this case the English counterparts provided by Barron are slightly more colourful: "An idle brain is the Devil's workshop". But even this is understandable. So in spite of the prevalent definitions idiomatic expressions are not devoid of logic. It would be more relevant to point to the fact that you can't guess beforehand why an idiomatic expression became popular instead of the alternatives which might have been equally logical and entertaining.

So idiomatic expressions land somewhere in the void between lexicography and syntax, and a fair number of authors have published lists of such expression. For instance there are collections of proverbs for many languages, but these are complete sentences which mostly are quoted in their entirety, and therefore they really belong into the genre known as "bevingede ord" in Danish ('winged words', i.e. collections of quotes). Slang is also an area favoured by language book authors, probably because these expressions often are funny, rather drastic or downright dirty, and that makes such books entertaining even for the natives. But idiomatic expressions don't have to be colourful and funny, even though most collections leave that impression.

'Chunks' are actually just short snippets of speech or writing, but there is a subgroup among them which are essential to 'gluing' a conversation or written text together. This technique is something that even native speakers use, but for a newbee who wants to communicate it is extremely important to learn these prefabricated elements before you spend your precious time learning proverbs and quotes from famous plays - or even the kind of colourful expressions which dominate idiomatic dictionaries. It is much more important that you can say "by the way" or "tell me" now than that you can quote the whole of Hamlet by heart next year. Because these expressions are so common you can learn them by reading conversations in literature or by listening to actual conversations, - dictionaries can contain them, but for once they would not be the best and most trustworthy source, dominated as they are by much rarer items.

I own a few books that purport to list idiomatic expressions. For instance I own an old French "Dictionnaire des expressions idiomatiques" (Livre de Poche). My problem with this book and its companions in French or other languages is that I doubt that the expressions are used often enough to make them indispensable. How often will a French person say "se mettre dans le cornet" (=eat)? Google reports 1960 hits for "s'est mis dans le cornet", including the following:

"Avec ce qu'il s'est mis dans le cornet et dans les narines plus les médicaments qu'on lui file à haute dose… C'est une putain d'arme chimique le mec !" .

Native Frenchmen will probably know this kind of expressions, but they have to be classified as 'rare et précieuses' if you look at how often they really are used. And short of checking each expresssion with Google you have absolutely no way of knowing whether a given expression still is in active use.

Another old book in my collection, A. Bryson Gerrard's "Beyond the dictionary in Spanish". is close to the thing I am looking for. It is in principle an ordinary Spanish-English dictionary, but with comparatively few words that are explained in depth. For example "avisar, aviso" gets this commentary: "Unreliable; they only mean 'to advise', and 'advice' in the sense of 'to inform', and even then imply warning. "Avisar" is the normal verb for 'to warn', and "aviso" is the word for an official notice which lays down the law; no question of giving advice. (etc etc....) 'To advise' in the sense of giving advice is aconsejar ('advice', consejo) ...." (15 lines). You certainly need ordinary dictionaries too, but they should be accompanied by such systematic in-depth guides to troublesome words and expressions.

there are also sites on the internet that have specialized in listing such expressions. One of these is FluentCzech's (aka Anthony Lauder's) "Connectors Starter Pack", where the connector phrases are divided into ten major groups. A quote from that site:

Opening connectors are used when somebody has just asked a question, and you want to start answering it.

thank you heartily : děkuji srdečně
that is a good question : to je dobrá otázka
that is such a difficult question : to je taková těžká otázka
once upon a time, long ago : kdysi, dávno

Couldn't such collections be published as oldfashioned paperbooks? Oh yes, they certainly could, but then it should be in the format of small language guides because the content is something you typically need to look up for a concrete purpose, and your thick paperbooks are probably standing at home on your bookshelf, not lodged permanently in your pocket.

But you can't expect to be spoonfed like this, and therefore you need to absorb the information from your casual reading and TV and from natives you meet. A paper notebook for jotting down expressions is a valuable tool, because just as with single words any normal person needs repeated exposure to remember expression. An electronic notebook or some other smart gadget with a camera would be even better because it gets easier to record interesting expressions in print - especially if you also can run the recordings through an OCR-program (Optical Character Recognition) so that you can make your collection searchable. But it may be too much to expect every language student to make his/her own collections, and you can't do it while having conversations - unless you want to come over as extremely absentminded and unsociable.
0 x

