Routledge Frequency dictionaries — 5k limit

einzelne · Postby **einzelne** » Thu Jun 16, 2022 4:38 pm

Iversen, I appreciate you contribution to the discussion, although I think your example is somewhat irrelevant. First, it's N=1 experiment (and that's why the power of frequency lists lie in the largeness of the corpus they use), second, you assessed you active vocabulary.

Frequency dictionaries are powerful tools when it comes to building your receptive vocabulary. I went through Routledge's German and French dictionaries somewhere at higher intermediate stages and I knew almost all words, although by that time my reading was limited by very narrow topics and may be a couple of fiction books. There were around 200 French 'gap' words, as you put it, and around 400 German ones (which simply reflected the fact that I hadn't read newspapers). So all of these words were common enough, in spite my narrow focus and limited reading experience.

I'm working on my Spanish right now and I've been using Routledge frequency dictionary right from the start. Anyone who has access to the book can judge for themselves if words beyond the first 1k are that rare and domain specific:

1001 coger v to hold, take, catch

1002 suyo pron his, hers, yours (−fam), theirs

1003 tratamiento nm treatment, processing

1004 conservar v to conserve, preserve

1005 raro adj strange, rare, scarce

1006 puesto nm job, place, position

1007 retirar v to take away, retire

1008 exterior adj exterior, outside

1009 ay interj oh no!, oh my!

1010 acá adv here, around here

Etc.

Postby **Iversen** » Thu Jun 16, 2022 7:09 pm

The main problem with my experiments is that the corpora are relatively small and entirely taken from a specific kind of non fiction by one author (myself), but I still think that my conclusion stands, namely at the lexical overlap between all unique headwords from two corpora is smaller than most people would expect, and therefore you will continue to run into unknown words long after you have learnt far more than five thousand headwords.

I don't claim 5000 words in the Routledge book or another one of the same size are rare and/or domain specific - on the contrary. And I don't say that learning 5.000 words from such a list is a waste of time. What I do say is that after you have passed the first 500 or 1000 words on such a list the listed order becomes rather unimportant, so you could just as well start learning them backwards from no. 5000 to no. 1001. Or you could learn words no. 5001 to 6000 if you had a list with more words than the Routledge one.

So my recommendation is that you take the first 1000 words or so very seriously and take care to learn them at an early stage, whereas it's less important which words you learn after that. If you get your words from texts that represent your personal lifestyle and hobbies then that selection will be as least as relevant FOR YOU as the general-purpose list in a frequency dictionary. A frequency dictionary can still be an excellent tool to locate and eliminate lexicological lacunes, but preferably at a time where you already know most of the words. Otherwise you could just as well have culled your words from an ordinary dictionary (as I do with my wordlists).

And you shouldn't shy away from rare and technical words just because they aren't listed on a frequency list. Such words might be relevant for YOU.

einzelne · Postby **einzelne** » Thu Jun 16, 2022 8:17 pm

Iversen wrote:What I do say is that after you have passed the first 500 or 1000 words on such a list the listed order becomes rather unimportant, so you could just as well start learning them backwards from no. 5000 to no. 1001. Or you could learn words no. 5001 to 6000 if you had a list with more words than the Routledge one.

Quite possible. It has never been my concern, since I know that eventually you need to know all 5k + many, many more. What I really care is to get a curated list of 5 000 - 10 000 most frequent words but it seems like, sadly, they don't provide it.

SpanishInput · Postby **SpanishInput** » Thu Jun 16, 2022 10:25 pm

I just wanted to add a couple of points:

1) Even before the 5,000 word mark, any frequency list will start to include words that not all native speakers know. For example, in my list gathered from Netflix subs the word "güey" is in the top 5,000, even though my mom has no idea what it means. (It's a Mexican slang word). And in Routledge's frequency dictionary of Spanish there's the word "reseña", which again, my mom has no idea what it means (It means "review", as in an Amazon review. When she was young online reviews weren't a thing). Of course, a native speaker with a wide knowledge of the world will have no trouble with any word in a "top 5,000" list.

2) This is why Imron from the "Chinese the hard way" blog (and also administrator of Chinese Forums) says that after you reach 1,200 words (HSK 4), it's waaaay more efficient to focus on words that are frequent in the book you're reading right now.

Here's his blog post, "learning from general word lists is inefficient":
https://www.chinesethehardway.com/artic ... efficient/

On the other hand, neither Routledge's dictionary nor my list from Netflix include the word "banana" among the top 5,000, despite the fact that you'll find bananas in any Ecuadorian supermarket and it's the #1 fruit we eat.

luke · Postby **luke** » Thu Jun 16, 2022 11:14 pm

SpanishInput wrote:On the other hand, neither Routledge's dictionary nor my list from Netflix include the word "banana" among the top 5,000, despite the fact that you'll find bananas in any Ecuadorian supermarket and it's the #1 fruit we eat.

Nor does the Routledge frequency dictionary have plátano nor guineo. All three words appear many times in Cien años de soledad.
Banano (17 times), plátano (6 times), and guineo (twice).

By the way, there are 2 different Anki 5000 Routledge decks for Spanish on Ankiweb. One has pictures, yandex sound and 10k cards. The other has the sentence fragment examples and 5000 cards.

Iversen wrote:A frequency dictionary can still be an excellent tool to locate and eliminate lexicological lacunes, but preferably at a time where you already know most of the words.

You've got me thinking about adding cards from one of those decks to my Anki routine someday soon.

einzelne · Postby **einzelne** » Fri Jun 17, 2022 12:30 am

SpanishInput wrote:For example, in my list gathered from Netflix subs the word "güey" is in the top 5,000, even though my mom has no idea what it means.

That's why the corpus size matters, just like its sources (literature, newspapers, TV/radio/cinema, speech). Curated frequency lists take that into account.

SpanishInput wrote:This is why Imron from the "Chinese the hard way" blog (and also administrator of Chinese Forums) says that after you reach 1,200 words (HSK 4), it's waaaay more efficient to focus on words that are frequent in the book you're reading right now.

I don't know the situation with Chinese and other character-based languages. But when I start reading unadapted books in Western languages, I know the core 5k words by that time and at that stage you don't have that many frequently occured words. And if you have them, you don't have to focus deliberately since due to their repetition they catch your attention anyway. The problem is usually with hapaxes (and dis legomenon, tris legomenon, and tetrakis legomenon). For instance, take a look at the statistical distribution in the Bible which, I think, reflects the statistical distribution of words in an average book. 2k words only appear once! If you put hapax, dis, tris, and tetrakis together, they give you a whopping 3500 words in total (out of total 5300). You won't be able to store them in your memory by extensive reading of the Bible alone. There's simply not enough repetitions. The question then: which of these 3500 you would rather concentrate first? As Iversen rightfully said, only a small fraction of these hapaxes will overlap with the words in your next book. How to choose?I n this case, intuition is not a reliable guide. Yes, you see bananas in your supermarket everyday but how often do you discuss them in your real life? How often you read about them in newspapers or books? People don't discuss bananas in everyday life and don't write novels about them. Yet this is what you usually get in textbooks — long lists of fruits, vegetables, clothes, furniture, professions and so on. As a result, as some studies suggest, students are ill equipped for reading actual texts because of the size and sampling of the textbook vocabulary:

The results of the frequency analysis of the vocabulary used in three current textbooks for begin- ners of German are somewhat disheartening. In all three books the percentage of vocabulary less frequent than the frequency rank 4,000 is high (29-44%) . These percentages may be partly due to issues of practicality in creating a textbook, like classroom management vocabulary, students' interest, chapter topics, story line of the book, etc. But this is certainly only partially the case. Not all lowfrequency words used in these books are connected to either classroom management or students' interests. To be sure, it can be debated how many low-frequency words should be included in first-year German textbooks. One should also keep in mind that, psycholinguistically, these words might contribute to an overload of students' capacities and lead to frustration.

Most importantly, learners should be familiar with high-frequency words. As far as sufficient text coverage and further vocabulary learning are concerned, the most-frequent 1,000 words are of such importance in language learning that teaching these words now appears to be absolutely essential. It is striking that only 64% and 61 % of the most-frequent 1,000 words are included in Deutsch heute and Neue Horizonte, respectively. Even more noteworthy is the fact that Kontakte teaches only 53% of the most high-frequency words.

Don't get me wrong, frequency dictionaries have their methodological limitations, and I'm perfectly aware of them. But when used appropriately and for the right purpose (i.e. reading), they are fantastic tools. I find them indispensable for developing reading skills, since they significantly accelerate the process.

Postby **Iversen** » Fri Jun 17, 2022 1:50 am

I think and/or read the word "banan" (banana in Danish) every time I enter a Danish supermarket. What more can you ask for? But unless enough authors include banana eating in their books it won't enter the frequency lists. And I'm sure that the technical words for the bodyparts of a fish or watery ambients in South America won't ever be included either, but right now where I'm compiling my own personal overview over the rayfinned fish (using Wikipedia) I see them quite often. So I would definitely support the idea of learning words from texts you use. On the other hand I'm warming up to the idea of using a frequency list as the basis for a wordlist (or an anki deck if you like that system). Right now I base some of my wordlists on the texts I study, but I supplement that source with words taken directly from a standard bilingual dictionary. And here I pluck the words I fancy, either because I think I might use them myself or because they somehow ring a bell - I may have seen them somewhere. But a bilingual frequency list as the one quoted by einzelne would also be a possible source for words .... IF you find a way to supplement it with words that reflect your personal needs and interests.

The first 1000 or so words are of course important for everyone, and any text book system should make sure that they are taught. They include not only the grammar words (and the many irregular forms thereof), but also some basic words for objects and actions which every human being is confronted with - things like houses and toilets and saying hello. Some conversational phrases should also be on the to do list. But you can't expect a text book system to cater for your personal interests, and if it tried then some readers would feel that it was veering off in a totally silly direction. The solution to this problem is to supplement with bilingual texts chosen by yourself as early as possible. And then I would pick up words for birdies and baroque dances and quantum mechanics and medieval castles, where others would get words for sports and politics and small talk about human relations.

luke · Postby **luke** » Fri Jun 17, 2022 4:58 am

einzelne wrote:
luke wrote:We could almost use a poll on the utility or non-utility of frequency dictionaries.

If used appropriately, I found them extremely effective and useful.

Please help me with ideas or selections that you think would be useful in a poll on frequency dictionaries and vocabulary study.

BeaP · Postby **BeaP** » Fri Jun 17, 2022 6:37 am

I haven't discussed bananas yet, but I've asked for them several times in markets.

A1-A2 (beginner) textbooks tend to concentrate on speaking (communication in everyday situations), although in theory the balance of the 4 skills is often considered something essential. Yes, it's true that no-one experiences all of these situations, for example I don't talk to the staff in clothes' shops. The biggest shortcoming of current textbooks (for me) is that 'real' texts (like the ones that you can read in a newspaper or a book) start to appear at B1, and I have to wait until B2 to get anything of substantial length. Until that point it's mostly train tickets, brochures, e-mails, SMSs, forum posts. Fragmented sentences, abbreviations, no real text cohesion.

They're also done with two assumptions: 1) The student will continue the process and reach B1/B2 at least. What they don't learn now, they'll learn in the next volume. 2) The student is helped by a tutor, who provides the extra words necessary for the communicative tasks.

Also, I wouldn't draw final conclusions from a study of German textbooks that only includes books written by academics at (mostly) American universities. Textbooks written by German teachers or methodology experts who still live in a German-speaking environment and meet an immense number of students (not just university students, but also immigrants for example) could be very different, simply because the purpose of these books, the goals they want to achieve are different.

When I use textbooks, I don't really lack words because they can be easily looked up. What I sometimes lack is expressions, the 'natural way' to say something instead of a mirror translation that I can come up with.

Maybe there are no 10k lists, because the difference between the frequency of each element is very small (there would be 2000 words at the same place), and it's affected by the type of the texts so much that it can't be done in a scientific way. (Think about Iversen's post.) On the other hand, maybe we can find a frequency list of 19th century French literature for example, so something that concentrates on one area.

In my experience it's often useful to be flexible and think out of the box. So if you let go of the frequency thing and Nation for just a couple of hours, and describe your goals, you might get some ideas from other members that help you more. Who knows? If it's reading literature, I know you've already tried, but it might worth it to ask the same question every now and then. Also, people who study solely for reading (Latin or Ancient Greek) can come up with good ideas that are totally missing from the perspective of Nation and similar researchers.

Haselnuss · Postby **Haselnuss** » Fri Jun 17, 2022 6:50 am

I agree that the Routledge Frequency Dictionaries are very useful tools, particularly if you want to jumpstart your way into reading material like newspapers. I've used both the 1st and 2nd editions of the one of the Routledge Frequency Dictionary of German. When I was first building my vocabulary I used the 1st edition since that was the most current at the time. Then, after the 2nd edition came out, I bought it because, as was mentioned earlier in this thread, it has 1,000 more words in it than the 1st edition. (As it turns out, since each edition employs a different corpus, some of the word that appear in the 1st edition actually fell off the list in the 2nd edition, despite the fact that the 2nd edition has 1,000 more words.)

The frequency dictionary provides a systematic way to acquire vocabulary in the most efficient way possible, since the first couple of thousand words are so important in learning the language. While it would be interesting to see Routledge publish words 5,001 to 10,000, I doubt that they will.

The key issue is that after you're done learning the frequency dictionary you still need an efficient, systematic way to learn the next tranche of several thousand words. A good tool for this is a high-quality thematic vocabulary book. Cambridge University Press publishes a set of books like this for the FIGS. I have the one for German, which is called Using German Vocabulary. The vocabulary is divided across 20 thematic areas, and within each thematic unit there are three levels graded according to relative frequency of the words. This kind of book is an efficient means of acquiring extra vocabulary if you don't want to take the trouble of composing your own word lists.

It's worth noting that Using German Vocabulary is required for students in the German language programs at Oxford and Cambridge Universities, so it must be highly regarded in academia.

A language learners’ forum

Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Re: Routledge Frequency dictionaries — 5k limit

Who is online