The size of vocabulary to set as a goal.

General discussion about learning languages
User avatar
AcademiaNut
White Belt
Posts: 47
Joined: Mon Jan 04, 2021 9:54 pm
Location: U.S.A.
Languages: English (N).
Spanish (beginner), French (beginner).
Medium interest: Latin, Dutch, German.
Mild interest: Japanese, Danish, Swedish, Portuguese, Greek, Hawaiian.
x 32

The size of vocabulary to set as a goal.

Postby AcademiaNut » Sun Jan 10, 2021 3:17 am

I thought I'd share some interesting things I found out recently about vocabulary size, since in my experience vocabulary seems to be the biggest hurdle to attaining proficiency in any given foreign language, by far.

Here's a table I made of how many words can be learned per year, based on the number of words learned per day:

Code: Select all

# of
words
learned   =>      =>      =>
per      per      per      per
day:      week:   month:   year:

1      7      30      365
2.74      19.2      74.1      1,000
5      35      150      1,825
5.48      38.4      164      2,000
8.22      57.5      247      3,000
10      70      300      3,650
10.96   76.72   328.8   4,000
13.70   95.90   411      5,000
15      105      450      5,475
16.44   115.1   493.2   6,000
19.18   134.3   575.4   7,000
20      140      600      7,300
21.92   153.4   657.6   8,000
24.66   172.6   739.8   9,000
25      175      750      9,125
27.40   191.8   822      10,000   
30      210      900      10,950
30.14   211.0   904.2   11,000
32.88   230.2   986.4   12,000
35      245      1,050   12,775
35.62   249.3   1,069   13,000
38.36   268.5   1,151   14,000
40      280      1,200   14,600
41.10   287.7   1,233   15,000
43.84   306.9   1,315   16,000
45      315      1,350   16,425
46.58   326.1   1,397   17,000
49.32   345.2   1,480   18,000
50      350      1,500   18,250
52.05   364.4   1,562   19,000
54.79   383.5   1,644   20,000
55      385      1,650   20,075


This raises the important question: How big a vocabulary is needed to feel comfortable in a language? I found some answers to that, too:

----------
1000 words allow you to understand about 80% of the language which surrounds you, as long as it is not too specialized (Hwang, 1989; Hirsh and Nation, 1992; Sutarsyah, Nation and Kennedy, 1994)

3000 words allow you to understand about 95% of most ordinary texts (Hazenberg and Hulstijn, 1996).

5000 words allow you to understand about 98% of most ordinary texts (Nation (1990) and Laufer (1997)). Such a vocabulary size also warrants accurate contextual guessing (Coady et al., 1993; Hirsh & Nation, 1992; Laufer, 1997).

10000 words allow you to understand about 99% of most texts (Nation (1990) and Laufer (1997)). It is the pinnacle of language learning — a counterpart to having the vocabulary of a college graduate.

https://universeofmemory.com/how-many-w ... ould-know/
----------

Clearly it's a situation of diminishing returns. The quantities of words needed are measured in the thousands, but with each additional few thousand words one's percentage of understanding increases with smaller increments but can never hit 100%: 80%, 95%, 98%, 99%, etc. (Nobody can reach 100% since that would be like memorizing a huge set of dictionaries.)

But this raises another question: For which percentage should one aim, in practical terms? I found the answer to that on the same web site, using the official language levels:

Code: Select all

Language Level:   Number of Base Words Needed:
A1         500
A2         1,000
B1         2,000
B2         4,000
C1         8,000
C2         16,000


Via my chart, that means level C2 can be reached, as far as vocabulary, in one year by learning 44 words per day, or in two years by learning 22 words per day, assuming that one can study grammar and other language topics in parallel or on the side. That's a lot of memorization, which is no fun. It reminds me of what happens every time I get interested in chess: ultimately it becomes a matter of memorizing openings, which becomes a turnoff after several months, whereupon I drop chess for another several years.

The above chart closely matches other statistics I've seen that claim the average active vocabulary size for English speakers is 20,000 words.
https://7esl.com/vocabulary/
Still, other problems crop up in this topic, especially: (1) Active vocabulary, which is the set of words the person can use accurately in a sentence, is only about half of the passive vocabulary, which is the set of words the person only somewhat understands. This can throw off the estimates by 50%. (2) The definition of "word" is highly flexible, and may or may not include conjugations, noun cases, prefixes, suffixes, and so on. (3) How exactly does a person review the vocabulary they have already learned, so as not to forget the older knowledge? There must be common strategies for doing this that don't involve repeated perusal of the entire list, but I don't know what strategies those would be. (4) Which words should be learned first? This is a big topic on which I found a few answers, but it's too much to get into here.

I'm finding this topic surprisingly interesting, especially since it seems to be the key topic that kept me from gaining proficiency in any of the languages I studied (although learning enough grammar was also rough).

Does anybody want to share their experiences or opinions on learning large numbers of words? Or recommendations of how to review words learned months ago? How does learning of conjugations or noun cases fit into the memorization of random words? Which sets of words do you learn? (How are they organized: by topic? frequency? by text book? as they are encountered?) How is learning words combined with learning grammar? (at the same time? in different sessions? by priority? by textbook order?)
9 x

User avatar
lusan
Green Belt
Posts: 463
Joined: Sat Aug 15, 2015 1:25 pm
Location: Greensboro, NC, USA
Languages: Spanish(Native)
English (Naïve)
French(Intermediate)
Italian(Intermediate)
Polish(In Alcatraz)
x 985

Re: The size of vocabulary to set as a goal.

Postby lusan » Sun Jan 10, 2021 5:17 am

AcademiaNut wrote: That's a lot of memorization, which is no fun. It reminds me of what happens every time I get interested in chess: ultimately it becomes a matter of memorizing openings, which becomes a turnoff after several months, whereupon I drop chess for another several years.


It depends. If one masters tactics and strategy is possible to play at an expert. No need of heavy memorization. I love chess. Look at Capablanca life. He did pretty good.
1 x
Italian, polish, and French dance
FSI Basic French Lessons : 10 / 24 17 of 24 goal

User avatar
AcademiaNut
White Belt
Posts: 47
Joined: Mon Jan 04, 2021 9:54 pm
Location: U.S.A.
Languages: English (N).
Spanish (beginner), French (beginner).
Medium interest: Latin, Dutch, German.
Mild interest: Japanese, Danish, Swedish, Portuguese, Greek, Hawaiian.
x 32

Re: The size of vocabulary to set as a goal.

Postby AcademiaNut » Sun Jan 10, 2021 6:26 am

lusan wrote:It depends. If one masters tactics and strategy is possible to play at an expert. No need of heavy memorization. I love chess. Look at Capablanca life. He did pretty good.


After he retired, Bobby Fischer said outright that he hated chess now because of the amount of memorization required, but also how important memorization is for chess nowadays. Here are similar quotes by Bobby Fischer on that topic:
https://www.dailychess.com/forum/only-c ... ght.103438

I figure that if anyone had good wisdom about the pursuit of chess, Bobby Fischer had it, so that discouraged me. In my opinion it was difficult to know which strategies to use when I wasn't familiar with the type of position after "book" ran out, and in some openings the type of position was very hard to predict.

Fortunately for me, my language goals are very modest: I'd be quite happy with reaching level B1 or B2 in any foreign language. One problem is that I was overambitious years ago, and tried to learn too many languages at once, so I failed at all of them. Another problem is that language education is awful in the U.S., and probably in other countries, too, judging by the awful accents I hear coming from people from certain countries when they try to speak English. I don't want to sound like that.

One set of statistics that is often quoted is that it takes 600-700 hours of study to reach reasonable professional proficiency in one of the easier languages, and up to 1,100 hours in the most difficult languages. At 8 hours/day, that higher figure corresponds to 138 days, or about 5 months. That sounds very low to me, but maybe with a very good teacher or school it would be reasonable.
https://linguapath.com/how-many-hours-learn-language/
Linguist Steve Kaufmann recommends B2 as the level to aim at, which sounds about right to me, and roughly matches my goal:
https://blog.thelinguist.com/how-long-s ... -language/
2 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: The size of vocabulary to set as a goal.

Postby ryanheise » Sun Jan 10, 2021 7:32 am

There are all sorts of factors that make it difficult to measure the comprehensibility of a text merely in terms of vocabulary size. The simplest ones are that it is dependent on the specific language and how you count words (Nation refers to "word families" in his own statistics). But one of the more subtle problems with Nation's analysis is that percentages of known words can be misleading. If you understand 90% of the words, that that does not equate to 90% comprehension.

In the past, I took an except from Alice's Adventures in Wonderland, Chapter IV, and simulated what reading would be like with a vocabulary of the top 2000 most frequent English words. Here was the result:

  • It was the White ?????? ?????? slowly ?????? again and looking ?????? about as it went as if it had lost something and she heard it ?????? to itself The ?????? The ?????? ?????? my dear ?????? ?????? my ?????? and ?????? She ?????? get me ?????? as sure as ?????? are ?????? Where CAN I have ?????? them I wonder ?????? ?????? in a moment that it was looking for the ?????? and the ?????? of white ?????? ?????? and she very good ?????? began ?????? about for them but they were ?????? to be seen everything seemed to have changed since her ?????? in the ?????? and the great hall with the glass table and the little door had ?????? ??????

(Try the simulator out yourself here)

Would you say you understand 90% of what's going on?

It's quite a common myth brought up in casual conversation among people who have ever run into these statistics before that if you only learn the top "X" most frequent words, you will understand "Y"% of a text. But in reality, the 10% that you don't know are typically the most important words that tell you what the sentence is actually about. What matters is not the percentage of words that you might know, but how many sentences you can understand, or to put it another way, whether your known words happen to cover the important words in a sentence. This is highly dependent on the properties of each individual text and is not something that a general vocabulary size can tell you.

So I would flip the importance of vocabulary size with the importance of the text you choose to read, and say that the more important question is not "how many words should I aim to learn?" but rather "which texts should I be reading?"

So how do you more accurately measure comprehensibility of a text? This is something I've been really curious about for the past year or so. You can find a trail of my adventures with vocabulary and comprehension below:


Currently I'm working on a much larger analysis of 40,000 English podcast episodes to see if I can sort them by language difficulty. In the process, I happen to have built quite a large corpus of English words typically used in speech, which I'll probably publish here once I'm done.

(The next step will be to analyse grammatical properties of the text.)
22 x

User avatar
tungemål
Blue Belt
Posts: 949
Joined: Sat Apr 06, 2019 3:56 pm
Location: Norway
Languages: Norwegian (N)
English, German, Spanish, Japanese, Dutch, Polish
Language Log: https://forum.language-learners.org/vie ... 15&t=17672
x 2192

Re: The size of vocabulary to set as a goal.

Postby tungemål » Sun Jan 10, 2021 9:32 am

Oh words, our favourite subject!

Ryanheise makes a good point.

But as not to discourage new learners - stories like "Alice in Wonderland" are normally very word rich and therefore hard to read for a learner. Many of the words that fell outside the 2000 words in your excerpt are words like rabbit, paw, fur, whiskers, and ferret (I've never heard about ferrets). These are words that a learner would not prioritise, and are words that are not likely to come up in an ordinary conversation. Other missing words are "muttering", "trotting", "anxiously" - common enough in books, but not that important in conversations. "Back" is also one of the words missing - that I find strange as I'm sure that would be one of the first words to learn.

Nation refers to "word families" - important to know that this will give a lower number than what most of us think of as words.

In my own experience:
When I studied Polish in 2018 I learned about 1500 words - based on what I entered into anki. Probably maximum 1000 word families. That was enough for very simple conversations, but I could not read any ordinary text, and I could not understand ordinary speech intended for natives. For Spanish, learning 1500 words would go much farther, since there are so many words that are similar to English.

(edited details and grammar)
7 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4782
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 15020

Re: The size of vocabulary to set as a goal.

Postby Iversen » Sun Jan 10, 2021 11:20 am

The main authority on this subject is Paul Nation, but kudos to those who actually do their own home research instead of just quoting some percentages.

As for the necessary number of words we have discussed this topic a number of times, and I would put the requirements higher than some authors have done - probably because I don't see small talk about daily life and where-do-you-came-from as the relevant standard, but rather the ordinary written stuff you would find for instance in a newspaper or magazine article (or some non-technical article on Wikipedia for that matter). And then the goal to go for should probably be set at least 10.000 headwords, although you can survive on less if you can live with a few holes in your reading.

On the other hand you can ask for directions to the nearest toilet with a total vocabulary of just four words (and hope for the answer to be a gesture rather than an explanation). In Russian even two words would be enough - or three if you add "пожалуйста", which you should do.
9 x

User avatar
tiia
Blue Belt
Posts: 751
Joined: Tue Mar 15, 2016 11:52 pm
Location: Finland
Languages: German (N), English (?), Finnish (C1), Spanish (B2??), Swedish (B2)
Language Log: viewtopic.php?t=2374
x 2061

Re: The size of vocabulary to set as a goal.

Postby tiia » Sun Jan 10, 2021 1:45 pm

tungemål wrote:But as not to discourage new learners - stories like "Alice in Wonderland" are normally very word rich and therefore hard to read for a learner. Many of the words that fell outside the 2000 words in your excerpt are words like rabbit, paw, fur, whiskers, and ferret (I've never heard about ferrets). These are words that a learner would not prioritise, and are words that are not likely to come up in an ordinary conversation. Other missing words are "muttering", "trotting", "anxiously" - common enough in books, but not that important in conversations. "Back" is also one of the words missing - that I find strange as I'm sure that would be one of the first words to learn.

I'd like to add that there's a problem that not only it is important what corpus is used to generate those word lists, but also, that a learner does not necessarily learn the most frequent words first. The word rabbit they would probably learn rather early, because talking about your pets is usually considered a beginners topic.

ryanheise wrote:(Try the simulator out yourself here)

I had a bit of fun trying out the simulator for a very easy introduction. Something that I may be able to say after just 1-3 lessons:

Hello, my name is tiia. I live in Finland. I am a civil engineer. I speak German, English, Finnish, Spanish and Swedish.


Now with a simulated vocabulary size of 2000 words we get:

?????? my name is ?????? I live in ?????? I am a civil ?????? I speak German English ?????? Spanish and ??????


We can argue about understanding the words Finnish and Swedish, depending on your location. (Same goes actually for German.) However, especially "Finland" is nearly the same word in most languages, so you may guess it, though it's not that frequent.

But how can it be that the simulator assumes someone knows civil, but not engineer (nor engineering)? I mean, I cannot even tell you how often I had to explain some non-native speaker of English what civil engineering is. The problem is never understanding engineering. It's understanding the civil-part. So this is probably the hardest word to understand in this introduction, but it is only blanked out once we go below the 1000 word mark.
I changed civil engineer into mechanical engineer. And voila it doesn't even know mechanical. (Though I've never seen any mechanical engineer having to explain that word.)

And now for the most irritating issue: I have never seen a learner who doesn't know the word hello. It's probably one of the first 5 words you learn. However, hello only appears again when you simulate a vocabulary size of 6200+.

In other frequency lists hello appears much earlier, but obviously not in the list used here.


But in general I do agree with the point, that if you understand 90% of the words this doesn't mean you'd understand 90% of a text.
7 x
Corrections for entries written in Finnish, Spanish or Swedish are welcome.
Project 30+X: 25 / 30

s_allard
Blue Belt
Posts: 985
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2370

Re: The size of vocabulary to set as a goal.

Postby s_allard » Sun Jan 10, 2021 2:55 pm

As the veteran participants of this forum will remember, I have written extensively here on the topic of vocabulary size and language proficiency. If I get warmed up, worked up or maybe riled up, I might go into more details but here are the key points of my position:

1. There are some fundamental methodological issues in the definition of what constitutes a word family. For example, there are what are called function or grammatical words and content words. There is the problem of idiomatic expressions. How does one take into account the multiple meanings of a word in different contexts? How do you count compound or multiple part words such as English prepositional verbs? Can one really separate vocabulary from grammar? What about proper nouns, place names and historical or cultural references? Etc.

This, by the way, is why learning the list of the most common words of a language is a very inefficient strategy. That said, I do believe in focusing on the most commonly used grammatical-lexical structures.

2. Trying to correlate vocabulary size, text coverage and comprehension is useless. As was pointed out in a previous post, knowing 90% of the words in a text does not mean 90% comprehension. Anything lower than around 98% coverage is very frustrating.

3. Although we can roughly measure vocabulary or word-family size of written texts, I know of no studies that have actually counted the number of so-called words that people actually speak. For example, I could take all the posts I have written here in the last two months and count the number of word families. Iversen is the only person I know who has done something comparable.

All though many vocabulary size tests you can find on the Internet use a form of frequency list interval method in which you are asked if you know certain words ranked by frequency. So, for example, if you know word 1000, the system assumes you know all the lower ranked words. Thus, with a tiny sample of key words, the system can estimate your vocabulary size.

Although I recognize some validity in this approach which is basically all we have, it doesn't make fundamental distinctions between word families whose form I recognize, word families I have actually seen before, word families whose meaning I fully know and word families I have actually used myself accurately.

4. More vocabulary is obviously better than less but it is misleading to associate numbers with the CEFR proficiency tests. Let's put aside the receptive tests of reading and listening for a moment and look at the productive vocabulary necessary. Let's say you have 45 minutes to write a minimum 250-word essay on a given topic. How many word families will you use? I would suggest 150 - 200. That's not a lot but we all know that the real challenge is how to put all those words together in proper, idiomatic, fluid form that demonstrates your level of proficiency. So, how many word families do you need here? Not more than 200.

As for the speaking test, In a 15-minute conversation with the examiner, how many word families will come out of your mouth? I don't know and I think the question is irrelevant. The important thing is how well you can use the word families that you know. Which is better, 100 words beautifully put together idiomatically with impeccable grammar or 200 words full of mistakes and often incomprehensible.

As for the receptive part of the tests, you will be confronted with only a very small selection of vocabulary but you don't know what selection. Here is where knowing more is better than less. Lots of reading and listening are very important obviously but I think it's important to study strategically. You have to able to recognize word-families and even guess or derive meaning from context.

5. While I agree that more vocabulary is better than less, I personally emphasize mastering a small number of elements rather than using a large number poorly. In the past, this idea has gotten me into epic arguments when I suggested that one only needs to use about 300 words to pass a C2 exam. Note I said "use" not "know".
7 x

User avatar
Deinonysus
Brown Belt
Posts: 1222
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: The size of vocabulary to set as a goal.

Postby Deinonysus » Sun Jan 10, 2021 3:52 pm

s_allard wrote:5. While I agree that more vocabulary is better than less, I personally emphasize mastering a small number of elements rather than using a large number poorly. In the past, this idea has gotten me into epic arguments when I suggested that one only needs to use about 300 words to pass a C2 exam. Note I said "use" not "know".

That reminds me of an apocryphal story about Nicola Tesla visiting one of Henry Ford's factories to solve a particularly tough problem. He walks onto the floor, takes one look at the troubled instrument, marks the problem area with an x with chalk, and leaves. When Ford receives Tesla's bill of $10,000 he asks him to itemize it, and then he gets a letter the next day:
  • Marking the x: $1
  • Knowing where to put it: $9,999.
I have never taken a CEFR exam of any level, but I imagine it's the same case. 0.01% of the exam is writing the 300 words, and 99.99% is knowing which 300 words to use and how to use them.

But, I couldn't agree more about mastering a small core of elements rather than scratching the surface of a large number. To quote Bruce Lee, "I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times."
11 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
lusan
Green Belt
Posts: 463
Joined: Sat Aug 15, 2015 1:25 pm
Location: Greensboro, NC, USA
Languages: Spanish(Native)
English (Naïve)
French(Intermediate)
Italian(Intermediate)
Polish(In Alcatraz)
x 985

Re: The size of vocabulary to set as a goal.

Postby lusan » Sun Jan 10, 2021 4:40 pm

AcademiaNut wrote:
lusan wrote:It depends. If one masters tactics and strategy is possible to play at an expert. No need of heavy memorization. I love chess. Look at Capablanca life. He did pretty good.


After he retired, Bobby Fischer said outright that he hated chess now because of the amount of memorization required, but also how important memorization is for chess nowadays. Here are similar quotes by Bobby Fischer on that topic:
https://www.dailychess.com/forum/only-c ... ght.103438

I figure that if anyone had good wisdom about the pursuit of chess, Bobby Fischer had it, so that discouraged me. In my opinion it was difficult to know which strategies to use when I wasn't familiar with the type of position after "book" ran out, and in some openings the type of position was very hard to predict.

Fortunately for me, my language goals are very modest: I'd be quite happy with reaching level B1 or B2 in any foreign language. One problem is that I was overambitious years ago, and tried to learn too many languages at once, so I failed at all of them. Another problem is that language education is awful in the U.S., and probably in other countries, too, judging by the awful accents I hear coming from people from certain countries when they try to speak English. I don't want to sound like that.

One set of statistics that is often quoted is that it takes 600-700 hours of study to reach reasonable professional proficiency in one of the easier languages, and up to 1,100 hours in the most difficult languages. At 8 hours/day, that higher figure corresponds to 138 days, or about 5 months. That sounds very low to me, but maybe with a very good teacher or school it would be reasonable.
https://linguapath.com/how-many-hours-learn-language/
Linguist Steve Kaufmann recommends B2 as the level to aim at, which sounds about right to me, and roughly matches my goal:
https://blog.thelinguist.com/how-long-s ... -language/


I study chess everyday and I play monthly competitions. Though the initial objective is winning, the beauty is NOT winning but the intellectual experience. Chess is not a memorization thing, but a chains of intents and ideas that occurs with 32 pieces on a 64 board. It is not memory but mental pleasure. The openings are set in stone. The real game start in the middle game. Going through games of Capa, Magnus, Karpov, etc. give me the same feeling of when I first learned thermodynamics. Pure intellectual pleasure.

Language-wise, for me, learning a language is similar to falling in love. I jumped in bed with French when listening, one day, Piaf brought tears to my eyes. Right now I am listening to French music, Francis Cabrel, and I am happy. Fall in love and then grammar, words, sentences structures are no more chores but the pleasure of sharing with a love one. Choose!
9 x
Italian, polish, and French dance
FSI Basic French Lessons : 10 / 24 17 of 24 goal


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests