The size of vocabulary to set as a goal.

General discussion about learning languages
User avatar
einzelne
Blue Belt
Posts: 804
Joined: Sat Mar 17, 2018 11:33 pm
Languages: Russan (N), English (Working knowledge), French (Reading), German (Reading), Italian (Reading on Kindle)
x 2884

Re: The size of vocabulary to set as a goal.

Postby einzelne » Wed Jan 20, 2021 8:26 pm

When it comes to reading, here's an interesting n=1 experiment. The guy decided to learn German by reading Der Spiegel. He meticulously put every new word or expression into an Excel file. Over 2 years he got more than 20 000 entries. Here's a quote (you can also find his other posts on Quora where he describes his experience)

As I read German newspapers, I write out unfamiliar words and phrases into my German-English wordlist. On July 5, I added 23500th word to my wordlist. At that point I decided to find out how close I was to reading German newspapers without a dictionary.

So, betwen July 5 and July 30, 2016, I have read 100 articles in various German newspapers (Der Spiegel, Süddeutsche Zeitung, Frankfurter Allgemeine Zeitung, and others). The combined length of the articles constitutes 63687 words. Of these, as it turned out, 333 words and phrases were new to me (that’s on average 5 unfamiliar words for every 1000 words of newspaper text).

More specifically, 22 articles had 0 new words; 63 articles had between 1 and 5 new words; 9 articles had between 6 and 9 new words; and 6 articles had in excess of 10 new words (one of these had 33 new words).

Thus, even with a vocabulary of 23500 words and phrases, occasional use of a dictionary when reading German newspapers will still be necessary, .


Personally, I think 20k words and expressions (not lemmas!) is about right for reaching a comfortable level of reading an average magazine article/contemporary fiction or non-fiction.
3 x

Dragon27
Blue Belt
Posts: 619
Joined: Tue Aug 25, 2015 6:40 am
Languages: Russian (N)
English - best foreign language
Polish, Spanish - passive advanced
Tatar, German, French, Greek - studying
x 1382

Re: The size of vocabulary to set as a goal.

Postby Dragon27 » Thu Jan 21, 2021 5:52 am

einzelne wrote:
The combined length of the articles constitutes 63687 words. Of these, as it turned out, 333 words and phrases were new to me (that’s on average 5 unfamiliar words for every 1000 words of newspaper text).

More specifically, 22 articles had 0 new words; 63 articles had between 1 and 5 new words; 9 articles had between 6 and 9 new words; and 6 articles had in excess of 10 new words (one of these had 33 new words).

Thus, even with a vocabulary of 23500 words and phrases, occasional use of a dictionary when reading German newspapers will still be necessary, .


5 new words per 1000 words is 99.5% coverage. Far more than 98% that many people recommend (most notably - Alexander Arguelles) for unassisted extensive reading for pleasure. So is it necessary to use a dictionary at this point, or is it just to satisfy the habit of using the dictionary? Maybe that one article that had 33 new words was a tougher nut to crack, I agree. But that would be very occasional use. I encounter new words reading novels in my native language all the time (and look them up in a dictionary, guilty).

Besides, 23500 words (or how many of these are words) in two years? That is way too optimistic a pace. How much of it has been forgotten?
0 x

Beli Tsar
Green Belt
Posts: 384
Joined: Mon Oct 22, 2018 3:59 pm
Languages: English (N), Ancient Greek (intermediate reading), Latin (Beginner) Farsi (Beginner), Biblical Hebrew (Beginner)
Language Log: https://forum.language-learners.org/vie ... =15&t=9548
x 1294

Re: The size of vocabulary to set as a goal.

Postby Beli Tsar » Thu Jan 21, 2021 10:42 am

Cainntear wrote:
Arizakai wrote:
Gustav Aschenbach wrote:Interesting video by someone who took the same approach (memorizing vocabulary as a first step):

Interesting video, indeed. But memorising all vocabulary from a textbook before starting the textbook is a bit extreme imho. What I and probably most people do is memorising vocab for the unit you are learning at that moment. This way you achieve the same results with less pain.

So yes, you need to do something more to make the material stick, but memorising all the vocab before starting seems mind-numbingly tedious, and would actually really overcompensate for the density of the material, leaving me with too little work to do when working through the lessons, so I'd get really bored.

I've been trying to recall when I did this for the last few days (thus demonstrating my immense powers of memory...) and it's suddenly come back to me. I used Memrise to memorise all the vocab in my main Hebrew textbook - 578 items - before starting the textbook. It was two years ago, so I was young and foolish...

So the results of the experiment were interesting - it worked surprisingly well. I was working on something else (Farsi? Greek?) at the time, so I didn't have headspace for Hebrew. 25 new words on the bus, though, was manageable. Yes, some words didn't work that well - anything that wasn't a noun or a verb was a bit harder - but it stuck. I'm not sure it would have worked with Anki instead of Memrise - it's a bit cruder for learning and you need the exposure more. Even though I started Hebrew a lot later than planned, leaving nearly a year between stopping the vocab and starting the textbook, most of it was (and is) still there.

The benefit was twofold. Firstly, I gained a lot of simple reading practice in a really awkward script before starting. Secondly, it was one less thing to worry about while doing the book. For me, those two were really helpful - it reduced the mental overhead working through each chapter substantially. Because I didn't have to devote any mental space to the script or vocabulary, I could really concentrate my mind on the grammar. It's just a pity I ran into sickness, wanderlust, and academic pressures and didn't complete the grammar!

Do you get the same benefit by doing it before each chapter? I don't think so - you'd get other benefits, like cementing it in place with different exposure - but that ignores the benefit of letting things sink into your mind over the long term. As I said, I'd retained most of it nearly a year later, because I'd had the time to work through all the vocab and let it sink in properly. Time is very powerful with SRS - you know something so much better after one month, and so much better still after three.

Will I do it again? Well, I'm not planning on any new languages any time soon, so probably not. I did do a bit with Latin, which may be helping me currently, but Latin's so transparent (comparatively speaking!) that it is hard to really test. But with a limited set of vocab I might consider it, depending very much on what energy and time I had available. It let me work a little on the language before starting. Of course, just doing Duolingo or Clozemaster might have similar effects.

Not sure it's something to recommend, but I was nonetheless pleasantly surprised at how well it worked.
4 x
: 0 / 50 1/2 Super Challenge - Latin Reading
: 0 / 50 1/2 Super Challenge - Latin 'Films'

User avatar
einzelne
Blue Belt
Posts: 804
Joined: Sat Mar 17, 2018 11:33 pm
Languages: Russan (N), English (Working knowledge), French (Reading), German (Reading), Italian (Reading on Kindle)
x 2884

Re: The size of vocabulary to set as a goal.

Postby einzelne » Thu Jan 21, 2021 6:28 pm

Dragon27 wrote:
einzelne wrote:So is it necessary to use a dictionary at this point, or is it just to satisfy the habit of using the dictionary?

Besides, 23500 words (or how many of these are words) in two years? That is way too optimistic a pace. How much of it has been forgotten?


It really depends on the context. Statistics might be misleading. For instance, if you 3 words from these 5 unknown words are in the same sentence, there is a high chance that the sentence would be totally unclear to you and you'll have to use a dictionary. Quite often it is so called low-frequency words which contain the meaning of the whole sentence.

As for 20k words in 2 years, I don't know. But he describes his experience elsewhere. If I remember he read about 3 hours every day and then he repeated new words during the day. But he states that he doesn't know all these words by heart and that he hardly remembers hapaxes.
0 x

Dragon27
Blue Belt
Posts: 619
Joined: Tue Aug 25, 2015 6:40 am
Languages: Russian (N)
English - best foreign language
Polish, Spanish - passive advanced
Tatar, German, French, Greek - studying
x 1382

Re: The size of vocabulary to set as a goal.

Postby Dragon27 » Sat Jan 23, 2021 7:27 am

einzelne wrote:It really depends on the context. Statistics might be misleading.

Of course, it was only average statistics. And we have not much to go on other than these numbers. And the numbers tell me that it's likely that 99% of the text was already known words (I'm not sure if I can even trust this 5 unfamiliar words per 1000 words statistics, what if these new words were encountered more than once each?).

einzelne wrote:For instance, if you 3 words from these 5 unknown words are in the same sentence, there is a high chance that the sentence would be totally unclear to you and you'll have to use a dictionary. Quite often it is so called low-frequency words which contain the meaning of the whole sentence.

And if we like, we can strategically place the unfamiliar words in such a way, that the entirety of text was almost incomprehensible. How likely it is that with 333 unknown words/phrases in the entire set of articles consisting of 63687 words combined we will encounter a sentence that has a whole bunch of these rare words in it at once?
My intuition tells me that these rare words usually aren't used in such a way as to make them key to understanding the meaning, without our being able to at least guess what they could mean in context (which is the entire point of extensive reading).

einzelne wrote:As for 20k words in 2 years, I don't know. But he describes his experience elsewhere. If I remember he read about 3 hours every day and then he repeated new words during the day. But he states that he doesn't know all these words by heart and that he hardly remembers hapaxes.

Well, in a big enough corpus, if we extract distinct words and count their frequencies, then on average half of them are hapaxes. Is this a large enough corpus (doesn't seem like too big a corpus to me, tbh)? Whatever the case is, memorizing 23.5k words in two years is an improbable task. Native speakers in their prime years of vocabulary bursts don't memorize words as fast (not even close). And I highly doubt that memory techniques and deliberate learning can show significant increase in efficiency over that (if we're talking about long term vocabulary increase).
1 x

User avatar
einzelne
Blue Belt
Posts: 804
Joined: Sat Mar 17, 2018 11:33 pm
Languages: Russan (N), English (Working knowledge), French (Reading), German (Reading), Italian (Reading on Kindle)
x 2884

Re: The size of vocabulary to set as a goal.

Postby einzelne » Wed Jan 27, 2021 6:17 pm

Dragon27 wrote:Whatever the case is, memorizing 23.5k words in two years is an improbable task. Native speakers in their prime years of vocabulary bursts don't memorize words as fast (not even close). And I highly doubt that memory techniques and deliberate learning can show significant increase in efficiency over that (if we're talking about long term vocabulary increase).


Well, didn't claim that he knew all these word. As he wrote himself: "For instance, from my 27000-word list I remember the words I most often come across in German newspapers. And the words I encountered just once, I most likely do not remember." and elsewere: "I’ve written out more than 23500 German words into my German-English wordlist. Of these I can probably recognize in written form about 75%". And lastly: "When I began to self-study German by reading the German press in February 2014, I was writing out between 50 and 80 words per day. I repeated them several times during the day and then the next day. And then in roughly 2 weeks a bunch of 500-700 words that have piled up over this period. Overall, I repeated each word no less than 18 times. But still I could retain long-term only about 60% of the words in my memory."

But in principle, I don't see why it is not doable. For instance, if you grew up bilingual, already know English at C2 level and is very dedicated (he read German each day for 2-4 hours, putting each new word in your list, repeating them during the day, reading grammar books and listening to radio apart from that, that's quite a determination!), I don't see why it's not possible. It's about 30 new words per day, and all you need to know is to recognize them in the text, not for production (so no need to worry about gender, declensions, conjugation etc.)
0 x

Dragon27
Blue Belt
Posts: 619
Joined: Tue Aug 25, 2015 6:40 am
Languages: Russian (N)
English - best foreign language
Polish, Spanish - passive advanced
Tatar, German, French, Greek - studying
x 1382

Re: The size of vocabulary to set as a goal.

Postby Dragon27 » Wed Jan 27, 2021 7:50 pm

einzelne wrote:It's about 30 new words per day, and all you need to know is to recognize them in the text, not for production.

Nobody was talking about production. It's an order of magnitude more incredible to learn to actively use 23500 words in two years.

30 new words per day quickly add up, when you have to repeat them multiple times. You can't just learn 30 words in a day and memorize them forever and then continue doing that every day. Those 30 new words quickly turn into mountains of words, and you brain won't be able to handle all of them. If we take the example (and word) of the guy in the experiment, he had to repeat hundreds of words that have accumulated over weeks, repeat each word at least 18 times, and he still forgot a major part of them. And will forget even more in the future.

You don't see why it's not doable, I don't see how it's doable.
Last edited by Dragon27 on Thu Jan 28, 2021 5:12 am, edited 1 time in total.
1 x

User avatar
einzelne
Blue Belt
Posts: 804
Joined: Sat Mar 17, 2018 11:33 pm
Languages: Russan (N), English (Working knowledge), French (Reading), German (Reading), Italian (Reading on Kindle)
x 2884

Re: The size of vocabulary to set as a goal.

Postby einzelne » Wed Jan 27, 2021 8:34 pm

Dragon27 wrote:You don't see why it's not doable, I don't see how it's doable.


For the 99,99% of people it is undoable, no need to argue about it. But when I started to work on my German vocabulary systematically, I could easily add 20 new words per day and it was during the time when I was a grad student overburdened with coursework and TA obligations (so you can imagine my cognitive workload). When I was learning French, I went through 5k words in two months after I finished Assimil. And I can easily review new vocabulary from a book (usually it's about 1k) in a couple of days, provided that I worked with this book closely (i.e. I wrote translations on the margin.)

I doubt you forget the words in the future, provided that you continue to read everyday (and why wouldn't you if you started to learn German for reading in the first place, like this guy?). If anything, your vocabulary will only increase. I used to print out PDFs when I started to read books in English, so I could highlight new words and write down translations on the margin. And I still remember how I came across these printouts about 3-4 years later while cleaning up. All highlighted words were familiar, my only thought was: "It's such an obvious word! How could I not know it?"

But if you stop reading (or never started to read and listen to unadapted texts in copious amounts in the first place) then, hell, yes, you will start forgetting even the most basic stuff.
1 x

Daniel_Zar
White Belt
Posts: 10
Joined: Tue Jan 29, 2019 9:55 am
Languages: Friulian N, Italian N, English C2, Spanish C2, French C1, German B2, Czech B1.
x 5
Contact:

Re: The size of vocabulary to set as a goal.

Postby Daniel_Zar » Wed Jan 27, 2021 9:51 pm

"5000 words allow you to understand about 98% of most ordinary texts" is an overestimation.

The mistaken assumption is: since those 5000 words make up 98% of a given text, knowing those words I'll get 98% of the text.

In reality it is more around 80%, and I may be optimistic. As someone pointed out, the 2% you don't know is much heavier in terms of meaning.
2 x

User avatar
Le Baron
Black Belt - 3rd Dan
Posts: 3578
Joined: Mon Jan 18, 2021 5:14 pm
Location: Koude kikkerland
Languages: English (N), fr, nl, de, eo, Sranantongo,
Maintaining: es, swahili.
Language Log: https://forum.language-learners.org/vie ... 15&t=18796
x 9564

Re: The size of vocabulary to set as a goal.

Postby Le Baron » Wed Jan 27, 2021 10:32 pm

I'll admit I've not read every post in this thread (I read a lot though!) and I don't believe people accrue words by rote, and not in a relatively short period, with continued positive returns. It doesn't match how or why we learns words - even in native languages - which is in use context (the bulk of useful everyday words in a social context, others from reading, TV etc). One can argue that they are looking up words and learning them 'in context' when reading a book, but is that what is really happening? When you read a book in your native language, depending on the size of your vocabulary, you will either look up a word you don't know, perhaps guesstimate its actual meaning from context or more likely just skip over it because it makes little difference with regard to understanding what you are reading.

It's pretty much established that reading lists of words and looking up words in dictionaries does not fix them into your mind (you'll remember at best a handful). What fixes words in your mind is hearing and using them again and again and again in lots of situations. We remember what we need to remember, so if you're e.g. a nurse learning Spanish for the workplace you'll likely remember máquina de rayos X, whereas the average person - unless he breaks his arm - probably won't. For more passive words, they also reflect exposure to culture and very often your circle of interests.

What's the point of setting a numbered goal? You know what you know as your understanding broadens. Words learned link up with related words which you don't need to actively learn. Many sink into dormancy. I used to love those Grafisk Forlag Easy Readers and I have a pile of them in a few languages. The level 'D' readers only bring you up to about 3000 words and though a person with an eye on something more like 8-10,000 might consider this "low", just open the D books and see how rapidly you can read it in your current target language with an A2 (max early B1) level.

There's a lot of stuff out there for enthusiasts about 'research' and methods for classifying words and word groups for learning, but much of it has little useful direct application for the average language learner working alone on self-study. It obviously gets used by course designers and editor/compilers of graded books. Like someone said in the first few pages the average daily active vocabulary of people (even native speakers) is not enormous and the main value of a dormant larger vocabulary is instant recognition when these words make their appearances. Anyone gunning for 10,000 words in a year of study is living in cloud cuckoo land.
6 x
Pedantry is properly the over-rating of any kind of knowledge we pretend to.
- Jonathan Swift


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests