CEFR Levels and vocabulary size

General discussion about learning languages
User avatar
emk
Black Belt - 1st Dan
Posts: 1692
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6622
Contact:

Re: CEFR Levels and vocabulary size

Postby emk » Fri Aug 19, 2016 9:32 am

Just a few days before I sat my B2 exam, I took a 5,000 word frequency dictionary of French, and picked several pages (one near the front, one near the end, a couple in between, etc.). I counted how many words I could roughly define on each page, and I still knew a large fraction of the words between 4,900–5,000. So if the book had included words 5,001–10,000, I would have known another good chunk of words from those pages as well. Based on how rapidly my performance was falling off, I guestimated, roughly speaking, that I knew about 7,000 words or so in French at the time.

(This was before the Super Challenge and the several million words of reading I did, and before thousands of sentence cards with Anki. The last time I tested my French vocab a year ago, using a test for native speakers, I scored above 15,000, but the test tended to count related forms separately, which makes a big difference past 10,000. 10,000–15,000 words is supposedly pretty typical after a year or two of serious immersion. For comparison purposes, this site estimates I know about 34,000 words in English.)

Anyway, with my vocab on the rough order of 7,000 words, I wound up guessing the meaning of quite a few words on the DELF B2 exam, and this was sometimes the critical factor in answering a question. Nonetheless, I passed comfortably (78/100, with 50/100 as the minimum passing grade). The DELF B2 exam includes lots of "highbrow" newspaper articles, including subjects like poaching in Africa, and the oral exam asked me to defend an opinion on congestion charging for cars in Paris.

So I'd imagine that sitting the DELF B2 with a passive vocabulary of much less than 5,000 words might be fairly stressful. And it's not like you could publish an official word list—the exam designers basically just pick interesting articles from real newspapers, and the vocabulary used is whatever was in the original articles. You're expected to cope with it.
8 x

s_allard
Blue Belt
Posts: 985
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2370

Re: CEFR Levels and vocabulary size

Postby s_allard » Fri Aug 19, 2016 12:55 pm

As usual, emk's post is interesting. I shall ignore some nasty comments from the self-styled thread police. When I see any kind of figures for vocabulary size, I always question what is the unit of measurement. Are we talking about tokens, types, lemmas or word families? For example, the excellent Frequency Dictionary of French by Lonsdale and Lebras contains lemmas not word families.

In passing, I should point out that this dictionary does present words in thematic lists, a feature I find very useful, as I alluded to in my previous post.

From my cursory perusal of the pdf, I didn't see any mention of idioms and figurative usage, but I may be wrong.

Another major issue is the notion of knowing a word. Is that the ability to recognize the word? So if I see "Je me suis fait à l'idée" I recognize the verb "faire". Do I know the meaning in the present context? - receptive knowledge. And, lastly, do I know how to use faire like this and do I know how to use faire in its many uses? - productive knowledge.

Although emk's post does not address this issue directly, I wonder what differences in vocabulary size do we find at the C level relative to the B level. Given, that, as emk points out, the examiners pick articles from real newspapers, how would the articles in the C-level tests differ? Would they have more vocabulary? Or would the vocabulary usage be more sophisticated? For example, wouldn't one be expected to know more idiomatic usage, word play and metaphoric or figurative uses?

I believe that there probably isn't much difference in raw - i.e. recognition - vocabulary size between a B2 and a C2 candidate. But there a world of difference in receptive and especially productive knowledge.
2 x

User avatar
James29
Blue Belt
Posts: 758
Joined: Mon Jul 20, 2015 11:51 am
Languages: English (Native)
Spanish (C1-ish)
French (Beginner)
Portuguese (Thinking about it)
x 1741

Re: CEFR Levels and vocabulary size

Postby James29 » Fri Aug 19, 2016 2:16 pm

s_allard wrote:
Another major issue is the notion of knowing a word. Is that the ability to recognize the word? So if I see "Je me suis fait à l'idée" I recognize the verb "faire". Do I know the meaning in the present context? - receptive knowledge. And, lastly, do I know how to use faire like this and do I know how to use faire in its many uses? - productive knowledge.



This is extremely important and I'm glad you raised it. What does "know" mean when everyone talks about vocabulary "size" and "knowing" 98% of the words in the text? Where does something cross the line from unknown to known? Does that line change places depending on the context of the discussion? If you know the most common definition do you "know" the word? What if you know the three or four most common definitions but do not know a number of idiomatic expressions with the word?

I've read things in my target language and "understood" "100%" of the words. Then, I re-read the same things and "understood" it on a much more deep level. This happens in native languages too.

"Understanding" depends on so many more things than what people are talking about here. The level of understanding dependent on the person's intelligence, life experiences and background knowledge. It depends on more than just the text and words. Any native English speaker can read a theory of stock investing and "understand" it perfectly fine. However, they will never "understand" it as well as Warren Buffett will "understand" it. I recently read a children's civil war book to a smart fifth grader. We both "understood" "100%" of the words, but I certainly "understood" the book at a much deeper level because I could draw on my knowledge of history, warfare, human nature, etc.

People are so interested in counting and quantifying things in ways that they really should not be quantified. "Knowledge" is not black and white. It is not something that can be counted as such.

I'd be really interested in s_allard's (and anyone else's) thoughts on this subject. When do you define a word as "known" for counting things like the 300 words and for "understanding" "98%" of the text, etc.
3 x

User avatar
Aozora
Orange Belt
Posts: 234
Joined: Mon Feb 22, 2016 3:46 pm
Location: Canada
Languages: English(N), Japanese (N2)
Language Log: https://forum.language-learners.org/vie ... 15&t=17971
x 203

Re: CEFR Levels and vocabulary size

Postby Aozora » Fri Aug 19, 2016 2:50 pm

James29 wrote:
I'd be really interested in s_allard's (and anyone else's) thoughts on this subject. When do you define a word as "known" for counting things like the 300 words and for "understanding" "98%" of the text, etc.

I'll give my view... When I say I "know" 300 words, I mean I understand them on at least one level, and could recognize them in reading. It may be that I understand those words in one context but not in another. It will take longer for me to understand them more thoroughly, but that's okay; I'm still learning. For "understanding 98% of a text"... I think most people mean they know 98% of the words on the page. I would differentiate between understanding the meaning of the text from the number of known vs unknown words. When it comes to "understanding" things are not clear cut, especially since we are self reporting our understanding and may have understood less than we think we did. It''s possible to "know" every word on a page but still have poor comprehension. On the other hand I may miss a word here or there but feel like I understood very well. If I'm talking about my own understanding of a text, I wouldn't put a percentage on it.
2 x
Super Challenge Books: 14 / 100
Super Challenge Films: 63 / 100

Cavesa
Black Belt - 4th Dan
Posts: 4978
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17680

Re: CEFR Levels and vocabulary size

Postby Cavesa » Fri Aug 19, 2016 4:17 pm

We could spend decades endlessly discussing "what does it mean to know a word" and such things in every thread but we have already discussed the issue many times and came to quite clear results. A rough definition we more or less agreed on in past was "can understand"-passive knowledge, "can correctly use"-active knowledge. Those are terms we have been using in this thread as well. Of course not everyone will agree on how many uses of the word you need to know before "claiming to know it" and such things and that is ok.

I am not that fond of the notion of all learners being naive and overestimating their level of comprehension.The numbers were always told to be estimates, and all that "understanding words" "understanding text" and similar matters have been discussed ad nauseam. Couldn't we stick to the topic for once?

I think there are only few possible outcomes of this endless "we know nothing" discussion and I find most of them very useless for language learning.

So a question back to the topic. I consider using one of my languages as a guinea pig for the "vocab counting and its usefulness/lessness". Anyone else would like to do such an experiment with me? So that we could plan something we could compare and draw some conclusion out of?
4 x

s_allard
Blue Belt
Posts: 985
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2370

Re: CEFR Levels and vocabulary size

Postby s_allard » Fri Aug 19, 2016 7:13 pm

Since James29 so kindly asked, I'd like to ever so briefly look at this question of various ways of knowing words. I'll take three examples. When I read British newspapers I always see mention of "high street". I certainly know the words "high" and "street". As a matter of fact, I use them regularly. For the North American me, a high street is a street that is in the higher part of town, as opposed to a low street. I recognize "high street" but as British readers here know "high street" has a number of uses in British English that they have nothing to do with the height of the street. I had to look up the word in the dictionary to find out what it meant. That's receptive knowledge. I have never used this word in my life - my productive knowledge, although I suppose I could.

In another British novel I was reading I came across the word "deerstalker". I figured out from the context that it was a sort of hat. Was it a hat for stalking deer? I had to finally look it up. Now I know what a deerstalker looks like. Nothing to do with stalking deer. Again, I have never used this word in my life.

Finally, a bit of grammar. French uses extensively something that is improperly called the reflexive verb. Technically it should be called the pronominal verb construction. Common examples are se laver, se réveiller, s'asseoir, etc. The interesting thing is that this same construction is used in five distinct ways. Of these five, a couple are very sophisticated uses that even native speakers do not often use. So you can see something you recognize, i.e. there's the "reflexive" verb "se faire" but it may be used in a special way, e.g. je me suis fait voler ma voiture. The B2 and C2 speakers can have exactly the same verbs but the C2 speaker will call on the more complex uses.

The point of all this is, as other posters have pointed out, when people say they understand something, it's very difficult to know exactly what they mean because it is all self-reporting. The real proof of the pudding is ability to actually use the words that one claims to understand. As any teacher can attest, students will say I understand everything but when time comes to actually use the words, it's a different story.

I understand that people like to count things like words. It sounds tangible. If I say I know 10,000 words in French, it sounds better than knowing only 5,000 words. But my interest is what you can do with those words. The essential idea of the CEFR system is the "can-do" statement. What can you do in the language? Don't tell me how many words you know, show me what can you do in the language. Can you write a professional letter with few mistakes? Can you introduce a table of panelists at a conference? Can you debate a topic with other speakers? Can you tell a joke in the target language? Nowhere does the CEFR speak of vocabulary or grammar size. It is assumed that you must have the necessary vocabulary and vocabulary. How much is basically up to you and how well you can use what you've got.
4 x

User avatar
tastyonions
Black Belt - 1st Dan
Posts: 1609
Joined: Sat Jul 18, 2015 5:39 pm
Location: Dallas, TX
Languages: EN (N), FR, ES, DE, IT, PT, NL, EL
x 3999

Re: CEFR Levels and vocabulary size

Postby tastyonions » Fri Aug 19, 2016 7:27 pm

Translation also works well as a check of understanding. A translation of "elle s'est fait arnaquer" as "she made herself conned" or "she conned herself" clearly indicates a defect in comprehension.
0 x

User avatar
Ezy Ryder
Orange Belt
Posts: 146
Joined: Tue Aug 25, 2015 8:22 am
Languages: PL (rusting Native)
EN (Advanced)
中文 (Lower Intermediate)
日本語 (Beginner, not studying)
台語 (Dabbling)
Language Log: viewtopic.php?t=1164
x 214
Contact:

Re: CEFR Levels and vocabulary size

Postby Ezy Ryder » Fri Aug 19, 2016 7:29 pm

s_allard wrote:The real proof of the pudding is ability to actually use the words that one claims to understand. As any teacher can attest, students will say I understand everything but when time comes to actually use the words, it's a different story.

Productive and receptive vocabulary don't really seem to be a perfect gradation. It's not unheard of to for example, be able to use a word, but not understand it when others use it (even in the exact same way). Or be able to write a Chinese character by hand, but not be able to "read" it in a book.

s_allard wrote:I understand that people like to count things like words. It sounds tangible. If I say I know 10,000 words in French, it sounds better than knowing only 5,000 words. But my interest is what you can do with those words. The essential idea of the CEFR system is the "can-do" statement. What can you do in the language? Don't tell me how many words you know, show me what can you do in the language. Can you write a professional letter with few mistakes? Can you introduce a table of panelists at a conference? Can you debate a topic with other speakers? Can you tell a joke in the target language? Nowhere does the CEFR speak of vocabulary or grammar size. It is assumed that you must have the necessary vocabulary and vocabulary. How much is basically up to you and how well you can use what you've got.

I think the whole idea is to estimate how much you need to learn, to be able to do those things... I can only speak for myself, but my ultimate goal in Mandarin is not to learn/acquire 20-25k lemmata, but rather to get to a level comparable to (or preferably better than) my English. Some evidence suggests 20-25k could be a decent, ball-park estimate for how much I need to learn/acquire for that, however it's more of a (not even the) means, than an end.
0 x
阿波
: 1250 / 1000010k SRS Challenge :
: 3750 / 48084,808 漢字 (handwriting) :

s_allard
Blue Belt
Posts: 985
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2370

Re: CEFR Levels and vocabulary size

Postby s_allard » Fri Aug 19, 2016 8:10 pm

Ezy Ryder wrote:...
I think the whole idea is to estimate how much you need to learn, to be able to do those things... I can only speak for myself, but my ultimate goal in Mandarin is not to learn/acquire 20-25k lemmata, but rather to get to a level comparable to (or preferably better than) my English. Some evidence suggests 20-25k could be a decent, ball-park estimate for how much I need to learn/acquire for that, however it's more of a (not even the) means, than an end.

It is certainly true that much of the thinking behind this idea of vocabulary size per CEFR level is the belief that we can determine goals or requirements in order to reach a given CEFR level, i.e. I need to learn 5,000 words to reach B2. In the same way, one could say that I have to read an entire grammar book to get all the grammar I need.

I actually think it's a good idea to systematically learn vocabulary but we must always keep in mind, as was pointed out, that it is a means not an end. When I say that counting is a waste of time what I'm saying is a) methodologically it is in fact very difficult and b) it is irrelevant because you should keep learning until you decide that you have had enough for your proficiency goals, irregardless of any figures.
2 x

User avatar
Ezy Ryder
Orange Belt
Posts: 146
Joined: Tue Aug 25, 2015 8:22 am
Languages: PL (rusting Native)
EN (Advanced)
中文 (Lower Intermediate)
日本語 (Beginner, not studying)
台語 (Dabbling)
Language Log: viewtopic.php?t=1164
x 214
Contact:

Re: CEFR Levels and vocabulary size

Postby Ezy Ryder » Fri Aug 19, 2016 8:28 pm

s_allard wrote:When I say that counting is a waste of time what I'm saying is a) methodologically it is in fact very difficult and b) it is irrelevant because you should keep learning until you decide that you have had enough for your proficiency goals, irregardless of any figures.

a) For this particular purpose, I think being consistent is enough. Think of it this way, 1 km isn't equal to 1 mile. And the difference between 500 km and 500 miles is even larger. But as long as a group is consistent about which unit they use, it shouldn't really cause confusion. In other words, as long as you specify what you mean by "know" and "word" clear enough, it should do the trick.
b) I don't know about you, but if I were to run a long distance, I'd find "halfway there" a much better motivation to keep running, than just "not yet there". Ideally we shouldn't need to have an idea how much more we have to acquire/learn, but it's better to think what's most likely to get you to your destination, than what ideally should get you there.
3 x
阿波
: 1250 / 1000010k SRS Challenge :
: 3750 / 48084,808 漢字 (handwriting) :


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests