CEFR Levels and vocabulary size

General discussion about learning languages
User avatar
outcast
Blue Belt
Posts: 585
Joined: Sat Dec 05, 2015 3:41 pm
Location: Florida, USA
Languages: ~
FLUENCY
Native: ENglish, ESpañol
Advanced: -
High Basic: DEutsch (rust), FRançais (rust), ZH中文
Basic: -
~
ACQUIRING
Formally: KO한국말, ITaliano, HI हिन्दी
Dabbling: HRvatski, GW粵語
Dormant: POrtuguês
~
Plan to learn: I BETTER NOT GO HERE FOR NOW
~
x 679

CEFR Levels and vocabulary size

Postby outcast » Mon Aug 15, 2016 10:24 am

I have seen many websites copy word by word the descriptions of the CEFR, and almost always the wording is kept the same as the official. Once and while I will run into a site that has added something to the level descriptions or more rarely, changed the wording. A result of this is, while very descriptive of the tasks one must be able to perform to be certified at that level (and indirectly hinting at the grammar structures needed for each level), I have never seen an actual CEFR that list the vocabulary size required to more or less attain each specific milestone.

A few weeks ago I wrote this in the "Intermediate level" thread:

"Most language tests agree: a B2 test requires you know between 4,000 and 5,000 terms to understand what is on the test. A C1 test between 6000 and 9000. What most people think of as "good" speakers of a foreign language, based on anecdotal evidence I have read over time, have an average vocabulary in their L2 of about 8,000 to 10,000 words, which is still half to 1/3 of native speakers. I would assume a C2 test would require at least 15,000 words to get through without major challenges. Let's remember even natives have problems passing such tests."

I forgot to mention at this time that these figures were based on a combination of anecdotal evidence (about the level of "fluent" L2 foreign speakers), based on two tests done on this subject matter I few years back; as well my extrapolation based on such studies and the average vocabulary size that is thrown out there for natives. Plus on vocabulary lists of tests like the HSK, the Chinese profiency tests for foreigners. Ironically, while I personally used the HSK levels and their vocabulary size as a landmark of sorts to estimate CEFR vocab size per level, the HSK itself is criticized by EuropeanTeachers of Chinese as being inaccurate in their claims of what each HSK level purportedly allows the taker to claim as a language level: basically, they claim the HSK grossly overestimates, since Hanban (the designers of the HSK) make a 1/1 correspondence between HSK 1 through 6 and the six CEFR Levels A1 to C2.

As an advanced student of Mandarin Chinese, I basically concur with the European teachers of Chinese, that the HSK 5 is definitely not C1, but rather a mid B1. And the HSK 6 is certainly high B2, maybe borderline C1 level (I say this because I have taken mock tests of the HSK 6 and they include some very rather obvious topic jargon: from various diseases like Lou Gehrig's, to neutron stars in articles about astronomy, to various financial terms in interviews of a CEO). The interesting thing is that when specifically the German branch of Teachers of Chinese put out their letter of refutation, they actually say the following:

The Fachverband welcomes the new HSK Chinese Proficiency Test that was published by the People’s Republic of China (PRC) earlier this year, especially insofar as it certifies elementary knowledge of Chinese for beginners with a vocabulary of 150 to
300 lexical units on the basis of the Hanyu Pinyin transcription system. It thus serves as a valuable motivator for students of Chinese.

However, in the interests of a proper and realistic assessment of Chinese language proficiency, we at the Fachverband Chinesisch, after examining the documents, consider it our duty to categorically deny the linking between the new HSK levels, as set out in the official HSK documents, and those of the Common European Framework of Reference for Languages (CEFR):

At present, the vocabulary size required for level A1 in all foreign languages is about 500 lexical units, for A2 about 1,000, for level B1 about 2,000. The new HSK suggests that just one-third of this vocabulary size would be needed to achieve the same levels of proficiency.
The official data given by the Hanban envisage that level B2 (HSK 4) will be reached after just 2 years of learning with 2-4 hou
rs of lessons per week (160-320 hours). These figures are out of the question, even for European languages. In this context we would like to refer once again to the resolution taken by the Fachverband in 2005, according to which we estimated that between 1,200 and
1,600 hours of instruction (+ private study time) are required to attain oral and written proficiency in Chinese that is comparable to level B2.


http://www.fachverband-chinesisch.de/si ... ungHSK.pdf

I slightly disagree (how dare I), with their view that HSK 6 is strictly a B2 level since there is some fairly technical material in there, and I have the feeling that you need to know more than the 5,000 "lexical units" (see below) to really feel comfortable with the listening and reading. I would estimate you need north of 6,000+ words to pass the test.

If you extrapolate those figures, then you I would get something like this (I may be wrong of course):

A1 = 500
A2 = 1,000
B1 = 2,000
B2 = 4,000
C1 = 8,000
C2 = 16,000

I am not trying to claim I am genius, I just may have guessed in educated fashion, but these figures match very closely with what I wrote in my quote above. 4,000 words for B2 (which is the level most people feel some "freedom of expression" in a foreign language, this is twice as many words as the famous figure of 2,000 to be able to communicate in most languages in a basic sense). And also match the stuff I have read about what people and experts consider "fluent foreign speakers of L2", having a vocabulary of "just 8,000" words, which matches the C1 estimate above. The C2 vocabulary level is just my guess, it could be more. Most native speakers that have finished high school have vocabularies higher than 16,000 from what I have read, so in theory they should be able to pass C2 exams. But then it also depends on their test-taking and other factors. University graduates I would assume could pass a C2 but who knows.

Finally, back to "lexical units". Is this a lemma? Or a word? If it is a word, is it that a lexical unit is "one definition", or multiple ones of the same word? That is my major doubt that could throw all this yapping I am doing about vocabulary size and language levels for a loop.

Anyways, I am just sharing this small info and my thoughts with anyone who is interested in this kind of stuff.
14 x
"I can speak wonderfully and clearly in zero languages, and can also fluently embarrass myself in half a dozen others."

The End of Language learning: 10 / 10000

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3310
Contact:

Re: CEFR Levels and vocabulary size

Postby aokoye » Mon Aug 15, 2016 1:58 pm

outcast wrote:Finally, back to "lexical units". Is this a lemma? Or a word? If it is a word, is it that a lexical unit is "one definition", or multiple ones of the same word? That is my major doubt that could throw all this yapping I am doing about vocabulary size and language levels for a loop.


That is really interesting. My previous attempts at finding word counts for CEFR levels have been pretty unsuccessful but it appears at least one textbook company (Cornelsen) think that 5,000 words for B2 is accurate. Once it's a more logical time for me to be doing any sort of research (it's not even 7am here) I'll do some Googling (and google scholaring? - yay for verbs being open classes). In terms of lexical unit, I am almost positive they mean lemmas.
0 x
Prefered gender pronouns: Masculine

s_allard
Blue Belt
Posts: 984
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2361

Re: CEFR Levels and vocabulary size

Postby s_allard » Mon Aug 15, 2016 2:07 pm

As the old-timers from HTLAL know, this topic of vocabulary size and the CEFR levels is right up my alley and is something I have written about extensively. I don't have the time right now to make a long post but I will throw a few ideas out for discussion.
First, the number one problem is how to define a word. The researcher most known for his work on vocabulary is Paul Nation and he uses the idea of word family which is basically a sort of minimal unit of functional form and meaning from which various word forms can be derived. These various forms are what we see on a page and can be called tokens. So a page may contain 400 tokens or words (as counted by MS-Word) but only 200 word families. That 50 000-word novel may only contain 5 000 word families.

One simple approach is the headwords of a good dictionary as the basic countable units.

There are all sorts of complications here that I can barely touch upon. How important is meaning when we consider that often the most common words can have very many different meanings? What about languages like German and to a lesser extent English where one can create compound words by stacking units? What about English phrasal verbs? What about idiomatic expressions?

Second, it is very important to understand how these estimates of vocabulary size necessary for a given task are calculated. The key concept here is coverage or the number of words in a document that you know. So, to be able to read with 99% coverage contemporary literature aimed at young people you may need a vocabulary of 10,000 word families. But wait a minute, it is very important to understand that this does mean that every book in the sample contains 10,000 words; quite the contrary, it means that the sum total of all the different words of all the different books is 10,000. Any given book will only contain let's say 3,000 different words but every book will have some unique words. What this means of course is that as the sample size increases so does the total number of word families.

Third, and this is my favourite topic, how many words or word families do you need to pass a C2-level test? I'll concentrate on the speaking test, but the reasoning is similar for the other receptive and productive skills. You may have a large productive vocabulary but how many actual words will you be using in that 20-minute discussion with the examiner? You need a somewhat large vocabulary because you do not know that the subject of the discussion will be but in fact the number of word families that will come out of your mouth during the exam period is quite limited. Assuming that the examiner will speak 25% of the time and you use a speaking rate in English of 150 words a minute, in 15 minutes of actual talking you will emit about 2250 word-tokens? Given that the spoken language is quite repetitive and assuming a ratio of 10 tokens to 1 word family, we have a rough estimate of 225 different words used during the exam. Let's be generous and say 350 different words.

We may quibble about the math, but the plain fact is that in the exam you are not going to use 1000 different words from your store of 10,000 words. You will only use a small fraction of the words you know. Which ones? Voilà la question.

Fourth, the question of number of words you use during the test is actually quite irrelevant. What is important is how you use the words. You will not be quizzed on vocabulary size. The examiner will not count the words you know. They want to see how you can control what comes out of your mouth. Could you ace the speaking test with only 150 word families? I think you probably could if you can show great dexterity with the words.

You have to understand what the examiner is looking for. They're looking for fluent well-crafted phrases with precise vocabulary and grammar that demonstrate the ability to make nuances and convey subtle meaning. Think of how a C2 candidate and B2 candidate would answer a simple question like: What is your favourite hobby and why?

In conclusion, I believe that vocabulary size is important in the sense that more is better but I think that the ability to use one's vocabulary is far more important.
Last edited by s_allard on Mon Aug 15, 2016 2:54 pm, edited 3 times in total.
11 x

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3310
Contact:

Re: CEFR Levels and vocabulary size

Postby aokoye » Mon Aug 15, 2016 2:15 pm

S_allard - I've read your posts before (or at least skimmed) them and am generally left with this question - what do you mean by word family? Do you mean headword or do you mean something else being that there are multiple definitions of "word family".
0 x
Prefered gender pronouns: Masculine

s_allard
Blue Belt
Posts: 984
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2361

Re: CEFR Levels and vocabulary size

Postby s_allard » Mon Aug 15, 2016 2:33 pm

aokoye wrote:S_allard - I've read your posts before (or at least skimmed) them and am generally left with this question - what do you mean by word family? Do you mean headword or do you mean something else being that there are multiple definitions of "word family".

Here is a pretty good definition in:
http://www.victoria.ac.nz/lals/about/staff/publications/paul-nation/1993-Bauer-Word-families.pdf

From the point of view of reading, a word family
consists of a base word and all its derived and inflected forms that can be
understood by a learner without having to learn each form separately. So,
watch, watches, watched, and watching may all be members of the same word
family for a learner with a command of the inflectional suffixes of English. As
a learner's knowledge of affixation develops, the size of the word family
increases. The important principle behind the idea of a word family is that
once the base word or even a derived word is known, the recognition of other
members of the family requires little or no extra effort. Clearly, the meaning
of the base in the derived word must be closely related to the meaning of the
base when it stands alone or occurs in other derived forms, for example, hard
and hardly would not be members of the same word family.
1 x

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3310
Contact:

Re: CEFR Levels and vocabulary size

Postby aokoye » Mon Aug 15, 2016 3:10 pm

Great - that's more or less what I thought you meant but I wasn't sure.
0 x
Prefered gender pronouns: Masculine

User avatar
aloysius
White Belt
Posts: 35
Joined: Mon Jul 20, 2015 8:49 pm
Location: Stockholm
Languages: Swedish (N), English, German.
Studying: French, Russian, Italian, Spanish.
x 43

Re: CEFR Levels and vocabulary size

Postby aloysius » Mon Aug 15, 2016 3:12 pm

Those are the exact numbers I tend to use.

A1 = 500
A2 = 1,000
B1 = 2,000
B2 = 4,000
C1 = 8,000
C2 = 16,000


I know I´ve seen them before over at HTLAL. Maybe in one of these very long threads:

http://how-to-learn-any-language.com/forum/forum_posts.asp?TID=35075&PN=25&TPN=1

http://how-to-learn-any-language.com/forum/forum_posts.asp?TID=39213&PN=8&TPN=1
1 x

Hork
White Belt
Posts: 11
Joined: Sun Jan 10, 2016 6:57 pm
x 8

Re: CEFR Levels and vocabulary size

Postby Hork » Mon Aug 15, 2016 4:39 pm

Here's a vocabulary vs CEFR level description for German: http://www.bellingua.ch/en/levels/intensive-course-morning
2 x

User avatar
jeff_lindqvist
Black Belt - 3rd Dan
Posts: 3153
Joined: Sun Aug 16, 2015 9:52 pm
Languages: sv, en
de, es
ga, eo
---
fi, yue, ro, tp, cy, kw, pt, sk
Language Log: viewtopic.php?f=15&t=2773
x 10537

Re: CEFR Levels and vocabulary size

Postby jeff_lindqvist » Mon Aug 15, 2016 5:17 pm

Interesting chart. How are we supposed to interpret the numbers in the weeks column? 9 weeks to A1 - no problem. Another 10 weeks to A2 - fine. 70 weeks in total to GZ C2?
0 x
Leabhair/Greannáin léite as Gaeilge: 9 / 18
Ar an seastán oíche: Oileán an Órchiste
Duolingo - finished trees: sp/ga/de/fr/pt/it
Finnish with extra pain : 100 / 100

Llorg Blog - Wiki - Discord

Cavesa
Black Belt - 4th Dan
Posts: 4974
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17637

Re: CEFR Levels and vocabulary size

Postby Cavesa » Mon Aug 15, 2016 6:41 pm

I'd be careful at automatically transferring HSK lists to CEFR levels. From what I've read, many Mandaring learners and teachers do not agree the new HSK 1-6 level equal the individual Cefr levels. As far as I've read, the concensus seemed to be "the highest HSK=B2 and the new system has been dumbed down to become more popular and gain more money". I am not saying HSK1 is B2, I cannot tell, but I just think the JLPT and HSK lists are of limited use, when talking about the european languages' levels.

The table by Aloysius is not precise, in my opinion. My Lernwortschatz Deutsch book, by Hueber, contains 4000 words and annotation "Léxico correspondiente al nuevo Zertificat Deutsch" and ZD is B1. Hork's source seems to agree, and goes up to C2=10000 words. I think German is a good example, as it is the only language in which I've found full lists "this is what you're expected to know at this level", usually for the B1 exam. The Germans seem much less afraid to quantify such things :-) In French and Spanish, the workbooks like Vocabulaire Progressive or normal courses sorted by level are supposed to give you all the vocab (of course not for the high levels, it gets much more complicated there) but counting would require someone with more free time than I've got as all the series have huge overlaps in the wordlists and, on the other hand, most books do not include everything used in the wordlists.

Perhaps someone, who has worked more with the Cervantes' level guide, could bring yet other numbers (and well founded) to the discussion.
0 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests