CEFR Levels and vocabulary size

General discussion about learning languages
Cavesa
Black Belt - 4th Dan
Posts: 4974
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17637

Re: CEFR Levels and vocabulary size

Postby Cavesa » Fri Feb 08, 2019 12:27 pm

Excellent post by Ser. I think it is hard to draw conclusions without knowing more about the methodology. Also, lets's not forget that English has the huge phenomenon of relatively few words being used for tons of stuff (the phrasal verbs are the nightmare of most beginner/intermediate learners), while French uses the individual words, or a word with a prefix or suffix. German creates many words out of other ones.

So yes, these are my amateur observations, but I think they show that the methodology could make a ton of difference, like always. I am also careful about this, as the amount and complexity of vocabulary is a rather explosive argument. Basically any time in the history, when one group wanted to discredit another group based on the language (the closest example to me is the AU empire and Czech vs German), the "they have much less vocabulary" argument almost always shows up! It is the "they should learn the better language instead of theirs for their own good, to not be limited by it". That's why I would be very careful about such statements.

Inst wrote:TBH, it's just my bad memory. The way I do recall was the comparison between vocabulary requirements for C2 French vs C2 English (TOEFL), with the latter being significantly larger.


I totally believe that but I'd see a different element here, taking a part of the "blame". I am not an English native. And I do not hesitate to say that English is being taught much more seriously than the other languages in Europe. It is not just about the amount of classes in the mainstream schools. But the attitude of "well, you are not expected to learn French anyways" and "English is vital" is so prevalent, that it may skew even the expectations from the C2 learners.

Based on my individual experience (and I haven't found any reliable sources on that), I am convinced that the expectations of what the learners can achieve at each level are not equal among the langauges, despite the CEFR scale being just one and intended to level the field.

So, when we see a table with a lower number of words required of a French learner at a certain level than of an English level (or even more obviously the real number of words the French and English learner has learnt at that level), I think it is caused at leastjust as much by the curriculum and the different expectations (which do get into coursebooks and the teachers' heads) as by the chachacteristics of the languages themselves.


The way I assumed the synthetic / analytic difference worked was that a synthetic language usually had a complex grammar with many inflections. For instance, nouns in English are gendered rarely, and the gender plays little grammatical effect. French, in contrast, has gender on its nouns, and someone on this board claimed that verbs have up to 30 forms, although many fall into verb classes. German has three genders, in contrast, with four different cases. What I thought the end effect would be would that it'd be harder for native speakers to expand their vocabulary; not only would they need to grasp the meaning, pronunciation, and sometimes orthography of the new word, they would also need to learn the grammatical features of the new word.


No. That's not how it works. Yes, small children will make different mistakes, based on their native language. But as a native speaker of a language with conjugations, declinations (7 cases), three genders, etc, I can assure you that this has absolutely nothing to do with acquiring more vocabulary. Natives do not struggle with that. After the initial phases in childhood and after a few years of learning to write correctly at school (the fact the English natives seem to have fewer grammar classes doesn't mean they don't need them just as much), you grasp the grammar related to the new words automatically. The same is true about the French and German natives.

With your argument, you could say the opposite. The English natives would struggle with learning new words, because of the crazily irregular ortograph.

A better illustration might be Chinese, although it is analytic.

Not really. CEFR is difficult to apply on non european languages. As you've said yourself, the HSK 6 is not C2. The difficulties are different, the way the learners progress is different.


But this is just a conjecture, anyways. I can't get clear and definitive data on the following topics:

-Rates at which children learn words/word families up to adulthood in a given language
-Vocabulary size of young adults in a given language, varying by education level


Such research will still be of limited value, as the researchers tend to leave out the most important parts that make enormous differences between the children: the communication and teaching activities from the parents, the IQ of the child, and how early and eagerly (or not) the child reads.

And it would still be irrelevant for the adult langauge learners, in my opinion.
3 x

User avatar
Querneus
Blue Belt
Posts: 841
Joined: Thu Dec 01, 2016 5:28 am
Location: Vancouver, Canada
Languages: Speaks: Spanish (N), English
Studying: Latin, French, Mandarin
x 2286

Re: CEFR Levels and vocabulary size

Postby Querneus » Fri Feb 08, 2019 1:20 pm

Inst wrote:The way I assumed the synthetic / analytic difference worked was that a synthetic language usually had a complex grammar with many inflections. What I thought the end effect would be would that it'd be harder for native speakers to expand their vocabulary; not only would they need to grasp the meaning, pronunciation, and sometimes orthography of the new word, they would also need to learn the grammatical features of the new word.

It is a common misconception among non-linguists that grammar is about morphology, especially inflections. It is also about syntax, a lot of it. Learning the grammatical features of every single word is necessary in all languages upon Earth.

For example, learning a noun in English involves learning its grammatical countability (we say "many chairs" but not "many furnitures") and the prepositional complements it can take (it's "a cup of coffee", not "a cup with coffee"). Learning a verb in English involves learning what complements it can take (a direct object, an indirect object with "from" or "to" or "for" or all three, adverbial adjuncts with a variety of prepositions, one or both types of infinities, one of more of several types of clauses), how natural it is to use it without any complements at all, whether it must take a subject of a particular number, whether it can form compound tenses freely (this is limited in verbs like "to love"), and technically whether it can undergo verb-subject inversion without do-support (this list is off the top of my head, and is a non-exhaustive list).

I don't think inflectional morphology is such a hindrance to learning a word in more synthetic languages either, because grammatical knowledge, conscious or not, can correctly produce acceptable inflections in most verbs with a minimum of information. For example, I am confident I have never come across the conditional form of Spanish estrujar 'to wring [a rag]', but I can tell you it is estrujaría. Similarly, I have never seen the future subjunctive of proveer 'to provide', but I know it's proveyere, as all I need is to know the stem provey- (found in more common forms such as preterite proveyeron 'they provided').

A better illustration might be Chinese, although it is analytic. A Chinese speaker unfamiliar with a word incorporating unfamiliar characters (typical character recognition is between 3500 and 5000 characters) would have to guess the meaning from context, same as with any other language, but the Chinese speaker would also need to guess the pronunciation. While analytic, it seems as though Chinese vocabularies are relatively sparse in terms of "words", because if they pick up a word from conversation, they'd also need to learn how to write it, and if they pick up a word from written text, they'd also need to figure out how to pronounce it.

I have no idea what you're trying to convey here about Chinese being word-sparse, but I just want to mention that a character recognition of 3500 characters seems a bit low if you want to handle typical modern Chinese text. I recently did a few statistics on Liu Cixin's The Three-Body Problem trilogy (published in the mid-2000s, all three complete books, 770K Chinese characters of text), and I found it has almost exactly 3600 distinct individual characters. And that's just one author on one field (science fiction); to read things in general, including your domain of expertise, you probably need a knowledge of characters towards the "5000" end of the scale.

I think it is very possible that many Chinese people who don't read that much live with a character recognition of around 3500 characters though, perhaps even less. I am not contesting that.
1 x

Inst
Orange Belt
Posts: 128
Joined: Thu Feb 07, 2019 9:43 pm
Languages: English (Primary), 普通话 (Mainland Mandarin Chinese, B2)
x 101

Re: CEFR Levels and vocabulary size

Postby Inst » Fri Feb 08, 2019 2:17 pm

The point I was trying to make about word sparsity in Chinese is that the effective lexicon is lower due to features of the language. The position I am holding is that "some languages have smaller vocabulary sizes for educated speakers than others". I brought up the point that French and German should have a smaller vocabulary needed for C2 fluency than English as an example, and you shot my claim down successfully.

Applying my position to the general thread, expecting CEFR proficiency to match a set vocabulary size (even if we ignore how to determine the vocabulary size) for all languages is silly. Languages vary, and the definition of vocabulary-based proficiency should be based on the language and its speakers, not some notion that all languages communicate the same ideas, but with different words that have the same quantity.

As a thought experiment, let's say certain polities institute language reforms that force their citizens / subjects to limit their speech to a small, but fully functional lexicon (say, stripped of synonyms), on the pain of prosecution. The idea that CEFR C2 fluency in that language, then, would require so and so many words as other languages measured by CEFR require, is then nonsense because of the deliberate lexical trimming. Or we could just go to Esperanto, which currently has a dictionary of about 5258 words.
0 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4776
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 14990

Re: CEFR Levels and vocabulary size

Postby Iversen » Fri Feb 08, 2019 2:52 pm

... says who? According to the relevant version of Wikipedia the Ilustrita Vortaro of Esperanto has got some 15200 headwords and 39400 lexical unities, and apparently there is an Esperanto-German dictionary with 80.000 lexical units. One idea behind Esperanto and some other conlangs (like Toki Pona) was to reduce the number of roots and boost the vocabulary by using derivations instead, but it is hard to keep users from adding new wordroots.
1 x

User avatar
reineke
Black Belt - 3rd Dan
Posts: 3570
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Fox (C4)
Language Log: https://forum.language-learners.org/vie ... =15&t=6979
x 6554

Re: CEFR Levels and vocabulary size

Postby reineke » Fri Feb 08, 2019 9:38 pm

"Luis Martínez-Fernández, writing in the Chronicle of Higher Education, claims that "a person whose native language is not English can adopt the English language as a means of communication for a variety of reasons," and that one of them is "the need to use a more precise language with a richer vocabulary. (English has about 900,000 words, while French, for example, has fewer than 100,000.)"

Where do people get this stuff? It rather looks as if Martínez-Fernández may have swallowed the self-promoting Paul Payack's specious claim that the number of words in English is creeping up toward one million. But what about the support for the claim that poor old French can only muster a hundred grand? (I know I once claimed French is a miserable and inadequate language. But I was only kidding.)

It scarcely matters what number you give in contexts like this. The sort of people who are prepared to believe that you get greater richness and precision when you have more available words will believe anything, so you can feed them any numbers you like. Not long ago a significant number of totally clueless journalists heard that a gigaword corpus had been collected and ran away with the notion that all the words in it were different, so they trumpeted that English had a billion words. (Confusing a corpus with a dictionary is roughly comparable to confusing the set of all cars now driving on American roads with the set of distinct car models available in the catalogs of US manufacturers.)

Why does the number of lexical entries in the dictionary matter to people, as opposed to the number of fax machines, or the number of lost socks? Teresa Cunningham, who pointed me to the Martínez-Fernández article, lives in Europe, where she has plenty of experience in talking to people in languages other than English, and she remarks: "I have never had anybody turn to me and ask to ‘borrow’ an English word so they can express their thoughts more precisely while speaking another language." Quite so.

Precision, richness, and eloquence don't spring from dictionary page count. They're a function not of how well you've been endowed by lexicographical history but of how well you use what you've got. People don't seem to understand that vocabulary-size counting is to language as penis-length measurement is to sexiness."

http://itre.cis.upenn.edu/~myl/languagelog/archives/003871.html
4 x

User avatar
reineke
Black Belt - 3rd Dan
Posts: 3570
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Fox (C4)
Language Log: https://forum.language-learners.org/vie ... =15&t=6979
x 6554

Re: CEFR Levels and vocabulary size

Postby reineke » Fri Feb 08, 2019 9:43 pm

How many words are there in the English language?

How many words are there in the English language?
There is no single sensible answer to this question. It's impossible to count the number of words in a language, because it's so hard to decide what actually counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (e.g. dogs = plural noun, dogs = present tense of the verb). Is dog-tired a word, or just two other words joined together? Is hot dog really two words, since it might also be written as hot-dog or even hotdog?

It's also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Teenage slang? Abbreviations?

The Second Edition of the 20-volume Oxford English Dictionary, published in 1989, contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. Over half of these words are nouns, about a quarter adjectives, and about a seventh verbs; the rest is made up of exclamations, conjunctions, prepositions, suffixes, etc. And these figures don't take account of entries with senses for different word classes (such as noun and adjective).

This suggests that there are, at the very least, a quarter of a million distinct English words, excluding inflections, and words from technical and regional vocabulary not covered by the OED, or words not yet added to the published dictionary, of which perhaps 20 per cent are no longer in current use. If distinct senses were counted, the total would probably approach three quarters of a million.

https://en.oxforddictionaries.com/explo ... -language/
0 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4776
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 14990

Re: CEFR Levels and vocabulary size

Postby Iversen » Fri Feb 08, 2019 10:12 pm

I have a Spanish-Danish dictionary with around 200.000 words, and I know that there are even larger dictionaries out there for other languages than English, so to believe that French only has got 100.000 words borders is quite naive. If you count all the varieties of English and all the technical words and company names and misspelled foreign placenames etc etc it is possible that English can boast of more words in total than French - but who cares? Nobody can learn all the words of a language, and I have seen no proof that those who speak a language endowed with an enormous dictionary know more than those whose country hasn't produced such a behemoth. Besides the French have more inflected forms of each word than the Anglophones, and what should we then count? Word families, headwords or wordforms? Or only headwords plus irregularly inflected forms? Roots?

When I succomb to the tentation of assessing my passive vocabulary I actually am more interested in the percentages than in absolute numbers. If I understand a third of the words the natives (or a dictionary) throw at me then I'm in a better position than if I only knew ten percent - and then it doesn't matter whether this particular language separates its words as English or glue them together like German. A third is a third.

That being said, sometimes I do wonder how large the vocabulary of persons like Littré, who more or less singlehandedly wrote a big French dictionary, or Bratli, who wrote the aforementioned Spanish dictionary, really is was, and whether even educated people nowadays ever will manage to stuff as many words into their heads as the good ol' pre-internet fellahs.
4 x

User avatar
FyrsteSumarenINoreg
Yellow Belt
Posts: 90
Joined: Fri Jan 01, 2016 10:10 am
Location: Adriatic
Languages: Croatian (N), proficient in Brazilian Portuguese, fluent in English (C1 IELTS band 8.0), conversant in Italian and Spanish, learning Norwegian Nynorsk, Bengali & Malayalam
x 57

Re: CEFR Levels and vocabulary size

Postby FyrsteSumarenINoreg » Sat Feb 09, 2019 10:42 am

I got 17,600 words which is in line with my lowish C1 level.
http://testyourvocab.com
0 x

User avatar
FyrsteSumarenINoreg
Yellow Belt
Posts: 90
Joined: Fri Jan 01, 2016 10:10 am
Location: Adriatic
Languages: Croatian (N), proficient in Brazilian Portuguese, fluent in English (C1 IELTS band 8.0), conversant in Italian and Spanish, learning Norwegian Nynorsk, Bengali & Malayalam
x 57

Re: CEFR Levels and vocabulary size

Postby FyrsteSumarenINoreg » Sat Feb 09, 2019 10:58 am

Cavesa wrote:
Why should huge languages like French or German be poorer in vocabulary than English?



Etymology.

English has so many Germanic/Romance pairings like feeling/emotion, harbor/port, freedom/liberty...
(The only comparable language is Malayalam with its Dravidian/Sanskritic pairs like tingal/chandran (for moon) etc..)
English wins compared to German even in situations when German compound nouns count as separate words
(while English multi-word nouns like ''shopping mall'' do not), and when phrasal words are not counted but
German verbs like aufstehen are.
2 x

Cavesa
Black Belt - 4th Dan
Posts: 4974
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17637

Re: CEFR Levels and vocabulary size

Postby Cavesa » Sat Feb 09, 2019 1:57 pm

Inst wrote:Applying my position to the general thread, expecting CEFR proficiency to match a set vocabulary size (even if we ignore how to determine the vocabulary size) for all languages is silly. Languages vary, and the definition of vocabulary-based proficiency should be based on the language and its speakers, not some notion that all languages communicate the same ideas, but with different words that have the same quantity.


I am convinced that when it comes to the real world language learning, the CEFR-vocab size is much more correlated by the different expectations from the learners, than by the inherent qualities of the languages themselves.


As a thought experiment, let's say certain polities institute language reforms that force their citizens / subjects to limit their speech to a small, but fully functional lexicon (say, stripped of synonyms), on the pain of prosecution. The idea that CEFR C2 fluency in that language, then, would require so and so many words as other languages measured by CEFR require, is then nonsense because of the deliberate lexical trimming. Or we could just go to Esperanto, which currently has a dictionary of about 5258 words.


That would destroy the language, if it was done abruptly. It is possible, that some language will naturally lose a lot of vocabulary. But any fast changes are a problem. I believe that that could actually be one of the few ways, to make natives move down the CEFR scale. The fast obligatory change would definitely shake the whole idea of what is a native language used like. And the result would simply not allow enough nuance, register, etc.

reineke wrote:"Luis Martínez-Fernández, writing in the Chronicle of Higher Education, claims that "a person whose native language is not English can adopt the English language as a means of communication for a variety of reasons," and that one of them is "the need to use a more precise language with a richer vocabulary. (English has about 900,000 words, while French, for example, has fewer than 100,000.)"


What century is it? The A-U empire, Prague, 1840's, and a newspaper instead of the internet. "Czechs wanting to use the language instead of German are naive, as Czech has much poorer vocabulary than German and therefore cannot be used with the same precision and for all the purposes." could have been the title. It was the general attitude of vast majority of the German speaking inhabitantsand also a part of the bilingual population. While I personally regret, that some Czechs decided to exhumate and upgrade the language to prove everyone wrong , I cannot deny the'vey succeeded and proved the point.

When German was accused of being poorer than Latin, it still didn't convince the A-U authorities to not push German into the universities as the only language of instruction. The emperor wanted it, and the German language simply had to be up to the task from that point on.

Really, such arguments have been around since forever. And they have been abused many times, as the interpretation of the situation depends on a lot of factors (political decisions, devoted intellectuals, the general population). Some languages just get tons of new vocab invented (and Iversen gives us a great example that even the Esperanto users like to invent new words), some die out naturally, some are forced out by the bigger languages.

Let me tell you, while people with a very unimportant language learn English:
-more people speaking it. No matter how many words we do or don't have, the rest of the world is not learning them.
-more books. We have enough words to fill them, just not a big enough market to publish all of them
-more money. The English vast majority of ESL speakers uses is not rich in vocabulary (truth be told, I have already had to dumb down my English vocab several times, while talking to other non natives). But it is enough to get paid for knowing English. But of course 99% of them is able to express themselves about anything with much more precision in their native languages.

So, the idea that people might be learning English to finally get enough vocabulary for their needs and desires is absolutely ridiculous. :-D

The sort of people who are prepared to believe that you get greater richness and precision when you have more available words will believe anything, so you can feed them any numbers you like.

I don't think it is true, if you compare the languages.

But when it comes to language learning, it is exact. Up to a certain point, it is about knowing a bit of vocab and finding your way around the rest. But from the B2 on, you simply have to learn more words, or you have no way to reach the level of richness and precision expected from you.

I would definitely be likely to believe, that it doesn't make much of a difference, if a very advanced learner know 30000 words or 35000 words. But most people advocating the "vocab amount doesn't matter" approach pretend that there is no difference between a learner knowing 5000 and 8000. There are even very common opinions like "just learn the 2000 most common words". That is obviously nonsense.
2 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests