CEFR Levels and vocabulary size

General discussion about learning languages
Cavesa
Black Belt - 4th Dan
Posts: 4978
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17678

Re: CEFR Levels and vocabulary size

Postby Cavesa » Wed Aug 17, 2016 3:04 pm

s_allard wrote:Now, we're told vehemently that grammar can be measured just like vocabulary. I'm really curious to see how. What sort of measurable units are we talking about? How many units of grammar can we associate with a text? We can perhaps talk about beginning, intermediate or advanced grammar but that is not measuring grammar. That's classifying by level of difficulty. We're told by the Instituto Cervantes that one needs 30,000 words for C2 Spanish. What would be the equivalent figure for grammar?

Perhaps the thing that comes remotely close - an oxymoron if ever there was one - is the readability score or index that determines the level of difficulty of reading a text. This looks at the number of syllables in words, the number of words in a sentence and the number of passive voice constructions to determine a level of difficulty. This is useful information but it is not a measure of grammar.


The equivalent could be the content of all three volumes of Gramática de uso del espanol.

Of course I was not talking about "counting grammar", no need for this strawman. I was talking about definitions of the grammar needed for a particular level vs definitions of the vocabulary needed for the same level.

Grammar per level has been defined quite clearly. We can see it in most courses, graded readers, grammarbooks. Vocabulary not so much, despite similar attempts having been made. The vocabulary sources are all over the scale, from crappy courses claiming you'll be fluent with 300 words up to huge dictionaries of 250000 words. For example, all the B1 courses are likely to include the same grammar points to the same depth (but of course some will teach it better and some worse, that is a different matter). But the differences in vocabulary taught will be more than noticeable. Sure, the topics of individual units (and the vocabulary) are gonna be the same or very similar but the amount of it and the particular words are gonna differ significantly.

And that is why learners and some institutions or publishers use the numbers. For example, let's take a learner trying to reach the B1 level. If he/she buys a course/grammar for B1 and learns all that is expected from a B1 speaker, no problem. But for vocabulary, there are three options:
1.The learner finds out a number of words to learn, such as the Instituto Cervantes number of passive words (7000 for B1) and Meara and Milton's for active (2750-3250 for B1). And they critically consider the fact their course offers only 1500/1800/500 or so words (all three are real examples). Reaction: Time to look for additional sources.
2.They don't care about any numbers, trust their coursebook/teacher, and either are lucky to learn enough vocabulary or not. In the latter case, which is extremely common among mainstream learners, the vocabulary will be one of their major problems, especially while speaking. Should they try an exam, they'll be likely to get lower scores due to poor and imprecise vocabulary, despite the fact the examinators won't be counting anything.
3.They start believing a too low number, fail to get the desired results,give up.
5 x

s_allard
Blue Belt
Posts: 985
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2370

Re: CEFR Levels and vocabulary size

Postby s_allard » Wed Aug 17, 2016 6:53 pm

Three points. First, it seems pretty clear that we can't measure grammar units like vocabulary. If I write a novel in Microsoft Word, it will tell me at the bottom of the screen that I have something like 125,367 words in the document. This a raw count of course of all the sequences of letters separated by white space. Using whatever methodology I subscribe to, I can later calculate how many "different" words are in the text. Let say 7895 word families or headwords.

Can we do this for grammar? Could we somehow determine that my novel contains let's say 4000 grammar families? Obviously not.
What one poster and I have alluded to is the idea of determining grammatical complexity, an idea that is used in graded readers. We certainly observe that certain phrases or sentences are grammatically more complex than others because they use constructions that are less frequent. If we are writing a book meant for grade 1 students (around age 6) in North America, we will obviously limit the vocabulary and the grammar to things that we think a 6-year old child is familiar with. And so on with readers for the other grades.

But this does not alter the fact that there is no way of counting grammar units like we can do with lexical units.

Second, I think it's very important to realize how difficult it is to actually define what a word is for purposes of counting different words. When we see for example that the Cervantes Institute gives a figure for 30,000 words associated with the C2 proficiency level in Spanish and another author says 4500 for C2 in English, we can ask if Spanish and English function that differently and how are these numbers calculated.

What I've always found intriguing is the fact that vocabulary counting for language learning is a highly developed field in English, a relatively uninflected language, whereas this is not a major area of activity in languages like French of Spanish that I'm familiar with. Vocabulary counting is basically something associated with lexicography or dictionary making.

The big problem of course is how to define what to put into the same word. We can say that take, takes, took, taken, taking are different form of the verb take. And then we have the prepositions off, up, down, to. What do we do about verb combinations like take to, take off, take up and take down? Should difference, different, differently be counted as one word or three, Is the idiom "hang out to dry" one unit or actually made up of four units? If you take a highly inflected language like French there are other issues. The list of complications goes on and on.

This issue comes to the fore when one attempts to use frequency lists for language learning. If you're learning a language, wouldn't it make sense to start by learning let's say the 500 most common words of the language? This never works. I have yet to meet anybody who has used a frequency list to learn a language. The main reason is that to learn to "speak" a language you have to learn how to use the words and not the words themselves. Learning the words in isolation is a pure waste of time.

That said, frequency word lists are important for the design of textbooks, graded readers and dictionaries.

I would also like to state that although I say that counting words is a waste of time, I certainly see the utility of wordlists as a learning tool of specific themes of vocabulary. If I'm interested in automobile vocabulary, I would certainly make a list of relevant terms.

Third, as for vocabulary size and performance on CEFR tests, I consider those vocabulary size numbers totally useless. The Cervantes Institute says that 30,000 words are required for the C2 test. Am I supposed to count my words up to 30,000 before attempting this test? This is the silliest thing I've heard in a longtime. What I suggest is a simple test: are you able to read one of the well-known Spanish newspapers easily and paraphrase any article in good quality and fluent Spanish.

Do you have to know every single word in the entire newspaper? No. You just have to be comfortable in the language. Right now I'm reading a British novel and every few pages I come across words I don't know in English. No problem, that's what dictionaries are for.

Instead of worrying about the number of words you know for the test, you should be studying what is actually required of you. In all cases, you will not tested on your vocabulary. It is your use of vocabulary that will be tested. For example, you may be asked to write a formal letter on a certain topic such as a cover letter for a job application. How do you prepare for this? Do you study 30 words a day for a month? How about writing two letters a week for a month instead so that you get a good feel of how letter-writing works in the language? I know which strategy I would use.
Last edited by s_allard on Thu Aug 18, 2016 5:25 pm, edited 1 time in total.
4 x

User avatar
Ezy Ryder
Orange Belt
Posts: 146
Joined: Tue Aug 25, 2015 8:22 am
Languages: PL (rusting Native)
EN (Advanced)
中文 (Lower Intermediate)
日本語 (Beginner, not studying)
台語 (Dabbling)
Language Log: viewtopic.php?t=1164
x 214
Contact:

Re: CEFR Levels and vocabulary size

Postby Ezy Ryder » Wed Aug 17, 2016 7:52 pm

s_allard wrote:This issue comes to the fore when one attempts to use frequency lists for language learning. If you're learning a language, wouldn't it make sense to start by learning let's say the 500 most common words of the language? This never works. I have yet to meet anybody who has used a frequency list to learn a language. The mean reason is that to learn to "speak" a language you have to learn how to use the words and not the words themselves. Learning the words in isolation is a pure waste of time.

Has anyone actually suggested or implied that learning isolated vocabulary would on its own "teach" one a language?
4 x
阿波
: 1250 / 1000010k SRS Challenge :
: 3750 / 48084,808 漢字 (handwriting) :

Cainntear
Black Belt - 3rd Dan
Posts: 3526
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8793
Contact:

Re: CEFR Levels and vocabulary size

Postby Cainntear » Wed Aug 17, 2016 8:48 pm

s_allard wrote:Three points. First, it seems pretty clear that we can't measure grammar units like vocabulary. If I write a novel in Microsoft Word, it will tell me at the bottom of the screen that I have something like 125,367 words in the document. This a raw count of course of all the sequences of letters separated by white space. Using whatever methodology I subscribe to, I can later calculate how many "different" words are in the text. Let say 7895 word families or headwords.

Can we do this for grammar? Could we somehow determine that my novel contains let's say 4000 grammar families? Obviously not.

There are plenty of computer-based tools out there that do grammatical analysis. Most modern work on grammar is built on corpora, and corpora rely on automated part-of-speech taggers. The biggest limitation of these taggers is that their models of grammar are learnt by inference and not directly queriable, but it still represents an enumerable measure of grammar.

The Cervantes Institute says that 30,000 words are required for the C2 test. Am I supposed to count my words up to 30,000 before attempting this test? This is the silliest thing I've heard in a longtime.

You're right -- it's very silly. So why did you say it? No-one else did.

If I tell you that a race you're training for is 10 miles long, does that mean I expect you to go out with a trundle wheel and measure out every practice run you do? Of course it doesn't! But if you have that information, you have an idea if what you're training for.

Personally, I'm not convinced it's useful. I really have no idea of what 30,000 words feels like. I also recognise that there are people who fixate on words to the detriment of everything else... but I haven't seen any evidence that that applies to anyone in this conversation.
4 x

Marais
x 7660

Re: CEFR Levels and vocabulary size

Postby Marais » Wed Aug 17, 2016 9:45 pm

Personally i think reading content, listening intensively, reading some more, speaking practice, looking up grammar points, speaking, doing some writing....is all more important than fixating on arbitrary numbers. And it's definitely more important than debating it.
1 x

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3310
Contact:

Re: CEFR Levels and vocabulary size

Postby aokoye » Thu Aug 18, 2016 1:00 am

s_allard wrote:I'll ignore that nasty comment about this thread turning into the usual hell. And I won't bite this bait and get into a mudslinging contest for the umpteenth time. It seems to me that things were going swimmingly until now.


You know I don't think anyone was baiting you, but I do think this turned into what one could call "the usual hell" or at least some sort of (perhaps small) firestorm (I wouldn't go so far as to call it a flamewar).
2 x
Prefered gender pronouns: Masculine

User avatar
outcast
Blue Belt
Posts: 585
Joined: Sat Dec 05, 2015 3:41 pm
Location: Florida, USA
Languages: ~
FLUENCY
Native: ENglish, ESpañol
Advanced: -
High Basic: DEutsch (rust), FRançais (rust), ZH中文
Basic: -
~
ACQUIRING
Formally: KO한국말, ITaliano, HI हिन्दी
Dabbling: HRvatski, GW粵語
Dormant: POrtuguês
~
Plan to learn: I BETTER NOT GO HERE FOR NOW
~
x 679

Re: CEFR Levels and vocabulary size

Postby outcast » Thu Aug 18, 2016 1:51 am

Ezy Ryder wrote:
s_allard wrote:This issue comes to the fore when one attempts to use frequency lists for language learning. If you're learning a language, wouldn't it make sense to start by learning let's say the 500 most common words of the language? This never works. I have yet to meet anybody who has used a frequency list to learn a language. The mean reason is that to learn to "speak" a language you have to learn how to use the words and not the words themselves. Learning the words in isolation is a pure waste of time.

Has anyone actually suggested or implied that learning isolated vocabulary would on its own "teach" one a language?


I plan to use the vocabulary list a beginner's textbook with audio for the first time with Korean words, I guess a sort of frequency list. Even before learning the language. My main focus will be to memorize those terms that are of Korean origin. But my main purpose with this exercise is to practice pronunciation of Korean words AND reading Hangul, all while I am familiarizing myself with unknown words, and seeing how Sino-Korean and English loans are said in Korean for the other words on the list. The key is I need to memorize the Korean words, and familiarize myself with the other ones (meaning, pronunciation, and hangul), as fast as possible.

It's a bit of an experiment really, but indeed I know I won't learn Korean this way. This exercise is not to learn the language, it is to hopefully learn the language quicker when I actually hit the books. When I go to read texts, I want to focus on the grammar, the context of where a word appears, and the pronunciation of Korean when the words are strung together, rather on the meaning of each word and how to read it.
1 x
"I can speak wonderfully and clearly in zero languages, and can also fluently embarrass myself in half a dozen others."

The End of Language learning: 10 / 10000

s_allard
Blue Belt
Posts: 985
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2370

Re: CEFR Levels and vocabulary size

Postby s_allard » Thu Aug 18, 2016 6:03 pm

Now that the question of determining vocabulary size for CEFR levels seems to have been either resolved or laid by the wayside, it might be interesting to revisit the question of how to systematically acquire receptive and productive vocabulary. For this, there are in my opinion two sets of tools. One is thematic dictionaries where words are grouped by subject matter. I am currently using the Vocabulaire de l'espagnol moderne in the Langues pour tous series. Under the heading La familia there are four pages of terms related to family and kinship, followed by four pages of examples sentences, three pages of notes and explanations, and finally some exercises. This is great stuff for preparing CEFR exams.

The other great tool for more technical terminology is the visual dictionary where you can see all kinds of objects and the necessary terminology
1 x

Cainntear
Black Belt - 3rd Dan
Posts: 3526
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8793
Contact:

Re: CEFR Levels and vocabulary size

Postby Cainntear » Thu Aug 18, 2016 6:23 pm

s_allard wrote:Now that the question of determining vocabulary size for CEFR levels seems to have been either resolved or laid by the wayside, it might be interesting to revisit the question of how to systematically acquire receptive and productive vocabulary.

I think that would merit its own thread, rather than being tacked onto the end of this one.
4 x

Cavesa
Black Belt - 4th Dan
Posts: 4978
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17678

Re: CEFR Levels and vocabulary size

Postby Cavesa » Fri Aug 19, 2016 9:00 am

s_allard wrote:Now that the question of determining vocabulary size for CEFR levels seems to have been either resolved or laid by the wayside, it might be interesting to revisit the question of how to systematically acquire receptive and productive vocabulary. For this, there are in my opinion...


No it wouldn't. This thread is called "CEFR Levels and vocabulary size" and has included lots of useful information about exactly this subject, including a wide range of valuable opinions on the matter, including yours. Please, don't turn this into another repetition of the other threads with this fate. The question you mention is important and has been discussed many times already and, truth be told, everyone knows your name and opinions by now, your main goal has been reached.

To people interested in the subject of systematically acquiring vocabulary, I recommend searching the forum first, the threads have been numerous, long, exhaustive, very informative. You might find especially interesting the experience and methods shared by Iversen and EMK.
4 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests