CEFR Levels and vocabulary size

General discussion about learning languages
User avatar
blaurebell
Blue Belt
Posts: 840
Joined: Thu Jul 28, 2016 1:24 pm
Location: Spain
Languages: German (N), English (C2), Spanish (B2-C1), French (B2+ passive), Italian (A2), Russian (Beginner)
Language Log: viewtopic.php?f=15&t=3235
x 2240

Re: CEFR Levels and vocabulary size

Postby blaurebell » Tue Aug 16, 2016 5:15 pm

s_allard wrote:After a few minutes we can probably also tell their level of education and probably their age, geographic and social class origin. Much of this lies in accent of course, but in reality we only need a relatively small sample of speaking or writing to say that "if the person can do this then they can probably do everything to a given level".


Although generally true, I know several people who are so damn good at mimicking the educational level of just about everyone they meet that one wouldn't be able to tell what kind of level of education they really have. It's the worst kind of way in which we stereotype each other based on the flimsiest pattern-matching. The craziest example was one of my Turkish-German course mates at university who would speak the worst migrant slang with half sentences and more than questionable grammar when talking to his mates. He seemed like someone who hardly even finished high school. From one moment to another he then would suddenly hold coherent, eloquent and grammatically perfect presentations in class and he always was a very good student. Basically "migrant speak" was his dialect, just like the Swiss can pull out high German from one moment to another. As foreigners we are constantly being judged on this level, and mostly unfairly. Vocabulary has very little to do with any of that, it's mainly grammar. If one manages to speak with perfect grammar and a minimal accent, people automatically assume a high proficiency level. I've had people throw all sorts of unintelligible stuff at me, just for saying too perfectly that I don't speak a language. They thought I was joking. I believe a focus on vocabulary is definitely the wrong strategy when aiming to pass a proficiency level test. This is probably also why they are so reluctant to provide estimates, since a certain vocabulary size usually accompanies certain proficiency levels, but doesn't necessarily mean that you're going to be able to pass the test. I probably have a vocabulary between C1-C2 in Spanish, but wouldn't be able to pass the C1 test due to fumbling around too much with grammar in my spoken Spanish.
7 x
: 20 / 100 Дэвид Эддингс - В поисках камня
: 14325 / 35000 LWT Known

: 17 / 55 FSI Spanish Basic
: 100 / 116 GdUdE B
: 8 / 72 Duolingo reverse Spanish -> German

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3309
Contact:

Re: CEFR Levels and vocabulary size

Postby aokoye » Tue Aug 16, 2016 5:38 pm

s_allard wrote:Now, we're told vehemently that grammar can be measured just like vocabulary. I'm really curious to see how. What sort of measurable units are we talking about? How many units of grammar can we associate with a text? We can perhaps talk about beginning, intermediate or advanced grammar but that is not measuring grammar. That's classifying by level of difficulty. We're told by the Instituto Cervantes that one needs 30,000 words for C2 Spanish. What would be the equivalent figure for grammar?


You would measure grammar in a text it by identifying all of the grammar constructions and counting them which would take significantly longer than a word count but is very possible. For example, you could take a text in English and look at all of the different types of verb constructions (something that I had to do as part of a paper I wrote earlier this year). There are some automated tools that do this but I don't know of any good free ones for English (I stumbled on one earlier this year when I was searching for something else for my English grammar class). There are actually unofficial grammar lists for the JLPT levels so it's not like this is unheard of. The Cervantes institute also has a similar (but official) list if you click through the levels it'll give you a more detailed view of what they expect of people at various levels.
1 x
Prefered gender pronouns: Masculine

galaxyrocker
Brown Belt
Posts: 1120
Joined: Mon Jul 20, 2015 12:44 am
Languages: English (N), Irish (Teastas Eorpach na Gaeilge B2), French, dabbling elsewhere sometimes
Language Log: viewtopic.php?f=15&t=757
x 3334

Re: CEFR Levels and vocabulary size

Postby galaxyrocker » Tue Aug 16, 2016 6:29 pm

jeff_lindqvist wrote:Interesting chart. How are we supposed to interpret the numbers in the weeks column? 9 weeks to A1 - no problem. Another 10 weeks to A2 - fine. 70 weeks in total to GZ C2?


To go back to this post,I think it means 70 weeks in a structured immersion environment. And I'm assuming they mean courses as well, so I can sorta understand that.
1 x

s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: CEFR Levels and vocabulary size

Postby s_allard » Tue Aug 16, 2016 6:50 pm

aokoye wrote:...
You would measure grammar in a text it by identifying all of the grammar constructions and counting them which would take significantly longer than a word count but is very possible. For example, you could take a text in English and look at all of the different types of verb constructions (something that I had to do as part of a paper I wrote earlier this year). There are some automated tools that do this but I don't know of any good free ones for English (I stumbled on one earlier this year when I was searching for something else for my English grammar class). There are actually unofficial grammar lists for the JLPT levels so it's not like this is unheard of. The Cervantes institute also has a similar (but official) list if you click through the levels it'll give you a more detailed view of what they expect of people at various levels.

I had a look at the Cervantes lists for Spanish, and, unsurprisingly, it consists of all the points of grammar that one is supposed to know at various CEFR proficiency levels. The interesting thing is that for various points of grammar, the list says what one has to know at the A level, the B level and the C level. So for the Imperativo verb form, you have Imperativo A1-A2, Imperativo B1-B2, Imperativo C1-C2. The level of complexity or sophistication of the Imperativo increases as one goes up the scale.

Whereas we can, supposedly, count the words in a text, how do we go about counting the grammar units in a text? Does a C2 level text have more grammar units than a B1 text? Not necessarily. But the units are definitely more sophisticated. In fact, when you look at the entire list, you see that the points of grammar are nearly identical for all three levels of proficiency. The differences are in the level of sophistication in each point.

Well, if you think counting words is fraught with difficulties, counting grammar is probably even more difficult. You'd have to identify the points of grammar and then then the level of difficulty or complexity and add that all up into some sort of scale that measures grammatical complexity. Good luck.

In any case, I still believe that counting words, not to mention attempting to count grammar, is a waste of time for language learners.
2 x

User avatar
aokoye
Black Belt - 1st Dan
Posts: 1818
Joined: Sat Jul 18, 2015 6:14 pm
Location: Portland, OR
Languages: English (N), German (~C1), French (Intermediate), Japanese (N4), Swedish (beginner), Dutch (A2)
Language Log: https://forum.language-learners.org/vie ... 15&t=19262
x 3309
Contact:

Re: CEFR Levels and vocabulary size

Postby aokoye » Tue Aug 16, 2016 8:55 pm

s_allard wrote:Whereas we can, supposedly, count the words in a text, how do we go about counting the grammar units in a text? Does a C2 level text have more grammar units than a B1 text? Not necessarily. But the units are definitely more sophisticated. In fact, when you look at the entire list, you see that the points of grammar are nearly identical for all three levels of proficiency. The differences are in the level of sophistication in each point.

Well, if you think counting words is fraught with difficulties, counting grammar is probably even more difficult. You'd have to identify the points of grammar and then then the level of difficulty or complexity and add that all up into some sort of scale that measures grammatical complexity. Good luck.


I actually don't think that counting words is difficult (I didn't say anywhere that I did) and I think I said that while counting units of grammar is totally doable it is time consuming. I would also concede that it's not necessarily easy to do for someone who isn't a native/expert speaker of the language. It would also be possible to look at the complexity of the clauses used but again, not something that I would expect out of someone who is below C1, maybe B2, but I would expect a teacher, native speaker, or someone trained in assessing the proficiency level of learners in X language to be able to learn how to do that.

Have you stopped to think what graded readers for native or second language learners consist of? The writing style is based in part off of "some sort of scale that measures grammatical complexity." The complexity of the grammar and of the words used in a book aimed at 3rd graders is going to be different than that of a 8th graders. Grading things by, in part, complexity of grammar, is very common and something that is very teachable. One of my professors actually thought that comparing the complexity of clauses in The NY Times vs The Guardian would be an interesting project for me (I went in with the idea of analyzing the different types of verb constructions between two similar articles by the two media organizations).
2 x
Prefered gender pronouns: Masculine

User avatar
Ezy Ryder
Orange Belt
Posts: 146
Joined: Tue Aug 25, 2015 8:22 am
Languages: PL (rusting Native)
EN (Advanced)
中文 (Lower Intermediate)
日本語 (Beginner, not studying)
台語 (Dabbling)
Language Log: viewtopic.php?t=1164
x 214
Contact:

Re: CEFR Levels and vocabulary size

Postby Ezy Ryder » Wed Aug 17, 2016 7:47 am

s_allard wrote:In any case, I still believe that counting words, not to mention attempting to count grammar, is a waste of time for language learners.

I wanted to respond with an argument I probably have used already on the previous forum. Also, it's fairly similar to one of those listed by Cavesa.

Say I want to get to level x. Without having any kind of idea as to just how much vocabulary I would need, after learning/acquiring amount y, I'll only know it's either enough, or not. Let's say it prove insufficient, and I decide to double my lexicon; just to find out it's still not enough. That's not very motivating, is it? Going through all the effort of further increasing my vocabulary, I still end up with the same result: not enough.

It most certainly isn't rational thinking, but seeing too many unknown* words despite continuously broadening my vocabulary, leaves me with a sense of futility. As if no matter how many words I learn/acquire, it's still not gonna cut it. Of course one can reason with themselves, but prevention is better than cure; and having some sort of reliable approximation can help with that. If according to a trustworthy source, level x is generally associated with vocabulary size z, and I've learnt/acquired 0.25*z, I know there's still more or less 3 times more to get through. Should I double my vocabulary, I'd know I should be give or take half way there.

Now, of course the breadth of one's vocabulary is not the only hindrance. One also needs to understand variations in usage, have a sufficient degree of familiarity to actually understand it in usage (at 165 WPM, that's ~364 ms per word on average), grammar, various phonetic processes, etc. But I don't see why the number of words not being the only problem, should stop us from discussing it.

*Ones I fail to understand.
3 x
阿波
: 1250 / 1000010k SRS Challenge :
: 3750 / 48084,808 漢字 (handwriting) :

Online
Cainntear
Black Belt - 3rd Dan
Posts: 3470
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8667
Contact:

Re: CEFR Levels and vocabulary size

Postby Cainntear » Wed Aug 17, 2016 9:07 am

outcast wrote:Finally, back to "lexical units". Is this a lemma? Or a word? If it is a word, is it that a lexical unit is "one definition", or multiple ones of the same word? That is my major doubt that could throw all this yapping I am doing about vocabulary size and language levels for a loop.

There are as many academic opinions on that as there are academics. All conjugations of a regula verb are generally considered the same thing, but there seems to be some debate over highly irregular verbs. You have to learn am, is, are, was, were and be as independent forms, so some would count them individually; but once learned they function as a unit so others count them as a single item.

But the definition is necessarily vague, because each language needs its own rules. I would consider "intend", "intent" and "intention" to be three lexical units, because despite the clear shared root, they don't follow from each other. In the original Latin, you might well consider them one unit, because they all derive from a single verb root using regular rules, and if you know the verb, you know the other forms.

However, if you start counting that way, you find that the number of lexical units needed varies from language to language. This sparks the argument between those who think that we should aim to have define "lexical units" in such as way that all languages have similar numbers and those who don't. Both sides hold that no language is more complicated than any other. The first camp suggest that this means vocabulary will be as complicated in any two languages. The second camp suggest that some languages are less complicated in vocabulary, but more complicated in grammar, and so all languages even out.

Basically, it can only be a rough measure anyway, so don't lose any sleep over it.
2 x

User avatar
emk
Black Belt - 1st Dan
Posts: 1622
Joined: Sat Jul 18, 2015 12:07 pm
Location: Vermont, USA
Languages: English (N), French (B2+)
Badly neglected "just for fun" languages: Middle Egyptian, Spanish.
Language Log: viewtopic.php?f=15&t=723
x 6337
Contact:

Re: CEFR Levels and vocabulary size

Postby emk » Wed Aug 17, 2016 9:34 am

Cainntear wrote:But the definition is necessarily vague, because each language needs its own rules. I would consider "intend", "intent" and "intention" to be three lexical units, because despite the clear shared root, they don't follow from each other. In the original Latin, you might well consider them one unit, because they all derive from a single verb root using regular rules, and if you know the verb, you know the other forms.

This also shows up in how different languages define "irregular verbs." In French, some textbooks claim "manger is irregular", because it has a spelling change to preserve the soft g in many forms: mangeais. To me, this seems perfectly regular if you know anything at all about French orthography, but I suppose that's not a good way to teach it to bored students.

But in Old Norse, I've seen verbs called "regular" because the weird vowel mutations in the stem were technically regular if you knew what the verb endings used to be in Proto-Norse and you remembered all the different kinds of umlaut sound shifts affecting the Norse family and when they occurred. And once again, this might be reasonable if you assume that nearly everybody who learns Old Norse is a graduate student studying dead Germanic languages.

(As a random aside, French verbs are actually far less irregular than they look, once you accept that they have something like 7 principal parts and a dozen or two minor conjugations. See my conjugator scripts and the script that compares them against a lexical database. That whole claim that "French has hundreds of irregular verbs" depends, once again, on how much of the underlying logic you know. And a great many of the most annoying verbs are avoided by native speakers.)
6 x

Elexi
Green Belt
Posts: 271
Joined: Wed Aug 19, 2015 9:39 pm
Languages: English (N), French (B1), German (A2), Latin (eternal beginner), Dutch (Aspires to find the time).
x 645

Re: CEFR Levels and vocabulary size

Postby Elexi » Wed Aug 17, 2016 11:40 am

On the subject of counting grammar units - in terms of state examinations this is pretty old hat stuff. If one was to look at how it is systematically applied in exam grading get one of those 'How to Pass Your Examination [e.g. GCSE in the UK] in French, etc.' books. These actively tell you what grammar units are set by the exam board as being necessary to get what grades. For example, the English system has always been obsessed with verb tenses as a marker of linguistic sophistication and so GCSE students have to demonstrate in their speech and writing tests a whole variety of verb tenses - the more the better as use of each one carries a mark. In the old 'O level' use of the subjunctive mood was also necessary, although from the pass guides I looked at, it appears the subjunctive has been largely moved to the A level exams.

It would appear also that CEFR requires the same kind of graded use of grammar units - at least in its teaching and examining application. The EU 'Competences' tables may not have them but they are present in the national exams that implement CEFR. I spent a few minutes looking through various French as a Foreign Language books for the DELF that go from A1 to B2 ((e.g. Echo, Edito and Latitudes, Reussir le DELF) and grammar unit progression is highly evident - and they kind of match what S Allard says about the Cervantes lists. I seem to remember that their used to be lists of these grammar units on the internet as well.
1 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4768
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 14962

Re: CEFR Levels and vocabulary size

Postby Iversen » Wed Aug 17, 2016 1:28 pm

I count words because unlike most other factors in language they can be counted, and then I just cross fingers and hope that an increase in vocabulary is accompanied by an increase in grammatical proficiency. And please notice: I didn't write anything about an increase in grammatical rules known.

It is well known than most of the extremely common words have some grammatical function (prepositions, pronouns, auxiliary verbs etc.). And very frequent words are also more likely to be irregular than the rarer ones - although there are some exceptions to this rule. So once you have learnt those words and endings and whatever plus the systems that govern them then there is simply nothing more left to count.

So if you want to make a quantitative assessment of your command of the grammar of a language then you have to find something outside the language itself to count, and the only thing I can see is counting the errors you make while writing or speaking freely. And since I have stopped taking language courses somewhere around 1980 that method is not applicable to me - I don't have a teacher with a red pencil, and I wouldn't be able to identify all my own errors. And even if I could I couldn't see the point in counting them. I leave that to the true hardcore masochists.
6 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: bolaobo, Cainntear and 2 guests