CEFR Levels and vocabulary size

General discussion about learning languages
Inst
Orange Belt
Posts: 128
Joined: Thu Feb 07, 2019 9:43 pm
Languages: English (Primary), 普通话 (Mainland Mandarin Chinese, B2)
x 101

Re: CEFR Levels and vocabulary size

Postby Inst » Sat Feb 09, 2019 4:02 pm

One thing I think I'd tirade against would be the notion that "the more lexical units there are in a language, the better".

The unwanted implication is whether Ovid and Homer are "inferior" poets because their lexicon was more limited than, say, Shakespeare's, who purportedly had a passive vocabulary of 65,000-70,000 words.

I found a blog post arguing about how the Japanese have "regressed" in a discussion of Haiku. That is to say, classical Haiku worked because each of the language speakers had strong emotional ties to a particular word, such that each word, in a sparse poem of 17 mora, could trigger powerful emotional responses. When you have a passive vocabulary of 40,000 words, is the same thing possible? And if it is, does that apply to all 40,000 words, or just a select group of commonly used words? Mother. Liberty. Fireworks. Beer. These aren't C2 English words, but wouldn't their power be more potent precisely for that reason?

And moreover, say, Chinese, which I've stated has a restricted lexicon due to its orthography. If you note the scientific language, the point is the creation of neologisms in phrases through combination of existing words. Within the language itself, despite having only 22,000 words within the lexicon of an undergraduate, there are numerous synonyms where there's a relatively slight variation in meaning. How much would an additional 20,000 words add to the richness of the language?

Another reposte would be, say, how Harvardians use language. If you Google around, it seems that their expository writing classes teach students to use simple words whenever possible because the intent is not to show off, but to be expressive, precise, and eloquent. On the Harvard Crimson, there's an editorial where the comments involve people trashing the writer because he used "peregrinator" in his text; people claimed that "they had gotten a full score on the SAT and had to look up that word".

The elite trend in English, it seems, is hiding your eliteness and using words appropriate, and thus not overly simple or "erudite", to your needs for communication. From this, we can begin to question the usefulness of a particularly large lexicon in a language.

Hemingway, for instance, used an extremely limited lexicon as well as a tendency to short sentences. The Swedish Academy saw fit to grant him a Nobel, despite his seeming eschewal of the richness of the English lexicon. In other words, it's not about how big a vocabulary you have, it's about how you use it.
1 x

User avatar
reineke
Black Belt - 3rd Dan
Posts: 3570
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Fox (C4)
Language Log: https://forum.language-learners.org/vie ... =15&t=6979
x 6554

Re: CEFR Levels and vocabulary size

Postby reineke » Sat Feb 09, 2019 4:27 pm

Word Anomalies

"It would be interesting to know for a given book what words are used uncommonly often or, likewise, uncommonly infrequently. To compute this, the relative frequency of each words is sampled from the database at large and then compared to the frequency in each book. Not surprisingly, these 'Anomalous Word Summaries' paint an incredibly accurate picture of the work.

Sample of Word Anomalies

The Bible (King James Edition); Anonymous / Various
Frequent: unto, lord, isreal, shall, god, moses, jesus, david, offering, tabernacle
Infrequent: girl, boy, school, success, condition, listen, princess

Wonderful Wizard of Oz; Baum, Frank
Frequent: woodman, scarecrow, witch, tin, emerald, monkeys, kansas, brains, winged
Infrequent: mother, money, soul, natural

White Fang; London, Jack
Frequent: musher, beaver, sled, dogs, cherokee, snarl
Infrequent: letter, person, window, green, sweet, loved, party, paper

The Republic; Plato
Frequent: guardians, unjust, true, injustice, state, gymnastic, rulers, democractical
Infrequent: miss, girl, boy, prince

Alice's Adventures In Wonderland; Carroll (C.L. Dodgson), Lewis
Frequent: gryphon, turtle, caterpiller, mock, dodo, mouse, rabbit, hedgehog
Infrequent: death, country, happy, fair, common

Origin of the Species; Darwin, Charles
Frequent: species, varieties, subaerial, selection, sterility, plants, modification, forms, variability
Infrequent: person, government, love, thinking, god, evil, fire

Communist Manifesto; Marx, Karl/Engels, Friedrich
Frequent: bourgeois, proletariat, communists, antagonisms, revolutionising, socialism, production, class, feudal, reactionary, exploitation, conditions, crises
Infrequent: said, love, why, heart, mother, poor, felt

Paradise Lost; Milton, John
Frequent: wonderous, heaven, satan, dominations
Infrequent: country, church, horses, sister

Apology; Plato
Frequent: corrupter, accusers, demigods, socrates, oracle, indictment
Infrequent: she, work, morning, replied, body

Gargantua and Pantagruel; Rabelais, Francis
Frequent: codpiece, catchpole, ballocks, dingdong, fart, chitterlings, gymnast, arse
Infrequent: smile, existence, feelings, british, professor, suffering

1st Inaugural Speech; Roosevelt, Franklin Delano
Frequent: foreclosure, interdependence, uneconomical, leadership, outgo, unsolvable, values, redistribution, national, emergency
Infrequent: you, her, his

The Jungle; Sinclair, Upton
Frequent: packingtown, packers, stockyards, fertilizer, slaughterhouses, streetcar, lituanian
Infrequent: influence, village, pray, gods, example

20,000 Leagues Under The Sea; Verne, Jules
Frequent: manometer, canadian, captain, frigate, harpoon, cuttlefish, submarine
Infrequent: garden, justice, ladies, laughed, wife

Time Machine; Wells, H. G.
Frequent: psychologist, sphinx, traveller, machine, i, lever, dimension
Infrequent: mother, dear, money, friends, horse, peace

War of the Worlds; Wells, H. G.
Frequent: martians, leatherhead, artilleryman, londonward, cylinder, pit, scullery
Infrequent: love, king, truth, gentleman, joy, youth

Moby Dick; Melville, Herman
Frequent: whale, sperm, harpooner, pequod, leviathan, fishery"

http://www.mine-control.com/zack/guttenberg/
3 x

User avatar
reineke
Black Belt - 3rd Dan
Posts: 3570
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Fox (C4)
Language Log: https://forum.language-learners.org/vie ... =15&t=6979
x 6554

Re: CEFR Levels and vocabulary size

Postby reineke » Sat Feb 09, 2019 4:45 pm

Voltaire used words sparingly. Rabelais' Gargantua and Pantagruel has more unique words than the CIA world factbook.

Gargantua and Pantagruel
Lexile score: 1340
Unique words: approx 26,000
High vocabulary density
This is one of the most vocabulary dense books in the Project Gutenberg database.

French audiobook:
http://www.litteratureaudio.com/livres- ... s-rabelais

English translation
http://gutenberg.readingroo.ms/1/2/0/12 ... image-0005

Introduction to Rabelais' work:

"Had Rabelais never written his strange and marvellous romance, no one would ever have imagined the possibility of its production. It stands outside other things — a mixture of mad mirth and gravity, of folly and reason, of childishness and grandeur, of the commonplace and the out-of-the-way, of popular verve and polished humanism, of mother-wit and learning, of baseness and nobility, of personalities and broad generalization, of the comic and the serious, of the impossible and the familiar. Throughout the whole there is such a force of life and thought, such a power of good sense, a kind of assurance so authoritative, that he takes rank with the greatest; and his peers are not many...

"Rabelais’ style has many different sources. Besides its force and brilliancy, its gaiety, wit, and dignity, its abundant richness is no less remarkable. It would be impossible and useless to compile a glossary of Voltaire’s words. No French writer has used so few, and all of them are of the simplest. There is not one of them that is not part of the common speech, or which demands a note or an explanation. Rabelais’ vocabulary, on the other hand, is of an astonishing variety. Where does it all come from? As a fact, he had at his command something like three languages, which he used in turn, or which he mixed according to the effect he wished to produce."


https://ebooks.adelaide.edu.au/r/rabela ... ction.html

Don't get intimidated:

http://www.litteratureaudio.com/livre-a ... gruel.html
1 x

Inst
Orange Belt
Posts: 128
Joined: Thu Feb 07, 2019 9:43 pm
Languages: English (Primary), 普通话 (Mainland Mandarin Chinese, B2)
x 101

Re: CEFR Levels and vocabulary size

Postby Inst » Sat Feb 09, 2019 5:14 pm

Reineke: thanks for the better substantiation.

I guess I've not well-argued that different languages have different lexicon sizes. I've seen claims that educated Francophones only have a vocabulary of 30,000 words, but what's the problem then? Is someone so daft as to claim that French is a literarily-impoverished language? This is actually an encouragement for learners; that is to say, if you want to reach half-educated speaker, you only need to learn, by that, meaning memorize for passive vocabulary, 15,000 words. You'll struggle through Rabelais, as Reineke has shown, but real proficiency, that is to say, the C2 level and above, isn't that inaccessible.

On the other hand, I'm saying this because I don't want people to be "happy with crappy", that is to say, that you can get away with a crap vocabulary with a given language. The important thing is that to be, say, a little bit like Nabokov, who despite having English as a second or third language was a major English writer, is not impossible.

But, I'm just, as the Chinese say, 吹牛-ing (chui1 nui3, blowing the cow, BS-ing) here. I have to go back to my primary task of developing C2 and above proficiency in Chinese, instead of wasting time arguing badly on forums.
0 x

User avatar
reineke
Black Belt - 3rd Dan
Posts: 3570
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Fox (C4)
Language Log: https://forum.language-learners.org/vie ... =15&t=6979
x 6554

Re: CEFR Levels and vocabulary size

Postby reineke » Sat Feb 09, 2019 5:51 pm

Inst wrote:Reineke: thanks for the better substantiation.

I guess I've not well-argued that different languages have different lexicon sizes..

On the other hand, I'm saying this because I don't want people to be "happy with crappy", that is to say, that you can get away with a crap vocabulary with a given language.


"They were both very smart, but otherwise they had quite different personalities. :ugeek: Adams was unbending and outspoken and argumentative, Franklin, charming and taciturn and flirtatious. Adams was rigid in his personal morality and lifestyle, Franklin famously playful. Adams learned French by pouring over grammar books and memorizing a collection of funeral orations; Franklin (who cared little about grammar) learned the language by lounging on the pillows of his female friends and writing them amusing little stories."

Isaacson, Benjamin Franklin, p. 351

"As time passed and his French improved, Adams further realized that Franklin spoke the language very poorly and understood considerably less than he let on ... When he did speak in French, he was, one official told Adams, almost impossible to understand. He refused to bother his head with French grammar, Franklin admitted to Adams, and to his French admirers this, with his odd pronunciation, were but another part of his charm, which only added to Adam's annoyance. Try as he might, Adams could never feel at ease in French society. Franklin, always at ease, never gave the appearance of trying at all."

McCullough, John Adams, p. 199

"Franklin was always industrious, and in America he famously believed in giving the appearance of being industrious. But in France, where the appearance of pleasure was more valued, Franklin knew how to adopt the style. As Claude-Anne Lopez notes, "In colonial America it was sinful to look idle, in France it was vulgar to look busy."

Isaacson, Benjamin Franklin, p. 353

"What appears to have pleased Adams no less was the discovery during his parting call at Versailles that his French had so improved he could manage an extended conversation and speak as rapidly as he pleased. (This was after about a year in France - April 1778 - March 1779)."

McCullough, John Adams, p. 213

"Even the founding fathers of our nation had different approaches to learning languages and adjusting to new cultures."

Foreign Language Teaching Forum (archives)
3 x

Inst
Orange Belt
Posts: 128
Joined: Thu Feb 07, 2019 9:43 pm
Languages: English (Primary), 普通话 (Mainland Mandarin Chinese, B2)
x 101

Re: CEFR Levels and vocabulary size

Postby Inst » Sat Feb 09, 2019 5:55 pm

To the indirect comments, I will interpret them as "different language learners have different needs and different wants".

I agree with that, but I've also met many people who've used the difficulty of acquiring a foreign language as an excuse for being incompetent.
0 x

User avatar
PeterMollenburg
Black Belt - 3rd Dan
Posts: 3239
Joined: Wed Jul 22, 2015 11:54 am
Location: Australia
Languages: English (N), French (B2-certified), Dutch (High A2?), Spanish (~A1), German (long-forgotten 99%), Norwegian (false starts in 2020 & 2021)
Language Log: https://forum.language-learners.org/vie ... 15&t=18080
x 8066

Re: CEFR Levels and vocabulary size

Postby PeterMollenburg » Wed Feb 13, 2019 10:06 am

I cannot seem to find the information now, but I do recall hearing/reading (or both) somewhere along the lines of non-heresay informative type, well, information, that while the size of the English dictionary increases, and that of the French dictionary supposedly decreases, it is claimed that it is at least in part for this reason:

English keeps adding new words to the Oxford English Dictionary without ever deleting those which have fallen out of use (and may be completely archaic and simply no longer understood by the vast majority of English speakers).

While French on the other hand literally deletes words that are no longer in use from their dictionaries.

How true this is, I’m am not sure. I imagine the truth could be somewhere in the middle. Perhaps the English language perveyors are indeed less inclined to delete archaic terminology, but perhaps not altogether opposed to it? Perhaps the French do indeed delete archaic words no longer in use, but perhaps not at the rate that might explain the size difference of the dictionaries.

Just some food for thought ;)
3 x

mdubes13
Posts: 1
Joined: Fri Jun 07, 2019 6:23 pm
Languages: Mandarin Chinese
x 2

Re: CEFR Levels and vocabulary size

Postby mdubes13 » Fri Jun 07, 2019 6:26 pm

Just for reference if anyone else finds them investigating this topic in 2019 or later as I am, this topic is very satisfactorily addressed on the website 'englishprofile dot org'. It has vocabulary items listed for each level A1 to C2 based empirically on instances of actual learner language production. A total goldmine for EFL teachers and autodidacts alike.

Hope this helps somebody! :D
2 x

User avatar
an onyme
Yellow Belt
Posts: 65
Joined: Tue May 28, 2019 10:09 pm
Languages: American (N); le franc,ais (B1-C1); 한구거 (B1); 忠文 (A2), ዓማርኛ (A1)
Language Log: https://forum.language-learners.org/vie ... 15&t=15312
x 113

Re: CEFR Levels and vocabulary size

Postby an onyme » Fri Jun 07, 2019 7:18 pm

mdubes13 wrote:Just for reference if anyone else finds them investigating this topic in 2019 or later as I am, this topic is very satisfactorily addressed on the website 'englishprofile dot org'. It has vocabulary items listed for each level A1 to C2 based empirically on instances of actual learner language production. A total goldmine for EFL teachers and autodidacts alike.

Hope this helps somebody! :D

This is a great resource! I love lists and wish I were an English learner just to have this be useful to me. Are there any equivalents for other languages around?
1 x
Just your friendly neighbourhood onyme

User avatar
Serpent
Black Belt - 3rd Dan
Posts: 3657
Joined: Sat Jul 18, 2015 10:54 am
Location: Moskova
Languages: heritage
Russian (native); Belarusian, Polish

fluent or close: Finnish (certified C1), English; Portuguese, Spanish, German, Italian
learning: Croatian+, Ukrainian; Romanian, Galician; Danish, Swedish; Estonian
exploring: Latin, Karelian, Catalan, Dutch, Czech, Latvian
x 5181
Contact:

Re: CEFR Levels and vocabulary size

Postby Serpent » Fri Jun 07, 2019 8:00 pm

PeterMollenburg wrote:English keeps adding new words to the Oxford English Dictionary without ever deleting those which have fallen out of use (and may be completely archaic and simply no longer understood by the vast majority of English speakers).

While French on the other hand literally deletes words that are no longer in use from their dictionaries.
A previous post included separate counts for contemporary and archaic words.

One more thing about English is that it's used in so many countries, and there are so many words/expressions specific to one country or even a part of it. Due to the cultural influence they're known passively by people who don't use them, and of course people often move to another country within the English-speaking world (while for example a Spanish speaker may well also move to an English-speaking country). In many cases both British and American terms are well-known among native and fluent speakers.

"New" concepts also become known worldwide via English, whether it's something like selfie or hygge. Not all of them truly assimilate in other languages.
4 x
LyricsTraining now has Finnish and Polish :)
Corrections welcome


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests