Vocabulary Tests?

Ask specific questions about your target languages. Beginner questions welcome!
s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: Vocabulary Tests?

Postby s_allard » Tue Jan 30, 2018 8:23 pm

Although I have been forever typecast as the small vocabulary guy, I want to dispel the idea that a small number of word families means a limited vocabulary or limited means of expression. Quite the contrary, it's a question of depth over breadth.

While certain topics require specific technical terminologies, many meanings and nuances are conveyed by the same words used in many different ways. This is particularly striking with verb forms in French. The use of the various tenses, collocations, idioms and other formulaic combinations can transform those common verbs like faire, savoir and donner into very rich tapestries of meaning.

A sign of a very high level of proficiency in French is precisely this ability to use these common words in sophisticated French constructions such as faire tenir, faire savoir, faire faire, se laisser faire, s'en faire, s'y faire, etc. Another sign of a high proficiency user is the presence of the pronouns en and y. Plus, of course, advanced use of pronominal verbs, not of the run-of- the-mill type called reflexive verbs, but in the form of the pseudo-passive or in impersonal constructions. And I hardly need to mention the various uses of the subjunctive mood.

All of these indicators of mastery of French have nothing to do with large sets of word families. Quite the contrary we are looking at a very tiny number of word families. This is what I call depth. Breadth is of course necessary for terminology but depth is the key to proficiency.
3 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23127
Contact:

Re: Vocabulary Tests?

Postby rdearman » Wed Jan 31, 2018 9:28 am

rdearman wrote:They say you can estimate the size of your vocabulary as pN where N is the number of words in the dictionary and p is an estimate of the proportion of words in the dictionary that you know. If, for example, you sample 40 words from a dictionary of 100 thousand words, and you know the meaning of half of them, then the estimated size of your vocabulary is 50 thousand words.

OK, I tried this and the results were surprising.

I took my mono-lingual French dictionary with 48,000 definitions and I flipped through pages randomly. I thought I would always try the word in the top right corner of the right side page so that it was randomly selected. I took 39 samples of which I knew 15 words. This works out to 18,461 out of 48,0000 words. So according to the charts I've seen:

Image

This makes be somewhere between C1/C2. I can tell you with some certainty that is bollocks, I'm not Cx in French.

Another problem I had with this sample was that I'd say a large portion of the words I knew were simply cognates of English words. This of course is the "William the Conqueror" bonus, (or if you prefer William the Bastard), where I know lots of French words because the French occupied England for awhile. I would say something like 50-60% of the words I knew were cognates. Unfortunately I didn't write down the words, I just flipped through the dictionary and looked at the words.

So is my passive vocabulary 18k+ words? I don't really know. I would have assumed less. Perhaps I just got lucky and found a lot of cognates. I don't have the dictionary to hand at the moment, but I think tonight I might do 3 more samples, taking words from the other corners of the pages and do the maths again. I know that statistically it will be more accurate the more samples I take. I think this is probably the best method for figuring out personal vocabulary size from what I've found.
1 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4876

Re: Vocabulary Tests?

Postby smallwhite » Wed Jan 31, 2018 10:51 am

rdearman wrote:Another problem I had with this sample was that I'd say a large portion of the words I knew were simply cognates of English words.

Everyone's saying this but I don't get it. What's so "but" about knowing a French word because it's a cognate of English? Why shouldn't cognates count?

rdearman wrote:I can tell you with some certainty that is bollocks, I'm not Cx in French.

What do you think is your vocabulary level in French?

rdearman wrote:So is my passive vocabulary 18k+ words? I don't really know. I would have assumed less. Perhaps I just got lucky and...

How many words do you think you know?

How do you think the CEFR-vocabulary table should look like?
A1 = ____ words
A2 = ____ words
B1 = ____ words
B2 = ____ words
C1 = ____ words
C2 = ____ words
0 x
Dialang or it didn't happen.

Cainntear
Black Belt - 3rd Dan
Posts: 3469
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8664
Contact:

Re: Vocabulary Tests?

Postby Cainntear » Wed Jan 31, 2018 11:42 am

smallwhite wrote:Everyone's saying this but I don't get it. What's so "but" about knowing a French word because it's a cognate of English? Why shouldn't cognates count?

Cognates should count (because they are a real thing) but the scale should account for the number of cognates in any given language pair.

Estimated vocabulary levels (whether against the CEFR or any other scale) are normally determined by taking a group of learners with a certified language level and then testing their vocabulary -- the average and range is then your estimate of what a speaker at each level usually knows. It's not a target, and it's not something we expect to measure.

Also, the CEFR is about functional skill and passive/receptive recognition of a word is not a functional skill -- it's a more abstract skill. That means the number is not a target, and vocabulary cannot be given a true CEFR score.

So we measure what we expect someone at that level to know. We expect that an absolute beginner of Spanish in Italy will have a very large passive vocabulary, and if they don't then by A1/A2 we expect that they'll have learned enough about sound equivalences (things like TT<->T, T<->D) that they'll suddenly have a passive vocabulary in Spanish that's probably more than half or their passive vocabulary in Italian. We expect a monolingual Japanese learner of Spanish to have a much, much smaller passive vocabulary.

To aggregate figures of people with such different backgrounds is worthless, as it doesn't tell us anything about what to expect from any real person.
3 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23127
Contact:

Re: Vocabulary Tests?

Postby rdearman » Wed Jan 31, 2018 12:11 pm

Cainntear wrote:
smallwhite wrote:
Everyone's saying this but I don't get it. What's so "but" about knowing a French word because it's a cognate of English? Why shouldn't cognates count?

Cognates should count (because they are a real thing) but the scale should account for the number of cognates in any given language pair.

Yes, you are right they should count even if they are free. I had intended to make that point myself. One estimate is that an English speaker knows 29% of the French language before they even start studying! https://medium.com/@andreas_simons/the- ... b2db3542b3

Being a lazy person, I'm happy to take a 29% free bonus. Also this is passive knowledge, I don't use many of these words in French for the simple reason I don't use them in English. So although I know them when I read them, or hear them, I don't use the majority of them.

Cainntear wrote:To aggregate figures of people with such different backgrounds is worthless, as it doesn't tell us anything about what to expect from any real person.

Agree. I'm only interested in determining the aproximate level of vocabulary in my brain, not determine a CEFR level.

smallwhite wrote:What do you think is your vocabulary level in French?

I think my vocabulary is probably somewhere between 5,000-20,000 words passive, and actively probably 500-5000 words. But this is just guess work, which is why I'd really like to find an objective measure. :)

smallwhite wrote:How many words do you think you know?

See above

smallwhite wrote:How do you think the CEFR-vocabulary table should look like?
A1 = ____ words
A2 = ____ words
B1 = ____ words
B2 = ____ words
C1 = ____ words
C2 = ____ words

I agree with Cainntear that there probably isn't a direct correlational between vocabulary and CEFR level. Because the description of CEFR levels is about how well you use the language as opposed to how many words you know.

If you'll permit me to use an analogy with shooting. Having a lot of guns or bullets doesn't make you a marksman, it is the skill and accuracy you have when using your arsenal. I think of CEFR as marksmanship levels not size of the armoury.

I think I'm going to try this other self-test this weekend when I have more time.
Download a corpus sorted by frequency and mark the first 10,000 if I know them or not. Find the median point where known and unknown diverge.

I have a wonderful database corpus which EMK found ages ago, a French word database from the Université de Savoie, and I have the DB. The link to EMK's iPhyton page.
I figure I can do a dump of words to excel of the first 10-20k and mark them as know and unknown. I'd use the method shown in the link I gave earlier.
rdearman wrote:Here is a good explanation of how a test was created for English vocabulary. http://testyourvocab.com/details

Hopefully this, plus the additional samples form the dictionary would give me a reasonably reliable number of words. Bit more painful than using the online testing of course, but hopefully more accurate.
1 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: Vocabulary Tests?

Postby s_allard » Wed Jan 31, 2018 12:25 pm

rdearman wrote:...
I took my mono-lingual French dictionary with 48,000 definitions and I flipped through pages randomly. I thought I would always try the word in the top right corner of the right side page so that it was randomly selected. I took 39 samples of which I knew 15 words. This works out to 18,461 out of 48,0000 words. So according to the charts I've seen:

Image

This makes be somewhere between C1/C2. I can tell you with some certainty that is bollocks, I'm not Cx in French.
...
So is my passive vocabulary 18k+ words? I don't really know. I would have assumed less. Perhaps I just got lucky and found a lot of cognates. I don't have the dictionary to hand at the moment, but I think tonight I might do 3 more samples, taking words from the other corners of the pages and do the maths again. I know that statistically it will be more accurate the more samples I take. I think this is probably the best method for figuring out personal vocabulary size from what I've found.


I hate to say I told you so, but this is precisely the problem with all this vocabulary size testing stuff: it's useless. Putting aside for the moment questions of methodology for measuring vocabulary size using such a simple dictionary sample, the question is what does a score of 18,461 French words known mean?

What sort of ability to speak French does this score indicate? Does it map to a CEFR level? No. In fact, the CEFR system doesn't indicate vocabulary size at all. I have no doubt that greater proficiency implies knowing more words but but knowing a bunch of words does not mean proficiency.

The real measure of proficiency is the ability to do things in the language. So, instead of telling me how many words one knows in language X, talk to me in the language for a two minutes about the problem of food wastage in the world today.
3 x

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4876

Re: Vocabulary Tests?

Postby smallwhite » Wed Jan 31, 2018 12:32 pm

We can build a Core 5000 Swadesh List of our own. Or build 2 or 3 to cater for different schools of thought.
0 x
Dialang or it didn't happen.

User avatar
tarvos
Black Belt - 2nd Dan
Posts: 2889
Joined: Sun Jul 26, 2015 11:13 am
Location: The Lowlands
Languages: Native: NL, EN
Professional: ES, RU
Speak well: DE, FR, RO, EO, SV
Speak reasonably: IT, ZH, PT, NO, EL, CZ
Need improvement: PO, IS, HE, JP, KO, HU, FI
Passive: AF, DK, LAT
Dabbled in: BRT, ZH (SH), BG, EUS, ZH (CAN), and a whole lot more.
Language Log: http://how-to-learn-any-language.com/fo ... PN=1&TPN=1
x 6093
Contact:

Re: Vocabulary Tests?

Postby tarvos » Wed Jan 31, 2018 2:45 pm

s_allard wrote:
The real measure of proficiency is the ability to do things in the language. So, instead of telling me how many words one knows in language X, talk to me in the language for a two minutes about the problem of food wastage in the world today.


At what level?
2 x
I hope your world is kind.

Is a girl.

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4876

Re: Vocabulary Tests?

Postby smallwhite » Wed Jan 31, 2018 3:22 pm

RD: you can count the words in your flashcards and written texts as a start.

Alternatively, you can take a 48000-word English dictionary, preferably one of the same brand as your French one, sample count how many English words you know there, say 30 out of 39, and assume your French vocabulary is 15/30 of your English vocabulary. (The 15 from 15/39).
0 x
Dialang or it didn't happen.

User avatar
reineke
Black Belt - 3rd Dan
Posts: 3570
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Fox (C4)
Language Log: https://forum.language-learners.org/vie ... =15&t=6979
x 6554

Re: Vocabulary Tests?

Postby reineke » Wed Jan 31, 2018 3:28 pm

rdearman wrote:
rdearman wrote:They say you can estimate the size of your vocabulary as pN where N is the number of words in the dictionary and p is an estimate of the proportion of words in the dictionary that you know. If, for example, you sample 40 words from a dictionary of 100 thousand words, and you know the meaning of half of them, then the estimated size of your vocabulary is 50 thousand words.

OK, I tried this and the results were surprising.

I took my mono-lingual French dictionary with 48,000 definitions and I flipped through pages randomly. I thought I would always try the word in the top right corner of the right side page so that it was randomly selected. I took 39 samples of which I knew 15 words. This works out to 18,461 out of 48,0000 words. So according to the charts I've seen:

Image

This makes be somewhere between C1/C2. I can tell you with some certainty that is bollocks, I'm not Cx in French.

Another problem I had with this sample was that I'd say a large portion of the words I knew were simply cognates of English words. This of course is the "William the Conqueror" bonus, (or if you prefer William the Bastard), where I know lots of French words because the French occupied England for awhile. I would say something like 50-60% of the words I knew were cognates. Unfortunately I didn't write down the words, I just flipped through the dictionary and looked at the words.

So is my passive vocabulary 18k+ words? I don't really know. I would have assumed less. Perhaps I just got lucky and found a lot of cognates. I don't have the dictionary to hand at the moment, but I think tonight I might do 3 more samples, taking words from the other corners of the pages and do the maths again. I know that statistically it will be more accurate the more samples I take. I think this is probably the best method for figuring out personal vocabulary size from what I've found.


That table was apparently put together by "Big Dog" aka leosmith. Researchers have identified nine aspects of vocabulary knowledge. You're fixated on written form-(possible) meaning connections involving cognates.

Cognates may share some meanings but not the meaning required in a given context, some words may be false friends and close cognates may have low levels of phonological transparency. In short, I believe you (and other forum members) are looking at this problem simplistically.
0 x


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 2 guests