Iversen wrote:I have spent a fair amount of time on counting my words in different languages, and I have taken my precautions against some of the pitfalls you may experience with this kind of activity. For instance I don't count word families - I count headwords because that's what you can find in dictionaries. I have generally skipped proper names because some dictionaries have them, others don't, but for instance in English it would be logical to count "Leghorn" for Italian Livorno because it definitely isn't the form used in the source language. My counts would however sudddenly swell if I now decided to include them. And if my English dictionary prints a word combination in bold then I include it, but not if it is printed with the same types as idiomatic expressions. I have in some cases tried to count expressions (separately, of course), but it was cumbersome, boring and pointless, given the very different profile of different dictionaries. So I dropped the iea, and after many word counts I now have a fairly consistent set of rules of thumb which I can apply across languages.
But there is one trick more - and it is probably the most important one: except in my earliest counts I have always included percentages. You can argue than there are more words in a big dictionary and consequently you should be able to tick more words, but apart from the pitifully tiny and the colossal dictionaries the percentages seem to be surprisingly consistent across dictionary sizes and languages. And I do feel that there is something substantial in the claim that I knew 36% of the words in my Afrikaans Prisma dictionary in 2009 and 66% in 2014. But unfortunately it is also a fact that scores can vay wildly with no apparently reason, like when I got 77% and 68% with two different Dutch dictionaries in 2014 - and the lowest percentage was found with the smallest one of the two dictionaries. Maybe I was just tired, maybe the small dictionary has fewer 'international' loanwords, or maybe it was just statistical hapax.
So I know how wary you have to be with vocabulary estimates. But my conclusion is nevertheless clear: vocabulary size IS relevant. My capability to read confidently with only few holes is directly proportional to the percentage of words I know in a language, and I have to know at least half the words from a standard size dictionary to be able to read without feeling strained - and two thirds is better. If I just know some 20-25 % (i.e. 8-10.000 known words out of 40.000 headwords in a typical midsize dictionary) I prefer having a dictionary within reach, unless I'm dealing with a bilingual printout.
But there are three caveats. The first is that reading can be more or less extensive, and if I'm just skimming a newspaper article I may not need to to know the meaning of all the abbreviations and institution names. If I'm studying a text intensively then every word should be understood, and it is a much more serious business not to know a certain word.
The second caveat is just knowing a word may not be enough if it is used in an idiomatic expression.
And finally word counts say very little about your ability to express yourself - especially at the lower levels. I have discussed this a lot with s-allard here and in HTLAL, and the funny thing is that I basically agree with him: knowing 2500 words well is more useful than knowing 25000 words purely passively. And if your native conversation partners aren't too dim they will know that they have to adapt to your dismally low level and not use words or constructions that are too difficult. The argument is NOT that you only get the chance to use 163 headwords in a given conversation - the relevant factor is that these words have been selected from a well-rehearsed subset of words from your target language. And surviving on a subset consisting of 2500 headwords is not unrealistic.
I actually wonder whether anyone counts words differently than counting headwords, which are in a dictionary. I somehow cannot imagine anyone counting a word family as one word, or the opposite, counting a conjugated verb as a dozen words. The only exception, which could confuse some of the beginning learners, could be tools like readlang, which add any word you click at as a flashcard as it is, and then just show you the number of flashcards.
Yes, the collosal dictionaries are bound to be more precise. I should definitely try your counting method sometime soon.
To the last part: I don't think there has ever been anyone saying the opposite, that you'd need many thousands words to express yourself at the lower levels. The argument that was reapeatedly so wrong was claiming that a C2 learner doesn't need that many words, just to know the small bit really really well. 2500 or 3500 is a ridiculously small amount of words for a C1 or C2 learner, even if you can use each of them in a dozen ways. In order to choose the appropriate vocabulary, you need a larger pool, and the choice of appropriate vocabulary is being judge during the language exams too. Both a B1 and C2 learners can only get the chance to use 163 headwords in a given conversation, but it should be obvious the C2 learner is choosing them from a much larger pile of options, taking the most appropriate one from it.
Surviving on a subste of 2500 headwords is not unrealistic, I've never said the opposite nor can I remember anyone else saying it. But living like a normally intelligent and educated person is unrealistic. It's just like the difference between surviving on bread and water and hoping to be well alimented and long-term satisfied on such a diet.
I also wouldn't go for the extreme of saying that a small amount of actively known words is more important than a large amount of passively known ones. We need both, the natives are not going to use our tiny vocabulary. And any interaction, spoken or written, consists of production and reception. And out of the two, comprehension can be more tricky and much more important. I am not pointing at anyone in person here (definitely not Iversen), but it sometimes seems like many learners are much more concerned with their ability to share their genious thoughts with others, who should be grateful for them no matter how hard they are to understand due to the butchering of the language, than with the ability to listen and perfectly understand the thoughts of others.