Le Baron wrote:ryanheise wrote:And if you're interested in what the words were in Shrek that were beyond 98% comprehension, they were:
measuring, gent, homey, hideous, ballad, freshness, caterer, dignified, bachelorette, sparkling, unorthodox, isle, decorator, meteor, firewood, wed, ail, jackass, rescuer, redhead, valiant, reek, huff, magnetism, knights, sonnet, gingerbread, stench, enchantment, saucy, sharpest, leaver, decapitate, dazzling, thine, preposterous, shilling, pitchfork, minty, brimstone, raincoat, damsel, ta, beset, twinge, colada, rickety, veal, steed, stalwart, uninvited, pheromone, deride, cruelly, highness, pocus, asthmatic, chatterbox, hocus, yonder, rotisserie, rescuing, bonehead, parfait, camping, resettlement, slobber, Knights, tartare, eking, tush, compensating, gumdrop, hmph, dolt, backstreet, toadstool, slobbery, housefly, superfly, outdrew, tubbing
It's interesting that "measuring" appeared in that list, even though the words were lemmatised. It turns out that in this instance, measuring was used as a noun, so it was counted as a distinct word: "I'll let you do the measuring when you see him tomorrow."
I'm wondering how things like 'housefly' or 'raincoat' make it outside the top level of comprehension. They seem fairly self-evident as compounds of simple words!
A very important observation here that goes to the heart of the theory and method at hand. First of all, I have alluded in my previous post that there is a confusion in the use of the terms comprehension and word coverage. They are obviously not the same thing. All serious vocabulary studies use the term coverage when referring to presence of words in a text or medium. We then have to determine what percentage coverage is required for comprehension which is the subjective appreciation of the meaning or significance of the message. We typically hear of 98% word coverage for unassisted comprehension of printed or scripted materials. Things are obviously more complicated when it comes to spontaneous conversations or even movies and presumably podcasts where the nature of the voices and other elements must be taken into account.
As can be imagined, I get very irritated when I see loose talk of 90% or 80% or 50% comprehension, but I usually just let it slide. However this is particularly important for the issue raised here. How can some pretty simple words like freshness, raincoat or housefly be beyond 98% comprehension ?
The answer, simply put, is that it has nothing to do with comprehension. It’s all about word frequencies. Freshness, raincoat, housefly, just like caterer and firewood among many others didn’t make the cut for 98% word coverage.
In fact, once the assumption of 98% coverage is necessary of unassisted comprehension is accepted, comprehension no longer enters into the picture. This leads to some interesting observations. We are told that according to chart1 in this thread, Shrek rates a 13,999, meaning – if I understand the chart correctly – that it requires more vocabulary than approximately 36000 of the 40000 podcasts in the corpus here. As for difficulty, it’s the same thing.
I find it intriguing that a movie meant for children requires more vocabulary and is more difficult than the vast majority of 40000 podcasts aimed at adults. At least that’s my interpretation of the charts – I might be wrong.
The big question of course is what do children and even adults understand when watching this movie and how do they do so ? Obviously the movie is more than just a list of words and this is why I always wonder what is the significance of all these vocabulary studies for us language hobbyists. I can see the utility in formal language classes and learning materials design but anybody who has tried to learn a language from a word-frequency list knows that it doesn’t work very well.
The only conclusion I come to is that a learner of English would do very well to watch Shrek many times and study every word. It may be more useful than watching thousands of podcasts.