s_allard wrote:ryanheise wrote:So slightly easier, but still on the difficult end (and looking at the vocabulary in Shrek compared to a typical conversational podcast, I am not too surprised by that.)
I’m really intrigued by the fact that the children’s movie Shrek requires more vocabulary than around 35000 adult podcasts most of which probably last 20 – 30 minutes each whereas Shrek lasts only 95 minutes.
Since I already explained before why length isn't significant, this time I'll try a demonstration rather than an explanation.
The experiment: I cut the movie Shrek into two halves.
Now, what do you expect the analysis to reveal? Do you expect that because each split is half the length that each split will be more comprehensible to people who have a smaller vocabulary?
Let's see. The original vocabulary size required for the whole was 11064.
After splitting it in two halves, the vocabulary size required for each half is:
1st half: 11869
2nd half: 10218
So the first half used slightly rarer words, but if we take the average of the two scores, we get something resembling the original number.
Now I'm not saying that we should get roughly the same vocabulary requirements in each half, but we should expect the average to come close to the original. The reason is that perhaps the first half of the movie was much more difficult to understand than the second half of the movie. So if we examine the whole, we'll get one figure, and if we examine each half, we'll get more localised figures.
Sure, podcasts can be very conversational but isn’t a movie all conversational ?
(edit: someone else already made this point that movies are more carefully scripted, but I can't remember who, sorry!)
I know that when I'm trying to be eloquent, or if I'm writing a script for a talk or lecture, I will search for the best word to use even if it is a rarely used word, but if I'm speaking casually off the top of my head, the vocabulary that I tend to find right there at the top of my head are those words I use or hear frequently. A movie is carefully scripted.
The figure of 11094 words of required vocabulary reported here can be explained by differences of methodology that I will leave to the author to explain. But the question remains : what words of Shrek are so different from those of 35000 podcasts ?
You may have some idea from the words pasted above. If a movie is set in the middle ages with kings and queens, and uses words like "beset", do you expect that is what people talk about frequently in casual conversation?
The interesting tidbit that strikes me is that with 357 words one gets 80% coverage in Shrek. So few words for a lot of coverage and probably 0 comprehension. And to go from 95% to 93% coverage you have to more than double the number of words. Fascinating indeed.
Even at 98% coverage, it's not really a great way to determine comprehension.