Iversen wrote:There is a reason that all major translating system now are based on statistics. This doesn't invalidate the attempt to create a comprehensive collection of grammatical rules and write them down in a grammar book, but the role of such a grammar book is mostly to summarize the linguistical behaviour of an unruly bunch of speakers in a pedagogical way. And that's one good reason to distrust any attempt to construct a contextfree grammar, shimmering above the ugly world like a Platonic idea.
As I mentioned recently in another thread, I own a rather
delightful 1,800 page doorstop of an English grammar. It focuses on descriptive linguistics—the goal isn't to promote some grand unified theory of linguistics, but rather to carefully and systematically catalog all the raw evidence that any such theory would need to account for.
And the book is full of all kinds of weird little details. I frequently find myself saying, "Huh, I never noticed that before. But it makes sense! What an odd construction. But I've always followed that rule." The grammar of (standard written) English is highly structured. And native speakers agree on the written dialect, despite the vast majority of the rules never being taught. As do many advanced non-native speakers. And, rather curiously, the larger GPT systems can all obey these rules, too, despite lacking any kind of Language Acquisition Device or Universal Grammar.
So there really is a rich, logical structure to grammar. And people who read a lot agree on the "standard written" dialect to a remarkable extent. But nobody teaches the whole thing, except possibly to linguistics graduate students. Except even they are delighted to consciously notice a grammatical pattern that they have been unfailingly obeying all along.
But even though grammar is highly structured, the only algorithms that can really cope with that structure rely heavily on probability and statistics. And this should be unsurprising to those who follow obscure branches of parsing theory.
To quote
Norvig (emphasis added):
Peter Norvig wrote:In 1967, Gold's Theorem showed some theoretical limitations of logical deduction on formal mathematical languages. But this result has nothing to do with the task faced by learners of natural language. In any event, by 1969 we knew that probabilistic inference (over probabilistic context-free grammars) is not subject to those limitations (Horning showed that learning of PCFGs is possible).
Now, "probabilistic context free grammars" are an over-simplified model of natural language. But even with CFGs, it's quite impossible to acquire language via strict logical deduction. You need to pay attention to frequencies, and to carefully weigh conflicting evidence, and to reach tentative (but well-supported) conclusions. And to do all that, your natural tools are probability and statistics. Chomsky's "poverty of the stimulus" is basically a way of saying "even with lots of data, you can't rule out many conflicting hypotheses, because you can't prove a negative." But if you see a pattern 1,000 times, and if you never see any counter-evidence, well, then you can't
prove anything. But you sure know which way to bet.
So we learn these elaborate, and surprisingly logical structures. And we ultimately do most of it via unconscious acquisition. Because the grammar we learn is
too damn big, and the majority of it is never taught anywhere. Chomsky chalks this mystery up to a poorly-dewscribed "Language Acquisition Device", which he sometimes acts as if it appeared in our brains by magic. But he severely underestimates how rich a grammar can be acquired via statistically inferring patterns from examples.
Iversen wrote:Next step: once clever grammarians have summarized a set of empirically based rules it is totally absurd not to learn from it. If I know how to inflect the majority of Bulgarian nouns it is not because I have listened to thousands of hours of Bulgarian or read 10.000 pages (I haven't) - it's because I have looked on some tables (and worked with them to produce my own summaries in the form of green sheets), and because of that I now know how to interpret the things I meet in my Bulgarian study texts.
Indeed. The central problem of an adult language learner is how to start the entire process. You can't just watch TV and hope to learn Mandarin from scratch in any sort of timely fashion, because it's just a wall of incomprehensible input. Children have the benefit of parents who patiently repeat "Please put on your pants" and "No, we do not put peas on the dog" thousands of times. (It's nature's own Anki deck!) And so for adults, it can certainly be practical to memorize a bunch of words and a bunch of grammatical rules, and use that as a starting point.
But those methods will not ultimately allow you to learn the 1,800 pages of grammar that native speakers and even advanced students know. (If only because nobody actually teaches most of those rules.) At some point, you can start reading freely, and listening to people speak. And then you'll start to pick up on all sorts of subtle patterns via sheer exposure.
Now,
personally I am quite happy to acquire much of my grammar from a Subs2SRS deck and grammatical overview printed on a laminated placemat. But that's not the only method that works to start the process!
Iversen wrote:So to revert to Krashen: I have adopted his formulation about comprehensible language, but rejected his way of using it (and his distrust of formal studies in particular).
My take on Krashen is that he's a man with one extremely good idea, and a gift for promoting it. Large amounts of comprehensible input really do help students to internalize a language, and many important rules are learned by sheer exposure. He's far from the first person to notice this—Alphonse Chérel certainly had some similar notion in mind when he drew up Assimil in 1929. And judging from the few accounts I've seen of language students before 1900, even the most brutal teachers of classical Greek who used the grammar-translation method still tended to assign massive amounts of translation. They might have used "intensive" methods, but they certainly used "extensive"
quantities of texts.
But Krashen took his admittedly valuable idea, and he promoted it to the exclusion of all else. He overlooked the role that practicing output plays in teaching
most people to speak. He cast aside many commonly-used tools that have long helped students gain that initial foothold.
None of this particularly bothers me, but that's because I have a personal soft-spot for people with one very good idea. Even when they're ultimately wrong about many things, they can still contribute to better theories.