SpanishInput wrote:@Cainntear: Yup, sometimes these common n-grams can only be used with far more complex structures.
So you agree with my statement... but haven't said anything about my conclusion, which I believe undermines your main point.
I think what you've done here is simplify your message to the point where you weren't saying what you actually believe, which we all do at times, but the problem is that if you're not arguing what you believe, then anyone arguing against you is simultaneously wrong and right...!
SpanishInput wrote:Funny enough, videos by him are what inspired me to create a corpus of subtitles to be able to extract actual data instead of relying on instinct and personal opinions. I had a student who watched his channel and this student kept coming with idioms that I had not heard in my entire life. I showed my student, with data, that those idioms were exclusive to Spain and weren't even that common within Spain.
I agree with you on this (as a general principle -- I'm not familiar with the series in question):
All too often "how people speak" is used as a justification for an arbitrary series of phrases that are of limited use, specific to certain geographies or are just plain out of date.
However, I think the most important misunderstanding is that "how people speak" claims that there's something more important than grammar, when in fact grammar is how people speak. Try going a day in any European language without using the conditional mood or a single subordinate clause... it's not easy!
As stated in the white paper "Bert Cappelle, Natalia Grabar. Towards an n-grammar of English. Constructionist Approaches to
Second Language Acquisition and Foreign Language Teaching, 2016. ffhal-01426700",
"Corpus-based vocabulary teaching prevents certain ‘pet’ expressions in ESL/EFL, such as raining cats and dogs, from being taught too vigorously, and common but less favorite ones, such as right up your (or his, her, etc.) alley, from being ignored altogether."
So more research on n-grams should prevent the problem of teachers and course creators relying too much on their local dialects and personal biases and could lead to the creation of Spanish courses that better reflect a more universal view of the language.
Indeed, and what was missing from your first post is that you're talking about teachers, not about learners. I've got nothing against discussing what teachers should do, but as this is first and foremost a forum for learners, you do need to be explicit when you're talking about teachers.
I must clarify that n-grams are just raw data. Of course no learner should approach them without guidance. Ideally, course creators would use them to inform what to include in the content, particularly in listening exercises. Not necessarily incorporating them as "things to learn", but as things that are just there, in the content of the course.
Exactly -- raw data, and quite a naïve type of data at that.
Because n-grams only capture adjacency relationships, it doesn't properly capture all collocations, because not all collocations are n-grams, at least not in all their realisations.
For example, we can look at English's separable multi-part verbs -- "pick it up", "pick the box up", "pick up (something else)" -- a single collocation that is represented by a near-infinite number of n-grams, and while the bigram is common, it's an understatement of the frequency of the full collocation. ...sorry, while the bigram
s are common, because an n-gram model is going to pick up "pick up", "picks up" and "picked up" as different n-grams.
The problem is arguably worse with Spanish verbos pronominales (or in fact any transitive verbs) because now we've got multiple unigrams (cuidarse, cuidarme, etc, cuídate, cuídateme, cuídese, etc) multiple bigrams (me cuido, te cuidas, etc) and multiple longer and rarer n-grams that together still represent a single construct.
For example,
No te me vayas a dormir hijita has three words between the reflexive pronoun and the verb it qualifies, and it's still the same construct as "dormirse".
N-grams definitely have their uses (for years, Google Translate got passable results from an approach that was almost entirely based on them!) but it just seems to me that you're overselling them here. They are part of the story in informing teachers in what to teach and in what order, but only a part.