The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

General discussion about learning languages
User avatar
SpanishInput
Yellow Belt
Posts: 97
Joined: Sun Sep 26, 2021 3:11 pm
Location: Ecuador
Languages: Spanish (N), English (C2), Mandarin (HSK 5)
x 469

The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby SpanishInput » Tue Nov 09, 2021 3:08 pm

An n-gram is just a general term for any group of words.
Individual words are 1-word n-grams.
"lo que" is the most common 2-word n-gram in spoken Spanish. It's actually more common than these individual words: "todo, como, muy, su, o, al, así, esta", and way, way more common than any verb conjugation you might have been forced to memorize. Actually, the only two verb forms more common than "lo que" are "es" and "está". "Lo que" is much more common, and orders of magnitude more frequent, than words usually introduced in the first lessons of a course, such as "hombre" and "mujer". If we make a list of all n-grams in spoken Spanish, including 1-word n-grams (aka individual words), "lo que" is #31. This means any course should be teaching it in the very first few lessons. But I've seen learners who know a lot of grammar terminology and verb conjugations and have been learning for months or even years, and yet are tripped when they see or hear "lo que".

"Lo que" is just the tip of the n-gram iceberg. You can find lots of common n-grams that include only the 100 most common Spanish words, and yet can trip people who've been learning for months:
no tiene nada que ver
a mí me parece que
a no ser que
siempre y cuando
hace mucho que
lo de siempre
a lo mejor

As Michael Lewis states in The Lexical Approach, actually not a lot of what native speakers say can be considered original. We just unconsciously put together a bunch of n-grams when we speak. When we speak, we don't grab a bunch of vocabulary ingredients and cook them with a bunch of grammar rules. We just use these pre-cooked chicken nuggets.

The nice thing about n-grams is that they include both vocabulary and grammar in nice little bite-sized packages.

I've seen learners waste time with theoretical verb conjugations that are actually almost never (if ever) used. Those are like theoretical elements in the periodic table. They're just included in apps and books for completeness. They might be possible, but they're not probable. And if you want to use your study time efficiently, you must focus on the probable, not the possible. This is a living language, not theoretical physics, after all.

So, what can you do? You could use AntConc (free tool for linguists) to analyze the text you're reading or the subtitles of the TV show you're watching, to find the most common n-grams. The data is going to be raw and without explanations so you might want to go through them with your teacher. You can use AntConc (I have no connections to AntConc, BTW, in case you guys think I'm spamming here) to then find real examples for further clarification. This is much better than the theoretical approach. Here's a list of the most common 5-word n-grams in Spanish to get you started. I extracted this from a corpus of Netflix subtitles. All of these appear in at least 20% of the shows in the corpus:

lo que pasa es que
no tiene nada que ver
es la primera vez que
la verdad es que no
tiene nada que ver con
y eso es lo que
lo que tienes que hacer
no sé de qué me
todo va a estar bien
lo único que quiero es
vamos a hacer una cosa
yo no tengo nada que
a mí no me gusta
después de todo lo que
me di cuenta de que
lo que tengo que hacer
yo no me voy a
eso no va a pasar
no no no no no
no tengo nada que ver
por qué no me lo
te das cuenta de que
a mí me parece que
no voy a permitir que
no voy a dejar que
voy a decir una cosa
no es la primera vez
no me voy a ir
lo que te voy a
todo va a salir bien
y qué vas a hacer
por qué no me dijiste
de una vez por todas
no va a pasar nada
lo que vamos a hacer
no te das cuenta de
que pasa es que no
a mí no me importa
y la verdad es que
que a mí no me
ya te dije que no
16 x

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2132
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4869

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby MorkTheFiddle » Tue Nov 09, 2021 7:01 pm

Good idea, and not just for Spanish.
BTW, several posts in this forum mention AntConc. Not knowing how to use it to find ngrams, this Youtube video helped. .
Thanks for the post.
6 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

User avatar
luke
Brown Belt
Posts: 1243
Joined: Fri Aug 07, 2015 9:09 pm
Languages: English (N). Spanish (intermediate), Esperanto (B1), French (intermediate but rusting)
Language Log: https://forum.language-learners.org/vie ... 15&t=16948
x 3632

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby luke » Tue Nov 09, 2021 7:06 pm

SpanishInput wrote:An n-gram is just a general term for any group of words.
Individual words are 1-word n-grams.

You can use AntConc (I have no connections to AntConc, BTW, in case you guys think I'm spamming here) to then find real examples for further clarification. This is much better than the theoretical approach. Here's a list of the most common 5-word n-grams in Spanish to get you started. I extracted this from a corpus of Netflix subtitles. All of these appear in at least 20% of the shows in the corpus:

lo que pasa es que
.
.
.
ya te dije que no

That is a very interesting post! (Not just the part I quoted).

I'd noticed that mic in your avatar and decided to investigate :)

I'm surprised your youtube channel is not in your personal info.

I should put this in my log, but since I'm here and writing and seeing you've got some very interesting content, I'll plug this one:



And apparently someone has a website to go along with it:

SpanishInput - https://www.spanishinput.com/

I like what I've seen and heard so far!
6 x
: 124 / 124 Cien años de soledad 20x
: 5479 / 5500 5500 pages - Reading
: 51 / 55 FSI Basic Spanish 3x
: 309 / 506 Camino a Macondo

User avatar
Le Baron
Black Belt - 3rd Dan
Posts: 3578
Joined: Mon Jan 18, 2021 5:14 pm
Location: Koude kikkerland
Languages: English (N), fr, nl, de, eo, Sranantongo,
Maintaining: es, swahili.
Language Log: https://forum.language-learners.org/vie ... 15&t=18796
x 9564

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby Le Baron » Tue Nov 09, 2021 7:17 pm

Spanishinput wrote:I've seen learners waste time with theoretical verb conjugations that are actually almost never (if ever) used. Those are like theoretical elements in the periodic table. They're just included in apps and books for completeness. They might be possible, but they're not probable. And if you want to use your study time efficiently, you must focus on the probable, not the possible.

Most definitely. I didn't know the n-grams name until now, but I stumbled into the idea of it quite some time ago, as I'm sure others have. The fact of the matter is that only a handful of verb conjugations ever get used. This also goes for complicated vocabulary and even collocations/linking phrases that tend to be used less than people think (even if it is worth knowing them to sound 'educated').

I always tell people that in general native speakers in any language tend to talk in patterns and with many ready-made blocks of language put together like Lego. That is everyday language though, there are people who speak beyond that and can extemporise in a more florid manner; nevertheless they still use the common blocks to link all this together and being familiar with these gets you a long way.
8 x
Pedantry is properly the over-rating of any kind of knowledge we pretend to.
- Jonathan Swift

User avatar
tungemål
Blue Belt
Posts: 949
Joined: Sat Apr 06, 2019 3:56 pm
Location: Norway
Languages: Norwegian (N)
English, German, Spanish, Japanese, Dutch, Polish
Language Log: https://forum.language-learners.org/vie ... 15&t=17672
x 2192

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby tungemål » Tue Nov 09, 2021 7:53 pm

"n-gram" seems to be a term used in computational linguistics. Isn't that what we in language learning refer to as collocations, expressions, idioms and so on? I noticed when I studied Spanish that there are a lot of these small expressions that one needs to learn, and that it's not enough to know single words. Maybe more than in other languages.

But contrary to popular belief, I think all the Spanish verb conjugations are actually regularly used, so you need to eventually learn them. Maybe except the future subjunctive - I only saw that one once in the 3-4 books that I read.
9 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4782
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 15019

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby Iversen » Tue Nov 09, 2021 8:07 pm

The items called n-grams have (under different names) been known for a long time, and people have been advised to learn them at an early stage as a supplement to single words. The new thing in SpanishInput's input is that it apparently gives people (linguists) some tools to extract these thingies from TV shows and similar sources, and nobody can be against that. On the other hand there are some things which I personally aren't quite happy about.

Point one: learning a bunch of short phrases is like learning a bunch of long words, and for that reason I would not want to learn any expression unless I knew all its elements - partly because I then remember them better, partly because I want to understand the thought behind each expression - and here I have to say that very few expressions are purely nonsensical. The criterion for naming something an idiomatic expression is not that it is incomprehensible, but more that you couldn't have guessed that things were to be said in that way - and therefore you have to learn those expressions by heart. The n-grams are not all unguessable, but still something you have to learn - sooner or later.

The other problem is that the n-grams on the list come from things like Netflix. My preferred input is not films or series, but documentaries and non-fictional written materials. I don't say that the n-grams shouldn't be learned (and learned early), but just that my preferred input contains other communication patterns. And that includes 'rare' verbal forms etc. If I read more literature I would definitely have to learn even more supposedly rare forms because sometimes they do pop up - and they do pop up even in non fictional materials, even though the first and second person forms are less numerous there than in casual smalltalk. And speaking of grammar: maybe it is a personal preference, but I like to know the landscape before I venture into any part of it. It's like having a map over an area instead of just a explanation of how to go from A to B - I also want to know where C and D are, even though I may not go there soon.

I have the same relationship with the rare words in a dictionary: I don't intend to speak to anybody before I can have a real discussion, and that means that I have to be able to understand what the other person or persons say. And that also means that I have to know a lot of rare words and constructions, at least passively.

So the n-grams and the method to distill them from a stream of Netflix babble is definitely something valuable, but it should not be the only thing you learn at an early stage.
10 x

User avatar
Le Baron
Black Belt - 3rd Dan
Posts: 3578
Joined: Mon Jan 18, 2021 5:14 pm
Location: Koude kikkerland
Languages: English (N), fr, nl, de, eo, Sranantongo,
Maintaining: es, swahili.
Language Log: https://forum.language-learners.org/vie ... 15&t=18796
x 9564

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby Le Baron » Tue Nov 09, 2021 8:08 pm

tungemål wrote:I noticed when I studied Spanish that there are a lot of these small expressions that one needs to learn, and that it's not enough to know single words. Maybe more than in other languages.

It's almost identical for French/Italian and other romance languages (though I have no direct knowledge of Portuguese/Catalan etc). In French you have all the same sorts of bien que, autant que.. etc that are pretty much cognate with Spanish. And those collocations run to other language groups.
2 x
Pedantry is properly the over-rating of any kind of knowledge we pretend to.
- Jonathan Swift

Cainntear
Black Belt - 3rd Dan
Posts: 3526
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8793
Contact:

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby Cainntear » Tue Nov 09, 2021 8:20 pm

SpanishInput wrote:"Lo que" is just the tip of the n-gram iceberg.

Indeed, but the problem with icebergs is that they tend to sink large ships...
As Michael Lewis states in The Lexical Approach, actually not a lot of what native speakers say can be considered original. We just unconsciously put together a bunch of n-grams when we speak. When we speak, we don't grab a bunch of vocabulary ingredients and cook them with a bunch of grammar rules. We just use these pre-cooked chicken nuggets.

I agree that lo que is generally left until far too late in most courses, but the reason it isn't introduced earlier is because it relies on grammatical concepts that go far beyond n-grams. Relative clauses don't need to be considered complex (we all have them in our L1s) but I still think "lo que" makes much more sense if you've been dealing with plain "que" conjunction first, and if you go by n-grams, "lo que" is almost certainly more common than any other n-gram containing the conjunction "que", thus it leads to teaching in the wrong order.

The nice thing about n-grams is that they include both vocabulary and grammar in nice little bite-sized packages.

Well, some grammar. The problem is that there is more grammar involved in connecting the collocation/phrase to the rest of the sentence than there is inside the phrase. Pretty much everything with "lo que" is on the outside.
1 x

User avatar
AllSubNoDub
Orange Belt
Posts: 172
Joined: Thu Aug 26, 2021 10:44 pm
Languages: English (N)
Speaks: Spanish (B1+), German (B2 dormant)
Learns: Japanese (Kanji only)
Language Log: https://forum.language-learners.org/vie ... 15&t=17191
x 475

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby AllSubNoDub » Tue Nov 09, 2021 8:40 pm

SpanishInput wrote:

Sorry to derail, but may I ask how "easy" you are making this to comprehend for foreigners? I understand 100% of this, so I'm selfishly trying to gauge my own listening comprehension. A lot of my interests are language- or Spanish-related (the Spanish language itself), and I think many of the videos I watch speak in a slow, didactic manner with simple vocabulary. Me gusta tu acento, es muy clarito.
3 x

BeaP
Green Belt
Posts: 405
Joined: Sun Oct 17, 2021 8:18 am
Languages: Hungarian (N), English, German, Spanish, French, Italian
x 1990

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby BeaP » Tue Nov 09, 2021 9:02 pm

I share Iversen's opinion. This is an old concept. I think many of you know the French in Action programme. If you've used the audio and the book as well, you must have recognised that it's based on the acquisition of blocks of words, that are combined with each other in different ways. What's very important is that these blocks were written by a professional, experienced teacher with a university behind him. I would never use Netflix to guide my learning, and I'd never recommend it to beginners. Netflix is good for comprehensible input up to a certain point, but that's it. If you compare the language of a typical Netflix show with that of a talk show on youtube, you'll see the difference. I know that a lot of people hate cramming conjugations, but they are not useless. We had to learn all the German conjugations and the cases of the articles by heart at high school. I hated it, but after a year I used them without having to think about them, and I owed my German fluency in a big part to this knowledge. I also don't think that conjugated verbs are rare in the Spanish language. If you don't just want to speak, but also want to say something meaningful, you will need conjugated verbs. You will need blocks like the ones collected by Capretz, not the ones that are the most frequent on Netflix according to a computer programme.
5 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests