The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

General discussion about learning languages
User avatar
SpanishInput
Yellow Belt
Posts: 97
Joined: Sun Sep 26, 2021 3:11 pm
Location: Ecuador
Languages: Spanish (N), English (C2), Mandarin (HSK 5)
x 469

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby SpanishInput » Tue Nov 09, 2021 9:18 pm

Iversen wrote: The other problem is that the n-grams on the list come from things like Netflix. My preferred input is not films or series, but documentaries and non-fictional written materials.


Iversen, thank you for commenting! I've read a lot of your posts! I read your entire series of posts on the different approaches to language acquisition and I've learned a lot from it.

Yes, Spanish language shows on Netflix are not exactly my cup of tea either. It's just that at the moment Netflix happens to be the number one source of Spanish content with parallel English translations, and it has also been the easiest to harvest source of text for my corpus.

I plan to start collecting Spanish texts to create a similar corpus of written Spanish. Do you have any favorite authors/newspapers/books/magazines in Spanish?

@AllSubNoDub: In that particular video I'm not doing anything special besides trying to speak clearly. In other videos I did use software tools while writing the script to make sure I wasn't using too many uncommon words. My accent is just a mix of both sides of my family, which come from different regions.

@BeaP: That's good! Is there a similar course for Spanish?

@luke: Thanks! I didn't mention any of that anywhere because I first want to get a feel of the forum and understand the rules well.

@LeBaron: So true. The other day a student asked me about a particular verb conjugation he was learning. It was a compund conjugation that included a form of haber. I did a search on the entire Netflix corpus, and could not find a single instance where it's used.

@tungemål: Some n-grams are idioms, but most are just raw data that needs to be further explained.

@Cainntear: Yup, sometimes these common n-grams can only be used with far more complex structures.
2 x

User avatar
Iversen
Black Belt - 4th Dan
Posts: 4787
Joined: Sun Jul 19, 2015 7:36 pm
Location: Denmark
Languages: Monolingual travels in Danish, English, German, Dutch, Swedish, French, Portuguese, Spanish, Catalan, Italian, Romanian and (part time) Esperanto
Ahem, not yet: Norwegian, Afrikaans, Platt, Scots, Russian, Serbian, Bulgarian, Albanian, Greek, Latin, Irish, Indonesian and a few more...
Language Log: viewtopic.php?f=15&t=1027
x 15040

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby Iversen » Wed Nov 10, 2021 12:09 am

SpanishInput wrote:I plan to start collecting Spanish texts to create a similar corpus of written Spanish. Do you have any favorite authors/newspapers/books/magazines in Spanish?


I'm probably not the right person to ask about about text collecting - I read homepages on the internet or listen to lectures at Youtube, but I don't collect and store them. The exception would be sci mags on paper, but only when I'm travelling - it would cost a fortune to get them shipped to Denmark. And being on paper they wouldn't really be useful for digital treatment. And they don't have a translation - though magazines from airlines might have them, and many touristical homepages have parallel versions. The bilingual texts I use for intensive study are mostly produced with Google translate, and I do store them when I have used them - but I throw my own scratchy study notes away when the piles become too high...

As for grammatical rarities in languages: I studied French (and a couple of other languages) at the university level during the 70s so I do know about grammars that get lost in details. For instance we used the famous Grevisse grammar "Le Bon Usage", and it struck me that many of its examples came from authors who were reputed for their contrived writing styles (people like André Gide and Céline, just to mention a few names). The problem is that many lesser minds have tried to write in the same way so as a student of French you need to know about those murky corners of the French grammar - but I had the impression that some of the things that book described in painstaking detail ONLY existed in two or three books of such authors - they had never caught on in the community at large, and they should only have been mentioned in footnotes as aberrant cases of language (mis)use.

But let me take a Spanish example: there are two past tense subjunctives, and in some cases only one of them can be used. I want to know from the start that they exist and what they look like, and maybe also that there are cases where only one is possible, but I wouldn't need to know the exact rules at that point (let alone being ready to recite those rules while standing on one leg in the middle of the night). I can learn the details when I start writing my own things, and then I'll be able to remember the rules because I can relate them directly to my own projects.

By the way: my favorite Spanish magazine is "Muy Interesante" (and in Portuguese its counterpart "Super Interesante").
4 x

User avatar
luke
Brown Belt
Posts: 1243
Joined: Fri Aug 07, 2015 9:09 pm
Languages: English (N). Spanish (intermediate), Esperanto (B1), French (intermediate but rusting)
Language Log: https://forum.language-learners.org/vie ... 15&t=16948
x 3632

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby luke » Wed Nov 10, 2021 1:25 am

BeaP wrote:I share Iversen's opinion. This is an old concept. I think many of you know the French in Action programme. If you've used the audio and the book as well, you must have recognised that it's based on the acquisition of blocks of words, that are combined with each other in different ways. What's very important is that these blocks were written by a professional, experienced teacher with a university behind him.

You will need blocks like the ones collected by Capretz, not the ones that are the most frequent on Netflix according to a computer programme.

I strongly agree with both of you, but want to go on a tangent on how these 5 n-gram snippets could be put into a useful program.

I'm more familiar these days with FSI drills than French in Action, but they share several common traits.

* Extensive full courses
* Mixture of readings, listenings, drills, and writing exercises.

I mention FSI because there are a couple of drill types that would be perfect for this sort of NetFlix n-gram focused training.

By the way, many of those n-grams are in FSI. I haven't looked for them all, but "no no no no no" is not in there :lol:

I'm just getting into NetFlix shows because I want to improve my understanding of colloquial speech. It's challenging beyond the normal because it's hard to find characters worthy of emulation. Maybe I haven't looked hard enough. If there's tips on that point, please share. :)

On to my idea, if it isn't clear already... Using an example, "lo único que quiero es"...

Replacement drill style:
Lo único que quiero es un nuevo auto. (example phrase)
abrigo (prompt)
Lo único que quiero es un nuevo abrigo. (response)

But if one has the actual NetFlix sentences, one could pull out 7-21 real examples that could be used as the "prompts", and they're probably more exciting than a new overcoat. :)

Similarly, the Variation drill style could also be used:
The only thing I want is a puppy. (prompt)
Lo único que quiero es un perrito. (response)
The only thing I want is for you to be to be happy. (prompt)
Lo único que quiero es que estés feliz. (response)

Of course, there again you could use actual sentences from NetFlix.

I agree with another comment that SpanishInput has a great accent.

And to tie this in with tutoring, imagine you have these sort of exercises available in e-format in advance and a student has access. They can practice in advance, then the drilling could be an on the fly tutoring session. Perfect, in that you can on the fly tailor to your pupil and their interests and abilities.

The pupil wins because they're getting common collocations and assuming they're wanting to improve their NetFlix comprehension. The tutor wins because there's a fixed topic and can go into pronunciation or semantics or whatever the pupil needs.

Where does one find NetFlix transcripts? Subtitles are helpful, but it would be nice to be able to read them away from the TV.
5 x
: 124 / 124 Cien años de soledad 20x
: 5479 / 5500 5500 pages - Reading
: 51 / 55 FSI Basic Spanish 3x
: 309 / 506 Camino a Macondo

User avatar
AllSubNoDub
Orange Belt
Posts: 172
Joined: Thu Aug 26, 2021 10:44 pm
Languages: English (N)
Speaks: Spanish (B1+), German (B2 dormant)
Learns: Japanese (Kanji only)
Language Log: https://forum.language-learners.org/vie ... 15&t=17191
x 475

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby AllSubNoDub » Wed Nov 10, 2021 1:48 am

luke wrote:Where does one find NetFlix transcripts? Subtitles are helpful, but it would be nice to be able to read them away from the TV.


FSI was the first thing that came to mind for me too (as well as using cloze exercises in an SRS).

https://subscene.com/ has tons of subtitles in SRT format. They basically read like a script when you open them up in a text editor, except they have timestamp info above each line (which you could remove if it really bothers you).
1 x

User avatar
SpanishInput
Yellow Belt
Posts: 97
Joined: Sun Sep 26, 2021 3:11 pm
Location: Ecuador
Languages: Spanish (N), English (C2), Mandarin (HSK 5)
x 469

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby SpanishInput » Wed Nov 10, 2021 1:50 am

luke wrote:Where does one find NetFlix transcripts? Subtitles are helpful, but it would be nice to be able to read them away from the TV.


Easiest way: Use Language Reactor or Glotdojo to export the text.

Harder, more complete way: Use the Netflix Subtitle Downloader script with Greasemonkey or Tampermonkey to download all episodes of a show. Then use the batch file processing feature of Subtitle Edit to get rid of formatting, convert from vtt to srt and then again to convert from srt to txt. Optional: Use the subtitle tools website to also remove song lyrics, which are often at the beginning of each episode if the show has a theme song. After you get your small collection of .txt files, you can now use AntConc to find the most common n-grams in that collection, and also to find examples of any n-gram you're interested in. This, in a nutshell, is what I did, but for thousands of episodes.
3 x

User avatar
luke
Brown Belt
Posts: 1243
Joined: Fri Aug 07, 2015 9:09 pm
Languages: English (N). Spanish (intermediate), Esperanto (B1), French (intermediate but rusting)
Language Log: https://forum.language-learners.org/vie ... 15&t=16948
x 3632

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby luke » Wed Nov 10, 2021 2:59 am

AllSubNoDub wrote:
luke wrote:Where does one find NetFlix transcripts? Subtitles are helpful, but it would be nice to be able to read them away from the TV.

https://subscene.com/ has tons of subtitles in SRT format. They basically read like a script when you open them up in a text editor, except they have timestamp info above each line (which you could remove if it really bothers you).

Oooh, nice. They don't have Spanish, but this is clearly Chava (one of the main characters of Club de Cuervos) - even if the subtitles didn't say so:

[Chava] My dad took all of that anger we felt
and turned it into this beautiful city we now live in.
Into a highway, into modern hotels,
into an airport...
and above all, into this stadium.
A beautiful stadium...
[crowd cheering]
... and I know he's watching us from above. :)
1 x
: 124 / 124 Cien años de soledad 20x
: 5479 / 5500 5500 pages - Reading
: 51 / 55 FSI Basic Spanish 3x
: 309 / 506 Camino a Macondo

User avatar
luke
Brown Belt
Posts: 1243
Joined: Fri Aug 07, 2015 9:09 pm
Languages: English (N). Spanish (intermediate), Esperanto (B1), French (intermediate but rusting)
Language Log: https://forum.language-learners.org/vie ... 15&t=16948
x 3632

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby luke » Wed Nov 10, 2021 3:05 am

AllSubNoDub wrote:
luke wrote:Where does one find NetFlix transcripts? Subtitles are helpful, but it would be nice to be able to read them away from the TV.

https://subscene.com/ has tons of subtitles in SRT format. They basically read like a script when you open them up in a text editor, except they have timestamp info above each line (which you could remove if it really bothers you).

Oooh, nice. They don't have Spanish, but this is clearly Chava (one of the main characters of Club de Cuervos) - even if the subtitles didn't say so:

[Chava] My dad took all of that anger we felt
and turned it into this beautiful city we now live in.
Into a highway, into modern hotels,
into an airport...
and above all, into this stadium.
A beautiful stadium...
[crowd cheering]
... and I know he's watching us from above. :)

SpanishInput wrote:Easiest way: Use Language Reactor or Glotdojo to export the text.

Harder, more complete way: Use the Netflix Subtitle Downloader script with Greasemonkey or Tampermonkey to download all episodes of a show.

Holy smokes! We've got a cottage industry going on here.
1 x
: 124 / 124 Cien años de soledad 20x
: 5479 / 5500 5500 pages - Reading
: 51 / 55 FSI Basic Spanish 3x
: 309 / 506 Camino a Macondo

BeaP
Green Belt
Posts: 405
Joined: Sun Oct 17, 2021 8:18 am
Languages: Hungarian (N), English, German, Spanish, French, Italian
x 1990

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby BeaP » Wed Nov 10, 2021 7:56 am

SpanishInput wrote:@BeaP: That's good! Is there a similar course for Spanish?

I wanted to give you a short answer, but I feel that some things need to be made clear regarding this topic. Sorry for that. Please don't take it as an offence, but I believe that experienced learners with C2 languages under their belt must weigh their responsibility when suggesting methods to beginners or long-time learners with little success. So when you present something, you need to be very careful and also address the possible problems and pitfalls. If you don't, sooner or later others will come and either do it for you or ask you specific questions and press you to do it. I also need to state that I am not an expert in teaching theory or methodology. I hope those who are will come and correct me.
1. Regarding the title
The concept of learning words in blocks is not widespread in the training of listening comprehension, it's an element of speaking practice. If you can't understand spoken Spanish it's not because you haven't learned enough n-grams. That's a totally different topic.
2. Terminology
It's a huge problem that this concept is not defined in language learning, at least not in general knowledge. This is why it is always presented as something new, and a lot learners can't connect the dots or use their older resources for the same goal. It's never totally the same, but it's always very similar. I also thought about FSI, but I haven't learned from it, so I didn't want to mention it here. Anyway, the old-school grammar books did include a lot of sentence drills that helped you to automatise word blocks. As I see it, although unfortunately language teaching is mainly governed by trends, the fine tuning of the communicative method is in progress, and some old-school elements are implemented again. I've studied Spanish from Difusión coursebooks, and they started to emphasise the teaching of these blocks of words recently. A quote from the C de C1 book: "El apartado Trabajar el léxico (...) se presta especial atención a las colocaciones y combinaciones frecuentes de palabras con el propósito de que el estudiante sea capaz de manejar léxico específico y a adquiera riqueza y precisión." So the key words here are collocations and frequent combinations of words. I don't know a Spanish course that is completely based on this concept, but it's widespread to use it at least partially.
3. Unimportant grammar
We have to make a clear distinction between grammars aimed at learners and academic grammars.
Gramática básica del estudiante de español (Difusión)
Uso de la gramática española (Edelsa)
Gramática de uso del español (SM ELE)
None of these contain structures that are not used in everyday conversations. The value of a good grammar book is in giving you good example sentences. Why would a beginner or pre-intermediate student learn from academic books and descriptive grammars?
4. Are we usually taught the less common words?
Well, I don't think that course book writers want to stop us from learning a language. Although there are a lot of books with questionable professionalism, most of them can be used successfully to learn a language. In my experience if you study an A1 course book, you will get by as a tourist. You won't miss something painfully. Actually the list of word blocks you've collected from Netflix are very close to 'muletillas'. They don't contain specific information, but they give you some time to think about what to say. Also, words and expressions are normally taught in the order they are taught in because there are other methodological aspects to consider beside frequency. During learning, especially in the beginner phase I have to feel safe, I have to think that I understand everything clearly. System and understanding is a normal need of the human mind.
2 x

BeaP
Green Belt
Posts: 405
Joined: Sun Oct 17, 2021 8:18 am
Languages: Hungarian (N), English, German, Spanish, French, Italian
x 1990

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby BeaP » Wed Nov 10, 2021 8:44 am

luke wrote:I want to go on a tangent on how these 5 n-gram snippets could be put into a useful program.
I'm just getting into NetFlix shows because I want to improve my understanding of colloquial speech.

If you want to improve the understanding of colloquial speech I recommend you to watch videos on the Spanish with Vicente youtube channel. There are some videos where Vicente analyses talk shows. If you want more material, you can take a look at other videos from the same talk shows, they are there for free and they contain a lot of valuable material. Vicente also repeats the most useful expressions throughout his videos, to help you acquire them. To those with a lower level of Spanish (A2-B1) I can recommend the Español con Juan youtube channel. It's also based on the repetition of useful phrases and word combinations plus the help of visual clues and situations. Vicente also has courses you have to pay for. I think they are not necessary because there are ample of free videos, but if you're interested he has a course that teaches colloquial expressions extracted from Aquí no hay quien viva and Buscando el norte. You can also find videos from these TV shows on youtube. I agree with Vicente in that if you want to save time and study colloquial Spanish as quickly as you can, you need to watch TV shows that take place in the present, and have everyday people leading everyday lives for protagonists. Sometimes I think that I've seen more Spanish TV series in my life than a Spanish couch potato, and I strongly believe that I didn't learn the same amount of expressions from them. On the contrary, the differences were huge. I can second Vicente's recommendations based on my own experience.
5 x

Monty
White Belt
Posts: 47
Joined: Sun Mar 28, 2021 12:09 pm
Languages: ...
x 108

Re: The importance of n-grams or why you still don't understand spoken Spanish even though you've learned it for years

Postby Monty » Wed Nov 10, 2021 9:31 am

luke wrote:Where does one find NetFlix transcripts? Subtitles are helpful, but it would be nice to be able to read them away from the TV.


Subadub Chrome extension.
2 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests