Iversen wrote: The other problem is that the n-grams on the list come from things like Netflix. My preferred input is not films or series, but documentaries and non-fictional written materials.
Iversen, thank you for commenting! I've read a lot of your posts! I read your entire series of posts on the different approaches to language acquisition and I've learned a lot from it.
Yes, Spanish language shows on Netflix are not exactly my cup of tea either. It's just that at the moment Netflix happens to be the number one source of Spanish content with parallel English translations, and it has also been the easiest to harvest source of text for my corpus.
I plan to start collecting Spanish texts to create a similar corpus of written Spanish. Do you have any favorite authors/newspapers/books/magazines in Spanish?
@AllSubNoDub: In that particular video I'm not doing anything special besides trying to speak clearly. In other videos I did use software tools while writing the script to make sure I wasn't using too many uncommon words. My accent is just a mix of both sides of my family, which come from different regions.
@BeaP: That's good! Is there a similar course for Spanish?
@luke: Thanks! I didn't mention any of that anywhere because I first want to get a feel of the forum and understand the rules well.
@LeBaron: So true. The other day a student asked me about a particular verb conjugation he was learning. It was a compund conjugation that included a form of haber. I did a search on the entire Netflix corpus, and could not find a single instance where it's used.
@tungemål: Some n-grams are idioms, but most are just raw data that needs to be further explained.
@Cainntear: Yup, sometimes these common n-grams can only be used with far more complex structures.