How not to learn Spanish: Building too much stuff, not studying enough

Postby **emk** » Thu Mar 21, 2024 12:13 am

Today's study. After several evenings of coding, I worked on importing a new song (see below). But mostly just card reviews.

52 cards in 19.5 minutes (while walking).
15 new cards.

Major new feature: DIRECT EXPORT TO ANKI! We now support the full power of AnkiConnect. This means you no longer need to create models and templates, and you no longer need to manually import CSVs and media files! Between this and the new AI-based transcription and translation features, it has never been easier to use substudy. Compared to a month ago, I think I've eliminated at least 90% of the messing around.

Things you still need to know:

Installing and working with (unsigned) command-line tools.
Installing Anki plugins and setting up AnkiConnect. This may involve changing your local firewall rules.
Basic Anki concepts.

Let me walk you through the process. Let's start with a song I bought years ago. This one has lots of simple, clear audio:

If I have a DRM-free music file, I can transcribe it, translate it, and turn it into Anki cards in under a minute. Let's start with a transcription:

At this point, I opened up the SRT file in a subtitle editor, and double-checked it. I found about 5 errors this time, most of them involving Lo oigo todo el tiempo. After fixing these and saving, I translated the subtitles and exported them to Anki:

This will take care of creating the necessary Anki model and card templates, if you don't have them already. And if you're working with songs, then you can just pass "--skip-duplicates", and you won't get 3 copies of the refrain. You can pass "--tags" multiple times if you like to add lots of tags.

And the cards all appear in Anki!

You can download the newest substudy from the usual places!

Code: Select all

substudy transcribe eres_para_mí.m4a --example-text eres_para_mí.txt > eres_para_mí.es.srt

substudy translate eres_para_mí.es.srt --native-lang=en > eres_para_mí.en.srt

substudy export anki eres_para_mí.m4a eres_para_mí.es.srt eres_para_mí.en.srt --deck "Español::Música" --tag "substudy" --skip-duplicates

Highlights from today's reviews. Lots of good stuff today during my walk. But lots of hard cards from a few days ago are already becoming easier.

The first card is kind of a garbage card: It has four loosely-related vocabulary words, pronounced very quickly. If I didn't like this song so much, I'd just delete the card. Deletion is always a good idea! Instead, I added a fun emoji hint, and I lowered my standards for this card. :lol:

If I can mostly understand it while staring at the emoji, I'll pass it.

Remember, no card ever matters very much. I'm distilling interesting input, not killing myself to memorize isolated facts perfectly. I have faith that if something's important, I'll see it again soon. If I don't see it again, then I didn't need to know it!

The second card has seis mil milliones. So Spanish maybe does "thousand million" instead of "billion", like many languages? Meh, what's a factor of a thousand among friends? :roll:

I do notice lots of little details like this when I review these cards—this kind of substudy deck can be a very intensive activity if you keep your eyes open.

I'm still getting beat up by Spanish's [β], and by relative clauses with no subject pronoun. The real-time, automatic-listening parts of my brain don't really believe in either yet. But, I mean, it's fine, I'm making progress, and I can make lots more cards very easily. And my goal here isn't just to consciously understand these features, it's to make them automatic.

It might give the impression that I'm just goofing off with music and TV, and not actually studying. :lol:

But I pay a lot of attention to what things sound like, how they're said, and anything that looks weird. I'm not even trying to explain everything—I still haven't looked up my conjugation tables since restarting Spanish—but I'm definitely noticing.

Postby **emk** » Thu Mar 21, 2024 11:56 am

emk wrote:It might give the impression that I'm just goofing off with music and TV, and not actually studying. But I pay a lot of attention to what things sound like, how they're said, and anything that looks weird. I'm not even trying to explain everything—I still haven't looked up my conjugation tables since restarting Spanish—but I'm definitely noticing.

Just to elaborate on this, I think this is another subtle barrier to widespread adoption of these tools: I'm doing a bunch of stuff that isn't obvious. What do I pay attention to when reviewing cards? When do I decide to pass a card for the day? How much grammar am I figuring out because I've already studied grammar in other languages?

I don't think I can just hand an arbitrary student an Anki deck and automatically get good results. My mental engagement and my knowledge of languages are important parts of the process. (I don't think this would be a problem for anyone reading this log! But it might be an obstacle for a mass audience.)

What I'm doing is basically an Assimil passive wave, except with much faster audio, and with heavy use of spaced repetition to efficiently "overlearn" the material. Or alternatively, this is just a variant of the L-R approach. It's all the same idea, really. And If I miss Assimil's grammar notes, I can dig up my copies of a laminiated poster or a short grammar.

What's missing here, though, is Assimil's active wave. I'm just enough of a Krashenite to focus on learning from massive input. No matter what some people insist, you really can learn vast amounts by osmosis. But unlike Krashen, I don't believe that output is automatic.If I want output, I'm going to have to train it. Sure, I'm already getting fragments of output. But I'll eventually need to focus on it.

And the active wave is where you can't fool yourself any longer. Were you really paying attention? Did you ultimately interalize any of the details, or were you just settling for the gist?

So I will probably be experimenting with ideas for an active wave. Khatzumoto had some clever ideas for bulk production of cloze cards that I don't think anyone ever really talked about. Or I could just tag 100 promising cards and mark them for active production of a single line.

Postby **emk** » Fri Mar 22, 2024 12:56 am

In order to support an actual substudy UI, I'm going to need some way to store sets of media and subtitle files together. The plan has always been to use aligned media format.

I was just testing Anthropic's Claude 3 models, to see if they were viable translators. (Spoiler: Claude 3 Haiku, their smallest and cheapest model, translates MC Solaar quite well. And it doesn't lose track in long songs as easily as GPT 3.5. And it costs less to use.)

Anyway, I tossed in the command-line parsing code from Rust, and asked Claude 3 Opus (the expensive slow one) to summarize the current arguments. It gave me:

Code: Select all

substudy clean <subs>
substudy combine <foreign_subs> <native_subs>
substudy export aligned <video> <foreign_subs> [<native_subs>] [-o|--out <out>]
substudy export anki <video> <foreign_subs> [<native_subs>] --deck <deck> [--tag <tags>...] [--skip-duplicates]
substudy export csv <video> <foreign_subs> [<native_subs>]
substudy export review <video> <foreign_subs> [<native_subs>]
substudy export tracks <video> <foreign_subs>
substudy import whisper-json <whisper_json>
substudy list tracks <video>
substudy transcribe <video> --example-text <example_text> [--format <format>]
substudy translate <foreign_subs> --native-lang <native_lang>'

I had a little discussion with it, and discussed various ideas for reoganizing these around the new "*.aligned" format. This gave us the following proposal:

Code: Select all

substudy import <video> <foreign_subs> [<native_subs>] [-o|--out <out_aligned_dir>]

substudy export anki <aligned_dir> --deck <deck> [--tag <tags>...] [--skip-duplicates]
substudy export csv <aligned_dir> [-o|--out <out_dir>]
substudy export review <aligned_dir> [-o|--out <out_dir>]
substudy export srt <aligned_dir> --lang <lang> [-o|--out <out_subs>]
substudy export tracks <aligned_dir> [-o|--out <out_dir>]

(this is the odd one out)
substudy list tracks <video>

substudy subs clean <subs> [-o|--out <out_subs>]
substudy subs combine <foreign_subs> <native_subs> [-o|--out <out_subs>]
substudy subs import whisper-json <whisper_json> [-o|--out <out_subs>]
substudy subs transcribe <video> --example-text <example_text> [--format <format>] [-o|--out <out_subs>]
substudy subs translate <foreign_subs> --native-lang <native_lang> [-o|--out <out_subs>]

The idea would be that if you wanted to use a future UI to work with a media file, you'd want to put it into "aligned" format. This would give us the ability to easily display it, to attach notes, to tag data for export, etc.

And now that I actually have Anki export smooth, I've made my decision: The first piece of UI I want to build is the media player, with support for subs and for selecting cards to generate. But this may take a while. Still, this has the potential to turn into a nice little set of language-learning tools. Maybe not this decade, but...

And I do want to prep for the Super Challenge. I really think I should aim for the 2,500 page Challenge, and start by using interlinear texts. But I'm very happy with my sound-happy focus for now—there's some more stuff I want to get internalize before I allow a "reading voice" to overly contaminate my phonology.

(Actually, the OpenAI voices are far more realistic than early TTS voices, and I think they'd actually be OK for doing L/R for French. And I'm not someone to say that lightly at all.)

Postby **rdearman** » Fri Mar 22, 2024 6:53 am

Have you tried the open source Grok model?

https://www.geeky-gadgets.com/grok-1-llm/

Postby **emk** » Fri Mar 22, 2024 12:05 pm

rdearman wrote:Have you tried the open source Grok model?

I have not. It's an interesting idea, but I don't plan to pursue it. Let me explain my thinking!

There are two big challenges, both described here:

The model’s size makes it challenging to host locally, but it can be accessed through a subscription service on Twitter, with plans for a quantized version in the future.

I don't plan to add any substudy features that require a high-end (expensive) nVidia GPU plus a working copy of the CUDA development tools. I can run small models locally—I have a pretty high end laptop with CUDA support, but I only have about 4 GB of GPU RAM. Plus, getting any of this stuff to work is far more annoying than all the rest of the rough edges of substudy and Anki combined.

Then there's the paid version, which would require making some kind of paid Twitter development account. I have pretty much abandoned Twitter at this point, and they have a long history of burning third party developers. Plus, I wouldn't believe any marketing copy from one of Musk's companies anyways—he's been promising "autopilot" soonish on his Tesla cars for many years now, and yet they've driven straight into barriers at high speed. Musk is an unreliable hype artist, to put it politely.

Right now, all the AI features of substudy can be accessed using nothing more than an OPENAI_API_KEY. This is easy to sign up for, does everything I need so far, and I'm still using the first $10 that I added to my account.

The one competitor I might consider is Anthropic. Their Claude 3 Haiku model appears to be a better translator than GPT-3.5-Turbo, and it's substantially cheaper. But:

They make it harder to sign up for an API key. It took me three tries to make it through the process.
They don't have anything like OpenAI's excellent speech recognition or surprisingly natural text-to-speech.
Their models don't yet support "function calling", which makes it harder to get structured output out. They're working on this.

So the only reason I'd add Anthropic support right now would be for possibly translating longer written texts. Specifically, Haiku's lower cost, and its ability to handle longer passages consistently without losing track of what it's doing, are both potentially very useful.

This will sound terrible, but in 2024, I am honestly pretty happy I'm not a translator.

Professional translators have been under heavy pressure from AI for years, but the next few years will make it worse. The trend-line on these models is getting more than a bit scary, and I think people are underestimating how many jobs will be disrupted in the next 5-10 years. This isn't some small, localized issue like the Luddites and the textile automation. In fact, my job is one of the very next to be affected. And indeed, one of the reasons substudy development just sped up so much is that I'm very good at splitting my labor with CoPilot—I know how to set up up initial contexts so that it gets the right answer the first time, and I can proofread code at a glance. I'm at least 50% more productive when I use these tools. (And substudy has always been a project I use to try out new techs.)

substudy, at least, isn't putting any translators out of work, I don't think. It's such a niche idea, personalized transcripts and translations for language learners who are doing native input from the very beginning. You'd have to be ridiculously rich to pay someone to transcribe (say) a French movie just to make flashcards you couldn't share anyways.

Anyway, there's always a danger of this log straying too far into software development. So let's do another lyrics video!

Postby **emk** » Fri Mar 22, 2024 3:23 pm

Graphic novel test. Just a little test of Claude 3 Opus (the expensive one, from Anthropic). I'm going to use the Spanish version of Blacksad, a terrific noir series that's on my list to read once I can actually read Spanish. :lol:

(I mean, I can read this bit, but I mean in general.)

emk wrote:This is a scene from a Spanish-language graphic novel titled "Blacksad", about a cat PI. Could you please look at this image, transcribe the text in each panel, translate it to English, and explain what's happening here?

Claude 3 Opus wrote:The image shows two characters from the graphic novel "Blacksad" - an anthropomorphic cat wearing a trench coat and hat who appears to be a detective, and an otter or weasel character smoking a cigarette.

The text in the first speech bubble reads: "SOY WEEKLY, DEL WHAT'S NEWS. Y TU PARA QUE PERIODICO TRABAJAS, AMIGO?"
Translated to English, this means: "I'm Weekly, from What's News. And what newspaper do you work for, friend?"

In the second speech bubble, the cat detective replies: "PARA EL "BAD NEWS". TE RECOMIENDO QUE LEAS MI ULTIMO ARTICULO TITULADO "NO SOMOS AMIGOS"."
In English: "For the "Bad News". I recommend you read my latest article titled "We are not friends"."

The scene depicts the cat detective having a tense interaction with the otter/weasel character, who seems to be a journalist. The detective sarcastically tells the other character they are not friends, implying some animosity or rivalry between them, likely related to their respective professions in journalism and detective work.

There's no cigarette in the image, but this is otherwise accurate.

That cost me maybe US$0.10, so doing this would add up quickly! But these models can be really quite startingly clever, provided you pick the right model, and you stick to the things it's good at.

Some very old experiments with cloze cards. See my French log for some examples of clozing bits of Spanish text. I want to resume this experiment, and others, fairly soon.

CDR · Postby **CDR** » Sun Mar 24, 2024 4:10 am

emk wrote:Major new feature: DIRECT EXPORT TO ANKI! We now support the full power of AnkiConnect

This convinced me to try out Substudy, I wrote about it more in my log to not clog up yours :lol:

. Overall it was cheap to mess around with, even when some of the music I attempted to transcribe had the text file come back blank, it costed me just 20 cents so far!

emk wrote:Some very old experiments with cloze cards. See my French log for some examples of clozing bits of Spanish text. I want to resume this experiment, and others, fairly soon.

I am sure you are aware, but I wanted to bring up that Gabriel Wyner in "Fluent Forever" talks quite a bit about cloze cards. I think it is pretty much covered in your old post, but: Part of his method is to get sentences from a grammar book and ask yourself the following questions:

• Do you see any new words here?
• Do you see any new word forms here?
• Is the word order surprising to you?

He uses an example sentence:
My homework was eaten by my dog

And makes three cards (these are the fronts) [Square brackets is my comment]

My homework was eaten ___ my dog [New word, by, author believes these kind of grammatical words should be learned via cloze instead of regard flash card]
My homework was ___ by my dog. (to eat) [learning the word form]
My homework by my dog (was eaten) [learning word order]

I think this is already pretty well covered in your previous post, but thought I would at least mention that others agree with you :lol:

.

These also remind me of JLPT problems where one is given a sentence with many blanks, and you have to select which word belongs in the position of the star, and each possible selection of the multiple choice is part of the sentence:

Postby **emk** » Sun Mar 24, 2024 7:40 pm

CDR wrote:This convinced me to try out Substudy, I wrote about it more in my log to not clog up yours . Overall it was cheap to mess around with, even when some of the music I attempted to transcribe had the text file come back blank, it costed me just 20 cents so far!

I saw the difficulties you had with Portuguese and Japanese audio, and I have released substudy 0.6.4. This offers a new transcription mode, "whisper-srt":

Code: Select all

transcribe song.m4a --example-text=lyrics.txt --format=whisper-srt

This mode will bypass substudy's own internal conversion from Whisper's "Verbose JSON" to SRT format, and it will just ask Whisper to generate the SRT directly. When I tested this:

The new mode produces worse output for Spanish, with more artifacts and other weirdness. Sometimes it credits random fan subtitle sites, because it has learned that the closing credit music of certain series is usually accompanied by the phrase ¡Suscríbete al canal!. (If you look at the JSON data, it's 95% certain there's no actual speech there, but the built-in SRT generator errs on the side of including all sorts of non-existant dialog.)
I have no idea whether it will be better or worse with Portuguese.
I suspect it will give much better results than 0.6.3 does for Japanese. This mostly because 0.6.3 doesn't produce output for Japanese at all, probably because it's looking for spaces between words and Japanese doesn't do that.

But still, since your major problems involved the transcription, giving you a second transcription mode might help, especially for languages that don't use spaces.

CDR wrote:My homework was eaten ___ my dog [New word, by, author believes these kind of grammatical words should be learned via cloze instead of regard flash card]

Yeah, cloze exercises are classics for a reason. They're basically just school worksheets! :lol:

But generations of teachers have asked people to "fill in the blanks" because it works. You can attach some audio, add in spaced repetition, pull your sentences from personally interesting media, and all that good stuff. But the underlying method is timeless. You're given context, and you're asked to recall a small fact correctly. This exploits the testing effect, which measurably improves learning. (Even my "comprehension" audio cards exploit this—I'm given the audio, and I'm testing myself on the meaning.)

And I certainly agree with Gabriel Wyner about prepositions: You might as well go ahead and learn them in context. Because prepositions are cursed in every single language I've studied. (Warning: rant ahead!)

English has its infamrous phrasal verbs. French uses de when I expect à, and vice versa. Middle Egyptian even gets in on the fun, with odd choices of preposition in particular phrases.

Latin relies a bit less on prepositions, and more on cases. It's the usual tradeoff between fusional and isolating languages, but it's all the same thing. The ablative case, in particular, is even worse than sorting out the idiomatic use of de and à. You have so many weird ways to use the ablative:

The Ablative Of Separation, Privation, and Want
The Ablative Of Source
The Ablative Of Cause
The Ablative Of Agent
The Ablative Of Comparison
[...insert 10 more, and that's not a complete list, I don't think...]

In Latin class, the teacher would ask, "Which ablative is this?" and I'd quietly think, "It's the Ablative of Annoying Students".

Effectively, you can only translate a preposition, or a Latin ablative, in the context of a specific sentence. As soon as you say "à is to", you've overlooked a dozen important situations where it isn't. Prepositions are a minor frustration, but a long-lasting one.

I also think I might actually experiment with some slightly harder cloze cards this time though. Instead of clozing a word, I'll pick out some cards with very useful, very basic phrases, and cloze the entire line of dialog. This might feel a bit more like Assimil's active wave, and I'd like to see whether it helps get some entire phrases into my active vocabularly. If they're too hard, I'll suspend them and make easier cards.

Postby **emk** » Mon Mar 25, 2024 2:08 am

OK, time for some more assorted cards!

This one was originally recognized as pedidas, which didn't make any sense. But careful listening suggested it should have been despedidas, which fits a lot better. This is the first obvious problem I've noticed in this song. And if I'm listening carefully, a few bad transcriptions aren't a huge problem—I'll notice most of them, and either delete them or figure out how to fix them. The essential part is that most are correct.

For the most part, I could get 85% of the dialog on these cards the first time they came up. Te ves is a nice little expression to pick up. Keeping with my earlier rant about prepositions being annoyingly arbitrary, I see that it's No conozco a nadie et enseñe de.

My current frustration. But my current source of frustrations is Sí. Debe ser porque sé muy poco acerca de ese mundo de los espíritus. There's a lot going on here:

I haven't seen Debe a lot, and Debe ser is new.
Sé is one of my current headaches, a subclause without a subject pronoun. As sé interferes with French c'est—not severely, but enough I don't always pick it up correctly in rapid speech.
Muy poco is still slightly new.
So is acerca de.

Hearing all of these less familiar bits all at once at high speed makes this the hardest card I've seen in days. I eventually resorted to reading it very slowly out loud, multiple times. And I finally did manage to actually hear all of the words at full speed and match them to the correct meaning. But it wasn't easy, and it won't be super easy tomorrow.

But this is a low stress situation—if I get bored with this card, I'll just suspend it. And I'll see plenty more examples of literally everything which appears here, probably spoken more slowly and surrounded by familiar words.

It also occurs to me that there's a ridiculously simple exercise I could start doing: repeating the text on certain cards out loud, or even "shadowing" the audio as it's played. I, uh, probably should have thought of this by now. But maybe I just needed to get into the groove first.

elAmericanoTranquilo · Postby **elAmericanoTranquilo** » Mon Mar 25, 2024 10:09 am

Very cool stuff, emk! It's really interesting to think about what might be possible now that we have access to all of these fantastic underlying tools. By the way, have you checked out knowclip?

I went down a similar path last year (inspired by your previous work!), using knowclip to extract video with extra help from whisper / deepl and then importing the resulting cards into Flashcards Deluxe. I ended up pausing my work on it though, once I found Language Reactor. It ended up being more fun for me to step through the video in Language Reactor versus working through the cards. But a big part of that might be because I've never been able to get into the habit of studing with SRS cards

Another thing I came across back when I was working on this is that there are various forks of whisper. In particular, I was using stable-ts in order to get more accurate timestamps. I don't know if whisper itself has since made improvements in this area.

A language learners’ forum

How not to learn Spanish: Building too much stuff, not studying enough

DIRECT EXPORT TO ANKI!

Re: DIRECT EXPORT TO ANKI!

Re: How not to learn Spanish: Building too much stuff, not studying enough

Re: How not to learn Spanish: Building too much stuff, not studying enough

Re: How not to learn Spanish: Building too much stuff, not studying enough

Re: How not to learn Spanish: Building too much stuff, not studying enough

Re: DIRECT EXPORT TO ANKI!

Re: DIRECT EXPORT TO ANKI!

Re: How not to learn Spanish: Building too much stuff, not studying enough

Re: How not to learn Spanish: Building too much stuff, not studying enough

Who is online