Re: Lilly's log - French, Russian, Spanish and Italian
Posted: Sun Jul 16, 2017 7:13 pm
Russian
So, today I've been messing about with text analysis to see how Russian and French LWT numbers relate to each other. With French I felt comfortable to read with good precision after I hit roughly 28,000 word forms in LWT. After fiddling about with LWT I managed to export all the known terms as a txt file. I ran this list through AntConc with a French lemma list and it came up with 9,800 separate words, ~3:1. So, roughly 10,000 words, that kinda makes sense. Now for Russian there was no pre-made lemma list and I didn't want to mess about with any kind of programming for this, so I found a neat little python script that does the job for Russian. After running my 11,960 known word forms in Russian through that one it came up with a list of 1500 words. That would be roughly 8:1! Well, that's not very surprising, since Russian has an awful lot of cases. So, for 9800 words in Russian I will have to deal with about 78400 word forms, yikes! I have already 24000 word forms in the database in total, so if I want to get to those kinds of numbers I will have to plan a total of ~630h of reading. So far I'm at 193h, so I still have a bit to go. The question is, will I need more or less words to get to 85%+ known vs. unknown ratio? I will let you know when I get there!
EDIT: The numbers for Russian are wrong because the script wasn't counting all the words in default mode. Total number of words in the database 9799 with 24,484 word-forms. Number of words learned: 4373 words! Phew! That sounds a little less depressing
It made me wonder, would using Anki be more efficient? Let's do a bit of math. Once upon a time I learned about 1200 words with Anki in about 50h (and promptly forgot them again, but that's a different story). So about 400h of Anki would probably do. Reading 5000 pages takes me about 150h in French, so I'd consider that a bare minimum. So even at a pretty decent reading speed I'd already be at 550h. Add to that a bit of puzzling about thanks to inexperience with strange word order and recognising the more arcane participle constructions, not recognising words etc. and we're at around 600-650h. In the end it's probably the same, only that I wouldn't be able to bear 400h of Anki and that I probably remember the words better now because of all the context. It also seems that after almost 200h it's actually starting to become proper fun, whereas with the other strategy it would be 400h of guaranteed torture. At least the dialogue heavy chapters are fun already at about 10-12% new word forms, and even the description dense chapters are already at 16%, close to becoming comfortable. In any case, it's definitely enough to keep me going for 6h+ just to see what's going to happen next in the story.
I think I'm almost done with the difficult part! For the next 6WC I will definitely do a lot of Russian reading now that it's getting less straining. It has taken a bit longer than I had hoped - speaking in months -, because it was too much of a strain to read for a long time, but I'm definitely getting there now.
So, today I've been messing about with text analysis to see how Russian and French LWT numbers relate to each other. With French I felt comfortable to read with good precision after I hit roughly 28,000 word forms in LWT. After fiddling about with LWT I managed to export all the known terms as a txt file. I ran this list through AntConc with a French lemma list and it came up with 9,800 separate words, ~3:1. So, roughly 10,000 words, that kinda makes sense. Now for Russian there was no pre-made lemma list and I didn't want to mess about with any kind of programming for this, so I found a neat little python script that does the job for Russian. After running my 11,960 known word forms in Russian through that one it came up with a list of 1500 words. That would be roughly 8:1! Well, that's not very surprising, since Russian has an awful lot of cases. So, for 9800 words in Russian I will have to deal with about 78400 word forms, yikes! I have already 24000 word forms in the database in total, so if I want to get to those kinds of numbers I will have to plan a total of ~630h of reading. So far I'm at 193h, so I still have a bit to go. The question is, will I need more or less words to get to 85%+ known vs. unknown ratio? I will let you know when I get there!
EDIT: The numbers for Russian are wrong because the script wasn't counting all the words in default mode. Total number of words in the database 9799 with 24,484 word-forms. Number of words learned: 4373 words! Phew! That sounds a little less depressing
It made me wonder, would using Anki be more efficient? Let's do a bit of math. Once upon a time I learned about 1200 words with Anki in about 50h (and promptly forgot them again, but that's a different story). So about 400h of Anki would probably do. Reading 5000 pages takes me about 150h in French, so I'd consider that a bare minimum. So even at a pretty decent reading speed I'd already be at 550h. Add to that a bit of puzzling about thanks to inexperience with strange word order and recognising the more arcane participle constructions, not recognising words etc. and we're at around 600-650h. In the end it's probably the same, only that I wouldn't be able to bear 400h of Anki and that I probably remember the words better now because of all the context. It also seems that after almost 200h it's actually starting to become proper fun, whereas with the other strategy it would be 400h of guaranteed torture. At least the dialogue heavy chapters are fun already at about 10-12% new word forms, and even the description dense chapters are already at 16%, close to becoming comfortable. In any case, it's definitely enough to keep me going for 6h+ just to see what's going to happen next in the story.
I think I'm almost done with the difficult part! For the next 6WC I will definitely do a lot of Russian reading now that it's getting less straining. It has taken a bit longer than I had hoped - speaking in months -, because it was too much of a strain to read for a long time, but I'm definitely getting there now.