ryanheise wrote:s_allard wrote:ryanheise wrote:There are techniques that can be applied to idioms (frequency dictionaries, BERT models), but the bigger picture is that a perfect system is not realistic for a one person who can only put in a finite amount of effort, and without Google's resources. My goal is only to build an approximate system that trades off accuracy for feasibility, and I hope that the less accurate system that's "possible" today is more useful than the accurate system that's "impossible" today but possible if you wait 20 years.
This is an excellent example of why I wrote “ I keep getting all these abstract and theoretical answers to my questions or examples that take us off on wild-goose chases”. How does this explanation help us evaluate the difficulty of understanding the podcast I presented? Is this podcast of a B2 or C1 or C2 level?
Why are you still asking me to do this difficult calculation for you? Can you really wait 20 years? I think you've overstepped your welcome, if that is the case.
I just told you that this is a huge amount of effort. Techniques exist to do what you're trying to do, yes, and I even gave you their names in case you are interested in pursuing them yourself, but for reasons explained, I am only one person with limited time and energy and so I only have the capacity to do the simpler, approximate calculations that provide immediate benefit. Doing the more accurate calculations are beyond my time and energy capacity. It is unfair of you to expect me to spend what time I have helping you with a calculation that is A) difficult to develop and finetune, and B) even more difficult to explain, C) even more difficult to explain to someone who is allergic to the abstract and theoretical, and D) not actually aligned with my own goals. Even if I did the 20 years of development and fine tuning, and I wrote a 20 page research paper explaining how it works on concrete examples, you would still not understand it without the appropriate theoretical knowledge. It is fine for you to have these goals, though, and that is why I at least tried to point you in the right direction in the above comment by giving you the names of techniques you could look into. That is really something you could have been grateful for, instead of being dissatisfied that I didn't meet your unrealistic expectations.
Also appreciate that the first time you asked me to explain a calculation, and a relatively simple one at that, I invested 1 hour of my time to design an example that was able to illustrate the essential features of the calculation. You then dismissed it out of hand, and said you don't want something that abstract. So I then invested another 3 hours of my time crafting another example and explanation to your new requirements. At the end of it, you said thanks but no thanks, and then gave up trying to understand it. And now you want me to go through that again, but this time not on the way my current calculation works, but you want me to do the difficult work of adding a new calculation that I don't currently do, and then explain that to you. To top it off, when I tell you this is difficult stuff, it's not feasible, I'm only one man, it will take 20 years, and "only" give you some pointers which you can take or leave, you pin this up as an "excellent example" of being unhelpful. It is easy to be dissatisfied when you don't get what you want, but you'd be better to respect the time of others.
…
Let me first apologize if my request requires a new calculation and 20 years of work. Being in a university environment myself, I know what goes into the practice of science. I’m also familiar with the process of peer review whereby other scientists look at my proposed articles and decide if they meet current standards and contribute to the advancement of the field.
So, please, don’t bother with my sample podcast. Don’t do anything. I can take care of it myself after pursuing some the leads or ideas that you suggested. I actually find the whole thing very enlightening and intriguing.
My starting point is a simple question that everybody here knows well : why are certain forms of the spoken target language easier or more difficult for us to understand ? This is extremely important because understanding is necessary for speaking.
Last night I was listening to a speech by a Mexican government official. I had the impression I understood every single word. I also felt that as if I were in this official’s shoes and speaking just like that because I was familiar with the subject and, very importantly, familiar with this style of formal Spanish.
On the other hand, I mentioned in an earlier post how I struggled with the recording of a lively discussion between three Mexican journalists because of a combination of difficulty deciphering exactly what was said, a series of proper nouns unknown to me and many idioms from informal Mexican Spanish.
In my exploration of the issues and thanks to recommendations by the OP I came across the following quote from a publication by IBM on natural language processing :
What makes speech recognition especially challenging is the way people talk—quickly, slurring words together, with varying emphasis and intonation, in different accents, and often using incorrect grammar. https://www.ibm.com/cloud/learn/natural ... processingThat last part about « often using incorrect grammar » really caught my eye. Does this have major implications for us language learners ?
One idea that comes to mind is that difficulty of understanding for language learners lies not in the text or spoken document itself but in our level of knowledge of the language and the subject. In other words, there is no such thing as beginner, intermediate or advanced native speech. It’s the learner who is beginner, intermediate or advanced.
Returning to the example of a podcast of which I transcribed the first 1.5 minutes, it doesn’t take 20 years of calculations or complex analyses of recursive grammatical structures to see that the difficulty is in the mind of the listener. I don’t see anything difficult in this example. On the other hand, if you are not a native educated speaker of English there are probably a couple of things including some « incorrect grammar » and the use of the carrot and stick metaphor that must be properly decoded.
So when I saw a title like
The statistical distribution of language difficulty, my curiosity was piqued. And a study of 40,000 podcasts which must include a lot of unscripted speech. I’m even more interested to see what insights this can bring to the language learning community.
It seems that the major insight is that natural speech samples fall into three buckets, beginner, intermediate and advanced. There are no concrete examples of this but lots of artificial models that, we are told, took a lot of time to develop just for me.
I don’t see the utility of this stuff but I will admit that I don’t have the training in AI and computational linguistics to fully understand this. So I defer final judgement.
On the other hand, if someone were to produce 10 4-minute podcasts of informal and formal spoken Russian with really accurate transcripts, a good anotated translation and above all an in-depth discussion of the linguistic features that learners should notice, I would be very grateful and willing to pay a good price.