The statistical distribution of language difficulty

General discussion about learning languages
User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: The statistical distribution of language difficulty

Postby ryanheise » Mon Aug 09, 2021 2:31 am

luke wrote:
s_allard wrote:the presence of proper nouns ... are key to understanding the content.

You made some distinctions between when proper nouns matter a lot and when they may matter less. I'm just using that bit of quote to note the comprehension is not "all" or "nothing".

Example: Discussion that mentions some specific plants grown in a garden (in a wider discussion - I don't mean in a podcast on gardening). Knowing that certain words are plants from a garden is a certain level of comprehension. Today I was at a market that had fresh vegetables. I noticed a couple roots that I'd heard and read before in the context of garden. Even in translation they didn't mean a lot to me, but knowing they are plants from a garden has been enough to understand pretty well what the speaker/author's point was. (Dad wasn't helping in the garden).

Do I have to know the plant is a root before I understand? Do I have to taste the root before I understand? Do I have to know how easy or hard to grow the root might be? Do I need to know if there is any lore about medicinal properties of these particular roots? Do I have to have years of experience eating these roots as a staple as a child to understand?

Having personal answers to all those questions may enrich the story, but aren't critical for getting the author/speaker's point.


I really like the way you presented this example, although I would just point out that you're talking about common nouns rather than proper nouns.

Thinking of a similar type of example for proper nouns, I recall that when I watch documentaries on TV in my own native language, English, about the life of some inspirational person who grew up in some foreign country, they tend to mention things like the name of the town where they were born, the names of their parents, the name of the school they went to, the names of people who were inspirational in their lives, the names of many other places where important events happened, etc. And despite not recognising the names of ANY of the people or places, I at least know the English language well enough to understand that that these proper nouns are labels for real world entities, and the longer I listen to the presenter talk about those people or places, the more information I am learning about those real world entities. This sort of learning, to me, is not language learning, but rather general knowledge learning, which is why I do not mind if it is left out of the language difficulty measure. We can always come up with examples where certain general knowledge is helpful for comprehension, but for my purposes I feel it is not necessary to model this to be useful.
1 x

s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: The statistical distribution of language difficulty

Postby s_allard » Mon Aug 09, 2021 5:40 am

Out of curiosity and since I keep getting all these abstract and theoretical answers to my questions or examples that take us off on wild-goose chases, I decided to have a look at some podcasts myself and I transcribed the first 1 minute 29 seconds of the following podcast dated Aug 7, 2021 from the PBS network in the USA :

https://podcasts.apple.com/us/podcast/from-carrots-to-sticks-why-vaccine-mandates-may-work/id78304589i=1000531340371
While new COVID cases surge, differing opinions over lockdowns, masks and vaccine mandates prevail. Yesterday, United Airlines announced that it will require all of its U.S. employees to be vaccinated as a condition of employment. I spoke with Juliette Kayyem, former Assistant Secretary at the Department of Homeland Security and professor at the Harvard Kennedy School of Government, who says that similar vaccine requirements should apply to airline passengers as well. Juliette, here we are at this point where there seems to be a pull between whether we have more mandates and more lockdowns or we can focus our energies on trying to get more people vaccinated. Why are you saying that maybe we should have a different tack ?
- About twenty per cent of Americans are wholly against vaccinations. These are what we would call the anti-vaxxers but that the remaining eighty per cent it’s actually quite complicated. They’re diverse, they have different reasons for why they are not moving forward. Some has to do with access, some has to do with science but a lot of it has to do with they simply feel like they can just wait and see, that the waiting is ok and I think what you’ve seen in the last two weeks in the United States is « we’re done waiting » and that we have to move from the carrots of luring people, talking to them about the science, you know, giving them extra pay, lottery systems to a system of sticks where there will be burdens. Privileges will be denied. (1.29)


From what I see, many podcasts have a scripted part where the journalist is speaking and an unscripted portion, as is the case here in the second paragraph, where the non-journalists are speaking spontaneously. In the case at hand, we see a rather typical example of unscripted spoken English, albeit from a professor at Harvard University.

When seen in written form this speech can seem somewhat disjointed and not always easy to understand or analyze.

I’ll skip the question of vocabulary size required for 98.08% coverage for full unassisted comprehension of this podcast. There are a bunch of proper nouns that are important. What do we do about them ? Ignore them? Treat them the way Paul Nation does ?

What I’m really curious to see how the grammatical difficulty of the professor’s speech is evaluated especially by studying the ‘size of a recursive grammatical structure’ Is it easy, medium or difficult ? Actually the grammar seems very simple to me.

But my main issue here is the the professor’s extended use of the idiom « carrots and sticks », two very common words. She says : I think what you’ve seen in the last two weeks in the United States is « we’re done waiting » and that we have to move from the carrots of luring people, talking to them about the science, you know, giving them extra pay, lottery systems to a system of sticks where there will be burdens.

Why is the professor talking about carrots and sticks ? and how do we incorporate this into our assessment of the difficulty of understanding this recording ? Do we just ignore the whole issue and treat these words like any others? So in the end this is quite an easy text to understand for learners of English.

Edit 1: Slight modification of the transcript and insertion of a question mark
Last edited by s_allard on Mon Aug 09, 2021 10:38 am, edited 1 time in total.
0 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: The statistical distribution of language difficulty

Postby ryanheise » Mon Aug 09, 2021 9:33 am

A grammar is recursive when it is possible to embed an expression of some type within another expression of the same type. An example is relative clauses where it is possible to embed a relative clause within another relative clause within another relative clause and so on. See What's special about human language? The contents of the "narrow language faculty" revisited which discusses the hypothesis that it is the human brain's ability to deal with recursion in language that sets it apart from say the chimpanzees's brain. So while a chimpanzees can learn a form of sign language, it will have a much simpler grammar without recursion. The chimpanzee will be able to process sentences like "I want banana", but not "The chimp that the chimp I saw yesterday saw saw me steal their banana." And to be honest, even a human would have trouble with that sentence because it is deeply recursive. You can try to draw a tree diagram of all of the dependencies in that sentence, and basically measure how many branches it has, and how deep the branches go (branches from branches from branches). Of course recursion is only one grammatical feature. Building an effective system is about choosing among different features, keeping the ones that are effective predictors and throwing out the ones that either aren't effective or are too costly.

There are techniques that can be applied to idioms (frequency dictionaries, BERT models), but the bigger picture is that a perfect system is not realistic for a one person who can only put in a finite amount of effort, and without Google's resources. My goal is only to build an approximate system that trades off accuracy for feasibility, and I hope that the less accurate system that's "possible" today is more useful than the accurate system that's "impossible" today but possible if you wait 20 years.
1 x

s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: The statistical distribution of language difficulty

Postby s_allard » Mon Aug 09, 2021 11:24 am

ryanheise wrote:A grammar is recursive when it is possible to embed an expression of some type within another expression of the same type. An example is relative clauses where it is possible to embed a relative clause within another relative clause within another relative clause and so on. See What's special about human language? The contents of the "narrow language faculty" revisited which discusses the hypothesis that it is the human brain's ability to deal with recursion in language that sets it apart from say the chimpanzees's brain.


There are techniques that can be applied to idioms (frequency dictionaries, BERT models), but the bigger picture is that a perfect system is not realistic for a one person who can only put in a finite amount of effort, and without Google's resources. My goal is only to build an approximate system that trades off accuracy for feasibility, and I hope that the less accurate system that's "possible" today is more useful than the accurate system that's "impossible" today but possible if you wait 20 years.


This is an excellent example of why I wrote “ I keep getting all these abstract and theoretical answers to my questions or examples that take us off on wild-goose chases”. How does this explanation help us evaluate the difficulty of understanding the podcast I presented? Is this podcast of a B2 or C1 or C2 level? Honestly, I don’t know. What I do know is that the grammar of the Harvard professor’s speaking requires the listener to reconstruct a more understandable standard grammar. In fact, I had some difficulty transcribing the speech. For example, it was difficult to determine sentence length. Here intonation and accent of the voice are important. Native speakers can deal with all this of course. I wonder what learners perceive.
1 x

User avatar
ryanheise
Green Belt
Posts: 459
Joined: Tue Jun 04, 2019 3:13 pm
Location: Australia
Languages: English (N), Japanese (beginner)
x 1681
Contact:

Re: The statistical distribution of language difficulty

Postby ryanheise » Tue Aug 10, 2021 8:25 am

s_allard wrote:
ryanheise wrote:There are techniques that can be applied to idioms (frequency dictionaries, BERT models), but the bigger picture is that a perfect system is not realistic for a one person who can only put in a finite amount of effort, and without Google's resources. My goal is only to build an approximate system that trades off accuracy for feasibility, and I hope that the less accurate system that's "possible" today is more useful than the accurate system that's "impossible" today but possible if you wait 20 years.


This is an excellent example of why I wrote “ I keep getting all these abstract and theoretical answers to my questions or examples that take us off on wild-goose chases”. How does this explanation help us evaluate the difficulty of understanding the podcast I presented? Is this podcast of a B2 or C1 or C2 level?


Why are you still asking me to do this difficult calculation for you? Can you really wait 20 years? I think you've overstepped your welcome, if that is the case.

I just told you that this is a huge amount of effort. Techniques exist to do what you're trying to do, yes, and I even gave you their names in case you are interested in pursuing them yourself, but for reasons explained, I am only one person with limited time and energy and so I only have the capacity to do the simpler, approximate calculations that provide immediate benefit. Doing the more accurate calculations are beyond my time and energy capacity. It is unfair of you to expect me to spend what time I have helping you with a calculation that is A) difficult to develop and finetune, and B) even more difficult to explain, C) even more difficult to explain to someone who is allergic to the abstract and theoretical, and D) not actually aligned with my own goals. Even if I did the 20 years of development and fine tuning, and I wrote a 20 page research paper explaining how it works on concrete examples, you would still not understand it without the appropriate theoretical knowledge. It is fine for you to have these goals, though, and that is why I at least tried to point you in the right direction in the above comment by giving you the names of techniques you could look into. That is really something you could have been grateful for, instead of being dissatisfied that I didn't meet your unrealistic expectations.

Also appreciate that the first time you asked me to explain a calculation, and a relatively simple one at that, I invested 1 hour of my time to design an example that was able to illustrate the essential features of the calculation. You then dismissed it out of hand, and said you don't want something that abstract. So I then invested another 3 hours of my time crafting another example and explanation to your new requirements. At the end of it, you said thanks but no thanks, and then gave up trying to understand it. And now you want me to go through that again, but this time not on the way my current calculation works, but you want me to do the difficult work of adding a new calculation that I don't currently do, and then explain that to you. To top it off, when I tell you this is difficult stuff, it's not feasible, I'm only one man, it will take 20 years, and "only" give you some pointers which you can take or leave, you pin this up as an "excellent example" of being unhelpful. It is easy to be dissatisfied when you don't get what you want, but you'd be better to respect the time of others.

I need to get back to what's practical in the here and now, so I will need to leave this thread for now. I will post an update after I have something like the finished list to share, or a corpus for people to download. Please save any further questions until after that point, as then it will be more interesting to see how a lot of these theories actually work in practice, and I also think it will answer a lot of these unresolved questions about how important each of these various types of calculations are. That will also be the easiest time to consider adding new calculations to improve any inadequate results in the ranking. Which factor is the biggest issue? Proper nouns? Idioms? Grammar? Multiple definitions for the same word? We will find out after seeing all of this theory put into action.
1 x

s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: The statistical distribution of language difficulty

Postby s_allard » Wed Aug 11, 2021 1:16 pm

ryanheise wrote:
s_allard wrote:
ryanheise wrote:There are techniques that can be applied to idioms (frequency dictionaries, BERT models), but the bigger picture is that a perfect system is not realistic for a one person who can only put in a finite amount of effort, and without Google's resources. My goal is only to build an approximate system that trades off accuracy for feasibility, and I hope that the less accurate system that's "possible" today is more useful than the accurate system that's "impossible" today but possible if you wait 20 years.


This is an excellent example of why I wrote “ I keep getting all these abstract and theoretical answers to my questions or examples that take us off on wild-goose chases”. How does this explanation help us evaluate the difficulty of understanding the podcast I presented? Is this podcast of a B2 or C1 or C2 level?


Why are you still asking me to do this difficult calculation for you? Can you really wait 20 years? I think you've overstepped your welcome, if that is the case.

I just told you that this is a huge amount of effort. Techniques exist to do what you're trying to do, yes, and I even gave you their names in case you are interested in pursuing them yourself, but for reasons explained, I am only one person with limited time and energy and so I only have the capacity to do the simpler, approximate calculations that provide immediate benefit. Doing the more accurate calculations are beyond my time and energy capacity. It is unfair of you to expect me to spend what time I have helping you with a calculation that is A) difficult to develop and finetune, and B) even more difficult to explain, C) even more difficult to explain to someone who is allergic to the abstract and theoretical, and D) not actually aligned with my own goals. Even if I did the 20 years of development and fine tuning, and I wrote a 20 page research paper explaining how it works on concrete examples, you would still not understand it without the appropriate theoretical knowledge. It is fine for you to have these goals, though, and that is why I at least tried to point you in the right direction in the above comment by giving you the names of techniques you could look into. That is really something you could have been grateful for, instead of being dissatisfied that I didn't meet your unrealistic expectations.

Also appreciate that the first time you asked me to explain a calculation, and a relatively simple one at that, I invested 1 hour of my time to design an example that was able to illustrate the essential features of the calculation. You then dismissed it out of hand, and said you don't want something that abstract. So I then invested another 3 hours of my time crafting another example and explanation to your new requirements. At the end of it, you said thanks but no thanks, and then gave up trying to understand it. And now you want me to go through that again, but this time not on the way my current calculation works, but you want me to do the difficult work of adding a new calculation that I don't currently do, and then explain that to you. To top it off, when I tell you this is difficult stuff, it's not feasible, I'm only one man, it will take 20 years, and "only" give you some pointers which you can take or leave, you pin this up as an "excellent example" of being unhelpful. It is easy to be dissatisfied when you don't get what you want, but you'd be better to respect the time of others.



Let me first apologize if my request requires a new calculation and 20 years of work. Being in a university environment myself, I know what goes into the practice of science. I’m also familiar with the process of peer review whereby other scientists look at my proposed articles and decide if they meet current standards and contribute to the advancement of the field.

So, please, don’t bother with my sample podcast. Don’t do anything. I can take care of it myself after pursuing some the leads or ideas that you suggested. I actually find the whole thing very enlightening and intriguing.

My starting point is a simple question that everybody here knows well : why are certain forms of the spoken target language easier or more difficult for us to understand ? This is extremely important because understanding is necessary for speaking.

Last night I was listening to a speech by a Mexican government official. I had the impression I understood every single word. I also felt that as if I were in this official’s shoes and speaking just like that because I was familiar with the subject and, very importantly, familiar with this style of formal Spanish.

On the other hand, I mentioned in an earlier post how I struggled with the recording of a lively discussion between three Mexican journalists because of a combination of difficulty deciphering exactly what was said, a series of proper nouns unknown to me and many idioms from informal Mexican Spanish.

In my exploration of the issues and thanks to recommendations by the OP I came across the following quote from a publication by IBM on natural language processing :

What makes speech recognition especially challenging is the way people talk—quickly, slurring words together, with varying emphasis and intonation, in different accents, and often using incorrect grammar.
https://www.ibm.com/cloud/learn/natural ... processing

That last part about « often using incorrect grammar » really caught my eye. Does this have major implications for us language learners ?

One idea that comes to mind is that difficulty of understanding for language learners lies not in the text or spoken document itself but in our level of knowledge of the language and the subject. In other words, there is no such thing as beginner, intermediate or advanced native speech. It’s the learner who is beginner, intermediate or advanced.

Returning to the example of a podcast of which I transcribed the first 1.5 minutes, it doesn’t take 20 years of calculations or complex analyses of recursive grammatical structures to see that the difficulty is in the mind of the listener. I don’t see anything difficult in this example. On the other hand, if you are not a native educated speaker of English there are probably a couple of things including some « incorrect grammar » and the use of the carrot and stick metaphor that must be properly decoded.

So when I saw a title like The statistical distribution of language difficulty, my curiosity was piqued. And a study of 40,000 podcasts which must include a lot of unscripted speech. I’m even more interested to see what insights this can bring to the language learning community.

It seems that the major insight is that natural speech samples fall into three buckets, beginner, intermediate and advanced. There are no concrete examples of this but lots of artificial models that, we are told, took a lot of time to develop just for me.

I don’t see the utility of this stuff but I will admit that I don’t have the training in AI and computational linguistics to fully understand this. So I defer final judgement.

On the other hand, if someone were to produce 10 4-minute podcasts of informal and formal spoken Russian with really accurate transcripts, a good anotated translation and above all an in-depth discussion of the linguistic features that learners should notice, I would be very grateful and willing to pay a good price.
0 x

User avatar
luke
Brown Belt
Posts: 1243
Joined: Fri Aug 07, 2015 9:09 pm
Languages: English (N). Spanish (intermediate), Esperanto (B1), French (intermediate but rusting)
Language Log: https://forum.language-learners.org/vie ... 15&t=16948
x 3631

Re: The statistical distribution of language difficulty

Postby luke » Wed Aug 11, 2021 1:59 pm

s_allard wrote:One idea that comes to mind is that difficulty of understanding for language learners lies not in the text or spoken document itself but in our level of knowledge of the language and the subject. In other words, there is no such thing as beginner, intermediate or advanced native speech. It’s the learner who is beginner, intermediate or advanced.

You introduced this passage with, "One idea that comes to mind". In that, you're admitting - in the sense of being open to - that things are complicated. Nonetheless, it's comforting to find simple - in the sense of fundamental - truths or answers.

Then things started getting split into buckets. "Native" vs "learner", and "beginner, intermediate, or advanced".

Although "native" and "learner" are fairly clear cut, "beginner, intermediate, or advanced" is really more of a continuum.

One challenge that might be faced by an academic is the need to categorize things for efficient communication or insight. That can be done with the three words you used, or CEFR exams, etc. But labels - which may be concrete in the real world, "did you pass the B2 exam?" and have a binary answer, yes or no - are sometimes fuzzy.

In your post, you asked the question, "why are certain forms of the spoken target language easier or more difficult for us to understand ?"

My first thought was, it would be very helpful for everyone if you gave what you think is a complete answer.

Then you introduced the notion I quoted at the top. I couldn't help but think of the earliest stuff I remember reading. "See Dick run. Run Dick run."

My mind couldn't help but contrast that with something I read much later and found difficult, "After an unequivocal experience of the inefficacy of the subsisting Federal Government, you are called upon to deliberate on a new Constitution for the United States of America.
The subject speaks its own importance; comprehending in its consequences, nothing less than the existence of the UNION, the safety and welfare of the parts of which it is composed, the fate of an empire, in many respects, the most interesting the world. " That's the opening line of Federalist Paper #1, BTW. Two sentences each for the two examples :)

Back to the humble exercise I suggested, which is, provide a concrete answer to your, "why are certain forms of the spoken target language easier or more difficult for us to understand ?" (and I don't mean answering why the two 2 sentence examples I used are different, but your personal answer to your question).

I think if you do write an answer, it will help you and everyone gain a greater appreciation for the challenge of coming up with a "statistical distribution of language difficulty".
1 x

s_allard
Blue Belt
Posts: 969
Joined: Sat Jul 25, 2015 3:01 pm
Location: Canada
Languages: French (N), English (N), Spanish (C2 Cert.), German (B2 Cert)
x 2305

Re: The statistical distribution of language difficulty

Postby s_allard » Wed Aug 11, 2021 2:40 pm

luke wrote:
s_allard wrote:One idea that comes to mind is that difficulty of understanding for language learners lies not in the text or spoken document itself but in our level of knowledge of the language and the subject. In other words, there is no such thing as beginner, intermediate or advanced native speech. It’s the learner who is beginner, intermediate or advanced.

You introduced this passage with, "One idea that comes to mind". In that, you're admitting - in the sense of being open to - that things are complicated. Nonetheless, it's comforting to find simple - in the sense of fundamental - truths or answers.

Then things started getting split into buckets. "Native" vs "learner", and "beginner, intermediate, or advanced".

Although "native" and "learner" are fairly clear cut, "beginner, intermediate, or advanced" is really more of a continuum.

One challenge that might be faced by an academic is the need to categorize things for efficient communication or insight. That can be done with the three words you used, or CEFR exams, etc. But labels - which may be concrete in the real world, "did you pass the B2 exam?" and have a binary answer, yes or no - are sometimes fuzzy.

In your post, you asked the question, "why are certain forms of the spoken target language easier or more difficult for us to understand ?"

My first thought was, it would be very helpful for everyone if you gave what you think is a complete answer.

Then you introduced the notion I quoted at the top. I couldn't help but think of the earliest stuff I remember reading. "See Dick run. Run Dick run."

My mind couldn't help but contrast that with something I read much later and found difficult, "After an unequivocal experience of the inefficacy of the subsisting Federal Government, you are called upon to deliberate on a new Constitution for the United States of America.
The subject speaks its own importance; comprehending in its consequences, nothing less than the existence of the UNION, the safety and welfare of the parts of which it is composed, the fate of an empire, in many respects, the most interesting the world. " That's the opening line of Federalist Paper #1, BTW. Two sentences each for the two examples :)

Back to the humble exercise I suggested, which is, provide a concrete answer to your, "why are certain forms of the spoken target language easier or more difficult for us to understand ?"

I think if you do write an answer, it will help you and everyone gain a greater appreciation for the challenge of coming up with a "statistical distribution of language difficulty".

Just a few posts back I gave an example of a podcast for which I asked the OP to determine the level of difficulty. I was told that the calculation would take many hours. Let me quote it again :

While new COVID cases surge, differing opinions over lockdowns, masks and vaccine mandates prevail. Yesterday, United Airlines announced that it will require all of its U.S. employees to be vaccinated as a condition of employment. I spoke with Juliette Kayyem, former Assistant Secretary at the Department of Homeland Security and professor at the Harvard Kennedy School of Government, who says that similar vaccine requirements should apply to airline passengers as well. Juliette, here we are at this point where there seems to be a pull between whether we have more mandates and more lockdowns or we can focus our energies on trying to get more people vaccinated. Why are you saying that maybe we should have a different tack ?
- About twenty per cent of Americans are wholly against vaccinations. These are what we would call the anti-vaxxers but that the remaining eighty per cent it’s actually quite complicated. They’re diverse, they have different reasons for why they are not moving forward. Some has to do with access, some has to do with science but a lot of it has to do with they simply feel like they can just wait and see, that the waiting is ok and I think what you’ve seen in the last two weeks in the United States is « we’re done waiting » and that we have to move from the carrots of luring people, talking to them about the science, you know, giving them extra pay, lottery systems to a system of sticks where there will be burdens. Privileges will be denied. (1.29)


I suggest people listen to the recording. Is this spoken language beginner, intermediate or advanced ? To my ears it’s very easy to understand. For one thing, I’m familiar with all the proper nouns and with the idioms and metaphors. But it may be difficult for somebody else. Obviously, I haven’t analyzed 40,000 podcasts but I tend to think that many sort of sound like this. The difficulty lies in the beholder. I'm not familiar with the language of the Federalist papers or the American constitution, therefore I will find that kind of written language difficult.

But let me give an example of a podcast that I simply could hardly decipher much less understand. This a group of young British female podcasters carrying on.

https://www.bbc.co.uk/sounds/play/p09rhbdy

Why is this so difficult for me ? First of all, the accent. I just can’t make out much of the language being spoken. But that’s me. And the podcast is full of British young people slang. So, is the podcast difficult per se or is it just me ?
1 x

User avatar
Le Baron
Black Belt - 3rd Dan
Posts: 3513
Joined: Mon Jan 18, 2021 5:14 pm
Location: Koude kikkerland
Languages: English (N), fr, nl, de, eo, Sranantongo,
Maintaining: es, swahili.
Language Log: https://forum.language-learners.org/vie ... 15&t=18796
x 9393

Re: The statistical distribution of language difficulty

Postby Le Baron » Wed Aug 11, 2021 3:39 pm

s_allard wrote:But let me give an example of a podcast that I simply could hardly decipher much less understand. This a group of young British female podcasters carrying on.

https://www.bbc.co.uk/sounds/play/p09rhbdy

Why is this so difficult for me ? First of all, the accent. I just can’t make out much of the language being spoken. But that’s me. And the podcast is full of British young people slang. So, is the podcast difficult per se or is it just me ?


:lol: I see why you say this. The first 10 seconds was an adjustment (since it's not from my region), but after that it was plain sailing for me. To be fair it (and I don't know your background so this is guesswork) it might just be something you're not familiar with (black cultural issues, girl talk...London youth talk..."mad cheeky innit?" :D ) But the accent is the least of this podcast's problems for someone who might be learning English because there are a lot of cut-off sentences due to it being a group talking.

The language use itself is very simplistic though. This is not to say: 'therefore everyone should understand it', because, as I think you're saying, there are many more obstacles to listening than numbers of words.

In one way it soothes me because when I now go to listen to Spanish podcasts (or some French or German) and end up feeling like some others might feel listening to that above, I'll know that it isn't just my lack of knowledge, but rather that very informal language in some registers is simply hard to get used to.
2 x

User avatar
luke
Brown Belt
Posts: 1243
Joined: Fri Aug 07, 2015 9:09 pm
Languages: English (N). Spanish (intermediate), Esperanto (B1), French (intermediate but rusting)
Language Log: https://forum.language-learners.org/vie ... 15&t=16948
x 3631

Re: The statistical distribution of language difficulty

Postby luke » Wed Aug 11, 2021 7:09 pm

luke wrote:Back to the humble exercise, which is, provide a concrete answer to your, "why are certain forms of the spoken target language easier or more difficult for us to understand ?" (your personal answer to your question).

s_allard wrote:https://www.bbc.co.uk/sounds/play/p09rhbdy

Why is this so difficult for me ? First of all, the accent. I just can’t make out much of the language being spoken. And the podcast is full of British young people slang. So, is the podcast difficult per se or is it just me ?

I get why ryanheise has better things to do than answer your questions ;)

That may be a good example, but I was hoping for a more complete answer.

That's why I like analogies.

s_allard wrote:we tend to get trapped in the idea that a language is a collection of words and that learning a language is tantamount to memorizing a bunch of words. So if we learn x number of words a day with y repetitions in an SRS app, we can count the number of months to learn the 6000 most common words and become « fluent » in the target language.

luke wrote:seems similar to the mother of the bride criticizing her daughter for wanting a good wedding cake. And then the mother saying in a huff, "you think a good marriage is all about wedding cake and as long as the wedding cake is good, the marriage will be successful, we'll, you're wrong". And I'm thinking the bride is not so delusional but does want a good cake.

s_allard wrote:I have to say that I didn’t really understand the paragraph containing the analogy with the bride and the wedding cake. Isn’t this is an excellent example of exactly what we are talking about? I think I know 100% of the words in the paragraph but I don’t comprehend or understand how it relates to what I said about this focus on learning a specific number of words.

mother-of-the-bride: s_allard :)
marriage: learning a language
wedding cake: memorizing a bunch of words
as long as the wedding cake is good: So if we learn x number of words a day with y repetitions in an SRS app, we can count the number of months to learn
good marriage: « fluent » in the target language
saying in a huff, "well, you're wrong": left as an exercise for the reader

If your husband or wife says, "lets have a nice cake for our kid's birthday", they're not saying, "we can make up for all of our parental blunders with the right cake". :lol:

It seems like you're thinking the title of the thread is "The difficulty of creating a statistical distribution of language difficulty. :) The difficulty is a given.

Everyone knows a good marriage or raising a healthy child is more than the right cake. We're just trading recipes and trying to make more delicious cakes.
2 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: tiia and 2 guests