low-frequency words that are unexpectedly frequent

General discussion about learning languages
StringerBell
Yellow Belt
Posts: 62
Joined: Mon Jul 23, 2018 3:30 am
Languages: English (n)
Italian: ~ intermediate
Polish : ~ lower intermediate
x 156

low-frequency words that are unexpectedly frequent

Postby StringerBell » Sat Aug 11, 2018 9:13 pm

When I started learning Italian and Polish, I decided to focus on high frequency words first, and focus later on less frequent words. There were a few words that I initially decided to ignore because I figured they were a little too "specific" or infrequent, but then I was surprised to see how often they reoccured in a variety of sources so I ended up learning them without even trying.

For Italian, one example is il campanile (the bell tower).
For some reason, this word kept coming up over and over...in the Veleno podcast serial, in a general conversation podcast between two people discussing a trip abroad where they climbed a bell tower, in a travel blog post, in a tv show...it seems like almost everything I listen to/watch manages to find a way to mention this word.

In Polish, one example is na brzegu (on the shore)
Somehow, everything I read and listen to always manages to involve being on the shore. I originally thought I would ignore this one for a while, but I've seen it so many times at this point that it's one of my best-remembered vocab words.

Which words have surprised you with their frequency?
1 x
Polish 1st goal: 1100 hours : 714 / 1100
Italian 1st goal: 730 hours : 730 / 730 COMPLETED! YAY!
Italian 2nd goal: read 150 articles/blog posts in 3 months : 26 / 100

User avatar
jeff_lindqvist
Brown Belt
Posts: 1437
Joined: Sun Aug 16, 2015 9:52 pm
Languages: sv, en
de, es
ga, eo
---
fi, yue, ro, tp, cy, kw, pt, sk
Language Log: viewtopic.php?f=15&t=2773
x 2717

Re: low-frequency words that are unexpectedly frequent

Postby jeff_lindqvist » Sat Aug 11, 2018 10:03 pm

Isn't this an example of the Baader-Meinhof phenomenon?

Dog-owners suddenly see other dog-owners everywhere, first-time parents see other parents pushing strollers, owners of red cars see an increased number of red cars on the road... (so they say - none of this applies to me).

One thing that has occured to me is that written sources sometimes seem to change over time. All of a sudden you see something in a book/CD booklet/movie that you haven't seen before, despite actively looking for it. Explanations which are suddenly there, new actors on cast lists/guest musicians on recordings appear (although you had it all memorized). When you're ready for the information. :?
10 x
Leabhair/Greannáin léite as Gaeilge: 9 / 18
Ar an seastán oíche: Oileán an Órchiste
Duolingo - finished trees: sp/ga/de/fr/pt/it
Finnish with extra pain : 100 / 100

Hashimi
Yellow Belt
Posts: 66
Joined: Sun Jan 10, 2016 12:45 pm
x 92

Re: low-frequency words that are unexpectedly frequent

Postby Hashimi » Sat Aug 11, 2018 11:15 pm

Actually, I'm surprised by the contrary. For example, common words like "headache", "boyfriend", "airport", "bathroom", "motorbike", "underground", "midnight", "classroom", "bedroom", "girlfriend", "birthday", "timetable", "weekend", "upstairs", "suitcase", "motorcycle", "homework", "businessman", "website", and even "forever" are not in the list of the most frequent 25,000 words in the British National Corpus and the Corpus of Contemporary American English!
3 x

User avatar
BalancingAct
Yellow Belt
Posts: 83
Joined: Thu Jan 12, 2017 6:37 am
Languages: Mandarin (N), Cantonese (N), English (prof.), French (Adv.), German (Adv. receptive; L. Int. active), Italian (Adv. receptive; Int. active), Spanish (on hold & rusty)
x 140

Re: low-frequency words that are unexpectedly frequent

Postby BalancingAct » Sun Aug 12, 2018 12:05 am

StringerBell wrote:For Italian, one example is il campanile (the bell tower).
Which words have surprised you with their frequency?


I frequently meet "campanello" (bell). In fact, I have just come across it in "un campanello d'allarme dell'aumento di un generalizzato disagio sociale" (alarm bell). Just the other day I saw it used as "door bell". I had previously thought that it would only mean the kind of bell in a church bell tower.
1 x

StringerBell
Yellow Belt
Posts: 62
Joined: Mon Jul 23, 2018 3:30 am
Languages: English (n)
Italian: ~ intermediate
Polish : ~ lower intermediate
x 156

Re: low-frequency words that are unexpectedly frequent

Postby StringerBell » Sun Aug 12, 2018 12:48 pm

jeff_lindqvist wrote:Isn't this an example of the Baader-Meinhof phenomenon?

Dog-owners suddenly see other dog-owners everywhere, first-time parents see other parents pushing strollers, owners of red cars see an increased number of red cars on the road... (so they say - none of this applies to me).

One thing that has occured to me is that written sources sometimes seem to change over time. All of a sudden you see something in a book/CD booklet/movie that you haven't seen before, despite actively looking for it. Explanations which are suddenly there, new actors on cast lists/guest musicians on recordings appear (although you had it all memorized). When you're ready for the information. :?


No, this isn't what I'm talking about. I do experience this phenomenon when I spend time learning a new expression or idiom, then it feels like I hear it everywhere. What I'm talking about here is different. This is when as a beginner or early intermediate learner I am confronted with a certain amount of high specific words that I choose to not spend time attempting to learn or really care about because I'm assuming that I won't come across them in the near future, yet a handful of them do consistently repeat to the point that I automatically remember them because I see them so frequently.
1 x
Polish 1st goal: 1100 hours : 714 / 1100
Italian 1st goal: 730 hours : 730 / 730 COMPLETED! YAY!
Italian 2nd goal: read 150 articles/blog posts in 3 months : 26 / 100

StringerBell
Yellow Belt
Posts: 62
Joined: Mon Jul 23, 2018 3:30 am
Languages: English (n)
Italian: ~ intermediate
Polish : ~ lower intermediate
x 156

Re: low-frequency words that are unexpectedly frequent

Postby StringerBell » Sun Aug 12, 2018 1:05 pm

Hashimi wrote:Actually, I'm surprised by the contrary. For example, common words like "headache", "boyfriend", "airport", "bathroom", "motorbike", "underground", "midnight", "classroom", "bedroom", "girlfriend", "birthday", "timetable", "weekend", "upstairs", "suitcase", "motorcycle", "homework", "businessman", "website", and even "forever" are not in the list of the most frequent 25,000 words in the British National Corpus and the Corpus of Contemporary American English!


I would consider most of those words to be high frequency. I don't use any "official" high frequency word lists, I basically make my own judgement call when I come across a word based on how likely I think I am to see it again soon or want to use it myself.

Because I've never used language learning books or programs and am 100% self-directed, I am constantly making the decision on what I consider to be essential vocabulary words and whether I want to learn/remember them. And since I listen to a massive amount of native (or "near" native) material very early on, the words that tend to repeat are true high frequency words.

The fact that these words you listed don't appear on "official" high frequency lists probably means those lists are garbage, because some of the very first things I learned to say and then subsequently use on a regular basis (and also repeatedly hear and read) are words like "headache", "bathroom", "bedroom", "suitcase", "website", and "boyfriend/girlfriend".

So I'm not really talking about words that aren't on a high frequency list but then appear a lot because they are actually useful everyday words, but words that seem like they are too highly specific and wouldn't tend to come up a lot but then actually do tend to resurface consistently. This would be different for every person, since it would be based on the individual input a person is using (someone else might only see "bell tower" or "on the shore" once or even never (early on) if they are watching, listening, and reading to different stuff than me.
0 x
Polish 1st goal: 1100 hours : 714 / 1100
Italian 1st goal: 730 hours : 730 / 730 COMPLETED! YAY!
Italian 2nd goal: read 150 articles/blog posts in 3 months : 26 / 100

User avatar
Adrianslont
Green Belt
Posts: 433
Joined: Sun Aug 16, 2015 10:39 am
Location: Australia
Languages: English (N), Learning Indonesian and French
x 726

Re: low-frequency words that are unexpectedly frequent

Postby Adrianslont » Sun Aug 12, 2018 2:14 pm

Hashimi wrote:Actually, I'm surprised by the contrary. For example, common words like "headache", "boyfriend", "airport", "bathroom", "motorbike", "underground", "midnight", "classroom", "bedroom", "girlfriend", "birthday", "timetable", "weekend", "upstairs", "suitcase", "motorcycle", "homework", "businessman", "website", and even "forever" are not in the list of the most frequent 25,000 words in the British National Corpus and the Corpus of Contemporary American English!
Yes, these words are high frequency. I know I used the Indonesian equivalents of 17/20 of them when I was in Indonesia recently - and most of those were used multiple times over three weeks. And I guess my Indonesian vocabulary is only a few thousand words - wild guess.

However, I’m not surprised that they don’t appear in the top 25,000 of those two copora because I know how the corpora are made - they include 80/90% written texts depending which one - including a large chunk of academic texts, newspapers etc.

And the spoken texts used are from tv.

If you want to see a list of the words that people use on a day to day basis you would need to find a corpus that was compiled from recordings made by following people around in their day to day activities. This, of course, doesn’t happen because it’s such a huge undertaking - you would need to follow how many people to make a 40 million word corpus? For how long? And then transcribe it all. And then there are ethical and permission issues about recording people.

It’s relatively - I stress, relatively - trivial to make a corpus from the written word. And actually very simple to make your own specialist corpus and frequency lists from written sources such as ebooks and web sites. Look for the Antconc software. And Ant’s other software.

I think there is some value in corpora and frequency lists for learners but they will always lack a large chunk of frequently used day to day vocabulary because of the reasons described above.
0 x

User avatar
devilyoudont
Yellow Belt
Posts: 60
Joined: Tue Jun 26, 2018 1:34 am
Location: Philadelphia
Languages: English (Native), Esperanto (high intermediate/advanced), Japanese (intermediate), Spanish (beginner)
Language Log: https://forum.language-learners.org/vie ... =15&t=8485
x 140

Re: low-frequency words that are unexpectedly frequent

Postby devilyoudont » Sun Aug 12, 2018 3:23 pm

I wonder if there will one day be like an Amazon Alexa corpus haha. Just what it hears people saying around the house.
3 x

User avatar
Adrianslont
Green Belt
Posts: 433
Joined: Sun Aug 16, 2015 10:39 am
Location: Australia
Languages: English (N), Learning Indonesian and French
x 726

Re: low-frequency words that are unexpectedly frequent

Postby Adrianslont » Sun Aug 12, 2018 5:52 pm

devilyoudont wrote:I wonder if there will one day be like an Amazon Alexa corpus haha. Just what it hears people saying around the house.

My cousin has Alexa. She mainly swears at it. :lol: I don’t think it understands her particular British accent. :roll:
1 x

Hashimi
Yellow Belt
Posts: 66
Joined: Sun Jan 10, 2016 12:45 pm
x 92

Re: low-frequency words that are unexpectedly frequent

Postby Hashimi » Sun Aug 12, 2018 6:26 pm

Hashimi wrote:Actually, I'm surprised by the contrary. For example, common words like "headache", "boyfriend", "airport", "bathroom", "motorbike", "underground", "midnight", "classroom", "bedroom", "girlfriend", "birthday", "timetable", "weekend", "upstairs", "suitcase", "motorcycle", "homework", "businessman", "website", and even "forever" are not in the list of the most frequent 25,000 words in the British National Corpus and the Corpus of Contemporary American English!


Now I understand why these common words are not on the list of the most frequent 25K words in the BNC-COCA. They are all considered as two-word words so they removed them from the list!
1 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 0 guests