Native fluency: only 1.5 MB of information needed?

General discussion about learning languages
User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Native fluency: only 1.5 MB of information needed?

Postby tommus » Mon Apr 01, 2019 12:09 pm

Kids store 1.5 megabytes of information to master their native language

And the link at the bottom of that page to Humans store about 1.5 megabytes of information during language acquisition

It sounds so easy. Only 1.5 MB. That can't be so much that an adult couldn't do it quickly?
2 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

Wurstmann
Yellow Belt
Posts: 50
Joined: Sun Jul 26, 2015 12:32 pm
Location: Germany
Languages: German (N), Mandarin (intermediate?), Spanish (beginner)
x 37

Re: Native fluency: only 1.5 MB of information needed?

Postby Wurstmann » Mon Apr 01, 2019 12:26 pm

If it were only used for vocabulary, 1.5 MB would be around 349525 words. So still a lot of information xD
3 x

DaveAgain
Black Belt - 1st Dan
Posts: 1988
Joined: Mon Aug 27, 2018 11:26 am
Languages: English (native), French & German (learning).
Language Log: https://forum.language-learners.org/vie ... &start=200
x 4079

Re: Native fluency: only 1.5 MB of information needed?

Postby DaveAgain » Mon Apr 01, 2019 1:18 pm

tommus wrote:Kids store 1.5 megabytes of information to master their native language

And the link at the bottom of that page to Humans store about 1.5 megabytes of information during language acquisition

It sounds so easy. Only 1.5 MB. That can't be so much that an adult couldn't do it quickly?
One question that comes out this is where do we store it all?

Some people have perfect recall, so the information must be there somewhere.

PS
(A lecture about how babies learn language suggested statistical analysis of everything they heard was the process.)
1 x

User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Re: Native fluency: only 1.5 MB of information needed?

Postby tommus » Mon Apr 01, 2019 1:27 pm

Wurstmann wrote:If it were only used for vocabulary, 1.5 MB would be around 349525 words. So still a lot of information xD

I would estimate 180,000 words. Still a lot.

The 3000 most common English words take about 25 KB to store. And 1500/25 = 60. Then 3000 * 60 = 180,000

3000 most common English words
0 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

Wurstmann
Yellow Belt
Posts: 50
Joined: Sun Jul 26, 2015 12:32 pm
Location: Germany
Languages: German (N), Mandarin (intermediate?), Spanish (beginner)
x 37

Re: Native fluency: only 1.5 MB of information needed?

Postby Wurstmann » Mon Apr 01, 2019 2:55 pm

tommus wrote:
Wurstmann wrote:If it were only used for vocabulary, 1.5 MB would be around 349525 words. So still a lot of information xD

I would estimate 180,000 words. Still a lot.

The 3000 most common English words take about 25 KB to store. And 1500/25 = 60. Then 3000 * 60 = 180,000

3000 most common English words


I was going by this. It says the average length of a English word is 4.5 letters.
0 x

User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Re: Native fluency: only 1.5 MB of information needed?

Postby tommus » Mon Apr 01, 2019 3:28 pm

Wurstmann wrote:I was going by this. It says the average length of a English word is 4.5 letters.

Interesting.

For the 3000 words (above), they take 25,222 bytes which means 8.4 bytes per word on average. But in regular text, a lot of small words show up very often, words like a, I, in, at, on, the, and, etc. So in regular text, the average word length would probably be smaller. So I took some plain text articles from today's BBC news, and put them together in a plain text file. In that sample, the average word length was 6.1 bytes. So perhaps the 4.5 letters per word would be for simple regular text, not word lists where all the common short words occur only once.
0 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

User avatar
Deinonysus
Brown Belt
Posts: 1222
Joined: Tue Sep 13, 2016 6:06 pm
Location: MA, USA
Languages:  
• Native: English
• Advanced: French
• Intermediate: German,
   Spanish, Hebrew
• Beginner: Italian,
   Arabic
x 4635

Re: Native fluency: only 1.5 MB of information needed?

Postby Deinonysus » Mon Apr 01, 2019 3:41 pm

Proof that English is easier: English has no diacritics so your brain can store the vocabulary as ASCII text, only one byte per letter. Other languages need Unicode encoding which takes up more neurons.
13 x
/daɪ.nə.ˈnaɪ.səs/

User avatar
tommus
Blue Belt
Posts: 957
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2)
x 1937

Re: Native fluency: only 1.5 MB of information needed?

Postby tommus » Mon Apr 01, 2019 4:14 pm

Deinonysus wrote:Proof that English is easier.

Indeed. Some other languages like Dutch and German where long, compounded words are common, average word lengths must be considerably longer. Such as:

Dutch
Hottentottensoldatententententoonstellingsbouwterrein
meaning: "construction ground for the Hottentot soldiers' tents exhibition"

German
Bundespräsidentenstichwahlwiederholungsverschiebung
meaning: "deferral of the second iteration of the federal presidential run-off election"

Afrikaans
Tweedehandsemotorverkoopsmannevakbondstakingsvergaderingsameroeperstoespraakskrywerspersverklaringuitreikingsmediakonferensieaankondiging
meaning: "issuable media conference's announcement at a press release regarding the convener's speech at a secondhand car dealership union's strike meeting"

And of course, these are very common words that often occur in normal conversation!

One would tend to think that these long words don't put an extra burden on learning the language because they are made up of a bunch of short, simple words. That works to some extent in reading and listening. However, in writing or speaking, it is not so easy to know or remember which words to put together and in which order. So this just increases the difficulty for active usage versus passive usage. I often wonder about native speakers using such long compound words. Do they even think about them being run together, or do they just say them as an English native would use a series of separate words? I think it depends on the mostoftenusedenglishcommonwordcombinations.

Check out the Wikipedia longest words by language.
0 x
Dutch: 01 September -> 31 December 2020
Watch 1000 Dutch TV Series Videos : 40 / 1000

Cainntear
Black Belt - 3rd Dan
Posts: 3527
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 8794
Contact:

Re: Native fluency: only 1.5 MB of information needed?

Postby Cainntear » Mon Apr 01, 2019 4:58 pm

Deinonysus wrote:Proof that English is easier: English has no diacritics so your brain can store the vocabulary as ASCII text, only one byte per letter. Other languages need Unicode encoding which takes up more neurons.

Only if you use boring old 7-bit ASCII. 8-bit ASCII with a Latin Extended charades did most of Europe well enough in the 90s.

Now excuse me... I’m off to download my languages onto 3.5” floppy disks...
6 x

User avatar
zenmonkey
Black Belt - 2nd Dan
Posts: 2528
Joined: Sun Jul 26, 2015 7:21 pm
Location: California, Germany and France
Languages: Spanish, English, French trilingual - German (B2/C1) on/off study: Persian, Hebrew, Tibetan, Setswana.
Some knowledge of Italian, Portuguese, Ladino, Yiddish ...
Want to tackle Tzotzil, Nahuatl
Language Log: viewtopic.php?f=15&t=859
x 7032
Contact:

Re: Native fluency: only 1.5 MB of information needed?

Postby zenmonkey » Tue Apr 02, 2019 1:09 am

The brain is not computer, nor does it use bits. It’s not surprising that the article referenced the 1958 von Neumann “Computer and The Brain” for its estimation.

All these discussions about how many bits a word takes ... we process language as sounds, not as text.
10 x
I am a leaf on the wind, watch how I soar


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests