Language chunks to ease language activation

Ask specific questions about your target languages. Beginner questions welcome!
User avatar
tommus
Blue Belt
Posts: 511
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1)
x 785

Re: Language chunks to ease language activation

Postby tommus » Sat Apr 01, 2017 2:19 pm

Cainntear wrote:The reason we talk about chunking is exactly that limit of 7 things at a time.

I ran my "chunking" program for 7-word chunks to see what that would look like. It was on the 2 million words of Dutch news. There was a surprising number of 7-word chunks occurring 4 or more times. The news included the weather forecast which tends to have repetitive phrases and the weather did dominate.

I relaxed the word length from 4 letters down to 2 letters to allow for connecting words, etc. In some languages, it might be better to go down to 1-letter words to include for example, "I" in English and "U" in Dutch which would occur in useful chunks.

Amongst the 7-letter chunks, there are very obvious recurring patterns which would be useful for language learners to note and remember. Many of these should not pose much problem to memorize, and would probably be a lot more useful than individual words. Here is a relatively long list of the 60 7-word expressions, which I think is useful to see the repetition and the patterns.

14 het is niet de eerste keer dat
11 in een groot deel van het land
9 in de loop van de nacht wordt
8 de loop van de nacht wordt het
8 het is niet voor het eerst dat
8 is het op de meeste plaatsen droog
7 in de loop van de dag komt
7 morgen in de loop van de dag
7 om een einde te maken aan de
7 op de meeste plaatsen blijft het droog
7 vooral in het zuiden van het land
6 de loop van de dag neemt de
6 de straat op om te demonstreren tegen
6 het heeft te maken met een lagedrukgebied
6 het koelt af naar een graad of
6 het vertrek van de britten uit de
6 het ziet er niet naar uit dat
6 in de loop van de dag neemt
6 in de nacht van zaterdag op zondag
6 in het oosten van het land is
6 in het zuiden van europa is het
6 loop van de dag neemt de bewolking
6 vooral in het noorden van het land
6 we hebben te maken met een hogedrukgebied
6 we hebben te maken met een lagedrukgebied
5 aan de andere kant van de grens
5 de loop van de dag wordt het
5 de politie houdt er rekening mee dat
5 het is nog maar de vraag of
5 het is voor het eerst dat een
5 het koelt af tot een graad of
5 het noorden en oosten van het land
5 ik denk dat het goed is om
5 in de loop van de dag gaat
5 in de loop van de dag wordt
5 in de nacht van donderdag op vrijdag
5 in de rest van het land is
5 maar in de loop van de dag
5 maar in de loop van de nacht
5 van de dag neemt de bewolking toe
5 van de hoofdverdachten van de aanslagen in
4 dat er niets aan de hand is
4 dat is goed te zien op de
4 de loop van de dag komt er
4 de oorzaak van het ongeluk is nog
4 een aanslag in het centrum van de
4 een hogedrukgebied bij ons in de buurt
4 ik denk dat het belangrijk is dat
4 in de eerste helft van dit jaar
4 in de loop van de middag gaat
4 in de loop van de nacht gaat
4 in de tweede helft van de nacht
4 in de westelijke helft van het land
4 is niet de eerste keer dat er
4 je moet er niet aan denken dat
4 niet van plan om op te stappen
4 te geloven dat het niet anders kon
4 wil dat er een einde komt aan
4 zegt er alles aan te doen om
4 aan de andere kant van de wereld
3 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 155 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 620 / 730
: 75 / 104
: 620 / 730

User avatar
tommus
Blue Belt
Posts: 511
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1)
x 785

Re: Language chunks to ease language activation

Postby tommus » Sat Apr 01, 2017 2:31 pm

jeffers wrote:The only problem with your solution is restricting it two words of at least 3 letters (evidently to cut down processing time). I imagine a high proportion of interesting word groups contain words of one and two letters .

I agree. I changed it to 2-letters for the 7-word chunks that I just tried. I can easily do it for 1-letter words.

When I started building this program, I envisioned taking each chunk (say 3 words, numbers 1, 2, 3), starting at the beginning and searching every 3-word set (incrementing by 1-word each time) for the rest of the text. Then choosing words 2,3,4 and repeating the full search. However, that is not required. I simply look through the entire text only once, keeping track of the number of times the 3-words occur in succession. So the processing time is only a few seconds, even for 2 million words.
1 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 155 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 620 / 730
: 75 / 104
: 620 / 730

User avatar
reineke
Black Belt - 2nd Dan
Posts: 2221
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Engrish
x 3606

Re: Language chunks to ease language activation

Postby reineke » Sat Apr 01, 2017 3:43 pm

I think this bears repeating:

Chunking Language
"What: Miller (1956) introduced the concept of “chunking” in his paper entitled The magical number seven, plus or minus two. Chunking refers to a strategy for making more efficient use of short-term memory by breaking down large amounts of information into smaller chunks. Chase and Simon (1973) suggested that the capacity of short-term (working) memory is limited to seven items, or chunks,
hence the formula 7 ± 2."

"Even though it is believed that short-term memory is limited to seven items only, the notion of vocabulary items or chunk varies. Chunking can mean both the breaking down of large amounts of information as well as grouping small chunks into larger categories. It does not necessarily mean that our mind can process only seven words at a time. A chunk can represent seven sentences, seven verses, or seven lines."

"Why: The ability to break large language chunks into smaller ones, and to group small chunks into larger ones extends the process of retention of information and allows for greater compression of information in working memory (Kalivoda, 1981). Such compression enhances the limited capacity of working memory and allows the learner to retain more information."

"How: Arranging vocabulary into semantic clusters of seven to ten related items rather than presenting a list of unrelated words in isolation will enhance retention.

The Essentials of Vocabulary Teaching: From Theory to Practice

And now, a study!

The Frequency and Use of Lexical Bundles in Conversation and Academic Prose

"Even before the use of computer-assisted techniques in lexicography and linguistics, schol-
ars interested in language use recognized the importance of recurring patterns. Firth (1957,
195) noted that patterns in the surrounding context were important for understanding the
meaning of a word, stating “you shall know a word by the company it keeps”. In looking at
the social functions of language, Hymes (1968, 126) claimed that “a vast portion of verbal
behavior … consists of recurrent patterns, of linguistic routines”. Branches of lexicology,
too, for decades have investigated the status of multi-word units (see review in Moon 1997,
48–50). Nevertheless, lexicography continues to emphasize the individual word as the basic
unit of discourse. The very fact that dictionaries are arranged by individual head words
gives primacy to the individual word, and suggests that phrases and clauses of a language
are built from these individual units. "
...
Identification and frequency counts of lexical bundles
We define lexical bundles as the most frequent recurring fixed lexical sequences in a regis-
ter. The more common a lexical bundle, the more useful it would appear to be in building
discourse, but precisely where to set a frequency cut-off is somewhat arbitrary. We give an overall summary of the frequency of 3- and 4-word lexical bundles considering bundles
with a frequency of at least 10 per million words in the register...

The fact that most of the lexical bundles are not structurally complete has likely con-
tributed to their being overlooked in previous research, since traditionally linguists have
focused on grammatical phrases and clauses, rather than lexical units that cut across gram-
matical structures. Furthermore, most of the bundles are quite transparent in meaning. As
such, they have also been overlooked by researchers who consider idiomaticity a require-
ment for language that is non-compositional, although there is no reason that semantically
transparent sequences could not also be processed as whole chunks (see further Erman/
Warren 2000, 54).
Although the majority of words do not occur within recurrent sequences in either con-
versation or academic prose, the frequency and functions of lexical bundles demonstrate
that speakers and writers use them regularly in building discourse. While much further
study is needed—particularly from a psycholinguistic perspective and in more registers—lexical bundles already deserve attention in thorough lexicographic descriptions...

Considering the structures that account for at least 10% of the 4-word bundles in each
register illustrates the contrast between the registers. (The Longman Grammar, chapter 13,
provides a complete review of the structures.) Three structural types account for almost
70% of the 4-word bundles in conversation (Table 2), and all three include a verb. However, these structures account for only a negligible proportion of the bundles in academic
prose. Rather, over 60% of the 4-word bundles in academic prose are covered by two
structural types that incorporate noun phrase components; these structures account for only
about 7% of the bundles in conversation...

The difference in lexical bundle structures between the registers is consistent with word,
phrase and clause category differences between these registers generally. Conversation
tends to have more verbs, more personal pronouns, and more questions, while academic
prose has more nouns and prepositional phrases (Longman Grammar, chapters 2, 8, 14).
More importantly, these structural differences reflect differences in the functions that the
bundles serve. The structures typical of conversation are used for more personal expres-
sions, particularly expressions of attitudes and desires, with bundles such as I don’t know
what or you want me to. The structures typical of academic prose are useful for specifying
aspects of information with bundles such as the nature of the, the extent to which, and as a
result of. These functional differences provide greater insight into lexical bundles’ role in
building discourse,

The function of common lexical bundles in conversation
The functional types of bundles that are common in conversation reflect the communicative
purposes and contexts of typical conversation in British English—a focus on interaction
and conveying personal thoughts and attitudes, and the concern for politeness and not im-
posing on others. The most striking aspect of conversation’s use of lexical bundles is the
high proportion of personal stance expressions. They are used for epistemic stance (usually
expressing lack of certainty or knowledge); expressing personal desires and inquiring into
others’ desires; directing others, releasing them from obligations, or inquiring into one’s
own obligations; and discussing intentions. Examples include:
I don’t know how you got on that list. [epistemic stance]
I don’t want to go by myself. [attitude/modality—desire]
You sure you want to go? [attitude/modality—desire]
As soon as you’ve finished just go, you don’t have to stay for your full three hours, nobody’s
gonna know [attitude/modality-obligation/directive]
A: She can’t cope.
B: Oh dear. What are we going to do now then? [attitude/modality—intention/prediction]

1 In some cases, 4-word bundles are parts of 5-word or 6-word bundles (e.g. at the end of and the
end of the are both part of at the end of the). These longer bundles are far less common and, for brevity, are not covered here...

https://www.google.com/url?sa=t&source= ... aIIOSh-xTw

Investigating the usefulness
of lexical phrases in contemporary
coursebooks
Mark Koprowski

Investigating the usefulness
of lexical phrases in contemporary
coursebooks
Mark Koprowski
Over the past decade, lexical theory, corpus statistics, and psycholinguistic
research have pointed to the pedagogical value of lexical phrases. In response,
commercial publishers have been quick to import these insights into their
materials in a bid to accommodate consumers and to profit from the ‘lexical
chunk’ phenomenon. Contemporary British coursebooks now routinely offer a
generous and diverse mix of multi-word lexical items: collocations, compounds,
idioms, phrasal verbs, binomials, fixed and semi-fixed expressions. But while
designers have been enthusiastic about adding chunks to the syllabus, the
process of selecting items has been highly subjective and conducted without
reference to corpus data. By analyzing the usefulness of lexical phrases in three
contemporary coursebooks, this paper offers a lexical profile of the items
specified for each course. It is shown that nearly a quarter of the multi-word
lexical items specified may be of limited pedagogic value to learners...

https://www.google.com/url?sa=t&source= ... oKwLhTcwCQ
2 x

Cainntear
Blue Belt
Posts: 853
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 1741
Contact:

Re: Language chunks to ease language activation

Postby Cainntear » Sat Apr 01, 2017 5:39 pm

tommus wrote:
Cainntear wrote:The reason we talk about chunking is exactly that limit of 7 things at a time.

I ran my "chunking" program for 7-word chunks to see what that would look like. It was on the 2 million words of Dutch news. There was a surprising number of 7-word chunks occurring 4 or more times. The news included the weather forecast which tends to have repetitive phrases and the weather did dominate.

If you look closely, you'll find that some of those show layers of chunking:

For example:
((het is niet (de eerste keer)) dat)
"de eerste keer" is likely to appear often as a three word chunk in several other settings (for the first time, since the first time) and the phrase "it is not the first time" is also a chunk containing that chunk. The conjunction "that" arguably doesn't count as a word within the chunk because it's a grammatically regular use of a conjunction, and therefore doesn't need to be "remembered" at all.

Then we've got chunks that look like they might class as "semi-fixed":
we hebben te maken met een hogedrukgebied
we hebben te maken met een lagedrukgebied
That's the same construction with one noun different, but both nouns are grammatically similar, judging from the suffixes.

in de loop van de nacht wordt
de loop van de nacht wordt het
in de loop van de dag komt
in de loop van de middag gaat
in de loop van de nacht gaat
de loop van de dag neemt de
Again here we have patterns that recur but with different content.

This is why I reckon that chunks, however important they are, can only ever be part of the story, and too many proponents of chunked language keep trying to suggest that we shouldn't look at grammar and individual lexis at all, only chunks.
1 x
A year of Tatoeba recordings: 40 / 365 One donated recording every day in 2017.

Daniel N.
Orange Belt
Posts: 145
Joined: Mon Oct 12, 2015 12:44 pm
Languages: Croatian (N), English (C1), German (beginner)
x 236
Contact:

Re: Language chunks to ease language activation

Postby Daniel N. » Wed Apr 05, 2017 1:08 pm

I would like to draw attention to the construction grammar, where everything in language is basically the same thing, a construction. Past tense is one construction, widely used, while English the + comparative, the + comparative (e.g. the faster, the better) is another construction, just less often used.

Instead of chunks, maybe a better concept would be "small templates", that is, chunks where it's clear what's variable, and what isn't.
1 x
Check Easy Croatian (very useful for Bosnian, Montenegrin and Serbian as well)


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 1 guest