Language chunks to ease language activation

Ask specific questions about your target languages. Beginner questions welcome!
User avatar
reineke
Black Belt - 2nd Dan
Posts: 2221
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Engrish
x 3606

Re: Language chunks to ease language activation

Postby reineke » Thu Mar 30, 2017 4:03 am

neofight78 wrote:
I've encountered the effects of ignoring Russian grammar and the results are more often than not awful. A purely lexical approach without grammar or vocabulary study doesn't work in all situations for all people.


The lexical approach is about studying "grammaticalized lexis" i.e. it does away with the dichotomy of grammar and vocabulary as separate study subjects.

Anyone actively studying lexical chunks cannot possibly be ignoring vocabulary. While grammar plays second fiddle, it is hardly ignored. Paying attention to lexical chunks could prove especially useful while studying a highly inflected language like Russian. Nobody's forcing students, teachers or self-learners to adopt this approach. What's more, the lexical approach is apparently largely ignored:

"A quick glance at any commercially available EFL textbook reveals that a traditional grammar syllabus, the main object of Lewis's attack, is still alive and kicking, albeit more cleverly disguised."

https://www.google.com/amp/s/amp.thegua ... revolution

As Nick Ellis (2011, p. 656) puts it:

"Adult language knowledge consists of a continuum of linguistic constructions of different levels of complexity and abstraction. Constructions can comprise concrete and particular items (as in words and idioms), more abstract classes of items (as in word classes and abstract constructions), or complex combinations of concrete and abstract pieces of language (as mixed constructions). Consequently, no rigid separation is postulated to exist between lexis and grammar."

https://scottthornbury.wordpress.com/20 ... struction/
Last edited by reineke on Wed Apr 12, 2017 7:36 pm, edited 1 time in total.
2 x

jeffers
Orange Belt
Posts: 152
Joined: Sat Aug 22, 2015 4:12 pm
Location: UK
Languages: Speaks: English (N), Hindi (A2-B1)

Learning: The above, plus French (A2-B1), German (A1), Ancient Greek (?), Sanskrit (beginner)
Language Log: viewtopic.php?f=15&t=2612
x 284

Re: Language chunks to ease language activation

Postby jeffers » Thu Mar 30, 2017 9:24 am

This thread made me wonder if it would be feasible to write a program which finds chunks in texts. The obvious use would be to create a frequency list of chunks for any language, and use it much like people do with vocabulary frequency lists.

The complex part would be to find pairs of words used more than a set threshold. You would have to do something like check every word against every word, a problem which at first glance is going to grow exponentially as the text file increases [O(2^n) for you computer science boffins]. Once you have pairs, finding chunks of 3, chunks of 4, etc, would be quick because each subsequent set will be a fractional subset of the previous.

There could be "heuristic" solutions (i.e. approximations which shorten the processing). Off the top of my head, you could find the most common pairs in several small to medium size texts, then only search for these pairs in a massive corpus. Another approach might be to only look for pairs where both words appear in the top 2000 (or any other arbitrary number) frequent words in the corpus.

I feel a summer project coming on.......
3 x
Fr books: 7 / 100films: 90 / 200
De books: 1 / 50films: 6 / 50
Hi books: 0 / 50films: 2 / 50
Gr books: 0 / 50films: 0 / 50

Online
DaveBee
Blue Belt
Posts: 769
Joined: Wed Nov 02, 2016 8:49 pm
Location: UK
Languages: English (native). French (studying).
x 1005

Re: Language chunks to ease language activation

Postby DaveBee » Thu Mar 30, 2017 10:09 am

reineke wrote:
neofight78 wrote:
I've encountered the effects of ignoring Russian grammar and the results are more often than not awful. A purely lexical approach without grammar or vocabulary study doesn't work in all situations for all people.


The lexical approach is about studying "grammaticalized lexis" i.e. it does away with the dichotomy of grammar and vocabulary as separate study subjects.

Anyone actively studying lexical chunks cannot possibly be ignoring vocabulary. While grammar plays second fiddle, it is hardly ignored. Paying attention to lexical chunks could prove especially useful while studying a highly inflected language like Russian. Nobody's forcing students, teachers or self-learners to adopt this approach. What's more, the lexical approach is apparently largely ignored:

"A quick glance at any commercially available EFL textbook reveals that a traditional grammar syllabus, the main object of Lewis's attack, is still alive and kicking, albeit more cleverly disguised."

https://www.google.com/amp/s/amp.thegua ... revolution
The author of Polish for Dummies, Daria Gabryanczyk, recommends learning phrases of Polish, rather than words.
So, what’s the best approach when it comes to learning Polish? Simply relax, take things easy, don’t worry if you can’t always get the endings right, don’t look too much ahead but just let yourself gradually dive into Polish, follow simple tips your teacher gives you during your classes, take every opportunity to speak Polish and listen to the language and keep repeating full phrases. In the case of the Polish language, learning isolated words is not a good idea, as you might not know how to put them together. Therefore, especially at the beginning, you should always focus on memorising full phrases as this is the best way to learn Polish. And you will soon notice that the more Polish phrases you already know the easier it becomes for you to take in more and more and in no time you will realise it’s actually not that difficult.
3 x
FR films: 59 / 100, FR books: 35 / 35

User avatar
reineke
Black Belt - 2nd Dan
Posts: 2221
Joined: Wed Jan 06, 2016 7:34 pm
Languages: Engrish
x 3606

Re: Language chunks to ease language activation

Postby reineke » Thu Mar 30, 2017 2:39 pm

Leoxicon
"Leo Selivan's blog for EFL/ESL teachers. Activities, ideas and useful tips with a lexical touch."

Essential lexical tools

http://leoxicon.blogspot.com/p/essentia ... l#section1

http://www.scoop.it/t/tools-for-lexical-teachers

You'll find these links and a few others in the language tools section of this forum:

http://forum.language-learners.org/view ... =19&t=2915
1 x

User avatar
tommus
Blue Belt
Posts: 511
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1)
x 785

Re: Language chunks to ease language activation

Postby tommus » Sat Apr 01, 2017 12:49 am

jeffers wrote:This thread made me wonder if it would be feasible to write a program which finds chunks in texts.

Excellent idea. You inspired me to put together what turned out to be a rather simple Java program to find chunks. Here are some Dutch results.

Corpus: 2 years of daily Dutch news from NOS Journaal
Number of words: over 2 million
Condition: words longer than 3 letters
Processing time: less than a minute
2-word chunks, occurring at least 10 times each: 5,118
3-word chunks, occurring at least 10 times each: 327

Examples, showing frequency of occurrence, excluding proper names

Top 30 two-letter chunks

1300 niet meer
756 jaar geleden
726 vorig jaar
709 heel veel
665 niet alleen
651 nieuws vandaag
650 steeds meer
604 veel mensen
532 veel meer
396 alleen maar
395 miljoen euro
337 niet voor
315 maar niet
315 niet goed
310 volgend jaar
309 mensen zijn
283 terug naar
276 twee jaar
276 vorige week
268 onder meer
266 miljard euro
262 ieder geval
256 deze week
254 daar zijn
254 helemaal niet
254 volgende week
243 laten zien
243 niet veel
229 afgelopen jaren
229 fijne avond

Top 30 three-letter chunks

91 twee jaar geleden
80 heel veel mensen
73 mensen raakten gewond
63 twee weken geleden
60 steeds meer mensen
55 zich zorgen over
53 eerder deze week
53 voor veel mensen
47 zich grote zorgen
44 niet alleen voor
42 maakt zich zorgen
41 veel meer over
40 paar jaar geleden
40 vijf jaar geleden
39 maken zich zorgen
39 prettige avond verder
38 jaar geleden werd
37 maar niet iedereen
37 meeste plaatsen droog
36 goed nieuws voor
35 heel veel geld
35 over twee weken
35 vier jaar geleden
33 niet alleen maar
32 veel mensen zijn
31 vrij veel bewolking
29 paar weken geleden
28 eind vorig jaar
27 even terug naar
27 komen steeds meer
4 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 155 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 620 / 730
: 75 / 104
: 620 / 730

User avatar
tommus
Blue Belt
Posts: 511
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1)
x 785

Re: Language chunks to ease language activation

Postby tommus » Sat Apr 01, 2017 12:50 pm

Here are links for the Dutch two- and three-word chunks and their frequencies from two years of NOS Journaal. They include proper names such as people and places. They may contain some chunks that are unique to journalism.

Corpus: 2 years of daily Dutch news from NOS Journaal
Number of words: over 2 million
Condition: words longer than 3 letters
Processing time: less than a minute

2-word chunks, occurring at least 2 times each: 56,833

3-word chunks, occurring at least 2 times each: 13,156
0 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 155 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 620 / 730
: 75 / 104
: 620 / 730

Cainntear
Blue Belt
Posts: 853
Joined: Thu Jul 30, 2015 11:04 am
Location: Scotland
Languages: English(N)
Advanced: French,Spanish, Scottish Gaelic
Intermediate: Italian, Catalan, Corsican
Basic: Welsh
Dabbling: Polish, Russian etc
x 1741
Contact:

Re: Language chunks to ease language activation

Postby Cainntear » Sat Apr 01, 2017 1:20 pm

I realise this is quite an old post, but I think the point raised demands a response.
lusan wrote:I believe, as soon research showed, that we remember best no more than 7 items at the time. Very long sentences fail in that respect.

The reason we talk about chunking is exactly that limit of 7 things at a time.

The idea is that by "chunking" multiple words together, we can see them as one thing, and then we can incorporate them into larger and larger constructs.

But it's a complex process, and the contents of the chunks usually still follow the grammatical rules of the language, so I don't see learning chunks as an alternative to learning grammar, but a supplement. I'm sure you can learn some grammar through studying chunks, but grammar is a series of generalisable rules, and you can't learn a generalisable rule from only one or two examples.
0 x
A year of Tatoeba recordings: 40 / 365 One donated recording every day in 2017.

User avatar
tommus
Blue Belt
Posts: 511
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1)
x 785

Re: Language chunks to ease language activation

Postby tommus » Sat Apr 01, 2017 1:24 pm

Here are links for the Dutch two- and three-word chunks and their frequencies from several years of some Dutch TV series. They contain mainly conversational material.

Corpus: 2 years of Dutch TV series.
Number of words: over 278,000
Condition: words longer than 3 letters
Processing time: less than 10 seconds

2-word chunks, occurring at least 2 times each: 6,635

3-word chunks, occurring at least 2 times each: 1,121
0 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 155 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 620 / 730
: 75 / 104
: 620 / 730

User avatar
tommus
Blue Belt
Posts: 511
Joined: Sat Jul 04, 2015 3:59 pm
Location: Kingston, ON, Canada
Languages: English (N), French (B2), Dutch (B2), German (A1), Spanish (A1), Esperanto (A1)
x 785

Re: Language chunks to ease language activation

Postby tommus » Sat Apr 01, 2017 1:31 pm

If there is interest, I will process other corpus material in other languages into 2-word and 3-word chunks. What I need is a link to plain text material. It should include at least 200,000 words, and preferably 1 million words or more, in Latin script.
0 x
Dutch
40 Boeken
● 35 Ned. Videos
● 370 Univ-Nederland
: 23 / 40
: 35 / 35
: 155 / 370
● 730 Video Nieuws
● 104 Skype NL Chats
● 730 Tekst Nieuws
: 620 / 730
: 75 / 104
: 620 / 730

jeffers
Orange Belt
Posts: 152
Joined: Sat Aug 22, 2015 4:12 pm
Location: UK
Languages: Speaks: English (N), Hindi (A2-B1)

Learning: The above, plus French (A2-B1), German (A1), Ancient Greek (?), Sanskrit (beginner)
Language Log: viewtopic.php?f=15&t=2612
x 284

Re: Language chunks to ease language activation

Postby jeffers » Sat Apr 01, 2017 2:12 pm

tommus wrote:If there is interest, I will process other corpus material in other languages into 2-word and 3-word chunks. What I need is a link to plain text material. It should include at least 200,000 words, and preferably 1 million words or more, in Latin script.


Thanks for setting that up. I was probably over-complicating the process in my mind.

The only problem with your solution is restricting it two words of at least 3 letters (evidently to cut down processing time). I imagine a high proportion of interesting word groups contain words of one and two letters . The obvious example in French is "il y a", but we could imagine others ("What a crock!").
0 x
Fr books: 7 / 100films: 90 / 200
De books: 1 / 50films: 6 / 50
Hi books: 0 / 50films: 2 / 50
Gr books: 0 / 50films: 0 / 50


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 1 guest