32000 words of intensive reading - an accidental experiment

mcthulhu · Postby **mcthulhu** » Sun Jan 28, 2018 6:10 pm

I have a friend who used to use color highlighting to parse complex sentences, e.g. verb phrases were highlighted in green. He had a complete color system worked out. You might consider this a streamlined form of sentence diagramming, without actual diagrams. He thought it helped him to see the syntactic structure much more easily. Other people's reactions varied - some thought it was a crutch he should abandon, and others thought it was just way too much work... He didn't seem to mind investing the time, though, and I don't think it really slowed him down that much.

He only did this on paper, not on a computer. His syntax highlighting system probably could have been more elaborate if he hadn't been limited to the physical color highlighters he had readily available in those days. It was rather colorful as it was, though.

Cainntear · Postby **Cainntear** » Sun Jan 28, 2018 7:35 pm

I personally find that the small amount of time I did explicit parsing of English taught me enough about the process that I can now do it in my head, and can "see" divisions in text without marking them, both in English and in languages that I'm studying.

Tristano · Postby **Tristano** » Mon Jan 29, 2018 3:43 pm

Except for thanking you for the nice post and congratulating the woman for both effort and results, I'm intruding here with a side question.
I'm used to find my English horrible. The reported excerpts and part of the posts belonging to this thread, to which level of cefr scale we can associate? That would be my level in passive English and one below would be a rough representation of my actives.

lsilvaj · Postby **lsilvaj** » Thu Feb 08, 2018 11:53 pm

I'm taking a similar approach with Greek, using the so called advanced texts from GreekPod101. By my estimate the 50 texts available amount to approximately 15000 words. I copy them to google docs and add notes to the words I'm not totally familiar with. I can feel a little progress in my understanding, but I'm not even halfway through.

luke · Postby **luke** » Fri Feb 09, 2018 2:12 am

s_allard wrote:
Axon wrote:...
In political articles there are monsters of sentences like this one:
In a speech at Georgetown University, she laid out the U.S. military maneuvers over the past several months—including a nuclear-powered submarine heading to South Korea, the movement of three aircraft carriers to the Western Pacific, and the Army testing out “mobilization centers” for deploying troops and training soldiers to fight in tunnels like those beneath North Korea—that inform this worry.

I'd be curious to see what the interactive process was to arrive at a complete understanding of that sentence.

One should be able to join the parts before and after the dashes.

In a speech at Georgetown University, she laid out the U.S. military maneuvers over the past several months that inform this worry. ("this worry", refers to a sentence before the monster one quoted).

Then one can treat the part between the dashes as support for the main idea ("this worry"):

including a nuclear-powered submarine heading to South Korea, the movement of three aircraft carriers to the Western Pacific, and the Army testing out “mobilization centers” for deploying troops and training soldiers to fight in tunnels like those beneath North Korea (Refers to the "U.S. military maneuvers" in the main sentence).

So, formulaically, in good English (like the New York Times article that was quoted), one can parse a sentence with dashes something like this:

A -- B -- C.

as

AC. B.

reineke · Postby **reineke** » Sat Feb 10, 2018 4:26 pm

Axon wrote:
In political articles there are monsters of sentences like this one:
In a speech at Georgetown University, she laid out the U.S. military maneuvers over the past several months—including a nuclear-powered submarine heading to South Korea, the movement of three aircraft carriers to the Western Pacific, and the Army testing out “mobilization centers” for deploying troops and training soldiers to fight in tunnels like those beneath North Korea—that inform this worry.

Look how long that middle part is! And less obvious but equally confusing: what are the mobilization centers for? Deploying troops and training soldiers. To do what? Fight in tunnels. Tunnels like what? Like the tunnels beneath North Korea. What does "inform a worry" really mean, and how can it be used? I'm a native speaker with a degree in a writing-heavy subject and I'm positive I've never used that phrase before.

You cut out important details.

'The Military Has Seen the Writing on the Wall'
The United States is preparing for a war with North Korea that it hopes never to have to fight

"When Senator Tammy Duckworth returned from a recent trip to South Korea and Japan, she brought back a sobering message: “Americans simply are not in touch with just how close we are to war on the Korean peninsula.” In a speech at Georgetown University, she laid out the U.S. military maneuvers over the past several months—including a nuclear-powered submarine heading to South Korea, the movement of three aircraft carriers to the Western Pacific, and the Army testing out “mobilization centers” for deploying troops and training soldiers to fight in tunnels like those beneath North Korea—that inform this worry. In an interview with me, she said the U.S. military seems to be operating with the attitude that a conflict “‘will probably happen, and we better be ready to go.’”

https://www.theatlantic.com/internation ... ea/551381/

Your text: When Senator Tammy Duckworth returned from a recen ...

Flesch Reading Ease score: 44.3 (text scale)
Flesch Reading Ease scored your text: difficult to read.

Gunning Fog: 15 (text scale)
Gunning Fog scored your text: hard to read.

Flesch-Kincaid Grade Level: 13.2
Grade level: College.

The Coleman-Liau Index: 11
Grade level: Eleventh Grade

The SMOG Index: 12
Grade level: Twelfth Grade

Automated Readability Index: 13.6
Grade level: 21-22 yrs. old (college level)

Linsear Write Formula : 16.6
Grade level: College Graduate and above.

http://www.readabilityformulas.com

Axon · Postby **Axon** » Sun Feb 11, 2018 7:36 am

Thanks, reineke! I didn't know there were so many different reading level calculators. I agree, you do need to know what "this worry" refers to in order to truly grasp the meaning of the sentence.

She's up past article 40 now, and one new thing I've noticed is that a few of the words she's asking me for help with are words that I would not really be able to use confidently in my own writing. Words like stolid, renal, putatively, bumptious, comport. Other educated native speakers, can you give definitions of these out of context off the top of your head? Have they recently or ever appeared in your writing?

Some might worry: If you don't have a tutor or native speaker around to tell you, then how can you be sure that the words you're learning from native sources are actually good words to use? I'd say more extensive reading will give you good intuition in the right direction. Also, if you vary the register of your readings, you'll notice which words appear in highbrow or more general writing.

Here (with her permission) is a sample of words that she's recently written down. This is a nice view into a self-learner's process of improving English literacy.

inhospitable, blunder, minted, akin to, ferocious, clout, well versed in, overthrow, a mob of, unleash, memorandum, hard-nosed, stemming from, croon, uninhibited, zealot, subordinate, atrocity, redoubled, undulating, slab, bohemian, hub, gripe, fret, cobblestone, sundering, allege, centrist, lag, pidgin, fizz

I searched the forum for several of these at random and found very few results. I think that goes to show the value of finding varied sources for reading practice. You could read this forum for weeks on end and only read "cobblestone" twice, but any novel set in London or any Minecraft video will expose you to that word dozens of times.

s_allard · Postby **s_allard** » Sun Feb 11, 2018 2:14 pm

I think this last post illustrated once again two fundamental truths about vocabulary learning. Firstly, outside the fundamental set of function or grammar words and the basic vocabulary of everyday life - including work of course -, most of the words in a language are rarely heard, read or used by most people.

This is a broad statement that demands lots of explanation that I don't have time to get into now but the point is that what we actually use is only a tiny portion of what is out there. I watch about an hour of television every day in English and I'll say that I hear a new word or saying every day. Just last night, I heard "If wishes were horses, beggars would ride". That was completely new to me. Who knows when I will hear it again.

The second point is that we will acquire new vocabulary as needed. That is exactly what reading or exposure in general does. This is of course how professional or occupational vocabulary is learned. A bus driver, a lawyer, an engineer, each will have their own subset of words that I do not know.

tommus · Postby **tommus** » Sun Feb 11, 2018 2:59 pm

s_allard wrote:I watch about an hour of television every day in English and I'll say that I hear a new word or saying every day

I watch a half hour of Dutch news every day. Yesterday, I did an analysis of the last 1,000 days of the subtitles of that news. I removed all the proper nouns and the numbers, leaving all forms of the remaining words (so not word families but the much more numerous word variations). Of these, there was a total of 55,000. And I then graphed the accumulated number of words per day. After a brief steep rise at the beginning, the daily increase was an almost linear increase of 55 words per day, still increasing by about 55 words per day after 1,000 days (about 3 years). So understanding that some (I don't yet know how many) would be variations of words that I have seen in the last 3 years, each new half hour of daily news contains about 55 words (or forms of words) that I have not actually seen before. Now I am going to look more closely at just what these 55 per day are.

reineke · Postby **reineke** » Sun Feb 11, 2018 3:32 pm

Most adult native test-takers range from 20,000–35,000 words
Average native test-takers of age 8 already know 10,000 words
Average native test-takers of age 4 already know 5,000 words
Adult native test-takers learn almost 1 new word a day until middle age
Adult test-taker vocabulary growth basically stops at middle age
The most common vocabulary size for foreign test-takers is 4,500 words

http://testyourvocab.com/blog/

In another vocabulary test study researchers found that "an average 20-year-old native speaker of American English knows 42,000 lemmas and 4,200 non-transparent multiword expressions, derived from 11,100 word families.." "The numbers range from 27,000 lemmas for the lowest 5% to 52,000 for the highest 5%. The knowledge of the words can be as shallow as knowing that the word exists. In addition, people learn tens of thousands of inflected forms and proper nouns (names), which account for the substantially high numbers of ‘words known’ mentioned in other publications."

http://journal.frontiersin.org/article/ ... 01116/full
http://vocabulary.ugent.be/wordtest/start

Russian
http://www.myvocab.info/articles/slovar ... razovaniya

However...

What Is Advanced-Level Vocabulary? The Case of Chunks and Clusters
http://www.tesol.org/docs/default-sourc ... .pdf?sfvrs

".. we move from the notion of advanced vocabulary as a set of words to the notion of advanced vocabulary as sets of words in combination. Once again, corpus analysis will be employed to help us search for patterns and frequencies. However, when we expand our search criteria to look for groupings of more than one word, things become more complicated, and there are clear lessons to be learned about how we describe the vocabulary of a language, as well as implications for what teachers teach in their vocabulary lessons and how learners approach the task of acquiring vocabulary and developing fluency. Throughout this paper we work from, but also hope to challenge, the understanding of many teachers, researchers, and learners that vocabulary means no more than all the single words of a language."

"Using a 4.7-million-word sample of North American English conversation from the Cambridge International Corpus (CIC), and applying corpus analytical software to obtain a frequency count for recurrent chunks, the following totals emerge for chunks occurring more than 20 times:

two-word chunks 19,509
three-word chunks 12,681
four-word chunks 2,953
five-word chunks 385

Chunks and Single Words

Only 14 items in a single-word frequency list occur more often than the most frequent chunk (i.e., you know, which occurs 45,873 times). Of the first 100 items in the overall frequency list, 11 are two-word chunks, including I think and I mean. By the time we reach 500 items, there are 177 two-word chunks and 7 three-word chunks, that is, 35% of the most frequent items are chunks, not single words."

So, in order to achieve advanced-level vocabulary you may need to know items like "up," "yours," and "up yours" which represents a considerable learning burden.

Also...

"experience changes the quality of lexical representations, and does so differently for different words and different individuals. Some aspects of this relationship are well-described, including the logarithmic relationship between word frequency of occurence and behavioral correlates of word recognition: ten exposures to an infrequent word may have a similarly strong impact on the quality of that word’s mental representation as 100 exposures to a word that is well entrenched in one’s mental lexicon...
Importantly, it may not be simply the number of exposures to a word – larger for good readers, smaller for poor ones, due to their differences in reading experience – that would give rise to individual variability. It may be that poor readers are not able to use the exposures they do get to create the kind of high quality lexical representations that skilled readers have.. .

For example, readers who make fewer phonological discriminations due to poor phonological processing skills will not end up with the same quality of lexical representation after 100 exposures than someone without phonological problems would end up with, even if their level of reading experience is matched. The same holds true for readers with a limited learning capacity or a compromised long-term lexical memory, or any other behavioral or organic characteristic that impedes the entrenchment of mental lexical representation: in all these cases the readers would have to have a larger number of exposures to a word than readers without those characteristics to create a representation of the same quality. None of these scenarios can be accounted for by general-use corpora, however large and genre-balanced they are..."

https://www.ncbi.nlm.nih.gov/pmc/articl ... po=2.33161

A language learners’ forum

32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Re: 32000 words of intensive reading - an accidental experiment

Who is online