I think it's worth pointing out that this paper is written from an information theoretic approach.
Meaning the 1.5MB of data is information one extracts after a hard day's work every day over 18 years.
Not at all like reading a 200,000 word book.
More like taking 12.5 million left or right turns while flying through space to end up on planet Earth.
You go through terabytes of data every day over those 18 years to extract those average 1900 bits for the day.
Same way you can download Google Translate for your language pair and it takes only ~100MB, that doesn't mean that's how much data was necessary to train that model, which is easily 10000 times higher. Not to mention the time and work effort to process all of it.
To quote the paper itself:
To put our lower estimate in perspective, each day for 18 years a child must wake up and remember, perfectly and for the rest of their life, an amount of information equivalent to the information in this sequence,
011010000110100101100100011001
000110010101101110011000010110
001101100011011011110111001001
100100011010010110111101101110
And that's the minimum estimate of 120 bits, average estimate is 1900 bits, and upper estimate is 6200 bits (50 sequences like the one above) to be extracted and memorized perfectly and for the rest of your life every day over 6570 days.
P.S. Also I think it's interesting to point out that ~96% of all the work goes into lexical semantics, a.k.a. learning the meaning of things in the concept space.