einzelne wrote:Anyways, I was just daydreaming.
Let's continue the daydream -- In the name of efficiency, we could rewrite great works and remove the useless repeats of prose.
I just ran Camus - L'Etranger through some Python programming.
Here are the 30 most frequently used words (having gotten rid of the stopwords*).
[('Raymond', 91), ('moment', 90), ('demandé', 88), ('bien', 81), ('rien', 77), ('Marie', 72), ('répondu', 65), ('temps', 61), ('fois', 58), ('été', 56), ('air', 53), ('avocat', 52), ('homme', 51), ('maman', 48), ('faire', 45), ('soleil', 43), ('femme', 42), ('jour', 41), ('chose', 39), ('sommes', 38), ('-il', 36), ('jamais', 36), ('pensé', 35), ('yeux', 35), ('voulait', 35), ('beaucoup', 34), ('petit', 33), ('visage', 33), ('fallait', 32), ('regardé', 32)]
He would only mention 'Raymond' once. Get rid of the repetitive melancholy of 'maman'. And his weak desire of 'Marie' and 'femme'. Such a lighter read we could have!
Alas - we'd use only once 'culotte' or 'brodée' but that is just like the original.
*probably needs a little more cleaning - that '-il' makes me think that the lemmatization function may need the '-' removed.