Word frequency checker

All about language programs, courses, websites and other learning resources
Hork
White Belt
Posts: 11
Joined: Sun Jan 10, 2016 6:57 pm
x 8

Word frequency checker

Postby Hork » Sun Jan 10, 2016 7:03 pm

I have a German word frequency list (not lemmatized) and a German book (both as .txt files). Would it be possible to develop a simple OFFLINE text/book reader program where words could be highlighted (e.g. by different color or underscoring?) according to their frequency band/level?


UPDATE AGAIN:
I've found a program that at last fulfilled my demands: Notepad++
The key feature is arcanely called "User Defined Language"
Here's the screenshot of a German book where four frequency levels highlight word forms that occur in half the corpus texts.

frek31.PNG
You do not have the required permissions to view the files attached to this post.
Last edited by Hork on Wed Jan 13, 2016 4:14 am, edited 4 times in total.
1 x

User avatar
rdearman
Site Admin
Posts: 7259
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23308
Contact:

Re: Word frequency checker

Postby rdearman » Sun Jan 10, 2016 7:27 pm

Hork wrote:I have a German word frequency list (not lemmatized) and a German book (both as .txt files). Would it be possible to develop a simple offline text/book reader program where words could be highlighted (e.g. by different color or underscoring?) according to their frequency band/level?

Anything is possible. You could do it in a text editor like vim or emacs which supports highlighting. In emacs you'd probably have to write a minor-mode like the ones used for syntax highlighting. Are you planning to write this yourself? Or are you asking for someone to write it for you?

What is the benefit though? If you read enough books you'll soon learn all the common words anyway.
1 x
: 26 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

Hork
White Belt
Posts: 11
Joined: Sun Jan 10, 2016 6:57 pm
x 8

Re: Word frequency checker

Postby Hork » Sun Jan 10, 2016 7:48 pm

I'm not versed in programming at all.
Here's how that should work: One opens a (fiction) book in that reader and has it highlight/show you most frequent words (different levels) so while reading one can instantly decide which new words are worth learning.
E.g. by showing words which appear statistically at least in half the books or in one third/forth etc.
It would be useful to someone like me who wants to learn a few new words from each chapter but wants to avoid infrequent ones.
0 x

User avatar
tomgosse
Brown Belt
Posts: 1143
Joined: Tue Aug 25, 2015 11:29 am
Location: Les Etats Unis
Languages: Anglais (langue maternelle)
Français (A1)
Language Log: viewtopic.php?f=15&t=1185
x 2378
Contact:

Re: Word frequency checker

Postby tomgosse » Sun Jan 10, 2016 8:12 pm

I don't know if this site will help, but you can enter a text and it will sort the words by frequency of use.
3 x
Rejoignez notre groupe français ! Les Voyageurs

User avatar
rdearman
Site Admin
Posts: 7259
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23308
Contact:

Re: Word frequency checker

Postby rdearman » Sun Jan 10, 2016 8:26 pm

Hork wrote:I'm not versed in programming at all.
Here's how that should work: One opens a (fiction) book in that reader and has it highlight/show you most frequent words (different levels) so while reading one can instantly decide which new words are worth learning.
E.g. by showing words which appear statistically at least in half the books or in one third/forth etc.
It would be useful to someone like me who wants to learn a few new words from each chapter but wants to avoid infrequent ones.


This particular argument for and against learning most frequent words appears all the time. I am in the "learn the most frequent words first" camp, others aren't. But I'm of the opinion word frequency lists have a diminishing value over the length of the list and as you read more. I personally would advise you to take your 2nd file with the frequent words and learn the first 100-200 of them. Then start reading books, lots of books. The rest of the words you'll learn anyway over time. Perhaps the best software for you in this case is an e-reader which has a built in dictionary. I use something called FBReader which works on just about every platform, and the smartphone version has integrated google translations.

You could of course use kindle or other software which has the translation facility. This works better in my opinion because you start reading (slowly and painfully) looking up each word as you go along, however the most frequent words (like "the", "of", "to", "and") etc will be on every page, and after the 4th or 5th time looking them up, you'll know them and will not have to look them up. Other words perhaps less frequent you'll look up once in order to understand the sentence and never bother again. Eventually you'll get to the point where you'll only be looking up a few words per paragraph or even page. That time will come quicker than you'd think.

But the answer to your original question is that I don't think there is an app or software that will do exactly what you want which is comparing a freqency list against a text file.
3 x
: 26 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
Montmorency
Brown Belt
Posts: 1035
Joined: Tue Oct 06, 2015 3:01 pm
Location: Oxfordshire, UK
Languages: English (Native)
Maintaining: German (active skills lapsed somewhat).
Studying: Welsh (advanced beginner/intermediate);
Dabbling/Beginner: Czech

Back-burner: Spanish (intermediate) Norwegian (bit more than beginner) Danish (beginner).

Have studied: Latin, French, Italian, Dutch; OT Hebrew (briefly) NT Greek (briefly).
Language Log: viewtopic.php?f=15&t=1429
x 1184

Re: Word frequency checker

Postby Montmorency » Sun Jan 10, 2016 8:43 pm

Hork wrote:I have a German word frequency list (not lemmatized) and a German book (both as .txt files). Would it be possible to develop a simple offline text/book reader program where words could be highlighted (e.g. by different color or underscoring?) according to their frequency band/level?


To the other useful suggestions that have been posted, you might also look at "Readlang".

I have only so far used this on a PC web browser, but I believe these days he has a mobile app version.

It can use on-line dictionaries (which you can choose) as well as google translate. At one time, there was an experimental option to produce word frequency lists for a given book. I'm not sure if he ever incorporated that as a standard option, but it's perhaps worth looking into.
In any case, it has some other useful features.
Last edited by Montmorency on Wed Jan 13, 2016 2:25 pm, edited 1 time in total.
1 x

Stefan
Green Belt
Posts: 379
Joined: Sun Dec 20, 2015 9:59 pm
Location: Sweden
Languages: -
x 920
Contact:

Re: Word frequency checker

Postby Stefan » Mon Jan 11, 2016 8:47 am

Not really what you're asking for but LWT might be a solution. It highlights all the words you don't know and there's a frequency table listing all the words (possible to show only new ones) which you can export to Anki. WCnt Txts shows how many times the words is included in your imported texts.

Image
1 x

User avatar
MorkTheFiddle
Black Belt - 2nd Dan
Posts: 2141
Joined: Sat Jul 18, 2015 8:59 pm
Location: North Texas USA
Languages: English (N). Read (only) French and Spanish. Studying Ancient Greek. Studying a bit of Latin. Once studied Old Norse. Dabbled in Catalan, Provençal and Italian.
Language Log: https://forum.language-learners.org/vie ... 11#p133911
x 4884

Re: Word frequency checker

Postby MorkTheFiddle » Mon Jan 11, 2016 7:54 pm

Hork wrote:I have a German word frequency list (not lemmatized) and a German book (both as .txt files). Would it be possible to develop a simple offline text/book reader program where words could be highlighted (e.g. by different color or underscoring?) according to their frequency band/level?


UPDATE:
I (kind of) managed to find a temporary solution. It required a text editor with incremental search and regular expression feature. The downside is that it can highlight only one frequency level/band at the same time.
Here's the screenshot of a German book with highlighted word forms that occur at least in half the corpus texts.

I also know of no app or program that does what you want it to do.

But I have a couple of questions:
1. What is the word frequency list you use? A non-lemmatized list sounds valuable indeed.
2. What program or reader did you use to highlight the words in the example page you showed?

The for-pay site lingq and the free app LWT both offer limited frequency lists. The limitation being that they count only words you have already uploaded into their readers. I would think neither would adequately meet your needs. I am familiar with both because I currently use LWT myself and I was a paying customer of LingQ for 3 or 4 years. I have tried ReadLang, but only very cursorily. But a query to the developer of ReadLang might induce him to add this functionality to his app. I'm not speaking for him, BTW, because I have no idea what such a feature would entail. The developer is a member, I believe, of this forum, but for the life of me I cannot recall his name, and a quick scan of his site did not turn it up.

For someone who is not a programmer, you have so far done something that strikes me as extraordinary.
0 x
Many things which are false are transmitted from book to book, and gain credit in the world. -- attributed to Samuel Johnson

Hork
White Belt
Posts: 11
Joined: Sun Jan 10, 2016 6:57 pm
x 8

Re: Word frequency checker

Postby Hork » Tue Jan 12, 2016 5:41 pm

1. I got inspired by Google ngram datasets. They allow you to compile a separate English literature word list but unfortunately not for German so I made my own.
2. Look at the upper left


As for "diminishing value over the length of the list". I counted words based on aggregate number of 1000 fiction books they appear in and not their overall frequency in them. That way proper nouns and frequent words specific to only a few books go to the list bottom. So the most frequent word could score max. 1000 regardless of whether that particular word appeared 1000 or 1000000 times. A word that appears 1000 times but in just 5 books scores 5 instead of 1000.
0 x

User avatar
Montmorency
Brown Belt
Posts: 1035
Joined: Tue Oct 06, 2015 3:01 pm
Location: Oxfordshire, UK
Languages: English (Native)
Maintaining: German (active skills lapsed somewhat).
Studying: Welsh (advanced beginner/intermediate);
Dabbling/Beginner: Czech

Back-burner: Spanish (intermediate) Norwegian (bit more than beginner) Danish (beginner).

Have studied: Latin, French, Italian, Dutch; OT Hebrew (briefly) NT Greek (briefly).
Language Log: viewtopic.php?f=15&t=1429
x 1184

Re: Word frequency checker

Postby Montmorency » Wed Jan 13, 2016 2:35 pm

Hork wrote:I have a German word frequency list (not lemmatized) and a German book (both as .txt files). Would it be possible to develop a simple OFFLINE text/book reader program where words could be highlighted (e.g. by different color or underscoring?) according to their frequency band/level?


UPDATE AGAIN:
I've found a program that at last fulfilled my demands: Notepad++
The key feature is arcanely called "User Defined Language"


Thanks for the update. I've used Notepad++ for other things, and it's pretty cool, but I would not have thought of using it in that way.
Good find!
0 x


Return to “Language Programs and Resources”

Who is online

Users browsing this forum: Google [Bot] and 2 guests