Georgian Word Freuency List

Ask specific questions about your target languages. Beginner questions welcome!
tomos1729
White Belt
Posts: 41
Joined: Sat May 16, 2020 10:47 am
Languages: Cymraeg (Native)
Deutsch (Fluent)
Italiano (probably around B2/C1)
ქართული (probably around B1)
Español (maybe B1)
English (Fluent)
x 40

Georgian Word Freuency List

Postby tomos1729 » Wed Aug 26, 2020 3:22 pm

Does anyone know where I can find one?

There's one on Wikipedia, but there are some weird things in it (word number 2 is "edit" for example). Does anyone know of any others?

Thanks!
0 x

genini1
Yellow Belt
Posts: 93
Joined: Mon Aug 24, 2020 12:21 am
Languages: English (N), Japanese
x 305

Re: Georgian Word Freuency List

Postby genini1 » Wed Aug 26, 2020 7:26 pm

I don't have one, but I can tell you that the wikipedia frequency lists are generated from all the wikipedia entries in that language which is why 'edit' is number two since 'edit' will appear multiple times in every entry. The rest of the list should be generally accurate for a scholastic word list.
5 x

Longinus
Yellow Belt
Posts: 76
Joined: Tue Oct 27, 2015 6:56 pm
Location: United States
Languages: English(N)
Currently studying: Latin, Scottish Gaelic
Maintenance work: German, Russian, BCS, Albanian, Estonian
Basic level, 3-6 months worth of work each: Persian, Mongolian, Old Irish, Polish, Macedonian
Wish list: Lithuanian, Icelandic, Hungarian
x 209

Re: Georgian Word Freuency List

Postby Longinus » Wed Aug 26, 2020 11:28 pm

Here, I made you a frequency list of the 1000 most frequent words in a web-crawl of Georgian web pages.

https://drive.google.com/file/d/1SoNal0sg5dppMRsAw9SH3u0GqFjcwuBj/view?usp=sharing

The file is in .ods format for OpenOffice Calc. I think Excel will convert them, but if not, you can always download OpenOffice or LibreOffice for free and do it that way.
1 x

Longinus
Yellow Belt
Posts: 76
Joined: Tue Oct 27, 2015 6:56 pm
Location: United States
Languages: English(N)
Currently studying: Latin, Scottish Gaelic
Maintenance work: German, Russian, BCS, Albanian, Estonian
Basic level, 3-6 months worth of work each: Persian, Mongolian, Old Irish, Polish, Macedonian
Wish list: Lithuanian, Icelandic, Hungarian
x 209

Re: Georgian Word Freuency List

Postby Longinus » Wed Aug 26, 2020 11:32 pm

Here's another one of the 1000 most common strings of 3-4 words in Georgian.

https://drive.google.com/file/d/1XAlwoQtTcgNG3zI__MNdHofjHw1bhbnI/view?usp=sharing

Please let me know if you have any trouble with the links.
0 x

mcthulhu
Orange Belt
Posts: 228
Joined: Sun Feb 26, 2017 4:01 pm
Languages: English (native); strong reading skills - Russian, Spanish, French, Italian, German, Serbo-Croatian, Macedonian, Bulgarian, Slovene, Farsi; fair reading skills - Polish, Czech, Dutch, Esperanto, Portuguese; beginner/rusty - Swedish, Norwegian, Danish
x 590

Re: Georgian Word Freuency List

Postby mcthulhu » Thu Aug 27, 2020 3:09 am

Do you happen to have a Georgian book in epub format? Jorkens can generate a word frequency list from a book and save it to a .csv file.
0 x

tomos1729
White Belt
Posts: 41
Joined: Sat May 16, 2020 10:47 am
Languages: Cymraeg (Native)
Deutsch (Fluent)
Italiano (probably around B2/C1)
ქართული (probably around B1)
Español (maybe B1)
English (Fluent)
x 40

Re: Georgian Word Freuency List

Postby tomos1729 » Thu Aug 27, 2020 9:11 am

Longinus wrote:Here, I made you a frequency list of the 1000 most frequent words in a web-crawl of Georgian web pages.

https://drive.google.com/file/d/1SoNal0sg5dppMRsAw9SH3u0GqFjcwuBj/view?usp=sharing

The file is in .ods format for OpenOffice Calc. I think Excel will convert them, but if not, you can always download OpenOffice or LibreOffice for free and do it that way.


um, that's amazing?

thanks ^ ^
1 x

tomos1729
White Belt
Posts: 41
Joined: Sat May 16, 2020 10:47 am
Languages: Cymraeg (Native)
Deutsch (Fluent)
Italiano (probably around B2/C1)
ქართული (probably around B1)
Español (maybe B1)
English (Fluent)
x 40

Re: Georgian Word Freuency List

Postby tomos1729 » Thu Aug 27, 2020 9:14 am

genini1 wrote:I don't have one, but I can tell you that the wikipedia frequency lists are generated from all the wikipedia entries in that language which is why 'edit' is number two since 'edit' will appear multiple times in every entry. The rest of the list should be generally accurate for a scholastic word list.


ye i realised that afterwards xD
0 x

tomos1729
White Belt
Posts: 41
Joined: Sat May 16, 2020 10:47 am
Languages: Cymraeg (Native)
Deutsch (Fluent)
Italiano (probably around B2/C1)
ქართული (probably around B1)
Español (maybe B1)
English (Fluent)
x 40

Re: Georgian Word Freuency List

Postby tomos1729 » Thu Aug 27, 2020 9:15 am

mcthulhu wrote:Do you happen to have a Georgian book in epub format? Jorkens can generate a word frequency list from a book and save it to a .csv file.


i don't think so, but i'll google jorkens and epub :D
0 x

tomos1729
White Belt
Posts: 41
Joined: Sat May 16, 2020 10:47 am
Languages: Cymraeg (Native)
Deutsch (Fluent)
Italiano (probably around B2/C1)
ქართული (probably around B1)
Español (maybe B1)
English (Fluent)
x 40

Re: Georgian Word Freuency List

Postby tomos1729 » Thu Aug 27, 2020 2:51 pm

Longinus wrote:Here, I made you a frequency list of the 1000 most frequent words in a web-crawl of Georgian web pages.

https://drive.google.com/file/d/1SoNal0sg5dppMRsAw9SH3u0GqFjcwuBj/view?usp=sharing

The file is in .ods format for OpenOffice Calc. I think Excel will convert them, but if not, you can always download OpenOffice or LibreOffice for free and do it that way.



Is there any chance you could find the next couple thousand words as well? (Not that the first 1000 hasn't helped me - they've been really great! My 2000 word list only includes substantives, adjectives and adverbs, so I think only around half of the 1000 words are in my list.)

But if this is a lot of work then don't worry! (I just meant in case you put a 1000 as a "random" limit). And as I said the first 1000 words have already helped a lot! :)

Or, how difficult/easy is it to learn to do this? I think it would be quite a useful skill for me to learn :D
0 x

Longinus
Yellow Belt
Posts: 76
Joined: Tue Oct 27, 2015 6:56 pm
Location: United States
Languages: English(N)
Currently studying: Latin, Scottish Gaelic
Maintenance work: German, Russian, BCS, Albanian, Estonian
Basic level, 3-6 months worth of work each: Persian, Mongolian, Old Irish, Polish, Macedonian
Wish list: Lithuanian, Icelandic, Hungarian
x 209

Re: Georgian Word Freuency List

Postby Longinus » Fri Aug 28, 2020 1:15 am

tomos1729 wrote:
Is there any chance you could find the next couple thousand words as well? (Not that the first 1000 hasn't helped me - they've been really great! My 2000 word list only includes substantives, adjectives and adverbs, so I think only around half of the 1000 words are in my list.)

But if this is a lot of work then don't worry! (I just meant in case you put a 1000 as a "random" limit). And as I said the first 1000 words have already helped a lot! :)

Or, how difficult/easy is it to learn to do this? I think it would be quite a useful skill for me to learn :D



Unfortunately, I cannot. I am not studying Georgian myself, so I had to use a proprietary database which limits me to 1000 item downloads.

The service I am using is called Sketch Engine https://www.sketchengine.eu/ It is a wonderful corpus linguistics tool, but it is also extremely useful for language learners. I'll just give one example here for the sake of time, you can do much more. So, you can take a big novel, for example, that you want to read in just about any file format, upload it to Sketch Engine and process it. Then, you can create frequency lists using your novel (not limited to 1000 entries). You can click on individual word entries and make concordances of every sentence containing that word, and even order the sentences so that the simplest, clearest example sentences show up first. Then, of course, you can copy some of these example sentences into Anki or another SRS program. You can also look for words or phrases that appear in your novel much more frequently than they occur in a general corpus of the same language, which are obviously very useful for study. You can also examine all the co-locations for individual words, and see what other words they are most commonly associated with. Or, if you're interested in speaking rather than reading, you could just upload one of those subtitle databases in your language of interest, rather than a novel, and use that, since a subtitle database contains mostly conversations.

Anyway, it's very much worth the annual membership price of about 60 euros. I use it just about every day. It supports a very large number of languages.
0 x


Return to “Practical Questions and Advice”

Who is online

Users browsing this forum: No registered users and 2 guests