Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

General discussion about learning languages
User avatar
SpanishInput
Yellow Belt
Posts: 97
Joined: Sun Sep 26, 2021 3:11 pm
Location: Ecuador
Languages: Spanish (N), English (C2), Mandarin (HSK 5)
x 469

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby SpanishInput » Tue Sep 28, 2021 10:27 pm

Hash wrote:If you don't mind, please share the three lists of word forms for both languages.


Hi! I'm new here so I don't think I can post links yet, but I'll copy-paste the top 50 results for each language here. The first column is the raw frequency. The second column is the contextual diversity (aka "range" or "document frequency" in AntConc terms). The third column is the word form, and the last column is how much coverage you can get so far. I believe the huge difference is due to Russian affixing case markers to words (I don't know the proper grammatical term), thus creating many more word forms.

@Sfuqua: I know conjugations are supposed to be the same word, but in my experience with Spanish learners, they can be unrecognizable to some people, except advanced students. I guess when you're at an advanced level you can rely on lemmatized word lists. Thankfully German-style compound words are rare in Spanish. The RAE wants us to write ex preso (ex prisioner) as expreso (exprisioner, but also express), but I'm a rebel.

@Einzelne, thanks for commenting. I created the Russian list when I was learning a bit of Russian last year, but got disheartened by the statistics. I guess most of the words are compound words of some kind, maybe including case markers, so it's actually not that big a difference in vocabulary as it seems. BTW, something interesting about Russian is how some words are pronounced almost exactly as they are in Spanish. I would dare say some cognates are easier to recognize in spoken Russian than in spoken French, despite French being a closer relative to Spanish.

*****Spanish*****

1258295 376 que 4.37 %
1051015 376 no 8.01 %
915078 376 a 11.19 %
767170 376 de 13.85 %
622668 376 la 16.01 %
570860 376 y 17.99 %
448317 376 me 19.55 %
435842 376 es 21.06 %
410289 376 lo 22.48 %
402304 376 qué 23.88 %
381298 376 el 25.20 %
342302 376 en 26.39 %
315485 376 por 27.49 %
273007 376 se 28.43 %
265823 376 con 29.36 %
250949 376 un 30.23 %
197721 376 ya 30.91 %
193379 376 para 31.58 %
190311 376 mi 32.24 %
188555 376 una 32.90 %
173597 376 está 33.50 %
168349 376 los 34.08 %
152414 376 si 34.61 %
122822 376 las 35.04 %
121733 376 más 35.46 %
80159 376 del 35.74 %
316200 375 yo 36.84 %
217892 375 pero 37.59 %
179939 375 le 38.22 %
142582 375 eso 38.71 %
113079 375 todo 39.11 %
107742 375 como 39.48 %
99473 375 muy 39.82 %
92225 375 su 40.14 %
89402 375 o 40.45 %
87623 375 al 40.76 %
83775 375 así 41.05 %
57044 375 esta 41.25 %
35685 375 son 41.37 %
325899 374 te 42.50 %
127360 374 bien 42.94 %
101864 374 porque 43.30 %
95926 374 nada 43.63 %
84298 374 sé 43.92 %
69819 374 hacer 44.17 %
63472 374 tiene 44.39 %
58213 374 hay 44.59 %
54820 374 ahora 44.78 %
50390 374 ser 44.95 %
47713 374 mucho 45.12 %


****Russian****
48075 98 не 2.47 %
43774 98 и 4.72 %
43618 98 я 6.96 %
41715 98 в 9.10 %
29340 98 а 10.61 %
23778 98 на 11.83 %
18706 98 с 12.79 %
18584 98 да 13.75 %
15311 98 у 14.53 %
14806 98 как 15.29 %
12412 98 так 15.93 %
12180 98 все 16.56 %
11554 98 он 17.15 %
10286 98 мы 17.68 %
9873 98 меня 18.18 %
9665 98 мне 18.68 %
8592 98 за 19.12 %
8111 98 нет 19.54 %
7482 98 же 19.92 %
6780 98 из 20.27 %
6623 98 тебя 20.61 %
6496 98 к 20.95 %
6134 98 очень 21.26 %
5985 98 если 21.57 %
5562 98 она 21.85 %
5257 98 там 22.12 %
4907 98 о 22.38 %
4763 98 бы 22.62 %
4533 98 нас 22.85 %
4499 98 вас 23.08 %
4426 98 от 23.31 %
4352 98 они 23.54 %
3978 98 где 23.74 %
2477 98 ли 23.87 %
2433 98 нам 23.99 %
22121 97 ты 25.13 %
15309 97 то 25.91 %
11076 97 вы 26.48 %
5727 97 здесь 26.78 %
5724 97 сейчас 27.07 %
5357 97 тебе 27.35 %
4763 97 для 27.59 %
3921 97 вам 27.79 %
3447 97 или 27.97 %
3082 97 был 28.13 %
3013 97 тут 28.28 %
2735 97 даже 28.42 %
2501 97 быть 28.55 %
2234 97 теперь 28.67 %
1855 97 куда 28.76 %
2 x

Hash
White Belt
Posts: 33
Joined: Mon May 18, 2020 3:17 pm
Languages: Arabic (N)
x 56

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Hash » Thu Sep 30, 2021 11:33 pm

SpanishInput wrote:
Hash wrote:If you don't mind, please share the three lists of word forms for both languages.


Hi! I'm new here so I don't think I can post links yet, but I'll copy-paste the top 50 results for each language here. The first column is the raw frequency. The second column is the contextual diversity (aka "range" or "document frequency" in AntConc terms). The third column is the word form, and the last column is how much coverage you can get so far. I believe the huge difference is due to Russian affixing case markers to words (I don't know the proper grammatical term), thus creating many more word forms.


Send the links to me in a private message, and I'll post them here. Alternatively, you can ... [EDITED: I suggested another solution, but it doesn't comply with the rules of the forum]

By the way, I don't understand the 4th column. The coverage for "por" is 27.49%? How come?
Last edited by Hash on Fri Oct 01, 2021 6:20 pm, edited 1 time in total.
0 x

tacerto1018
Yellow Belt
Posts: 53
Joined: Mon May 06, 2019 10:00 pm
Languages: MA - French
Currently studying - Icelandic
Studied in the past to different levels - Portuguese, Norwegian, Italian, Spanish, Russian, Japanese, Dutch, German
Language Log: https://forum.language-learners.org/vie ... 15&t=15400
x 97

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby tacerto1018 » Fri Oct 01, 2021 1:01 am

smallwhite wrote:
Agglutiv...ness would be a factor as well but I'm not familar with that.

I think you're looking for agglutination
1 x

Dragon27
Blue Belt
Posts: 616
Joined: Tue Aug 25, 2015 6:40 am
Languages: Russian (N)
English - best foreign language
Polish, Spanish - passive advanced
Tatar, German, French, Greek - studying
x 1375

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Dragon27 » Fri Oct 01, 2021 5:00 am

Hash wrote:By the way, I don't understand the 4th column. The coverage for "por" is 27.49%? How come?

SpanishInput wrote:the last column is how much coverage you can get so far

i.e. including all the previous words.
1 x

User avatar
rdearman
Site Admin
Posts: 7231
Joined: Thu May 14, 2015 4:18 pm
Location: United Kingdom
Languages: English (N)
Language Log: viewtopic.php?f=15&t=1836
x 23125
Contact:

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby rdearman » Fri Oct 01, 2021 6:37 am

Hash wrote:justpaste DOT it / antconc_lists
(replace DOT with . and remove the spaces)

I DOT will DOT delete DOT those DOT messages. So don't try to circumvent the rules because I will delete the message and possibly ban you from the forum.

A legitimate way would be to PM the administrator and ask permission and they can make an exception you. Circumventing the rules will get you in trouble. Don't do it.
3 x
: 0 / 150 Read 150 books in 2024

My YouTube Channel
The Autodidactic Podcast
My Author's Newsletter

I post on this forum with mobile devices, so excuse short msgs and typos.

User avatar
pityus
Posts: 8
Joined: Sat Oct 05, 2019 9:52 pm
Location: Biel/Bienne - Samedan, CH
Languages: i can read Harry Potter books in Russian, English and German. maybe French too, someday.
x 11

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby pityus » Fri Oct 01, 2021 10:27 am

Hash wrote:For example, it doesn't differentiate between "last", "latest", "latter", "final", "ultimate", etc. they are all "posledny"!


well, choose your sources carefully... some synonyms of "last" are listed here:

https://kartaslov.ru/%D1%81%D0%B8%D0%BD ... 0%B8%D0%B9
2 x

Hash
White Belt
Posts: 33
Joined: Mon May 18, 2020 3:17 pm
Languages: Arabic (N)
x 56

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Hash » Fri Oct 01, 2021 6:23 pm

rdearman wrote:I DOT will DOT delete DOT those DOT messages.

Sorry for suggesting him that solution. I thought it was a technical issue and didn't realize it is an intended policy.
1 x

User avatar
SpanishInput
Yellow Belt
Posts: 97
Joined: Sun Sep 26, 2021 3:11 pm
Location: Ecuador
Languages: Spanish (N), English (C2), Mandarin (HSK 5)
x 469

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby SpanishInput » Fri Oct 01, 2021 10:20 pm

Hi, yes, it's cumulative coverage, not per-word coverage.
And yes, it's because of forum rules. I read them all even before creating an account.
I believe a raw list without context or explanations isn't that useful for a learner, anyway, unless one is only looking to plug holes in one's vocabulary. I also prefer to keep the full Spanish list private for now, as I'm using it... for something.

BTW, you can find similar word lists elsewhere on the net. For example, the Spanish Royal Academy has a list called "CREA", based on texts from the 1975 to 2004. It's a frequency list with no contextual diversity information. 50% of the texts come from Spain and 50% come from America (the continent, not the country). CREA even has an interface that allows you to search the corpus, and this can help you identify when a word is only used in certain countries.

Another list you can find is the Subtlex list. Subtlex is a project or movement that involved several researchers creating word lists from movie subtitles for several languages. It was inspired by Paul Nation's work. The English subtlex list and the Chinese subtlex list do contain contextual diversity information.

With any of these lists and a bit of Excel-fu you can calculate how many word forms you need to reach a certain coverage in any language, at least for the domain (movies, books, news) the list is related to.
1 x

User avatar
FyrsteSumarenINoreg
Yellow Belt
Posts: 90
Joined: Fri Jan 01, 2016 10:10 am
Location: Adriatic
Languages: Croatian (N), proficient in Brazilian Portuguese, fluent in English (C1 IELTS band 8.0), conversant in Italian and Spanish, learning Norwegian Nynorsk, Bengali & Malayalam
x 56

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby FyrsteSumarenINoreg » Fri Oct 08, 2021 8:41 pm

tungemål wrote:The Norwegian naob.no has 225,000 words, so we beat Russia and France! And I have even searched for words in that dictionary that they haven't included.

And Norsk Ordbok has 330,000 words. ;)
http://no2014.uib.no/perl/ordbok/no2014.cgi
1 x

User avatar
FyrsteSumarenINoreg
Yellow Belt
Posts: 90
Joined: Fri Jan 01, 2016 10:10 am
Location: Adriatic
Languages: Croatian (N), proficient in Brazilian Portuguese, fluent in English (C1 IELTS band 8.0), conversant in Italian and Spanish, learning Norwegian Nynorsk, Bengali & Malayalam
x 56

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby FyrsteSumarenINoreg » Fri Oct 08, 2021 8:44 pm

Is there a corpus of ''everyday speech'' at all? The only one I know of, in the case of English, is CORPUS OF AMERICAN SOAP OPERAS.
https://www.english-corpora.org/soap/
2 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: No registered users and 2 guests