Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

General discussion about learning languages
User avatar
QueenBee
White Belt
Posts: 20
Joined: Tue Sep 14, 2021 11:32 am
Languages: English (N), Russian (N), French (advanced), Japanese (low N1 ?), Hebrew (~B2), Persian (just started)

Previously learned or dabbled in: Spanish, Thai, Palestinian Arabic
x 89

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby QueenBee » Thu Sep 23, 2021 6:47 am

Hebrew is the only language I've studied (to an advanced level) that really seemed to have "less" vocabulary. There are probably a few reasons for this:

1. Like other Semitic languages, Hebrew creates words from 3-letter roots. There is a lot of redundancy in vocabulary and a lot of "high-level" vocabulary is visibly similar to simple, everyday vocabulary.

2. Hebrew was also dead for close to 2,000 years. There hasn't been a community of speakers that has developed the language, played with it, created words in it, etc. the way that there has been for other languages.

So, if OP really struggles with learning vocabulary, and is looking for an "easy" language to learn in that regard, then Hebrew is one option. (Hebrew would also be easy for OP, since his/her native language is Arabic...)

Otherwise, your best bet would be a language spoken in a poor country, where most people never go to university. People in this scenario would probably stick to "simple" vocabulary, even if their language (theoretically) offers more. Any major world language with a literary tradition is going to be rich in vocabulary, although of course, you can find people who stick to simpler words.
2 x

Hash
White Belt
Posts: 33
Joined: Mon May 18, 2020 3:17 pm
Languages: Arabic (N)
x 56

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Hash » Thu Sep 23, 2021 3:53 pm

Let's take a look at large dictionaries of different languages:

Webster's International Dictionary contains around half a million English entries and English Wiktionary has around one million. On the other hand, the largest Russian dictionaries like the Great Academy Dictionary of Russian langauge (Большой академический словарь русского языка) has only 150,000 entries or less, the famous Spanish dictionary by La Real Academia Española has 93,000 entries, the Gran Diccionari de la llengua catalana has 88,500 entries, and the Dictionnaire de l’Académie française has only 55,000 entries!

Doesn't this mean anything to you?
1 x

vonPeterhof
Blue Belt
Posts: 879
Joined: Sat Aug 08, 2015 1:55 am
Languages: Russian (N), English (C2), Japanese (~C1), German (~B2), Kazakh (~B1), Norwegian (~A2)
Studying: Kazakh, Mandarin, Coptic
Language Log: viewtopic.php?f=15&t=1237
x 2833
Contact:

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby vonPeterhof » Thu Sep 23, 2021 5:03 pm

Hash wrote:Let's take a look at large dictionaries of different languages:

Webster's International Dictionary contains around half a million English entries and English Wiktionary has around one million. On the other hand, the largest Russian dictionaries like the Great Academy Dictionary of Russian langauge (Большой академический словарь русского языка) has only 150,000 entries or less, the famous Spanish dictionary by La Real Academia Española has 93,000 entries, the Gran Diccionari de la llengua catalana has 88,500 entries, and the Dictionnaire de l’Académie française has only 55,000 entries!

Doesn't this mean anything to you?

The English Wikipedia article on the English language used to have a subsection on the number of words in English, and there was a paragraph in there that I like to quote, with all the internal and external links intactt, whenever this subject comes up.

Comparisons of the vocabulary size of English to that of other languages are generally not taken very seriously by linguists and lexicographers. Besides the fact that dictionaries will vary in their policies for including and counting entries,[88] what is meant by a given language and what counts as a word do not have simple definitions. Also, a definition of word that works for one language may not work well in another,[89] with differences in morphology and orthography making cross-linguistic definitions and word-counting difficult, and potentially giving very different results.[90] Linguist Geoffrey K. Pullum has gone so far as to compare concerns over vocabulary size (and the notion that a supposedly larger lexicon leads to "greater richness and precision") to an obsession with penis length.[91]
10 x

Cavesa
Black Belt - 4th Dan
Posts: 4960
Joined: Mon Jul 20, 2015 9:46 am
Languages: Czech (N), French (C2) English (C1), Italian (C1), Spanish, German (C1)
x 17566

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Cavesa » Thu Sep 23, 2021 7:24 pm

Thanks, vonPeterhof, this was probably the best way to explain this.

But we should also not forget that there can be differences even within the language. Even my rather small Le Robert contains 90000 words, while Academie (mentioned by Hash) "only" 55000, so none of them is clearly the limit of all the French vocabulary, or all the everyday French vocabulary.

Hash wrote:Doesn't this mean anything to you?


It means that there are clearly many various methods you can use to put together a dictionary and I find that pretty interesting.

smallwhite wrote:So we think that the size of English’s everyday vocabulary = that of some Amazon hunting tribe’s everyday vocabulary?


Not at all. As I've written, languages that don't fulfill all the functions are clearly likely to have a smaller vocabulary. But that is definitely not the case of Spanish, Russian, or Turkish.

QueenBee wrote:Otherwise, your best bet would be a language spoken in a poor country, where most people never go to university. People in this scenario would probably stick to "simple" vocabulary, even if their language (theoretically) offers more. Any major world language with a literary tradition is going to be rich in vocabulary, although of course, you can find people who stick to simpler words.


A much more common situation is a language of a country, that uses a different (usually colonial) language for higher education or all the education. But it may still be very rich in the "everyday language" area. Really, it all depends on what does the learner consider "everyday vocabulary".
2 x

User avatar
Axon
Blue Belt
Posts: 775
Joined: Thu Jun 16, 2016 12:29 am
Location: California
Languages: Native English, in order of comfort: Mandarin, German, Indonesian,
Spanish, French, Russian,
Cantonese, Vietnamese, Polish.
Language Log: viewtopic.php?f=15&t=5086
x 3288

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Axon » Fri Sep 24, 2021 6:29 am

I took a stab at answering the original question in a scientific way, albeit with some big caveats. I found Turkish, Russian, and Spanish subtitles for three world-famous Chinese movies, the idea being that translators working from a totally different culture and language wouldn't be tempted to use cognates.

I tried for a while to find a stemmer/lemmatizer for Turkish but ended up taking the easy route and simply counting all words for all languages in their conjugated forms, taking the very big logical leap that all three are inflected in some way or another. I have no idea what kind of mental processing speed is involved with parsing inflected words in each language - is recognizing a novel Turkish compound word just as easy for the brain as recognizing a Russian noun in a case you've never seen it in before, or a Spanish verb in a tense you've never seen it before?

Averaging the unique and total word counts for all three movies, I found that Turkish used 2512 unique words on average to translate each film, while Russian needed 2186 and Spanish just 1896. Turkish and Russian used just about the same number of words in total (roughly 5900), while Spanish used its more limited lexicon more frequently, reaching 7192 words on average per film.

These results would indicate that one can understand dialogue in Spanish with a smaller vocabulary than would be needed for equivalent dialogue in Russian or Turkish, with Turkish requiring the most words by a slight amount. Again, this counts conjugations/declensions as equally difficult to learn as new word roots. More study is needed!
9 x

User avatar
tungemål
Blue Belt
Posts: 947
Joined: Sat Apr 06, 2019 3:56 pm
Location: Norway
Languages: Norwegian (N)
English, German, Spanish, Japanese, Dutch, Polish
Language Log: https://forum.language-learners.org/vie ... 15&t=17672
x 2181

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby tungemål » Fri Sep 24, 2021 9:45 am

Hash wrote:Let's take a look at large dictionaries of different languages:

Webster's International Dictionary contains around half a million English entries and English Wiktionary has around one million. On the other hand, the largest Russian dictionaries like the Great Academy Dictionary of Russian langauge (Большой академический словарь русского языка) has only 150,000 entries or less, the famous Spanish dictionary by La Real Academia Española has 93,000 entries, the Gran Diccionari de la llengua catalana has 88,500 entries, and the Dictionnaire de l’Académie française has only 55,000 entries!

Doesn't this mean anything to you?

The Norwegian naob.no has 225,000 words, so we beat Russia and France! And I have even searched for words in that dictionary that they haven't included.

I think it's clear that English has more words than other languages, but that doesn't mean that the average everyday vocabularies are any different.

I guess there is one main reason for the huge English vocabulary:
- English is used in far more domains than any other language. (in probably all domains). For instance computer programming: the default language is English, and not all languages have bothered to invent new words but just use the English words.

However there might be more english everyday words: Because of English heavy import of French/Latin -derived words there are often dublets. One example:
unbelievable (germanic) / incredible (latin).
They mean exactly the same so it's totally superfluous. While in German you've got "unglaublich" and in Spanish "increíble".

On the other hand English doesn't have so many conjugations as the other European languages.
3 x

User avatar
SpanishInput
Yellow Belt
Posts: 97
Joined: Sun Sep 26, 2021 3:11 pm
Location: Ecuador
Languages: Spanish (N), English (C2), Mandarin (HSK 5)
x 469

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby SpanishInput » Sun Sep 26, 2021 9:26 pm

Hi! New here. I can offer a few numbers for Russian and Spanish.

I gathered Russian language subtitles from YT channels and movies and TV shows on both YT and Netflix. I grouped them in 100 TXT files. Each TXT file represents one "context". For example, one YT channel, one movie or one TV shows (all episodes in one file).

I did the same with Spanish. I gathered more than 5,000 subtitle files from movies and TV shows, mostly from Netflix, but also a few from YT. I then merged these files into around 400 TXT files. Again, each TXT file represents one "context". One movie or one TV show (all episodes merged together).

I tried to only gather subtitles from native TV shows and movies. For example, I have "La casa de papel (Money Heist)" on my Spanish corpus, but I don't have "Fuller House". Word frequency lists you find online were obviously gathered, in a big proportion, from translated content, as they often have English names among the top words. I didn't want this for my lists. I wanted only native content, a "truer" reflection of the language.

I then loaded these TXT files into AntConc, a corpus analysis tool. I wanted to sort words by contextual diversity instead of frequency. Why? Because when I tried sorting by frequency, 300+ episode telenovelas ended up having too much of an influence due to the sheer number of repetitions of certain words and names. So, for example, «Yo soy Betty, la fea» and «El señor de los cielos» ended up having too much of an influence in the results. Also, years ago I found a paper that explained that contextual diversity, instead of frequency, is a better predictor of whether a native speaker can quickly recognize a word or not (aka "lexical decision time" experiments). It's just common sense that as a beginner you might want to focus on words that appear in a wide variety of telenovelas and movies instead of those that appear a lot in a few telenovelas. I also found that using contextual diversity (aka "range" in AntConc) helps keep proper names away from the top words.

As for what a "word" is, I prefer to use wordforms instead of lemmas or word families. Why? Because it's easier to work with them, and because once you decide to work with lemmas things start to become fuzzy: Should "being" be considered just another form of "to be"? Should "querido" be considered just another form of "querer"? You avoid these ambiguities by only working with wordforms. Besides, the software to lemmatize is just not there yet. The guys at Language Reactor (formerly LLN) used a lemmatizer that even resulted in lots of made-up Spanish words for the frequency list they used behind their extension's word highlighting feature. I hope they have solved it.

Another reason for working with wordforms is that, IMHO, you can't consider yourself fluent until you can retrieve each wordform from your memory as a whole unit, and not as something you must try to calculate on the fly following conjugation rules and adding prefixes/suffixes/pronouns/case markers.

So, answering the OP question, let's say you want to reach 80% coverage in each language. Enough for basic conversations and to watch telenovelas with the help of a popup dictionary and a translation.

In Spanish, you can achieve this level with the top 1,148 most common (not most frequent, mind you) word forms.
In Russian, you need the top 7,453 most common word forms to achieve the same level.

If you want to reach 90% coverage, which will make for a lot more enjoyable TV watching experience (but still intensive if you want to understand everything), you'll need:

5,068 words in Spanish
25,885 words in Russian.

To reach 95% coverage:

13,960 words in Spanish
53,400 words in Russian.
7 x

Hash
White Belt
Posts: 33
Joined: Mon May 18, 2020 3:17 pm
Languages: Arabic (N)
x 56

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby Hash » Mon Sep 27, 2021 10:54 pm

SpanishInput wrote:In Spanish, you can achieve this level with the top 1,148 most common (not most frequent, mind you) word forms.
In Russian, you need the top 7,453 most common word forms to achieve the same level.

If you want to reach 90% coverage, which will make for a lot more enjoyable TV watching experience (but still intensive if you want to understand everything), you'll need:

5,068 words in Spanish
25,885 words in Russian.

To reach 95% coverage:

13,960 words in Spanish
53,400 words in Russian.


This is a huge difference between Spanish and Russian.. I didn't expect that!

If you don't mind, please share the three lists of word forms for both languages.
1 x

User avatar
sfuqua
Black Belt - 1st Dan
Posts: 1642
Joined: Sun Jul 19, 2015 5:05 am
Location: san jose, california
Languages: Bad English: native
Samoan: speak, but rusty
Tagalog: imperfect, but use all the time
Spanish: read
French: read some
Japanese: beginner, obsessively studying
Language Log: https://forum.language-learners.org/vie ... =15&t=9248
x 6299

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby sfuqua » Tue Sep 28, 2021 1:26 am

I think we are back to the "what is a word?" question. :D
Interesting numbers, though. I guess that proves that one needs to learn the grammatical structure of a language like Russia to get very far with the language, so that one doesn't have to learn all of those lexical items as unrelated words. :o
Are compound words words?
If you know the language affixes are transparent, why not treat them as separate words?
I mean it is very complicated.

As far as how many words people know in their native language---

A few years ago Iversen, a nonnative speaker of English, took a vocabulary test in English. I took the same test and then kept the results very quiet. :o
I am a native speaker of English with postgraduate degrees in linguistics and computer science who has worked decades in a professional field.
Iversen, the nonnative speaker, scored higher than me in English vocabulary.
Oh well... :lol:
6 x
荒海や佐渡によこたふ天の川

the rough sea / stretching out towards Sado / the Milky Way
Basho[1689]

Sometimes Japanese is just too much...

User avatar
einzelne
Blue Belt
Posts: 804
Joined: Sat Mar 17, 2018 11:33 pm
Languages: Russan (N), English (Working knowledge), French (Reading), German (Reading), Italian (Reading on Kindle)
x 2882

Re: Which language has less vocabulary in everyday speech: Russian, Turkish, or Spanish?

Postby einzelne » Tue Sep 28, 2021 2:22 am

SpanishInput wrote:In Spanish, you can achieve this level with the top 1,148 most common (not most frequent, mind you) word forms.
In Russian, you need the top 7,453 most common word forms to achieve the same level.


One can criticize your methodological approach from many perspectives (and, yes, that pesky question "what counts as a word" again! You don't avoid all all ambiguities: even if you can conjugate laisser (let) and pousser (push) without even thinking, still, it doesn't guarantee that it will help you with understanding of a phrase laisser pousser — to grow a beard).

But if we bracket all philosophical debates on what counts as a word and start to approach language learning pragmatically, I think the upshot of your analysis is not really that one "needs to know" 7k or 50k words in order to be fluent in Russian, but rather that you need to pay more attention to grammar when you deal with synthetic languages.
6 x


Return to “General Language Discussion”

Who is online

Users browsing this forum: bombobuffoon, nathancrow77, s_allard and 2 guests