Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Continue or start your personal language log here, including logs for challenge participants
User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4879

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby smallwhite » Wed Apr 18, 2018 2:47 pm

Bex wrote:Does anyone know roughly how many Spanish words you need to know to be around a high B1 level, which is where I'd like to be at the end of the year.

I can't parse the description "high B1 level" but I've read part of HP1. What % of known words do you want to be at with HP1? 90% or 95% or?

http://eurosla.org/monographs/EM01/211-232Milton.pdf
Original document page 224, pdf file page 14.
0 x
Dialang or it didn't happen.

User avatar
Bex
Blue Belt
Posts: 562
Joined: Thu Sep 15, 2016 7:10 am
Languages: English (N), Spanish (A2)
Language Log: https://forum.language-learners.org/vie ... 77#p157977
x 1538

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby Bex » Wed Apr 18, 2018 3:12 pm

smallwhite wrote:
Bex wrote:Does anyone know roughly how many Spanish words you need to know to be around a high B1 level, which is where I'd like to be at the end of the year.

I can't parse the description "high B1 level" but I've read part of HP1. What % of known words do you want to be at with HP1? 90% or 95% or?

http://eurosla.org/monographs/EM01/211-232Milton.pdf
Original document page 224, pdf file page 14.

I would like to know 90% which I believe is 5,118.3 words (90% of 5,687)

According to http://eurosla.org/monographs/EM01/211-232Milton.pdf
B1 Preliminary English Test 2750 - 3250
B2 First Certificate in English 3250 - 3750
C1 Cambridge Advanced English 3750 - 4500
C2 Cambridge Proficiency in English 4500 - 5000

So I would say 3250 is a high B1/Low B2 but this is English and not Spanish isn't it? I assume there is no data for Spanish.

Interesting to think that HP1 in English has 5,687 unique words and that would be classed as C2 on the CEFR scale if you knew every word or do the two not correlate like that?
0 x
Kwiziq
A0: 100 / 100
A1: 100 / 100
A2: 100 / 100
B1: 91 / 100
B2: 53 / 100

User avatar
eido
Blue Belt
Posts: 842
Joined: Tue Jan 30, 2018 8:31 pm
Languages: English (N), Spanish (C1)
x 3189

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby eido » Wed Apr 18, 2018 4:08 pm

Does unique word mean like, one instance of "on" is 'unique', the second isn't? Or do they count phrasal verbs in the mix?

I thought a rough estimate for any language could be this:
A1 = 500
A2 = 1,000
B1 = 2,000
B2 = 4,000
C1 = 8,000
C2 = 16,000

Source here.

I've read a few times too that it depends on what words you're focusing on, too. Like if you're trying to learn the phrase "he got on the horse and rode away" it might be more applicable to a B1 level than "he approached the fetching young lass and laid upon her a sweet, practiced kiss", but a B1 might know it. You can have different skills. I know I know some words that I probably shouldn't for my level, but I needed to know them to understand the text I was reading. What say you, or y'all?
0 x

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4879

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby smallwhite » Thu Apr 19, 2018 4:23 am

Bex wrote:
Interesting to think that HP1 in English has 5,687 unique words and that would be classed as C2 on the CEFR scale if you knew every word or do the two not correlate like that?

1. "5,687 unique words and that would be classed as C2 on the CEFR scale" is not true. That's not what the table represents. The document says "Milton and Meara (2003) tested students taking and passing Cambridge exams at every level of the CEFR and estimated their vocabulary sizes using the XLex tests". That is, the students who got C2 happened to have vocabs of ~5000. They could well happen to be nurses or happen to like Metallica but none of that is related to C2.

2. 5,687 unique words likely means comería comerías comería comeríamos comeríais comerían get counted as 6 unique words. In the document they likely count differently.

Bex wrote:
I would like to know 90% which I believe is 5,118.3 words (90% of 5,687)

1.
90% wouldn't be 90% x 5,687 because some words appear more than others. Easy words like "que" would appear many times. Knowing 5,118 words would most likely produce a known words figure higher than 90%.

2.
HTLAL thread "Experimenting with French word frequency" by emk, Message 48 of 55.

The pic is missing but it says 90% coverage = 4117 words. Definition of "word" can likely be found earlier in that thread.

3.
I don't have data for HP1 at 90%. I started reading later. Some of my data here: Spanish Group, page 23.
You will know far more English-Spanish cognates than I do.

4.
For Greek, which has far fewer cognates with English, after studying 4305 flashcards, I knew 88.9% of the words of my first translated crime fiction for adults. My flashcards had headwords (comer rather than comeríamos). The words were roughly 50% from courses and shared decks, 25% from LingQ, 25% extracted from reading non-fiction. I know more than what's in my flashcards.

Bex wrote:Clozemaster...

I would like to get through the 3000 top words and then I hope my reading comprehension will be better.

I am not sure why but I find it so much more enjoyable than actual reading. I am viewing it like a graded reader...

Please bear in mind that the way Clozemaster is designed, when you do 3000 top words, you are only ever exposed to 3000 top words, no 3001th or 5999th at all.
Last edited by smallwhite on Thu Apr 19, 2018 7:10 am, edited 3 times in total.
0 x
Dialang or it didn't happen.

User avatar
Bex
Blue Belt
Posts: 562
Joined: Thu Sep 15, 2016 7:10 am
Languages: English (N), Spanish (A2)
Language Log: https://forum.language-learners.org/vie ... 77#p157977
x 1538

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby Bex » Thu Apr 19, 2018 6:46 am

eido wrote:Does unique word mean like, one instance of "on" is 'unique', the second isn't? Or do they count phrasal verbs in the mix?

I thought a rough estimate for any language could be this:
A1 = 500
A2 = 1,000
B1 = 2,000
B2 = 4,000
C1 = 8,000
C2 = 16,000

Source here.

I've read a few times too that it depends on what words you're focusing on, too. Like if you're trying to learn the phrase "he got on the horse and rode away" it might be more applicable to a B1 level than "he approached the fetching young lass and laid upon her a sweet, practiced kiss", but a B1 might know it. You can have different skills. I know I know some words that I probably shouldn't for my level, but I needed to know them to understand the text I was reading. What say you, or y'all?

I know many words which are above B1 level but then I find the levels in themselves very confusing. I can't speak well still but I'm working on it and I know that I understand a lot of words but I can't produce them...I am still at a basic level for speaking, I don't have much trouble with general conversational vocabulary but I do struggle with grammar, especially verb conjugations.

Similar to you I have learned words I probably shouldn't because I've needed to but now I have great big holes in my basic comprehension/vocabulary. I am hoping that learning these 3000 words on Clozemaster will fill some of the holes. There were words in the 500-1000 secrion that I didn't know (subjunctive :roll:) so it's definitely covering areas that I really should know by now rather than continuing to learn stuff which I'll rarely see again.
0 x
Kwiziq
A0: 100 / 100
A1: 100 / 100
A2: 100 / 100
B1: 91 / 100
B2: 53 / 100

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4879

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby smallwhite » Thu Apr 19, 2018 7:01 am

Deleted and reposted below.
Last edited by smallwhite on Thu Apr 19, 2018 9:14 am, edited 2 times in total.
1 x
Dialang or it didn't happen.

User avatar
Bex
Blue Belt
Posts: 562
Joined: Thu Sep 15, 2016 7:10 am
Languages: English (N), Spanish (A2)
Language Log: https://forum.language-learners.org/vie ... 77#p157977
x 1538

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby Bex » Thu Apr 19, 2018 7:09 am


Very interesting read....

smallwhite wrote:Please bear in mind that the way Clozemaster is designed, when you do 3000 top words, you are only ever exposed to 3000 top words, no 3001th or 5999th at all.

Yes and this is exactly the narrow view I want at the moment, I have gaping holes in what would be considered beginner levels and little language learning experience and so no idea how to fill them.

I know many words outside the 3000 level after nearly 4 years of living in Spain but my basics are bad, I feel like an idiot at times...I know people would love to be in my situation.

A frequency list and iTalki lessons combined seems like a good way to fill these holes without learning conjugations which are rarely used.

If you or anyone knows a better way please do tell, I'd love to know.
0 x
Kwiziq
A0: 100 / 100
A1: 100 / 100
A2: 100 / 100
B1: 91 / 100
B2: 53 / 100

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4879

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby smallwhite » Thu Apr 19, 2018 9:14 am

I deleted my original post as I have overhauled the calculations.

I inserted a paragraph into a previous post to say that knowing 90% of the unique words doesn't mean knowing 90% of the words in the book (the figure we usually mean).

Bex wrote:
Also I didn't have the unique words figure and so that's really useful to know...seems I had better aim for 6,000 rather than 3,000.

I analysed what I think is Chapter 1 of HP1 in Spanish; the first 4828 words anyway. I compared it with the 6000 most frequent Spanish words from movie subtitles from Wikitionary (which may not be the frequency list that Clozemaster uses).

Total words in Chapter 1 including inflected words = 4828 words
Unique words including inflected words = 1366

857 (62.7%) of these were within the top 6000 freq list,
033 (02.4%) of these were English words,
476 (34.8%) of these were rare words beyond the top 6000 freq list.
~> 857 + 033 + 476 = 1366 unique words in Chapter 1

The 857 words, being frequent words, appeared more frequently in the book than the 476 rarer words.

The 857 freq unique words made up 4019 words (83.2%) in Chapter 1,
The 033 Eng unique words made up 0241 words (05.0%) in Chapter 1,
The 476 rare unique words made up 0568 words (11.8%) in Chapter 1.
~> 4019 + 0241 + 0568 = 4828 words in Chapter 1

So, if you knew the most frequent 6000 Spanish words and nothing else, plus English, you would know 83.2 + 5.0 = 88.2% of the words in Spanish HP1 Chapter 1.
0 x
Dialang or it didn't happen.

User avatar
Bex
Blue Belt
Posts: 562
Joined: Thu Sep 15, 2016 7:10 am
Languages: English (N), Spanish (A2)
Language Log: https://forum.language-learners.org/vie ... 77#p157977
x 1538

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby Bex » Thu Apr 19, 2018 9:27 am

smallwhite wrote:
Bex wrote:
Also I didn't have the unique words figure and so that's really useful to know...seems I had better aim for 6,000 rather than 3,000.

I inserted a paragraph into my previous post to say that knowing 90% of the unique words doesn't mean knowing 90% of the words in the book (the figure we usually mean).

-

I analysed what I think is Chapter 1 of HP1 in Spanish; the first 4828 words anyway. I compared it with the 6000 most frequent Spanish words from movie subtitles from Wikitionary (which may not be the frequency list that Clozemaster uses).

Total words in Chapter 1 including inflected words = 4828 words
Unique words including inflected words = 1366

857 (62.7%) of these were within the top 6000 freq list,
509 (37.3%) of these were not (ie. were rarer).
857 + 509 = 1366 unique words in Chapter 1

The 857 words, being frequent words, appeared more frequently in the book than the 509 words.

The more frequent 857 unique words made up 4019 words (83.2%) of Chapter 1,
The less frequent 509 unique words made up 809 words (16.8%) of Chapter 1.
4019 + 809 = 4828 words in Chapter 1

So, if you knew the most frequent 6000 Spanish words and nothing else, you would know 83.2% of the words in Spanish HP1 Chapter 1.

Thanks smallwhite, it appears that I have been looking at these frequency lists from completely the wrong angle.

I wonder if there is much difference between the 83.2% you would know in the top 6000 words compared to how much % coverage you would get from just the top 3000 words?

ETA: I was surprised that only 83% was covered by the 6000 most frequently used words but then I realised your comparison is between subtitles and literature, I am assuming that this is because there are verb forms in the book that simply aren't used in speech.

Also ETA: Please feel free to correct any and all of my assumptions/opinions, anywhere on this forum. I know not what I speak of and I am never easily offended.
Last edited by Bex on Thu Apr 19, 2018 9:47 am, edited 1 time in total.
0 x
Kwiziq
A0: 100 / 100
A1: 100 / 100
A2: 100 / 100
B1: 91 / 100
B2: 53 / 100

User avatar
smallwhite
Black Belt - 2nd Dan
Posts: 2386
Joined: Mon Jul 06, 2015 6:55 am
Location: Hong Kong
Languages: Native: Cantonese;
Good: English, French, Spanish, Italian;
Mediocre: Mandarin, German, Swedish, Dutch.
.
x 4879

Re: Bex's Spanish log 2018. Chapter 2: Improbable ambitions

Postby smallwhite » Thu Apr 19, 2018 9:35 am

Bex wrote:
I wonder if there is much difference between the 83.2% you would know in the top 6000 words compared to how much % coverage you would get from just the top 3000 words?

Excellent question!

6k => 857 unique words made up 4019 words (83.2%) of Chapter 1
3k => 710 unique words made up 3786 words (78.4%) of Chapter 1

857 - 710 = 147
4019 - 3786 = 233

The 5% from English words mentioned in my last post remains 5%.
0 x
Dialang or it didn't happen.


Return to “Language logs”

Who is online

Users browsing this forum: No registered users and 2 guests