Learning a language (at least in its initial phase) all comes to learning new vocabulary. And one of the first questions you want to ask yourself to downsize this overwhelming task is how many words you need (really need) to learn in order to understand and speak a foreign language. The answer may be not as formidable as you thought.
How many words is there in a language?
Counting the number of words in a language, it’s a pain in the butt.
First of all, all dictionaries have a different number of words. For example, Merriam-Webster has proudly announced that in has 470000 entries. Oxford English Dictionary seems to have “just” 171,476 words. However, both insist that English has around a million of distict words.
“Why aren’t they in dictionaries”, you’d ask. And the answer is simple: these words incude all sorts of slang, dialectisms (region-specific words), technical terms, names of chemical elements: benzoapyrene, methulene diphenyl diisocyanate and other things you probably don’t want to hear about.
Same things are with foreign languages. Trésor de la langue française noted 539,413 disctinct entries for the French language. But the cunning Tresor actually included a bunch of inflections like “belle-belles“, and they certainly do not add up anything to the meaning. Nevertheless, the estimated number of French words is around 1,200,000. Same with Spanish, as Diccionario de la lengua española has just 88,000 entries, while the real spoken language out there in all Hispanosphere is x10 richer. So let’s give up this task.
What do you think, how many words you need for your everyday linguistic needs? No more than few thousand.
Even Shakespeare didn’t use more than a 31,534 words in all his poems.
So you’re fine.
How many words you need to speak some French: A case study.
312000 words, is it a lot or not?
(Ok, just for a quick reference: it’s something like The Lord of Rings)
In any case, an original French study of a speech corpus of 312,000 words revealed that just 8,000 of them were different. However, about a half of of these 8000 words (4564) occured in a speech no more than three times.
But, hey, it’s only a half of the story.
On the other side, the first 38 most frequent words showed up no more no less 157,943 times. It corresponds to 50,6%. Half of French speech consist just of 38 words. Here’s the famous list:
And as you can see, there’s not so much sense in any of these 38 words. Personally, I see a bunch of morphemes: preposition, pronouns and a couple of verbes. Just dry sheer grammar.
However, applause to Dr. Dedier (1964) for this meticulous work.
What is important, however, is what we can get out of this data. Here are 2 conclusions:
- You need just 8000 words to keep going for days and night. Oh come on, 300000 words equals to a 30 hours audiobook;
- You use 38 words for the half of the time you speak.
What is the core vocabulary
Another french linguist, Paul Rivenc, conducted a similar study of (again) the spoken French. However, his findings were organazed in a more beautiful way:
This “sun” is basically a representation of a language from the lexical point of view. Vocabulary was found to consist of five major parts:
As a previous study has shown, language nucleus consists of a very small number of grammatical words. Determinants (a, the), auxiliary verbs (have, be, do), pronouns of all sorts (I, you, his, mine), prepositions (in, at) and so on. These words will show up in every single sentence because this is what they do: they help to build a grammatical one.
However, as you can guess even a hundred of these nucleus morphemes won’t let you to create a meaningful thought. Without an inflow of fresh words you will hit the max line fairly soon. This is mine, not yours, got it?
Only 1000-2000 words show up frequently enough to be counted as a core of a language.
However, not all high-frequency words are “service” ones. Those notorious grammatical words would consist just 25% of the core. The rest would be discributed between nouns (40%), verbs (22%) and adjectives (11%). They are indispensable. You won’t be able to go for a day without using them.
Core vocabulary is simply popular and useful words that appear a lot in our speech. They are not specific at all. All they can refer to are very general and vague ideas as “to like”, “to want”, “to have”, “to come” or “to work”. Or, when it comes to nouns, to fairly common thing like “day”, “thing”, “home”, “water”, “life” or “people”. People are a fairly common thing in this life, aren’t they?
By the way, all nasty things like irregular verbs, the remaining of old declensions and so on show up on this level. So if something is irregular, damn, you have to learn it.
Frequent vocabulary (Rays)
Here, things become complicated.
“Ray” vocabulary slips from our mouth all the time but… but it’s domain-specific. Do you know how many words you need, for example, to ask for directions, to buy stuff in a grocery store, or to order something eatable in a foreign language? The range of things you have to know how to name is really immense:
- types of transport: a car, a bus, a taxi;
- directions: left, right, and a Toronto-specific “Nord-West”;
- time: tomorrow, Saturday, in a week;
- eatable stuff: 1 kg of tomates, chicken breasts, ice-cream;
- restaurant-specific: please, vegetarian, deep-fried.
And so on, just think about it! Of course, you don’t need to know how to say “vegetarian” in French when you are lost in Leon. But the time will come, my friend, when you get hungry.
The “ray” vocab penetrates all possible domains of our life, and it’s very easy to have gaps on this level. Why? Because it’s estimated to cover around 4000-5000 words. And this is of a lot when you’re beginner.
“A man with a scant vocabulary will almost certainly be a weak thinker. The richer and more copious one’s vocabulary and the greater one’s awareness of fine distinctions and subtle nuances of meaning, the more fertile and precise is likely to be one’s thinking. Knowledge of things and knowledge of the words for them grow together. If you do not know the words, you can hardly know the thing.”
― Henry Hazlitt, Thinking as a Science
That’s the prominent problem of all language learners. Basic vocabulary doesn’t give us much flexibility of expression; more refined vocabulary requires a hell of time to be learned. Meanwhile, some native speakers look at us as if we were some kind of funny creatures with undevelopped speech. Not all, thankfully.
In any case, if you want to pass for a learned man in a foreign country you have to use high-end words and expressions. This level consists of around 25000 words that are mostly redundant to your basic vocabulary. All those synonyms of “good” and “bad”, 79th and 132nd meaning of “to go”, latin derivations and so on would park exactly on this level.
As you can guess, the only way to memorize all this vocabulary is to read, and read a lot.
However, high in the sky, there’s even another level, and most of us never reached it even in our native language.
These other 200,000-300,000 words of highly technical, or scientific, vocabulary are so domain-specific that they almost never overlap. Linguists attacks you with formidable terms like “suprasegmental properties” or “degemination“; mathematics – with “manifold” and “quaternions“; nuclear scientists – with “antineutrino” and “photomultipliers“. I hope it’s enough.
There is absolutely no chance for this vocabulary to be used in every day speech. But again, you come across them in many technical books on a related topic.
How many words you need for a new language?
You can’t learn it all, it’s evident.
We don’t know even 50% of words of our mother tongue, leave a foreign language alone. Certain tools like Test Your Vocab, for example, are there to illustrate you this idea. And it’s fine.
Learning a language is not cramming new words for the rest of your life. As the Pareto Law goes, “20% of efforts give 80% of results“, and the kick-ass you don’t want to waste a single minute for covering other 20%. And since an average adult native speaker knows around 20-35 thousand words, you 20%-target would be to cover 4,000-7,000 words.
And if we quickly refer back to the Rivenc’s vocabulary sun we will see that core and frequent vocabulary altogether comprise exactly around 5,000-7,000 words. These are exactly the words you want to concentrate on.
General Service Lists
So you have found how many words you need to learn. Now, how do you do this?
You have to know your enemy. You can’t learn thousands of words without knowing what they look like. And the perfect tool here is the General Service List. Briefly, GSL is a long list of 2000 most-frequent words in English compiled by Dr West in far 1953.
Wow, that’s an old one, you would say. And I would totally agree, same like hundreds of linguists and lexicostatists. Luckily for us, there were a number of people who went beyond simply saying that the West’s list is outdated.
Dr. Charles Browne with his team published a new updated (and littled bit expanded) version of GSL in 2013. The new list includes already 2800 words deliberately extracted from the immense 273-million-words written corpus of English. With the New General Service list, learners can understand 92% of English texts (on general topics obviously).
I can feel your question flying around.
“But I don’t need a list of English words, I need one for [insert your target language here]!!!”
In this case, you want to access the blog of Hermit Dave who created frequency word lists for 39 languages based on the database of Open Subtitles Org. Most of list include more than 50 thousand words. The one for Russian language, for example, included 450029 unique words. And, since it’s all subtitles, you have an access to true spoken language.
GSL’s 2000 words is a number quite inferior to how many words you need to know to go about your business in the real life. Nevertheless, you won’t be able to build a wide vocabulary without this base, so I recommend you to start from here. And as you move along, you will acquire new vocab naturally through reading, listening and real-time conversations.
I hope this article was helpful to you! If so, simply press any of “to share” button and let others know!
People might be interested in Clozemaster, which uses the GSL lists (AFAIK) and lets you learn and practice vocab for the target language in context and by frequency. I’ve found it quite a useful tool. (Clozemaster.com)