October 9, 2011 2 Comments
This article asserts that English is relatively easy to learn, and that the
average speaker of English has a vocabulary with about half again as many words as the average speaker of, say, French or German
although no link is made to any source for this point. It also claims that the English version of a lengthy text is always substantially shorter than versions in other languages. I did a very scientific text of looking up the Treaty establishing the European Community, available in html here, and found that in English there were 47,165 words, but only 46,101 in French and 39,612 in German. Still, it’s only one data point, so I won’t disagree with the author just yet.
Google uses the texts of the EU Commission (which are translated into 20 or so languages) to drive its Google Translate service. To translate between more obscure languages it uses English as the pivot,
A good number of English-language detective novels, for example, have probably been translated into both Icelandic and Farsi…This means that John Grisham makes a bigger contribution to the quality of GT’s Icelandic-Farsi translation device than Rumi or Halldór Laxness ever will. And the real wizardry of Harry Potter may well lie in his hidden power to support translation from Hebrew into Chinese
The chart above is based on work by Francis and Kucera in 1982 on frequency analysis of English usage in which they found that a vocabulary of 2,000 words was sufficient to provide comprehension of 80% of the words in their texts.
These days, with the internet and better computing power, you can quickly test your own size of vocabulary here http://www.testyourvocab.com based on 120 words and some clever stats (they start off with 40 words to get your probable vocabulary range, and then drill down with a further 80 to end up with an error of +/-10% in your vocabulary size.) Here’s a graph of the database of results they’ve compiled relating vocabulary with age:
They find that whilst the Oxford English Dictionary may list 300,000 words, after 45,000, they’re pretty much all either archaic, scientific/technical, or otherwise inapplicable to any kind of “general” vocabulary test. Of the 100,000,000 word long British National Corpus that they used, the top 3 words are…
Answer under the fold!