Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

How words get the message across

Languages are adapted to deliver information efficiently and smoothly.

The length of words is related to how much information they convey. Credit:

Longer words tend to carry more information, according to research by a team of cognitive scientists.

It's a suggestion that might sound intuitively obvious, until you start to think about it. Why, then, the difference in length between 'now' and 'immediately'? For many years, linguists have tended to believe that the length of a word was associated with how often it was used, and that short words are used more frequently than long ones. This association was first proposed in the 1930s by the Harvard linguist George Kingsley Zipf1.

Zipf believed that the relationship between word length and frequency of use stemmed from an impulse to minimize the time and effort needed for speaking and writing, as it means we use more short words than long ones. But Steven Piantadosi and colleagues at the Massachusetts Institute of Technology in Cambridge say that, to convey a given amount of information, it is more efficient to shorten the least informative — and therefore the most predictable — words, rather than the most frequent ones.

Zipf's original association is roughly correct, as implied by how much more often 'a', 'the' and 'is' are used in English than, say, 'extraordinarily'. And this relationship of length to use seems to hold up in many languages. Because written and spoken length are generally similar, it applies to both speech and text.

But after analysing word use in 11 different European languages, Piantadosi and colleagues found that word length was more closely correlated with their information content than with how often they are used. They describe their results in the Proceedings of the National Academy of Sciences2.

"This is a landmark study", says linguist Roger Levy of the University of California at San Diego. "Our understanding of the relationship between word frequency and length has remained relatively static since Zipf's discoveries," he says, and he feels that this new study may now supply "the largest leap forward in 75 years" in our understanding of how languages evolve.

Method madness

Measuring the information content of a word isn't easy, especially because it can vary depending on the context. But Piantadosi and colleagues make the assumption that the more predictable a word is, the less informative it is. So the word 'nine' in 'A stitch in time saves nine' contains less information than it does in the phrase 'The word that you will hear is nine', because in the first case it is highly predictable - when it comes, it doesn't significantly add to the information already in the phrase.

The MIT group devised a method for estimating the information content of words in digitized texts by looking at how it is correlated with — and thus predictable from — the preceding words. For just a single preceding word, Piantadosi explains, "we count up how often all pairs of words occur together in sequence, such as 'the man', 'the boy', 'a man', 'a tree' and so on. Then we use this count to estimate the probability of a word conditioned on the previous word — or more generally, the probability of any word conditioned on any preceding sequence of a given number of words." According to information theory, the information content is then proportional to the negative logarithm of this probability.

However, physicist Damián Zanette of the Centro Atómico Bariloche in San Carlos de Bariloche, Argentina, who has studied Zipf-type relationships in linguistics, is not persuaded that the Harvard group's method accurately captures the real information content of a word in context. This, he says, is typically determined by several hundred surrounding words, not just a few3.

Piantadosi and colleagues suggest that the relationship of word length to information content might not only make it more efficient to convey information linguistically but also make language cognition a smoother ride for the reader or listener. If shorter and briefer words carry less information, then the density of information throughout a phrase or sentence will be smoothed out, so that it is delivered at a roughly steady rate rather than in lumps. In this way, the results suggest how the structure of language might aid communication.

Surprising though it may seem, some linguists, such as Noam Chomsky, have suggested that communication might not be the primary purpose of language - that it might, for example, be primarily about establishing social relations. Yet according to cognitive scientist Florian Jaeger at the University of Rochester in New York, these new results "suggest that communication is a sufficiently important aspect of language to shape it over time".


  1. Zipf, G. The Psychobiology of Language (Routledge, 1936).

  2. Piantadosi, S. T., Tily, H. & Gibson, E. Proc. Natl Acad. Sci. USA doi:10.1073/pnas.1012551108 (2011).

  3. Montemurro, M. A. & Zanette, D. H. Adv. Complex Syst. 13, 135-153 (2010).

    Article  Google Scholar 

Download references


Related links

Related links

Related external links


Steven Piantadosi's home page

Roger Levy's home page

Florian Jaeger's home page

Zipf's Law

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ball, P. How words get the message across. Nature (2011).

Download citation

  • Published:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing