Language does not fossilize — for all that it was one of the great transitions in evolution1, the advent of language has left no obvious equivalent to fossil teeth and bones, and seems inaccessible to enquiry. But it is not hard to imagine the emergence of a set of signals to label objects, the combinatorial nature of which allowed an infinite repertoire of sentences to be constructed from a finite set of words. An essential part of the process must have been the acknowledgement of a set of rules to combine words in such a way as to make sentences meaningful. These rules are the syntax that we all easily learn as children, but students of language evolution have a tough time explaining its origins.

So, how did syntax originate? Some authors have conjectured that it resulted from some pre-adaptation of the human brain2 — that the first step was the building of a rich lexicon upon a cognitive system predisposed to formulate rules able to exploit the underlying combinatorial features of word associations. According to a study by Ramon Ferrer i Cancho and colleagues3, however, a simple word–object association matrix can provide the basis for syntax almost for free.

The authors use a simple model in which signals (words) are associated with objects. Such association can be referential (such as meat referring to ‘edible organic matter’) or non-referential (eat is associated with ‘action of eating’). Some words display both associations: eat, for example, is also linked to ‘edible organic matter’. A bipartite set of connections is obtained linking the two sets (Fig. 1a).

Figure 1: Building a protolanguage network.
figure 1

a, A bipartite set of connections can be built by linking signals (words, red) to objects (blue). Most words are specific, referring to only one or two objects, whereas a few of them have many links. b, A new network is obtained by linking words that share at least one object of reference. The resulting network has many words with few links, but some acting as hubs. Ferrer i Cancho et al.3 believe that syntactic rules might have emerged from such a scale-free network architecture.

Ferrer i Cancho et al. show that most of the architecture of this network stems from a universal feature common to all languages, Zipf's law. Briefly stated, Zipf's law establishes that if we take all the words in a text and order them by rank, from the most common to the rarest, the frequency (number of appearances) decays inversely with their rank4. Most words are rare, whereas a few (such as the, of, and, to, I) are very common. Common words are less specific and can be linked to many objects, whereas rare ones are more specific and have one or a few links. The number of links of a given word, not surprisingly, follows the same law.

This set of linkages does not provide any obvious clue to the origins of syntax. But it allows another network to be built that incorporates a primitive form of word–word association. The new network naturally links names and actions, because pairs such as eat and meat will be connected. This is simply done: two words will be linked if they share at least one object of reference (Fig. 1b). This is of course a very rough way of associating symbols, but, as Ferrer i Cancho et al. show, the architecture of the resulting network seems to be surprisingly close to many features exhibited by linguistic networks.

Previous studies5 have demonstrated that networks obtained by linking words exhibit seemingly universal patterns of organization, provided the words are syntactically related: two words are linked if they have been syntactically combined in a collection of sentences. Different languages share the same scale-free structure, with most words having few syntactic links and a few of them being connected to many others.

Ferrer i Cancho and colleagues' network displays the same structure as its real counterparts. Exactly the same distribution of links is found, suggesting the possibility that early ‘protolanguage’ might have been ready-made for the development of a full syntax. If this is so, the sometimes illogical and quirky appearance of syntactic rules might be nothing but a by-product of scale-free network architecture. The study also suggests that Zipf's law could have been a precondition for syntax and symbolic communication. Once such a condition was met, the basis for the combinatorial explosion characteristic of human language was ready for selection to shape it. The new theory will be subject to debate, but the remnants of the communicative Big Bang are evidently hiding somewhere inside modern language networks.