Distribution of Word Frequencies


FIRST, we must apologize for having evidently given the impression in our original communication, especially in our assumption (iv), that we were intending to use information theory in treating the problem. Our aim, however, was to attack the problem without recourse to the assumptions of this theory. These necessarily involve regarding language as consisting of messages, and the words as being, for all statistical purposes, packets of information. This model is unrealistic as applied to real language situations, on the basis of which the faculty of speech is learned and which even in the most sophisticated communities dominate its natural evolution. We accordingly sought for some approach which would avoid the need for the concept of ‘information’ in the technical sense; our condition (iv) should have been put in the form: “Languages will tend to evolve in such a way that the time required to recall a given number of distinct words is a minimum”. Our unusual mathematical method was designed because of the difficulty in giving an exact formulation of this condition.

PARKER-RHODES, A., JOYCE, T. Distribution of Word Frequencies. Nature 179, 595–596 (1957).

