Nature 179, 595 (16 March 1957); doi:10.1038/179595a0
Distribution of Word Frequencies
I. J. GOOD
25 Scott House, Princess Elizabeth Way, Cheltenham.
THE purpose of this communication is to explain, in terms of the theory of information, the implications of the Zipf distribution of word frequencies1. The distribution is formally identical with the Pareto income and Willis taxonomic distributions, but the present discussion is restricted to word frequencies. The discussion resembles that of Mandelbrot2 but is simpler. The discussion by Parker-Rhodes and Joyce3 also resembles Mandelbrot's, but is fallacious.
Letters to Nature
Nature 178, 1308 (08 December 1956); doi:10.1038/1781308a0
A Theory of Word-Frequency Distribution
A. F. PARKER-RHODES & T. JOYCE
Cambridge Language Research Unit, 20 Millington Road, Cambridge.
THE object of this communication is to show that a certain remarkably simple experimental relation governing word-frequencies in language can be explained by a simple model of the process of searching for information, about each word heard or read, in the memory of words employed in the language faculty.
Cover, T. King, R.
Abstract
In his original paper on the subject, Shannon found upper and lower bounds for the entropy of printed English based on the number of trials required for a subject to guess subsequent symbols in a given text. The guessing approach precludes asymptotic consistency of either the upper or lower bounds except for degenerate ergodic processes. Shannon's technique of guessing the next symbol is altered by having the subject place sequential bets on the next symbol of text.....