Abstract
This paper describes a population model for word frequency distributions based on the Zipf-Mandelbrot law,
corresponding to the word frequency distribution induced by a random character sequence. The model, which
has convenient analytical and numerical properties, is shown to be adequate for the description of language data
extracted by automatic means from large text corpora. It can thus be used to study the problems faced by the
statistical analysis of such data in the ï¬eld of natural-language processing.
Users
Please
log in to take part in the discussion (add own reviews or comments).