Abstract

We have created three testbeds of web data for use in controlled experiments in collection modeling. This short paper examines the applicability of Ziff's and Heaps' laws as applied to web data. We find extremely close agreement between observed vocabulary growth and Heaps' law. We find reasonable agreement with Ziff's law for medium to low frequency terms. Ziff's law is a poor predictor for high frequency terms. These findings hold for all three testbeds although we restrict ourselves to one here due to space limitations.

Links and resources

Tags

community