The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.
J. Leskovec, L. Backstrom, and J. Kleinberg. KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, page 497--506. New York, NY, USA, ACM, (2009)
B. Berendt, N. Glance, and A. Hotho (Eds.) Workshop at 18th Europ. Conf. on Machine Learning (ECML'08) / 11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), (2008)