The dataset genres.json contains (sub)genre classifications for novels published between 1770 and 1915. The genres covered are
gothic novels
"silver fork" novels
national tale novels
The project combines two sources of information. The word counts themselves come from the HathiTrust Research Center (HTRC), which has tabulated them at the page level in 4.8 million public-domain volumes. Information about genre comes from a parallel project led by Ted Underwood, and supported by the National Endowment for the Humanities and the American Council of Learned Societies.
Introduction: Since the beginning of Amazon.com’s creation in the 1990’s, books have been a major component of the business. In fact books were the first items Amazon ever sold, before being joined by other items. Recently Amazon.com ventured into the publishing sector by allowing people to self-publish their works, and this has created a vast…
M. Taboada, J. Brooke, and M. Stede. Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, page 62--70. Stroudsburg, PA, USA, Association for Computational Linguistics, (2009)
D. Nguyen, D. Trieschnigg, and M. Theune. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, page 321--330. ACM, (2014)
D. Elson, N. Dames, and K. McKeown. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, page 138–147. Association for Computational Linguistics, The Association for Computer Linguistics, (2010)