
Summarizing Topical Content with Word Frequency and Exclusivity

, и . Proceedings of the 29th International Coference on International Conference on Machine Learning, стр. 9–16. Madison, WI, USA, Omnipress, (2012)


Recent work in text analysis commonly describes topics in terms of their most frequent words, but the exclusivity of words to topics is equally important for communicating content. We introduce Hierarchical Poisson Convolution (HPC), a model which infers regularized estimates of the differential use of words across topics as well as their frequency within topics. HPC uses known hierarchical structure on human-labeled topics to make focused comparisons of differential usage within each branch of the hierarchy of labels. We then infer a summary for each topic in terms of words that are both frequent and exclusive. We develop a parallelized Hamiltonian Monte Carlo sampler that allows for fast and scalable computation.

Линки и ресурсы



  • @bsc
  • @dblp
@bsc- тэги данного пользователя выделены