Artikel,

A Two-Level Topic Model Towards Knowledge Discovery from Citation Networks

Z. Guo, Z. Zhang, S. Zhu, Y. Chi, und Y. Gong.
Transactions on Knowledge and Data Engineering, 26 (4): 780--794 (April 2014)
DOI: 10.1109/TKDE.2013.56

Zusammenfassung

Knowledge discovery from scientific articles has received increasing attention recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper, we propose a Bernoulli process topic (BPT) model which considers the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. An efficient computation algorithm is proposed to overcome the difficulty of matrix inverse operation. In addition to conducting the experimental evaluations on the document modeling and document clustering tasks, we also apply the BPT model to well known corpora to discover the latent topics, recommend important citations, detect the trends of various research areas in computer science between 1991 and 1998, and to investigate the interactions among the research areas. The comparisons against state-of-the-art methods demonstrate a very promising performance. The implementations and the data sets are available online .

BibTeX-Schlüssel: guo2014twolevel
Eintragstyp: article
Jahr: 2014
Monat: apr
Zeitschrift: Transactions on Knowledge and Data Engineering
Nummer: 4
Seiten: 780--794
Verlag: IEEE
Band: 26
issn: 1041-4347
DOI: 10.1109/TKDE.2013.56
URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6494572&tag=1

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

%0 Journal Article %1 guo2014twolevel %A Guo, Zhen %A Zhang, Zhongfei %A Zhu, Shenghuo %A Chi, Yun %A Gong, Yihong %D 2014 %I IEEE %J Transactions on Knowledge and Data Engineering %K citation model network sota topic %N 4 %P 780--794 %R 10.1109/TKDE.2013.56 %T A Two-Level Topic Model Towards Knowledge Discovery from Citation Networks %U http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6494572&tag=1 %V 26 %X Knowledge discovery from scientific articles has received increasing attention recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper, we propose a Bernoulli process topic (BPT) model which considers the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. An efficient computation algorithm is proposed to overcome the difficulty of matrix inverse operation. In addition to conducting the experimental evaluations on the document modeling and document clustering tasks, we also apply the BPT model to well known corpora to discover the latent topics, recommend important citations, detect the trends of various research areas in computer science between 1991 and 1998, and to investigate the interactions among the research areas. The comparisons against state-of-the-art methods demonstrate a very promising performance. The implementations and the data sets are available online .

@article{guo2014twolevel, abstract = {Knowledge discovery from scientific articles has received increasing attention recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper, we propose a Bernoulli process topic (BPT) model which considers the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. An efficient computation algorithm is proposed to overcome the difficulty of matrix inverse operation. In addition to conducting the experimental evaluations on the document modeling and document clustering tasks, we also apply the BPT model to well known corpora to discover the latent topics, recommend important citations, detect the trends of various research areas in computer science between 1991 and 1998, and to investigate the interactions among the research areas. The comparisons against state-of-the-art methods demonstrate a very promising performance. The implementations and the data sets are available online .}, added-at = {2014-04-29T08:06:18.000+0200}, author = {Guo, Zhen and Zhang, Zhongfei and Zhu, Shenghuo and Chi, Yun and Gong, Yihong}, biburl = {https://www.bibsonomy.org/bibtex/23a4fdf58f02ec72f11af4ef4f2582515/jaeschke}, doi = {10.1109/TKDE.2013.56}, interhash = {6f5dcd94fc8906e5981cade3897d1f0d}, intrahash = {3a4fdf58f02ec72f11af4ef4f2582515}, issn = {1041-4347}, journal = {Transactions on Knowledge and Data Engineering}, keywords = {citation model network sota topic}, month = apr, number = 4, pages = {780--794}, publisher = {IEEE}, timestamp = {2014-07-28T15:57:31.000+0200}, title = {A Two-Level Topic Model Towards Knowledge Discovery from Citation Networks}, url = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6494572&tag=1}, volume = 26, year = 2014 }

BibSonomy

A Two-Level Topic Model Towards Knowledge Discovery from Citation Networks

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf