Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic Models of Text and Link Structure for Hypertext Classification

L. Getoor, E. Segal, B. Taskar, und D. Koller. IJCAI Workshop on "Text Learning: Beyond Supervision", Seattle, WA, (August 2001)

Zusammenfassung

Most text classification methods treat each document as an independent instance. However, in many text domains, documents are linked and the topics of linked documents are correlated. For example, web pages of related topics are often connected by hyperlinks and scientific papers from related fields are commonly linked by citations. We propose a unified probabilistic model for both the textual content and the link structure of a document collection. Our model is based on the recently introduced framework of Probabilistic Relational Models (PRMs), which allows us to capture correlations between linked documents. We show how to learn these models from data and use them efficiently for classification. Since exact methods for classification in these large models are intractable, we utilize belief propagation, an approximate inference algorithm. Belief propagation automatically induces a very natural behavior, where our knowledge about one document helps us classify related ones, which in turn help us classify others. We present preliminary empirical results on a dataset of university web pages.

Links und Ressourcen

BibTeX-Schlüssel: citeulike:344951
Eintragstyp: inproceedings
Adresse: Seattle, WA
Buchtitel: IJCAI Workshop on "Text Learning: Beyond Supervision"
Jahr: 2001
Monat: August
comment: Main.ThomasHofmannCiteSeer --- task: classify web pages according to student, course, faculty project using Hyperlink, Anchor text and the hub property. idea of the task: train with manually classified pages from one university and apply this to other universities. Method: PRM with existence uncertainty + Belief Propagation (partial knowledge influences its unknown environment) --- "Because instances are not independent, information about some instances can be used to reach conclusions about others." --- "Note that during classification, existence of links and anchor words in the links are used as evidence to infer categories of the web pages." --- determined classes: Page .category, .hub, .word1, ... .wordn undetermined: Links .fromPage, .toPage, .anchor, .exists; Anchor .word
priority: 0
citeulike-article-id: 344951
Dokument: http://ai.stanford.edu/~erans/publications/ijcai01-ws.pdf

@ldietzs Tags hervorgehoben

Zitieren Sie diese Publikation

@inproceedings{citeulike:344951, abstract = {Most text classification methods treat each document as an independent instance. However, in many text domains, documents are linked and the topics of linked documents are correlated. For example, web pages of related topics are often connected by hyperlinks and scientific papers from related fields are commonly linked by citations. We propose a unified probabilistic model for both the textual content and the link structure of a document collection. Our model is based on the recently introduced framework of Probabilistic Relational Models (PRMs), which allows us to capture correlations between linked documents. We show how to learn these models from data and use them efficiently for classification. Since exact methods for classification in these large models are intractable, we utilize belief propagation, an approximate inference algorithm. Belief propagation automatically induces a very natural behavior, where our knowledge about one document helps us classify related ones, which in turn help us classify others. We present preliminary empirical results on a dataset of university web pages.}, added-at = {2006-06-16T10:34:37.000+0200}, address = {Seattle, WA}, author = {Getoor, Lise and Segal, Eran and Taskar, Ben and Koller, Daphne}, biburl = {https://www.bibsonomy.org/bibtex/2ec406c66aec88652cc15438fd53e140f/ldietz}, booktitle = {IJCAI Workshop on "Text Learning: Beyond Supervision"}, citeulike-article-id = {344951}, comment = {Main.ThomasHofmannCiteSeer --- task: classify web pages according to student, course, faculty project using Hyperlink, Anchor text and the hub property. idea of the task: train with manually classified pages from one university and apply this to other universities. Method: PRM with existence uncertainty + Belief Propagation (partial knowledge influences its unknown environment) --- "Because instances are not independent, information about some instances can be used to reach conclusions about others." --- "Note that during classification, existence of links and anchor words in the links are used as evidence to infer categories of the web pages." --- determined classes: Page [.category, .hub, .word1, ... .wordn] undetermined: Links [.fromPage, .toPage, .anchor, .exists]; Anchor [.word]}, interhash = {2712a221b906243be0d1f92874e48380}, intrahash = {ec406c66aec88652cc15438fd53e140f}, keywords = {mustread relationalmodels socialnets}, month = {August}, priority = {0}, timestamp = {2006-06-16T10:34:37.000+0200}, title = {Probabilistic Models of Text and Link Structure for Hypertext Classification}, url = {http://ai.stanford.edu/~erans/publications/ijcai01-ws.pdf}, year = 2001 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic Models of Text and Link Structure for Hypertext Classification

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Probabilistic Models of Text and Link Structure for Hypertext Classification

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Probabilistic Models of Text and Link Structure for Hypertext Classification

Kommentare und Rezensionen
(0)