Most text classification methods treat each document as an
independent instance. However, in many text domains, documents
are linked and the topics of linked documents are correlated.
For example, web pages of related topics are often
connected by hyperlinks and scientific papers from related
fields are commonly linked by citations. We propose a
unified probabilistic model for both the textual content and
the link structure of a document collection. Our model is
based on the recently introduced framework of Probabilistic
Relational Models (PRMs), which allows us to capture correlations
between linked documents. We show how to learn
these models from data and use them efficiently for classification.
Since exact methods for classification in these large
models are intractable, we utilize belief propagation, an approximate
inference algorithm. Belief propagation automatically
induces a very natural behavior, where our knowledge
about one document helps us classify related ones, which in
turn help us classify others. We present preliminary empirical
results on a dataset of university web pages.
IJCAI Workshop on "Text Learning: Beyond Supervision"
year
2001
month
August
comment
Main.ThomasHofmannCiteSeer
---
task: classify web pages according to student, course, faculty project using Hyperlink, Anchor text and the hub property.
idea of the task: train with manually classified pages from one university and apply this to other universities.
Method: PRM with existence uncertainty + Belief Propagation (partial knowledge influences its unknown environment)
---
"Because instances are not independent, information about some instances can be used to reach conclusions about others."
---
"Note that during classification, existence of links and anchor words in the links are used as evidence to infer categories of the web pages."
---
determined classes: Page .category, .hub, .word1, ... .wordn
undetermined: Links .fromPage, .toPage, .anchor, .exists; Anchor .word
%0 Conference Paper
%1 citeulike:344951
%A Getoor, Lise
%A Segal, Eran
%A Taskar, Ben
%A Koller, Daphne
%B IJCAI Workshop on "Text Learning: Beyond Supervision"
%C Seattle, WA
%D 2001
%K mustread relationalmodels socialnets
%T Probabilistic Models of Text and Link Structure for Hypertext Classification
%U http://ai.stanford.edu/~erans/publications/ijcai01-ws.pdf
%X Most text classification methods treat each document as an
independent instance. However, in many text domains, documents
are linked and the topics of linked documents are correlated.
For example, web pages of related topics are often
connected by hyperlinks and scientific papers from related
fields are commonly linked by citations. We propose a
unified probabilistic model for both the textual content and
the link structure of a document collection. Our model is
based on the recently introduced framework of Probabilistic
Relational Models (PRMs), which allows us to capture correlations
between linked documents. We show how to learn
these models from data and use them efficiently for classification.
Since exact methods for classification in these large
models are intractable, we utilize belief propagation, an approximate
inference algorithm. Belief propagation automatically
induces a very natural behavior, where our knowledge
about one document helps us classify related ones, which in
turn help us classify others. We present preliminary empirical
results on a dataset of university web pages.
@inproceedings{citeulike:344951,
abstract = {Most text classification methods treat each document as an
independent instance. However, in many text domains, documents
are linked and the topics of linked documents are correlated.
For example, web pages of related topics are often
connected by hyperlinks and scientific papers from related
fields are commonly linked by citations. We propose a
unified probabilistic model for both the textual content and
the link structure of a document collection. Our model is
based on the recently introduced framework of Probabilistic
Relational Models (PRMs), which allows us to capture correlations
between linked documents. We show how to learn
these models from data and use them efficiently for classification.
Since exact methods for classification in these large
models are intractable, we utilize belief propagation, an approximate
inference algorithm. Belief propagation automatically
induces a very natural behavior, where our knowledge
about one document helps us classify related ones, which in
turn help us classify others. We present preliminary empirical
results on a dataset of university web pages.},
added-at = {2006-06-16T10:34:37.000+0200},
address = {Seattle, WA},
author = {Getoor, Lise and Segal, Eran and Taskar, Ben and Koller, Daphne},
biburl = {https://www.bibsonomy.org/bibtex/2ec406c66aec88652cc15438fd53e140f/ldietz},
booktitle = {IJCAI Workshop on "Text Learning: Beyond Supervision"},
citeulike-article-id = {344951},
comment = {Main.ThomasHofmannCiteSeer
---
task: classify web pages according to student, course, faculty project using Hyperlink, Anchor text and the hub property.
idea of the task: train with manually classified pages from one university and apply this to other universities.
Method: PRM with existence uncertainty + Belief Propagation (partial knowledge influences its unknown environment)
---
"Because instances are not independent, information about some instances can be used to reach conclusions about others."
---
"Note that during classification, existence of links and anchor words in the links are used as evidence to infer categories of the web pages."
---
determined classes: Page [.category, .hub, .word1, ... .wordn]
undetermined: Links [.fromPage, .toPage, .anchor, .exists]; Anchor [.word]},
interhash = {2712a221b906243be0d1f92874e48380},
intrahash = {ec406c66aec88652cc15438fd53e140f},
keywords = {mustread relationalmodels socialnets},
month = {August},
priority = {0},
timestamp = {2006-06-16T10:34:37.000+0200},
title = {Probabilistic Models of Text and Link Structure for Hypertext Classification},
url = {http://ai.stanford.edu/~erans/publications/ijcai01-ws.pdf},
year = 2001
}