copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank

M. Richardson, and P. Domingos. NIPS, (2001)

Abstract

The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve Page- Rank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human- rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines.

Links and resources

BibTeX key: citeulike:393238
entry type: inproceedings
booktitle: NIPS
year: 2001
comment: Google's PageRank algorithm applies a query-unsensitive algorithm to the result set for ranking. In this paper, they include a relevance measure Rq(j) of a page j to the query q to the transition of pages Pq(i->j). They also include this into the eigenvector calculation, of the original PageRank algorithm. They state that the relevance measure Rq(j) can be chosen arbitrarily, e.g. TFIDF or based on latent semantic indexing. --- According to http://pr.efactory.de/e-pagerank-themes.shtml Problem with this approach are - vulnerability to spam - scalability (pre-computation for 100.000 terms requires 100-200 times the original PageRank space and time.) space not so bad compared to the reverse index
priority: 0
citeulike-article-id: 393238
Document: http://books.nips.cc/papers/files/nips14/AP17.pdf

@ldietz's tags highlighted

Cite this publication

@inproceedings{citeulike:393238, abstract = {The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve Page- Rank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human- rated) quality of the pages returned, while remaining efficient enough to be used in today’s large search engines.}, added-at = {2006-06-16T10:34:37.000+0200}, author = {Richardson, M. and Domingos, P.}, biburl = {https://www.bibsonomy.org/bibtex/2a0175e4eb1058fa2eff7643cce22d4fb/ldietz}, booktitle = {NIPS}, citeulike-article-id = {393238}, comment = {Google's PageRank algorithm applies a query-unsensitive algorithm to the result set for ranking. In this paper, they include a relevance measure Rq(j) of a page j to the query q to the transition of pages Pq(i->j). They also include this into the eigenvector calculation, of the original PageRank algorithm. They state that the relevance measure Rq(j) can be chosen arbitrarily, e.g. TFIDF or based on latent semantic indexing. --- According to http://pr.efactory.de/e-pagerank-themes.shtml Problem with this approach are - vulnerability to spam - scalability (pre-computation for 100.000 terms requires 100-200 times the original PageRank space and time.) [space not so bad compared to the reverse index]}, interhash = {f47be03b0e387ba30dfbff13f09b4574}, intrahash = {a0175e4eb1058fa2eff7643cce22d4fb}, keywords = {pagerank community}, priority = {0}, timestamp = {2006-06-16T10:34:37.000+0200}, title = {The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank}, url = {http://books.nips.cc/papers/files/nips14/AP17.pdf}, year = 2001 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank

Comments and Reviews
(0)