copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Untangling compound documents on the web

N. Eiron, and K. McCurley. HYPERTEXT '03: Proceedings of the fourteenth ACM conference on Hypertext and hypermedia, page 85--94. New York, NY, USA, ACM, (2003)
DOI: http://doi.acm.org/10.1145/900051.900070

Abstract

Most text analysis is designed to deal with the concept of a "document", namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the World Wide Web tend to have a much smaller granularity than text documents. We claim that the notions of "document" and "web node" are not synonymous, and that authors often tend to deploy documents as collections of URLs, which we call "compound documents". In this paper we present new techniques for identifying and working with such compound documents, and the results of some large-scale studies on such web documents. The primary motivation for this work stems from the fact that information retrieval techniques are better suited to working on documents than individual hypertext nodes.

Description

Untangling compound documents on the web

Links and resources

BibTeX key: 900070
entry type: inproceedings
address: New York, NY, USA
booktitle: HYPERTEXT '03: Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
year: 2003
pages: 85--94
publisher: ACM
location: Nottingham, UK
isbn: 1-58113-704-4
DOI: http://doi.acm.org/10.1145/900051.900070
url: http://portal.acm.org/citation.cfm?id=900051.900070

@dominikb1888's tags highlighted

Cite this publication

search on

Meta data

Last update 14 years ago
Created 16 years ago

Comments and Reviews
(0)

There is no review or comment yet. You can write one!

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Untangling compound documents on the web

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Untangling compound documents on the web

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Untangling compound documents on the web

Comments and Reviews
(0)