копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

On the resemblance and containment of documents

A. Broder. Compression and Complexity of Sequences, стр. 21--29. Salerno, Italy, IEEE Computer Society Press, (июня 1997)

Аннотация

Given two documents A and B we define two mathematical notions: their resemblance r(A, B) and their containment c(A, B) that seem to capture well the informal notions of â€œroughly the sameâ€� and â€œroughly contained.â€� The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document. This paper discusses the mathematical properties of these measures and the efficient implementation of the sampling process using Rabin (1981) fingerprints

Описание

Not previously uploaded

Линки и ресурсы

ключ BibTeX: Broder1997
тип записи: inproceedings
адрес: Salerno, Italy
название книги: Compression and Complexity of Sequences
год: 1997
месяц: June
страницы: 21--29
издательство: IEEE Computer Society Press
priority: 3
citeulike-article-id: 562668
Document: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.779&rep=rep1&type=pdf

тэги

@stroeh- тэги данного пользователя выделены

Цитировать эту публикацию

искать в

Метаданные

Последнее изменение 14 лет назад
Создан 14 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!