stroeh > duplicate

bookmarks (hide)7
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1w-shingling - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/W-shingling
13 years ago by @stroeh
show all tags
detection
ähnlichkeitsmaß
shingle
duplicate
shingling
w-shingling
detectionähnlichkeitsmaßshingleduplicateshinglingw-shingling
copydelete
- community post
- history of this post
5Bibliographic Hash Key – Verbund-Wiki GBV
http://www.gbv.de/wikis/cls/Bibliographic_Hash_Key
13 years ago by @stroeh
show all tags
hashkey
bibkey
bibliographic
duplicate
hash
hashkeybibkeybibliographicduplicatehash
copydelete
- community post
- history of this post
1Dublette (Bibliothek) – Wikipedia
http://de.wikipedia.org/wiki/Dublette_(Bibliothek)
14 years ago by @stroeh
show all tags
dublette
doublet
duplicate
definition
dublettedoubletduplicatedefinition
copydelete
- community post
- history of this post
1Near-duplicates and shingling
can now generate all pairs $i,j$ for which $x_i^\pi$ is present in both their sketches. From these we can compute, for each pair $i,j$ with non-zero sketch overlap, a count of the number of $x_i^\pi$ values they have in common. By applying a preset threshold, we know which pairs $i,j$ have heavily overlapping sketches. For instance, if the threshold were 80%, we would need the count to be at least 160 for any $i,j$. As we identify such pairs, we run the union-find to group documents into near-duplicate ``syntactic clusters''. This is essentially a variant of the single-link clustering algorithm introduced in Section 17.2 (page [*]).
13 years ago by @stroeh
show all tags
near
shingle
duplicate
shingling
nearshingleduplicateshingling
copydelete
- community post
- history of this post
1Signature Based Duplicate Detection in Digital Libraries - Powered by Google Text & Tabellen
http://docs.google.com/viewer?a=v&q=cache:hSTBthSicWIJ:www.bibalex.org/icudl06/presentation/(SreenivasRao)_Signature_Based_Duplicate_Detection_in_Digital_Libraries.ppt+libraries+duplicate+detection&hl=de&pid=bl&srcid=ADGEESgF2l4SzhchmQ3FqkiNZAN5FpHI5hrC8ybDPwrTuM6TyoWvg-Ckfrt5VnMEJrs39uS4FW9g--_n8XiaZX0j7edrQ8ifAobN3uoDG9oXGqcWTeaFEpVGKLHLv5QwBmJB-5AzDc3x&sig=AHIEtbSKjdkrBwI2MLnoQBKPO-c2LxS12w
14 years ago by @stroeh
show all tags
library
detection
signature
duplicate
librarydetectionsignatureduplicate
copydelete
- community post
- history of this post
1tutorial 4 (Duplicate Detection).pdf (application/pdf-Objekt)
http://webcourse.cs.technion.ac.il/236621/Winter2010-2011/ho/WCFiles/tutorial%204%20(Duplicate%20Detection).pdf
13 years ago by @stroeh
show all tags
detection
webcourse
tutorial
duplicate
detectionwebcoursetutorialduplicate
copydelete
- community post
- history of this post
1Deduplication - Solr Wiki
http://wiki.apache.org/solr/Deduplication
13 years ago by @stroeh
show all tags
searchengine
solr
ir
deduplication
duplicate
searchenginesolrirdeduplicationduplicate
copydelete
- community post
- history of this post

⟨⟨
⟨
1
⟩
⟩⟩

publications (hide)44
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2Accurate discovery of co-derivative documents via duplicate text detection.
Y. Bernstein, and J. Zobel. Inf. Syst., 31 (7): 595-609 (2006)
13 years ago by @stroeh
show all tags
detection
co-derivative
duplicate
derivative
detectionco-derivativeduplicatederivative
copydeleteadd this publication to your clipboard
10Adaptive duplicate detection using learnable string similarity measures.
M. Bilenko, and R. Mooney. KDD, page 39-48. ACM, (2003)
13 years ago by @stroeh
show all tags
detection
similarity
duplicate
detectionsimilarityduplicate
copydeleteadd this publication to your clipboard
3Copy Detection Mechanisms for Digital Documents.
S. Brin, J. Davis, and H. Garcia-Molina. SIGMOD Conference, page 398-409. ACM Press, (1995)
13 years ago by @stroeh
show all tags
detection
copy
duplicate
detectioncopyduplicate
copydeleteadd this publication to your clipboard
1Algorithms for duplicate documents
A. Broder. (2005)
13 years ago by @stroeh
show all tags
detection
algorithms
duplicate
detectionalgorithmsduplicate
copydeleteadd this publication to your clipboard
5Identifying and Filtering Near-Duplicate Documents.
A. Broder. CPM, volume 1848 of Lecture Notes in Computer Science, page 1-10. Springer, (2000)
13 years ago by @stroeh
show all tags
detection
near
duplicate
detectionnearduplicate
copydeleteadd this publication to your clipboard
5On the resemblance and containment of documents
A. Broder. Compression and Complexity of Sequences, page 21--29. Salerno, Italy, IEEE Computer Society Press, (June 1997)
13 years ago by @stroeh
show all tags
detection
resemblance
duplicate
detectionresemblanceduplicate
copydeleteadd this publication to your clipboard
1Duplicate Data Detection
A. Chowdhury. (2004)
13 years ago by @stroeh
show all tags
detection
data
duplicate
detectiondataduplicate
copydeleteadd this publication to your clipboard
5Collection statistics for fast duplicate document detection.
A. Chowdhury, O. Frieder, D. Grossman, and M. McCabe. ACM Trans. Inf. Syst., 20 (2): 171-191 (2002)
13 years ago by @stroeh
show all tags
detection
document
duplicate
detectiondocumentduplicate
copydeleteadd this publication to your clipboard
2Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface.
P. Christen. KDD, page 1065-1068. ACM, (2008)
13 years ago by @stroeh
show all tags
detection
deduplication
duplicate
detectiondeduplicationduplicate
copydeleteadd this publication to your clipboard
2Online duplicate document detection: signature reliability in a dynamic retrieval environment.
J. Conrad, X. Guo, and C. Schriber. CIKM, page 443-452. ACM, (2003)
13 years ago by @stroeh
show all tags
detection
document
signature
duplicate
detectiondocumentsignatureduplicate
copydeleteadd this publication to your clipboard
2Managing déjà vu: Collection building for the identification of nonidentical duplicate documents.
J. Conrad, and C. Schriber. JASIST, 57 (7): 921-932 (2006)
13 years ago by @stroeh
show all tags
detection
document
collection
building
duplicate
detectiondocumentcollectionbuildingduplicate
copydeleteadd this publication to your clipboard
1Duplicate detection and record consolidation in large bibliographic databases: the COPAC database experience (vol 24, pg 231, 1998)
S. Cousins. JOURNAL OF INFORMATION SCIENCE, 24 (6): 393 (1998)
13 years ago by @stroeh
show all tags
database
detection
metadata
bibliographic
duplicate
databasedetectionmetadatabibliographicduplicate
copydeleteadd this publication to your clipboard
3Detecting phrase-level duplication on the world wide web
D. Fetterly, M. Manasse, and M. Najork. SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, page 170--177. New York, NY, USA, ACM Press, (2005)
13 years ago by @stroeh
show all tags
detection
document
duplicate
detectiondocumentduplicate
copydeleteadd this publication to your clipboard
2Finding similar files in large document repositories.
G. Forman, K. Eshghi, and S. Chiocchetti. KDD, page 394-400. ACM, (2005)
13 years ago by @stroeh
show all tags
similar
detection
near
document
duplicate
similardetectionneardocumentduplicate
copydeleteadd this publication to your clipboard
2Detecting Near-Duplicates in Large-Scale Short Text Databases.
C. Gong, Y. Huang, X. Cheng, and S. Bai. PAKDD, volume 5012 of Lecture Notes in Computer Science, page 877-883. Springer, (2008)
13 years ago by @stroeh
show all tags
detection
near
duplicate
detectionnearduplicate
copydeleteadd this publication to your clipboard
2Adaptive near-duplicate detection via similarity learning.
H. Hajishirzi, W. tau Yih, and A. Kolcz. SIGIR, page 419-426. ACM, (2010)
13 years ago by @stroeh
show all tags
detection
near
similarity
duplicate
detectionnearsimilarityduplicate
copydeleteadd this publication to your clipboard
2Unsupervised deduplication using cross-field dependencies.
R. Hall, C. Sutton, and A. McCallum. KDD, page 310-317. ACM, (2008)
13 years ago by @stroeh
show all tags
detection
deduplication
duplicate
detectiondeduplicationduplicate
copydeleteadd this publication to your clipboard
3Achieving both high precision and high recall in near-duplicate detection.
L. Huang, L. Wang, and X. Li. CIKM, page 63-72. ACM, (2008)
13 years ago by @stroeh
show all tags
detection
near
duplicate
detectionnearduplicate
copydeleteadd this publication to your clipboard
2Similarity and Duplicate Detection System for an OAI Compliant Federated Digital Library
H. Khan, K. Maly, and M. Zubair. Research and Advanced Technology for Digital Libraries, volume 3652 of Lecture Notes in Computer Science, Springer, Berlin / Heidelberg, (2005)
13 years ago by @stroeh
show all tags
library
detection
bibliographic
oai
similarity
duplicate
librarydetectionbibliographicoaisimilarityduplicate
copydeleteadd this publication to your clipboard
2Lexicon randomization for near-duplicate detection with I-Match
A. Kolcz, and A. Chowdhury. J. Supercomput., (September 2008)
13 years ago by @stroeh
show all tags
detection
near
duplicate
detectionnearduplicate
copydeleteadd this publication to your clipboard

⟨⟨
⟨
1
2
3
⟩
⟩⟩

BibSonomy

bookmarks (hide)7
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

1w-shingling - Wikipedia, the free encyclopedia

5Bibliographic Hash Key – Verbund-Wiki GBV

1Dublette (Bibliothek) – Wikipedia

1Near-duplicates and shingling

1Signature Based Duplicate Detection in Digital Libraries - Powered by Google Text & Tabellen

1tutorial 4 (Duplicate Detection).pdf (application/pdf-Objekt)

1Deduplication - Solr Wiki

publications (hide)44
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...

2Accurate discovery of co-derivative documents via duplicate text detection.

10Adaptive duplicate detection using learnable string similarity measures.

3Copy Detection Mechanisms for Digital Documents.

1Algorithms for duplicate documents

5Identifying and Filtering Near-Duplicate Documents.

5On the resemblance and containment of documents

1Duplicate Data Detection

5Collection statistics for fast duplicate document detection.

2Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface.

2Online duplicate document detection: signature reliability in a dynamic retrieval environment.

2Managing déjà vu: Collection building for the identification of nonidentical duplicate documents.

1Duplicate detection and record consolidation in large bibliographic databases: the COPAC database experience (vol 24, pg 231, 1998)

3Detecting phrase-level duplication on the world wide web

2Finding similar files in large document repositories.

2Detecting Near-Duplicates in Large-Scale Short Text Databases.

2Adaptive near-duplicate detection via similarity learning.

2Unsupervised deduplication using cross-field dependencies.

3Achieving both high precision and high recall in near-duplicate detection.

2Similarity and Duplicate Detection System for an OAI Compliant Federated Digital Library

2Lexicon randomization for near-duplicate detection with I-Match

browse

related tags

concepts

tags

bookmarks (hide)7 displayallbookmarks onlybookmarks per page5102050100 sort byadded attitle RSSBibTeXXML

publications (hide)44 displayallpublications onlypublications per page5102050100 sort byadded attitleauthorpublication dateentry typehelp for advanced sorting... RSSBibTeXRDFmore...

browse

related tags

tags

bookmarks (hide)7
display
all
bookmarks only
bookmarks per page
5
10
20
50
100
sort by
added at
title
RSS
BibTeX
XML

publications (hide)44
display
all
publications only
publications per page
5
10
20
50
100
sort by
added at
title
author
publication date
entry type
help for advanced sorting...
RSS
BibTeX
RDF
more...