Description

Syntactic clustering of the Web. Approach for finding very similar docs on the web. * Shingling approach * Computation of digest basing on shingles. * Computation of super shingles. * Filtering. * Clustering in part (division into tiles and merging) About 30.000.000 docs analyzed. This approach is used in the paper of Vedran

Links and resources

Tags

community