G. Piatetsky-Shapiro. SIGKDD Explor. Newsl., 1 (2):
59--61(2000)Sumary: Good Summary of and introduction to KDD. Emphasizis is placed
on Data Mining and more "data" (in contrast to document) oriented
applications. Starting point for KDD.
DOI: http://doi.acm.org/10.1145/846183.846197
Zusammenfassung
Similarity search in text has proven to be an interesting problem
from the qualitative perspective because of inherent redundancies
and ambiguities in textual descriptions. The methods used in search
engines in order to retrieve documents most similar to user-defined
sets of keywords are not applicable to targets which are medium
to large size documents, because of even greater noise effects stemming
from the presence of a large number of words unrelated to the overall
topic in the document. The inverted representation is the dominant
method for indexing text, but it is not as suitable for document-to-document
similarity search, as for short user-queries. One way of improving
the quality of similarity search is Latent Semantic Indexing (LSI),
which maps the documents from the original set of words to a concept
space. Unfortunately, LSI maps the data into a domain in which it
is not possible to provide effective indexing techniques. In this
paper, we investigate new ways of providing conceptual search among
documents by creating a representation in terms of conceptual word-chains.
This technique also allows effective indexing techniques so that
similarity queries can be performed on large collections of documents
by accessing a small amount of data. We demonstrate that our scheme
outperforms standard textual similarity search on the inverted representation
both in terms of quality and search efficiency.
C:\Dokumente und Einstellungen\mgrani.KNOW\Eigene Dateien\private\PhD\Text\Representation.and.Metrics\similarityMeasures\publications\aggarwal01effectiveSimilaritySearch.pdf
Hinweis
Sumary: Good Summary of and introduction to KDD. Emphasizis is placed
on Data Mining and more "data" (in contrast to document) oriented
applications. Starting point for KDD
%0 Journal Article
%1 Piatetsky00KDDTenYears
%A Piatetsky-Shapiro, Gregory
%D 2000
%I ACM Press
%J SIGKDD Explor. Newsl.
%K Data Indexing, Minig, Concept Similarity MEtrics
%N 2
%P 59--61
%R http://doi.acm.org/10.1145/846183.846197
%T Knowledge discovery in databases: 10 years after
%V 1
%X Similarity search in text has proven to be an interesting problem
from the qualitative perspective because of inherent redundancies
and ambiguities in textual descriptions. The methods used in search
engines in order to retrieve documents most similar to user-defined
sets of keywords are not applicable to targets which are medium
to large size documents, because of even greater noise effects stemming
from the presence of a large number of words unrelated to the overall
topic in the document. The inverted representation is the dominant
method for indexing text, but it is not as suitable for document-to-document
similarity search, as for short user-queries. One way of improving
the quality of similarity search is Latent Semantic Indexing (LSI),
which maps the documents from the original set of words to a concept
space. Unfortunately, LSI maps the data into a domain in which it
is not possible to provide effective indexing techniques. In this
paper, we investigate new ways of providing conceptual search among
documents by creating a representation in terms of conceptual word-chains.
This technique also allows effective indexing techniques so that
similarity queries can be performed on large collections of documents
by accessing a small amount of data. We demonstrate that our scheme
outperforms standard textual similarity search on the inverted representation
both in terms of quality and search efficiency.
@article{Piatetsky00KDDTenYears,
abstract = {Similarity search in text has proven to be an interesting problem
from the qualitative perspective because of inherent redundancies
and ambiguities in textual descriptions. The methods used in search
engines in order to retrieve documents most similar to user-defined
sets of keywords are not applicable to targets which are medium
to large size documents, because of even greater noise effects stemming
from the presence of a large number of words unrelated to the overall
topic in the document. The inverted representation is the dominant
method for indexing text, but it is not as suitable for document-to-document
similarity search, as for short user-queries. One way of improving
the quality of similarity search is Latent Semantic Indexing (LSI),
which maps the documents from the original set of words to a concept
space. Unfortunately, LSI maps the data into a domain in which it
is not possible to provide effective indexing techniques. In this
paper, we investigate new ways of providing conceptual search among
documents by creating a representation in terms of conceptual word-chains.
This technique also allows effective indexing techniques so that
similarity queries can be performed on large collections of documents
by accessing a small amount of data. We demonstrate that our scheme
outperforms standard textual similarity search on the inverted representation
both in terms of quality and search efficiency.},
added-at = {2006-07-16T10:28:56.000+0200},
author = {Piatetsky-Shapiro, Gregory},
biburl = {https://www.bibsonomy.org/bibtex/2f696ed24ccfe7b731e306a01a2fd3dbd/grani},
doi = {http://doi.acm.org/10.1145/846183.846197},
interhash = {fb51f5854b03b7a35677869be1536284},
intrahash = {f696ed24ccfe7b731e306a01a2fd3dbd},
journal = {SIGKDD Explor. Newsl.},
keywords = {Data Indexing, Minig, Concept Similarity MEtrics},
note = {Sumary: Good Summary of and introduction to KDD. Emphasizis is placed
on Data Mining and more "data" (in contrast to document) oriented
applications. Starting point for KDD},
number = 2,
pages = {59--61},
pdf = {C:\Dokumente und Einstellungen\mgrani.KNOW\Eigene Dateien\private\PhD\Text\Representation.and.Metrics\similarityMeasures\publications\aggarwal01effectiveSimilaritySearch.pdf},
publisher = {ACM Press},
timestamp = {2006-07-16T10:28:56.000+0200},
title = {Knowledge discovery in databases: 10 years after},
volume = 1,
year = 2000
}