Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Signiﬁcant Phrases Detection

M. Hart, und M. Bautin. (2007)

Zusammenfassung

SIPs are not infallible and may produce phrases that have no bearing on the content in general. There- The problem of determining key words fore, it is clear that there are phrases in text that are and phases which best characterize a text signiﬁcant inasmuch that they signify the content of document has important applications such the document. We pose this research question: by as building a compact index for a large- selecting and displaying signiﬁcant phrases, are we scale text processing system, or using a able to give users a sense of the general ideas, bet- keyword set for summarization and topic ter understanding, and increased search power of the detection. We approached this problem text? What properties do signicant phrases posses from two perspectives. Our knowledge- and how can we identify them? poor approach is based on statistical collo- We will approach this problem from two per- cation detection using the t-test and like- spectives: Knowledge Poor and Knowledge Rich. lihood ratio, and applying latent seman- Knowledge Poor techniques rely on using shallow tic analysis to identify terms important in text processing which primarily utilizes the informa- a particular document. The knowledge- tion about word and collocation frequencies. From rich approach addresses the problem us- the Knowledge Rich perspective, we hope to use ing noun phrase chunking and coreference many computational linguistic techniques to intel- resolution. Both approaches use a deci- ligently parse documents and rank words to dis- sion tree classiﬁer to answer whether a cover meaningful phrases. We have compared these given phrase is a key word looking at the two approaches in selecting signiﬁcant phrases, and set of calculated features. We have built found that they should be combined to augment each prototypes and compared results of these other. The knowledge poor approach is robust and two approaches. fast, but the knowledge rich approach has the ad- vantage of tackling phrases relevant to the contents more precisely.

Beschreibung

Algorithm for key words detection based on SIPs (Statistically Improbable Phrases)

Links und Ressourcen

BibTeX-Schlüssel: BautinHart2007
Eintragstyp: article
Jahr: 2007

Zitieren Sie diese Publikation

%0 Journal Article %1 BautinHart2007 %A Hart, Michael %A Bautin, Mikhail %D 2007 %K algorithms detection idiom keywords phrases similarity %T Signiﬁcant Phrases Detection %X SIPs are not infallible and may produce phrases that have no bearing on the content in general. There- The problem of determining key words fore, it is clear that there are phrases in text that are and phases which best characterize a text signiﬁcant inasmuch that they signify the content of document has important applications such the document. We pose this research question: by as building a compact index for a large- selecting and displaying signiﬁcant phrases, are we scale text processing system, or using a able to give users a sense of the general ideas, bet- keyword set for summarization and topic ter understanding, and increased search power of the detection. We approached this problem text? What properties do signicant phrases posses from two perspectives. Our knowledge- and how can we identify them? poor approach is based on statistical collo- We will approach this problem from two per- cation detection using the t-test and like- spectives: Knowledge Poor and Knowledge Rich. lihood ratio, and applying latent seman- Knowledge Poor techniques rely on using shallow tic analysis to identify terms important in text processing which primarily utilizes the informa- a particular document. The knowledge- tion about word and collocation frequencies. From rich approach addresses the problem us- the Knowledge Rich perspective, we hope to use ing noun phrase chunking and coreference many computational linguistic techniques to intel- resolution. Both approaches use a deci- ligently parse documents and rank words to dis- sion tree classiﬁer to answer whether a cover meaningful phrases. We have compared these given phrase is a key word looking at the two approaches in selecting signiﬁcant phrases, and set of calculated features. We have built found that they should be combined to augment each prototypes and compared results of these other. The knowledge poor approach is robust and two approaches. fast, but the knowledge rich approach has the ad- vantage of tackling phrases relevant to the contents more precisely.

@article{BautinHart2007, abstract = { SIPs are not infallible and may produce phrases that have no bearing on the content in general. There- The problem of determining key words fore, it is clear that there are phrases in text that are and phases which best characterize a text signiﬁcant inasmuch that they signify the content of document has important applications such the document. We pose this research question: by as building a compact index for a large- selecting and displaying signiﬁcant phrases, are we scale text processing system, or using a able to give users a sense of the general ideas, bet- keyword set for summarization and topic ter understanding, and increased search power of the detection. We approached this problem text? What properties do signicant phrases posses from two perspectives. Our knowledge- and how can we identify them? poor approach is based on statistical collo- We will approach this problem from two per- cation detection using the t-test and like- spectives: Knowledge Poor and Knowledge Rich. lihood ratio, and applying latent seman- Knowledge Poor techniques rely on using shallow tic analysis to identify terms important in text processing which primarily utilizes the informa- a particular document. The knowledge- tion about word and collocation frequencies. From rich approach addresses the problem us- the Knowledge Rich perspective, we hope to use ing noun phrase chunking and coreference many computational linguistic techniques to intel- resolution. Both approaches use a deci- ligently parse documents and rank words to dis- sion tree classiﬁer to answer whether a cover meaningful phrases. We have compared these given phrase is a key word looking at the two approaches in selecting signiﬁcant phrases, and set of calculated features. We have built found that they should be combined to augment each prototypes and compared results of these other. The knowledge poor approach is robust and two approaches. fast, but the knowledge rich approach has the ad- vantage of tackling phrases relevant to the contents more precisely. }, added-at = {2010-12-23T18:55:37.000+0100}, author = {Hart, Michael and Bautin, Mikhail}, biburl = {https://www.bibsonomy.org/bibtex/27c4b26cc63190a1bc27161b5f425b2f4/dzibold}, description = {Algorithm for key words detection based on SIPs (Statistically Improbable Phrases)}, interhash = {635c90448b1f1a8c5b000ea4f578c06a}, intrahash = {7c4b26cc63190a1bc27161b5f425b2f4}, keywords = {algorithms detection idiom keywords phrases similarity}, timestamp = {2010-12-23T18:55:38.000+0100}, title = {Signiﬁcant Phrases Detection}, year = 2007 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Signiﬁcant Phrases Detection

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Signiﬁcant Phrases Detection

Zusammenfassung

Beschreibung

Links und Ressourcen

Tags

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Signiﬁcant Phrases Detection

Kommentare und Rezensionen
(0)