Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words.

V. Sridhar. VS@HLT-NAACL, Seite 192-200. The Association for Computational Linguistics, (2015)
DOI: 10.3115/v1/W15-1526

Zusammenfassung

We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA) need aggregation of short messages to avoid data sparsity in short documents, our framework works on large amounts of raw short texts (billions of words). In contrast with other topic modeling frameworks that use word cooccurrence statistics, our framework uses a vector space model that overcomes the issue of sparse word co-occurrence patterns. We demonstrate that our framework outperforms LDA on short texts through both subjective and objective evaluation. We also show the utility of our framework in learning topics and classifying short texts on Twitter data for English, Spanish, French, Portuguese and Russian.

@ghagerers Tags hervorgehoben

Zitieren Sie diese Publikation

@inproceedings{conf/naacl/Sridhar15a, abstract = {We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA) need aggregation of short messages to avoid data sparsity in short documents, our framework works on large amounts of raw short texts (billions of words). In contrast with other topic modeling frameworks that use word cooccurrence statistics, our framework uses a vector space model that overcomes the issue of sparse word co-occurrence patterns. We demonstrate that our framework outperforms LDA on short texts through both subjective and objective evaluation. We also show the utility of our framework in learning topics and classifying short texts on Twitter data for English, Spanish, French, Portuguese and Russian.}, added-at = {2020-02-06T12:48:32.000+0100}, author = {Sridhar, Vivek Kumar Rangarajan}, biburl = {https://www.bibsonomy.org/bibtex/250b9889e5cfd4bf64a70aa5544435ef8/ghagerer}, booktitle = {VS@HLT-NAACL}, crossref = {conf/naacl/2015vs}, doi = {10.3115/v1/W15-1526}, editor = {Blunsom, Phil and Cohen, Shay B. and Dhillon, Paramveer S. and Liang, Percy}, ee = {https://www.aclweb.org/anthology/W15-1526/}, interhash = {f5daad5965c9f5d6a819da085082c05b}, intrahash = {50b9889e5cfd4bf64a70aa5544435ef8}, isbn = {978-1-941643-46-4}, keywords = {bag-of-concepts clustering document-embeddings gmms topic-modeling unsupervised word-vectors}, pages = {192-200}, publisher = {The Association for Computational Linguistics}, timestamp = {2020-06-24T14:49:07.000+0200}, title = {Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words.}, url = {https://www.aclweb.org/anthology/W15-1526.pdf}, year = 2015 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words.

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words.

Zusammenfassung

Links und Ressourcen

Tags

Community

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Unsupervised Topic Modeling for Short Texts Using Distributed Representations of Words.

Kommentare und Rezensionen
(0)