copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

S. Sharifirad, B. Jafarpour, and S. Matwin. Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), page 107--114. Brussels, Belgium, Association for Computational Linguistics, (October 2018)
DOI: 10.18653/v1/W18-5114

Abstract

Text classification models have been heavily utilized for a slew of interesting natural language processing problems. Like any other machine learning model, these classifiers are very dependent on the size and quality of the training dataset. Insufficient and imbalanced datasets will lead to poor performance. An interesting solution to poor datasets is to take advantage of the world knowledge in the form of knowledge graphs to improve our training data. In this paper, we use ConceptNet and Wikidata to improve sexist tweet classification by two methods (1) text augmentation and (2) text generation. In our text generation approach, we generate new tweets by replacing words using data acquired from ConceptNet relations in order to increase the size of our training set, this method is very helpful with frustratingly small datasets, preserves the label and increases diversity. In our text augmentation approach, the number of tweets remains the same but their words are augmented (concatenation) with words extracted from their ConceptNet relations and their description extracted from Wikidata. In our text augmentation approach, the number of tweets in each class remains the same but the range of each tweet increases. Our experiments show that our approach improves sexist tweet classification significantly in our entire machine learning models. Our approach can be readily applied to any other small dataset size like hate speech or abusive language and text classification problem using any machine learning model.

Links and resources

BibTeX key: sharifirad-etal-2018-boosting
entry type: inproceedings
address: Brussels, Belgium
booktitle: Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)
year: 2018
month: oct
pages: 107--114
publisher: Association for Computational Linguistics
DOI: 10.18653/v1/W18-5114
url: https://www.aclweb.org/anthology/W18-5114

@albinzehe's tags highlighted

proposal-knowledge

Cite this publication

%0 Conference Paper %1 sharifirad-etal-2018-boosting %A Sharifirad, Sima %A Jafarpour, Borna %A Matwin, Stan %B Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) %C Brussels, Belgium %D 2018 %I Association for Computational Linguistics %K proposal-knowledge %P 107--114 %R 10.18653/v1/W18-5114 %T Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs %U https://www.aclweb.org/anthology/W18-5114 %X Text classification models have been heavily utilized for a slew of interesting natural language processing problems. Like any other machine learning model, these classifiers are very dependent on the size and quality of the training dataset. Insufficient and imbalanced datasets will lead to poor performance. An interesting solution to poor datasets is to take advantage of the world knowledge in the form of knowledge graphs to improve our training data. In this paper, we use ConceptNet and Wikidata to improve sexist tweet classification by two methods (1) text augmentation and (2) text generation. In our text generation approach, we generate new tweets by replacing words using data acquired from ConceptNet relations in order to increase the size of our training set, this method is very helpful with frustratingly small datasets, preserves the label and increases diversity. In our text augmentation approach, the number of tweets remains the same but their words are augmented (concatenation) with words extracted from their ConceptNet relations and their description extracted from Wikidata. In our text augmentation approach, the number of tweets in each class remains the same but the range of each tweet increases. Our experiments show that our approach improves sexist tweet classification significantly in our entire machine learning models. Our approach can be readily applied to any other small dataset size like hate speech or abusive language and text classification problem using any machine learning model.

@inproceedings{sharifirad-etal-2018-boosting, abstract = {Text classification models have been heavily utilized for a slew of interesting natural language processing problems. Like any other machine learning model, these classifiers are very dependent on the size and quality of the training dataset. Insufficient and imbalanced datasets will lead to poor performance. An interesting solution to poor datasets is to take advantage of the world knowledge in the form of knowledge graphs to improve our training data. In this paper, we use ConceptNet and Wikidata to improve sexist tweet classification by two methods (1) text augmentation and (2) text generation. In our text generation approach, we generate new tweets by replacing words using data acquired from ConceptNet relations in order to increase the size of our training set, this method is very helpful with frustratingly small datasets, preserves the label and increases diversity. In our text augmentation approach, the number of tweets remains the same but their words are augmented (concatenation) with words extracted from their ConceptNet relations and their description extracted from Wikidata. In our text augmentation approach, the number of tweets in each class remains the same but the range of each tweet increases. Our experiments show that our approach improves sexist tweet classification significantly in our entire machine learning models. Our approach can be readily applied to any other small dataset size like hate speech or abusive language and text classification problem using any machine learning model.}, added-at = {2020-07-28T14:17:05.000+0200}, address = {Brussels, Belgium}, author = {Sharifirad, Sima and Jafarpour, Borna and Matwin, Stan}, biburl = {https://www.bibsonomy.org/bibtex/2f373379a078fc69d51cd20d63e77047e/albinzehe}, booktitle = {Proceedings of the 2nd Workshop on Abusive Language Online ({ALW}2)}, doi = {10.18653/v1/W18-5114}, interhash = {0c8edd1ca961ecdb57aa8715b636c1e9}, intrahash = {f373379a078fc69d51cd20d63e77047e}, keywords = {proposal-knowledge}, month = oct, pages = {107--114}, publisher = {Association for Computational Linguistics}, timestamp = {2020-07-28T14:17:05.000+0200}, title = {Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs}, url = {https://www.aclweb.org/anthology/W18-5114}, year = 2018 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs

Comments and Reviews
(0)