@parismic

Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation

, and . Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), page 200--208. Online, Association for Computational Linguistics, (November 2020)
DOI: 10.18653/v1/2020.wnut-1.26

Abstract

Data augmentation has been shown to be effective in providing more training data for machine learning and resulting in more robust classifiers. However, for some problems, there may be multiple augmentation heuristics, and the choices of which one to use may significantly impact the success of the training. In this work, we propose a metric for evaluating augmentation heuristics; specifically, we quantify the extent to which an example is ``hard to distinguish'' by considering the difference between the distribution of the augmented samples of different classes. Experimenting with multiple heuristics in two prediction tasks (positive/negative sentiment and verbosity/conciseness) validates our claims by revealing the connection between the distribution difference of different classes and the classification accuracy.

Description

Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation - ACL Anthology

Links and resources

Tags

community

  • @parismic
  • @dblp
@parismic's tags highlighted