@ngandong

Revisiting Feature Selection with Data Complexity

, and . 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), page 211-216. IEEE, (October 2020)
DOI: 10.1109/BIBE50027.2020.00042

Abstract

The identification of biomarkers or predictive features that are indicative of a specific biological or disease state is a major research topic in biomedical applications. Several feature selection (FS) methods ranging from simple univariate methods to recent deep-learning methods have been proposed to select a minimal set of the most predictive features. However, the main question of which method to use when remains unanswered. We study the above problem from the perspective of data complexity and ask if data complexity measures can be used to guide the selection of the most-suitable method. We perform a comparative study of 11 feature selection methods over 27 publicly available datasets evaluated over a range of the number of selected features using classification as the downstream task. We (empirically) show that as regard to classification, the performance of all studied feature selection methods is highly correlated with the error rate of a nearest-neighbor based classifier. We also argue about the non-suitability of studied complexity measures to determine the optimal number of relevant features. While looking closely at several other aspects, we provide recommendations for choosing a particular FS method for a given dataset.

Description

Revisiting Feature Selection with Data Complexity - IEEE Conference Publication

Links and resources

Tags

community

  • @khosla
  • @dblp
  • @ngandong
@ngandong's tags highlighted