Abstract
Machine learning is a rapidly evolving technology with
manifold benefits. At its core lies the mapping between samples and corresponding target labels (SL-Mappings). Such
mappings can originate from labeled dataset samples or from
prediction generated during model inference. The correctness
of SL-Mappings is crucial, both during training and for model
predictions, especially when considering poisoning attacks.
Existing standalone works from the dataset cleaning and
prediction confidence scoring domains lack a dual-use tool offering an SL-Mappings score, which is impractical. Moreover,
these works have drawbacks, e.g., dependence on specific
model architectures and reliance on large datasets, which may
not be accessible, or lack a meaningful confidence score.
In this paper, we introduce LabelTrust, a versatile tool designed to generate confidence scores for SL-Mappings. We
propose pipelines facilitating dataset cleaning and confidence
scoring, mitigating the limitations of existing standalone approaches from each domain. Thereby, LabelTrust leverages
a Siamese network trained via few-shot learning, requiring
minimal clean samples and is agnostic to datasets and model
architectures. We demonstrate LabelTrust’s efficacy in detecting poisoning attacks within samples and predictions alike,
with a modest one-time training overhead of 34.56 seconds
and an evaluation time of less than 1 second per SL-Mapping.
Links and resources
Tags
community