Article,

Studying bias in visual features through the lens of optimal transport

, , , , and .
ECML-PKDD 2023 Journal Track, 38 (1): 281--312 (2023)
DOI: 10.1007/S10618-023-00986-W

Abstract

Computer vision systems are employed in a variety of high-impact applications. However, making them trustworthy requires methods for the detection of potential biases in their training data, before models learn to harm already disadvantaged groups in downstream applications. Image data are typically represented via extracted features, which can be hand-crafted or pre-trained neural network embeddings. In this work, we introduce a framework for bias discovery given such features that is based on optimal transport theory; it uses the (quadratic) Wasserstein distance to quantify disparity between the feature distributions of two demographic groups (e.g., women vs men). In this context, we show that the Kantorovich potentials of the images, which are a byproduct of computing the Wasserstein distance and act as “transportation prices", can serve as bias scores by indicating which images might exhibit distinct biased characteristics. We thus introduce a visual dataset exploration pipeline that helps auditors identify common characteristics across high- or lowscored images as potential sources of bias. We conduct a case study to identify prospective gender biases and demonstrate theoretically-derived properties with experiments on the CelebA and Biased MNIST datasets. Keywords Fairness and bias · Computer vision · Optimal transport · Dataset exploration · Tools and frameworks · Wasserstein distance

Tags

Users

  • @entoutsi
  • @aiml_group
  • @l3s

Comments and Reviews