Zusammenfassung
It is well understood that classification algorithms, for example, for
deciding on loan applications, cannot be evaluated for fairness without taking
context into account. We examine what can be learned from a fairness oracle
equipped with an underlying understanding of ``true'' fairness. The oracle
takes as input a (context, classifier) pair satisfying an arbitrary fairness
definition, and accepts or rejects the pair according to whether the classifier
satisfies the underlying fairness truth. Our principal conceptual result is an
extraction procedure that learns the underlying truth; moreover, the procedure
can learn an approximation to this truth given access to a weak form of the
oracle. Since every ``truly fair'' classifier induces a coarse metric, in which
those receiving the same decision are at distance zero from one another and
those receiving different decisions are at distance one, this extraction
process provides the basis for ensuring a rough form of metric fairness, also
known as individual fairness. Our principal technical result is a higher
fidelity extractor under a mild technical constraint on the weak oracle's
conception of fairness. Our framework permits the scenario in which many
classifiers, with differing outcomes, may all be considered fair. Our results
have implications for interpretablity -- a highly desired but poorly defined
property of classification systems that endeavors to permit a human arbiter to
reject classifiers deemed to be ``unfair'' or illegitimately derived.
Beschreibung
[2004.01840] Abstracting Fairness: Oracles, Metrics, and Interpretability
Links und Ressourcen
Tags
Community