Abstract
Interpretable representations are the backbone of many black-box explainers.
They translate the low-level data representation necessary for good predictive
performance into high-level human-intelligible concepts used to convey the
explanation. Notably, the explanation type and its cognitive complexity are
directly controlled by the interpretable representation, allowing to target a
particular audience and use case. However, many explainers that rely on
interpretable representations overlook their merit and fall back on default
solutions, which may introduce implicit assumptions, thereby degrading the
explanatory power of such techniques. To address this problem, we study
properties of interpretable representations that encode presence and absence of
human-comprehensible concepts. We show how they are operationalised for
tabular, image and text data, discussing their strengths and weaknesses.
Finally, we analyse their explanatory properties in the context of tabular
data, where a linear model is used to quantify the importance of interpretable
concepts.
Users
Please
log in to take part in the discussion (add own reviews or comments).