Explainability and Fairness

Deep learning is a powerful way of approximating functions. However, its solutions are often “black boxes”, which prevents their use in life- and safety-critical applications in health, transport, law and welfare. “White and grey” statistical models, such as generalised additive and vine copula models, are explainable, allow non-linear effects and complex dependence, but sacrifice complex interactions for interpretability.

Integreat will propose new methods to couple statistical models with deep learning, in order to exploit advantages from both. We will blend statistical sampling approaches with backpropagation-based feature explainability mechanisms, and study regularisations which imposes interpretability. Integreat will work on attributing the significance to each factor in explaining individual predictions taken by machine learning (ML) algorithms.

Methods we will develop originate from classical statistical testing, in-silico controlled knock-outs and game theory, like Shapley values, which we have started to extend to dependent features. We believe that the currently most promising approach is using non-parametric vine copulae. A further important element of trustworthiness of ML models is robustness to model assumptions and inference, as well as to adversarial attacks. We will leverage knowledge about specific artefacts (e.g., adversarial backdoor attacks, Clever Hans effects), to develop robust systems that are self-explanatory. We will develop new ways to understand deep learning systems, utilizing information theoretic concepts and knowledge-infused learning. The general limits of explainability in “black boxes” and the role of uncertainty quantification to provide explainability are further themes of research for us.

For fairness, one should not discard factors which can bias decisions, like gender or race, because such bias would persist in other correlated variables and be less easily quantifiable. Instead, we will propose approaches that detect and control the bias, starting from counterfactual fairness, that explicitly intervenes on protected features to quantify a potential bias. Furthermore, to determine which counterfactuals to test, we will detect factors that influence large numbers of variables in the causal graph representing the domain knowledge.

Key researchers in this research theme:

Publisert 3. juli 2023 10:55 - Sist endret 6. sep. 2023 18:23