Integreat will propose new methods to couple statistical models with deep learning, in order to exploit advantages from both. We will blend statistical sampling approaches with backpropagation-based feature explainability mechanisms, and study regularisations which imposes interpretability. Integreat will work on attributing the significance to each factor in explaining individual predictions taken by machine learning (ML) algorithms.
Methods we will develop originate from classical statistical testing, in-silico controlled knock-outs and game theory, like Shapley values, which we have started to extend to dependent features. We believe that the currently most promising approach is using non-parametric vine copulae. A further important element of trustworthiness of ML models is robustness to model assumptions and inference, as well as to adversarial attacks. We will leverage knowledge about specific artefacts (e.g., adversarial backdoor attacks, Clever Hans effects), to develop robust systems that are self-explanatory. We will develop new ways to understand deep learning systems, utilizing information theoretic concepts and knowledge-infused learning. The general limits of explainability in “black boxes” and the role of uncertainty quantification to provide explainability are further themes of research for us.
For fairness, one should not discard factors which can bias decisions, like gender or race, because such bias would persist in other correlated variables and be less easily quantifiable. Instead, we will propose approaches that detect and control the bias, starting from counterfactual fairness, that explicitly intervenes on protected features to quantify a potential bias. Furthermore, to determine which counterfactuals to test, we will detect factors that influence large numbers of variables in the causal graph representing the domain knowledge.
Key researchers in this research theme: