Integreat's Four Inter-related Scientific Objectives

Integreat’s research is organised in 9 interconnected research themes (RTs) each focusing on specific methods and challenges of knowledge-driven machine learning. Integreat will reshape the foundations of machine learning to fulfil its immense promises, through four inter-related scientific objectives.

Objective 1: Accurate knowledge-driven machine learning

The usual way information is introduced into machine learning (ML) is implicit and by means of massive training data. Instead, Integreat will develop new methods to inject available knowledge directly into analyses, so to increase precision and robustness of solutions while shortcutting learning efforts. ML driven by domain-knowledge will leverage knowledge about the structures, processes and dynamics of the modelled system and about the data generating mechanisms. We will formulate new frameworks that combine exact domain knowledge (e.g., in the form of must-link and cannot relationships, logical formalisations and formal ontologies), more imprecise and subjective information (e.g., most-likely-links, prior beliefs and stochastic relations) and data, in a coherent probabilistic theory, leading to provably more accurate discoveries, predictions and decisions.

Furthermore, to increase accuracy of ML, we will study methods to enlarge training-data on the basis of knowledge, which require understanding of the coverage space of data, tail behaviour and optimal sample design. Smaller, noisier, or even biased data can become more useful when knowledge drives ML. Knowledge-driven ML will be designed to also work with such data.

Objective 2: Sustainable and green machine learning

The carbon impact of data-centric ML is growing.  State-of-the-art ML algorithms can consume an immense amount of energy, and models are often trained on costly, large, manually-annotated data sets. To achieve a linear gain in performance, it is necessary to train an exponentially larger neural net.

The footprint of data-centric ML can to some extent be reduced by more efficient hardware and use of renewable energy, but not sufficiently to make it economically, technically and environmentally sustainable in the future. Integreat will add a new dimension, developing knowledge-driven methods for saving energy. Our research will contribute to this aim by e.g., reusing and integrating data, by transfer learning, by developing causal and logical models (as these persist) and by developing more parsimonious models. We will document in detail the energy footprint of our algorithms and methods, to allow for more complete cost-benefit analyses. Because storage also consumes energy, Integreat will propose knowledge-driven ways to compress, project and reduce data, while managing the loss of information. Furthermore, we will develop active learning methods, to determine how training data could be enlarged systematically and sustainably. UN Sustainable Development Goals 12 and 13 are directly addressed by Integreat, and knowledge-driven ML will indirectly contribute to many of their other goals too. 

Objective 3: Fair, explainable and trustworthy machine learning

Today, ML can perpetuate and even amplify discrimination, because models are trained on historical data, which are often biased and non-representative, and the data labels can reflect outdated policies. Integreat will develop knowledge-driven methods to de-bias data sets and models. We will further work towards theoretical guarantees of fairness, exploiting a priori domain knowledge. For example, we will characterise fairness of algorithm-based decisions, by representing how variables change in response to knowledge-driven counterfactual interventions and exploiting logic-based knowledge representations. Explainability of black-box algorithms, such as deep learning, is today a major challenge for ML. As ML systems begin making decisions previously entrusted to humans, it becomes critical to explain the reasons behind their conclusions and make them fully interpretable by humans. Current ML is unable to explain the rationale and reasons for its solutions, while Integreat’s knowledge-based approaches will be explainable. Our new solutions will enable understanding of the inner-workings of deep learning and other black- and grey-box approaches, by augmenting them with knowledge-based explainable white-box components. Integreat will examine the fairness and explainability of ML algorithms as a necessary component of building public trust in them. We will detect and solve the lack of robustness in traditional ML approaches, in order to achieve greater trustworthiness. ML algorithms are easily fooled into misclassifications by minor perturbations of the data and by out-of-distribution data.
The foundations of fairness, explainability and trustworthiness reside in ethical analysis. It is necessary to examine from a philosophical position how these concepts are defined and practiced, in which way researchers, including ourselves, face the dilemma between accuracy vs. transparency, and how our results are committed to respecting each individual. Ethics research in Integreat will develop solutions to these questions. 

Objective 4: Machine learning with quantified uncertainty

Data are most often incomplete, noisy and inconsistent, knowledge and models are imperfect, and algorithms are approximate. Therefore, estimates, generalisations, predictions and decisions produced by ML are intrinsically uncertain. To provide the trust needed to inform decisions, uncertainty of results must be quantified. Current ML struggles to assess uncertainty in a precise and coherent way. Therefore, at Integreat we will represent and model multiple sources of uncertainty probabilistically, thus quantifying the reliability of results. Importantly, uncertainty quantification can also reveal disagreement between data and knowledge, as well as between different data sources. We will develop methods that automatically alert us of such discrepancies, a safety insurance for our knowledge-driven methods. At Integreat we can reduce the uncertainty of solutions considerably by combining knowledge and data. 


Publisert 4. juli 2023 08:25 - Sist endret 26. juni 2024 13:42