Research Themes

Publisert 3. juli 2023 10:56

The reliance on large quantities of labelled training data currently represents a severe bottleneck for machine learning (ML). Training data sets are often tailored to each application and ML models are typically trained to perform a single task only, and often require retraining from scratch rather than building on what has previously been learned. Transfer learning refers to sharing of knowledge from a source task to target tasks to boost generalisation capabilities. However, generalisability across domains is often poor, and the usefulness of current transfer learning methods remains unpredictable and brittle.

Publisert 3. juli 2023 10:56

In Bayesian inference, Markov Chain Monte Carlo (MCMC) algorithms are considered the gold standard but are far too slow for big data and complex models. Recent developments (from superefficient subsampling and quasi-stationary Monte Carlo (MC), to non-reversible processes) are computationally unfeasible in high dimensions. A different approach is to approximate the original model with one which can be efficiently sampled from.

Publisert 3. juli 2023 10:55

Natural language allows us to communicate about matters of unlimited complexity and as such constitutes a very powerful representation of knowledge. Access to the knowledge contained in vast amounts of text and audio requires high quality natural language processing (NLP) tools. The use of deep neural networks, based on sequential modelling of text, has led to a paradigm shift in NLP. To date, NLP research is dominated by classification and sequence labelling techniques in combination with language models that are pre-trained via prediction of surface properties over vast amounts of raw text. Linguistically, however, this development is paradoxical: language structure is hierarchical in nature, so that sequential models can at best be approximate.

Publisert 3. juli 2023 10:55

Integreat will integrate different forms of knowledge into joint analyses, as well as knowledge with a large variety of data types. Knowledge can be expressed by means of stochastic models, differential equations, semantic technologies, graphical representations, and logic-based ontologies and so on: the gap between these approaches is considerable today, but Integreat will study how they can be used in combination.

Publisert 3. juli 2023 10:55

Most real-life data are structured; for example, they are often relational and stored in databases, hierarchical as in XML, or in the form of knowledge graphs in formats such as RDF and Property Graphs. Moreover, logic is used to systematise and formalise complex domain knowledge, e.g., in the form of ontologies, and logical inference can be used to enrich explicit structured data (e.g., relational databases or knowledge graphs) with implicit information that logically follows from the explicit data and ontology. Machine learning (ML) can benefit tremendously from taking both the data structure and the logical knowledge into account. However, having structured data as input is challenging for classical ML approaches, since they are not able to work with such data directly, and the challenge increases in the presence of logical knowledge.

Publisert 3. juli 2023 10:55

Deep learning is a powerful way of approximating functions. However, its solutions are often “black boxes”, which prevents their use in life- and safety-critical applications in health, transport, law and welfare. “White and grey” statistical models, such as generalised additive and vine copula models, are explainable, allow non-linear effects and complex dependence, but sacrifice complex interactions for interpretability.

Publisert 3. juli 2023 10:55

Integreat will have a research activity in ethical aspects of knowledge-driven machine learning (ML) to analyse its fundamental ethical dilemmas. We will employ the methods of analytical philosophy, in conjunction with experimental philosophy, to examine three interrelated themes which pertain to all other research themes and define the ethical ground of our four objectives.

Publisert 3. juli 2023 10:55

When humans reason, they typically think in terms of cause and effect, an ability that current machine learning (ML) models generally lack. This may prove to be a fundamental limitation; in fact, the lack of robustness and generalisation in current ML models has recently been linked to their incapacity to identify and exploit causal structures. We can assume that systems under study are modular and built up of independent causal components, and when moving between domains, some of these components remain unchanged. A system that recognises cause-and-effect relations is thus able to transfer knowledge and is more explainable. short introduction to the research theme.

Publisert 26. juni 2023 13:19

Quantifying uncertainty in machine learning (ML) is an active research area, because current methods are insufficient and often lack a theoretical basis. With the purpose to strengthen predictive precision and quantifying uncertainty, Integreat will inject knowledge into ML by developing further the Bayesian perspective of ML, through new forms of knowledge-based informative priors, penalties, complex latent structure, model ensembles and network designs.