Probabilistic Modelling and Uncertainty

Quantifying uncertainty in machine learning (ML) is an active research area, because current methods are insufficient and often lack a theoretical basis. With the purpose to strengthen predictive precision and quantifying uncertainty, Integreat will inject knowledge into ML by developing further the Bayesian perspective of ML, through new forms of knowledge-based informative priors, penalties, complex latent structure, model ensembles and network designs.

Most current ML methods are not specifically designed for their intended use, which is also a form of knowledge. One may for instance want to estimate conditional distributions in order to assess how some variables affect others. In other situations, rare events are of interest, whose occurrence is described by the tails of the model. The methods and model that give the overall most accurate fit to the data are not necessarily the best to describe specific features, like conditional probabilities or tails. We will build ML models and inference methods that are optimised towards specific aims, e.g., via the Focussed Information Criterion.

Often, stationarity and equilibrium of systems cannot be assumed; new and different data become available; knowledge changes, because of new understanding or varying opinions. Building on our work on anomaly and change detection, we will model knowledge change, to make optimal predictions and decisions. Integreat will study the sustainability of probabilistic modelling. To obtain energy savings, we will develop new sustainability penalisations and study how to combine these with classical performance measures. For example, because uncertainty is expected to shrink when good quality data grow or additional knowledge becomes available, Bayesian uncertainty quantification allows to detect the incapacity to learn further, allowing to save energy.

Integreat will develop theoretical guarantees of properties of new methods, in addition to empirical validation on benchmark data, cross-validation and assessing predictability on new data, which are traditional approaches in ML. Too often benchmarks represent surpassed situations and may be a result of biased or unrealistic labelling. We will critically assess benchmark quality and generalisability by exploiting knowledge about the application domain. Theoretical properties include statistical consistency, efficiency, asymptotic and small sample properties, robustness and convergence.

Integreat will equip model comparison, selection and averaging with knowledge-based performance metrics that include uncertainty. We will work with the concept of inferential focus, i.e. the operational view that the domain’s questions influence the optimal combinations of models and their analysis.

Integreat will evaluate the usefulness of knowledge to reach our four objectives: which knowledge is relevant, when, where should it be infused into models and how. Simulation-based comparison and cross-validation will accompany theoretical approaches. We will study robustness and stability to knowledge assumptions, investigate the effects of false knowledge and divergence between knowledge, models and data. 


Key researchers in this research theme:

Publisert 26. juni 2023 13:19 - Sist endret 6. sep. 2023 18:15