Scalable Approximation of Models and Algorithms

In Bayesian inference, Markov Chain Monte Carlo (MCMC) algorithms are considered the gold standard but are far too slow for big data and complex models. Recent developments (from superefficient subsampling and quasi-stationary Monte Carlo (MC), to non-reversible processes) are computationally unfeasible in high dimensions. A different approach is to approximate the original model with one which can be efficiently sampled from.

Integreat will develop approximation theories that preserve structure and knowledge, by moving away from simplistic approximations, like mean-field models. In the spirit of normalising flows, we will propose knowledge-based, explainable Variational Bayes approximations, using causal graphs and vine copulae, which are flexible decompositions of any high dimensional model, capturing well also complex multivariate tail dependence. Their particular structure makes them computationally tractable. Similarly, we will investigate how to make active use of knowledge in dimension reduction, matrix factorisation and vector space embeddings, for example by informative penalisations. One further direction is extending sequential MC methods to high dimensions, combining efficient sampling with online updating of static parameters.

When first-principles models of the system exist, Approximate Bayesian Computation (ABC) approaches are very useful: being likelihood-free, they can be scalable, and because of relying on a mechanistic simulator, they can preserve knowledge. Integreat will deepen simulator-based machine learning (ML) by efficiently combining first-principles models with deep learning, to form more theoretically plausible generative models. Integreat will host ELFI (Engine for Likelihood-Free Inference, elfi.ai) which is a leading open software inference platform for simulator-based models, and which efficiently uses Bayesian optimization through emulator models. Inspired by our recent work that uses GP emulators with bounded errors to replace the proposal function in MCMC, we will combine GP emulators and neural network architectures. This can be done by algorithms that efficiently use emulator-generated training data (cheap to forward simulate) in deep learning (flexible and accurate with sufficient training). Incorporating knowledge in this way, we will significantly improve the exploration-exploitation decisions of ABC. We will also develop knowledge-steered data sampling procedures to guide optimization for better performance.

Huge data storage emerge as threats for our climate. Integreat will suggest new ways to approximate and compact data developing further statistical sufficiency and data distillation to incorporate domain knowledge. Learning what to forget, like humans do, also reduces storage needs. We will develop methods which use knowledge to decide which part of the information in the data that needs to be synthetized in appropriate summaries for future tasks or transfer learning.

Integreat will explore cloud computing ecosystems, as distributed computing implies agility and adaptability, and saves energy. Federated training and distributed inference will be knowledge-driven, and we will propose algorithms that can automatically telescope and scale up or down, depending on accuracy needs. 
 


Key researchers in this research theme:

Publisert 3. juli 2023 10:56 - Sist endret 6. sep. 2023 18:18