Grand Challenges in Mathematical Biology: Integrating Multiscale Modelling and Data
- ^{1}University of Franche-Comté, France
Despite the fact that mathematical applications to biology and medicine have been mentioned since the 18th century (Bernoulli,1760) (or even earlier, if we think of Fibonacci's example of a rabbit population growth (Sigler, 2002)), the evolution of this research field was slow until the 20th century. Unlike mathematical applications to physical problems (where equations can describe accurately the physical reality), applications to biological problems are much more complex due to the evolutionary nature of the living matter (Hartwell et al.,1999;Nanjudiah,2003). Moreover, this evolution of biological systems occurs on different spatial and temporal scales (see Figure 1), which renders the modelling much more challenging. Also, the mathematical and statistical investigation of these multi-scale models is more challenging, due to the complexities of the interactions between the scales.
The multi-scale mathematical models developed to describe biological phenomena can be qualitative or quantitative (Bondavalli et al.,2008; Saadatpour and Albert, 2016). While the qualitative models are used to make general predictions about the biological system, the quantitative models are more precise and specific about the system, being parametrised with specific data. The advances over the last two decades in terms of collecting various types of single-scale and multi-scale data in ecology, epidemiology, cell biology, immunology, neurobiology, plant biology, social sciences, etc., led to the development of more and more models parametrised with data and used to obtain new quantitative results (Hasenauer et al.,2015; Tran et al., 2017). In the following we discuss briefly some current research aspects related to multi-scale modelling in mathematical biology, as well as model parametrisation using multi-scale historical data, and conclude by mentioning the digital twin concept – an emerging technology that uses real-time data and will impact most biology/mathematical biology areas.
The last decade has seen an emphasis on the multi-scale aspects of different biological phenomena: from the multi-scale aspects of collective migration in bacteria/cells/animals (Deutsch et al.,2020), to the multi-scale landscape studies investigating the impact of environmental or habitat factors on the abundance or occurrence of species (Holland and Yang,2016; McGarigal et al.,2016), the multi-scale structure of cellular biological systems (Petridou et al.,2017; Schaffer and Ideker, 2021; Montagud et al.,2021), or the multi-scale aspects of viral infections in the context of immuno-epidemiology (Saad-Roy et al.,2022). However, our mechanistic understanding of these multi-scale aspects of various biological phenomena is still in its infancy, and more quantitative and qualitative modelling studies are necessary to advance the field. As an example, the current SARS-CoV-2 pandemic has highlighted the need to understand the impact of anti-viral immunity (at meso-scale) on virus evolution (at micro-scale) and virus transmission among individuals (at macro-scale), and the epidemiological and evolutionary implications of immune escape (Saad-Roy et al.,2022). Despite the large number of mathematical studies investigating single-scale and multi-scale dynamics of different viruses (e.g., influenza (Rudiger et al.,2019), HIV (Hosseini and Gabhann,2012)), the SARS-CoV-2 offered unexpected surprises that could not have been predicted by past modelling approaches, and which are still open questions. Such questions range from the mechanisms underlying the development of long COVID-19, to the number of viral particles a patient is exposed to (and how to quantify it) and the impact of this viral load on immune responses. Moreover, since current climate change will result in the emergence of new pathogens with new characteristics and increased cross-species
transmission risks (Carlson et al.,2022), one of the challenges of the future will be the development of new multi-scale models that combine evolutionary aspects of the new pathogens, with immunological aspects of pathogen infections, as well as epidemiological and ecological aspects of disease transmission at the level of populations (see Figure 1).
The development of these different multi-scale models is accompanied by challenges related to the development of new mathematical theories required to understand the behaviour of these models. For example, new numerical approaches are required to be developed to better deal with the numerical blow-up of solution densities in a class of (advection dominated) non-local multi-scale moving-boundary models developed in the context of multi-scale cancer spread (Suveges et al.,2020, 2021). As another example, the current bifurcation theory will have to be extended to consider also the bifurcation of patterns at multiple scales, especially when the bifurcation parameter connects the different scales; see Figure 1.
The last few decades, and especially the last few years, have seen an explosion in data collection throughout all biological fields and across multiple spatial and temporal scales (Conde et al.,2019; Farley et al.,2018; Dolinski and Troyanskaya,2015; Hariri et al.,2019). As an example we mention the molecular-level data collected via super-resolution microscopy (SRM) – whose development was acknowledged with a Nobel prize in 2014 (Prakash et al.,2022; Sydor et al.,2015). Such SRM methods can be used to generate live imaging molecular maps of protein complexes, and to extract quantitative information on the number, size, distribution, and spatial organisation of various molecules inside cells (Ruan et al.,2021; Sydor et al.,2015; Turkowyd et al.,2020), including viruses such as SARS-CoV-2 (Putlyaeva and Lukyanov,2021). While live imaging approaches will play more and more important roles across all biological fields (Sydor et al.,2015; Qin et al.,2016), their use opens up new challenges in understanding the huge amount of generated data: from new approaches to deal with high data volumes generated at higher and higher speeds and that could be presented in a variety of forms (structured, semi-structured and/or unstructured data) (Hariri et al.,2019), to new approaches to deal with data heterogeneity (Farley et al.,2018; Hariri et al.,2019), or deal with incomplete data (Hariri et al.,2019; Conde et al.,2019) or even irreproducible data – which is a major issue at least in immunology and cell biology (Errington et al.,2021; Hirsch and Schildknecht,2019), and even challenges in understanding the biological mechanisms behind the data (Lele,2020). While artificial intelligence techniques (e.g. machine learning, natural language processing, computational intelligence) can provide faster and more accurate results in data analytics compared to classical statistical methods (Hariri et al.,2019) (especially if the training data is not biased in any way) they don’t provide us with a mechanistic understanding of the data. This can be done by using collected data to parametrise mathematical models. However, model parametrisation using poor data can lead to uncertainty in the predictions, which adds to the uncertainty arising from model formulation (e.g., deterministic vs. stochastic models, spatial vs. non-spatial models, simple vs. complex models) and to the uncertainty arising from the numerical approximation of the solution. (As a note, uncertainty can arise also from the whole data analytics process: collecting, organising and analysing the data (Hariri et al.,2019)).
Uncertainty in the model results can be investigated using sensitivity analysis (Renardy et al.,2019).
Sensitivity analysis for single-scale models is a well-accepted approach across various sub-disciplines of mathematical biology: from ecology (Baraba ́s et al., 2014), to cancer research (Dela et al.,2022; Eftimie and Barelle,2021), immunology (Dela et al.,2022), pharmacology (Zhang et al.,2015), epidemiology (Massard et al.,2022), etc. In the large majority of cases such an analysis (either local – where one parameter is varied at a time, or global – where multiple/all parameters are varied at a time) has been applied mainly to deterministic and stochastic ordinary differential equations models (Marino et al.,2008; Renardy et al.,2019). For models described by partial differential equations, the sensitivity and uncertainty analysis approaches are not always standard, due to challenges caused by the multi-dimensionality of such models. In fact, very few studies perform spatially-explicit sensitivity and uncertainty analysis, and many of these studies focus on various environmental modelling aspects (Lilburne and Tarantola, 2009; Razavi et al.,2021). In regard to the application of sensitivity analysis to multi-scale models, there are various ways to
approach this, as summarised in (Renardy et al.,2019): (a) all-in-one sensitivity, which treats the whole model as a black box and model outputs are evaluated after all or a subset of model parameters are varied;(b) intra/inter-compartmental sensitivity analysis, which varies parameters for a given scale and compares the results with the outputs at the same scale or at a different scale; (c) hierarchical sensitivity analysis, which focuses first on the analysis of the top/highest-level model, then on the next lower level sub-model where the outputs of this sub-model are replaced with constant parameters that become inputs for the higher-level model. It should be noted that very large numbers of model parameters, which might even depend on time and/or space, can lead to a computational burden when sensitivity analysis is performed (due to the sampling of the parameter values within specified ranges) (Renardy et al.,2019). Finding systematic approaches to reduce the computational time by reducing model complexity and/or reducing the number parameters investigated through this analysis, while preserving the biological realism of the model, is still an open problem for many multi-scale mathematical models. Overall, data-driven multi-scale mathematical models are more challenging to be parametrised, and it is thus expected that in the future new multi-scale methods for data assimilation will be developed in the context of various biological problems. Some of these data assimilation approaches will likely be imported and adapted from other fields (e.g., from engineering (de Moraes et al.,2020)). Moreover, it is expected
that such new data assimilation approaches will focus on the automatic parameter estimation (e.g., via Bayesian approaches (Tran et al.,2017; Deshpande et al.,2022)) using collected as well as real-time data, with the goal of making real-time forecasting.
One of the main issues associated with big data in biology is related to the speed at which data is produced (e.g., continuously-produced medical sensor data), which should meet the speed with which the data is processed so that fast decisions are being made (Hariri et al.,2019). In this context, the COVID-19 pandemics has led to an explosion of references to the “digital twins” concept across various areas of the mathematical biology field. A digital twin is a computer replica of a real-life system (e.g., cells (Filippo et al.,2020), tissues (Moller and Portner,2021), or even natural environment (Blair,2021)), which allows us to integrate real-time and historical data and information about their functionality with the goal of making predictions about their future. This concept, which initially emerged in 1960’s in the engineering field (Bonney and Wagg,2022; Guo and Lv,2022), is not always very clear, mainly due to the interpretation of the connection between the data and the mathematical model. In Figure 2 we summarise the three sub-categories proposed by (Kritzinger et al.,2018) based on the level of data integration, while adapting them to biological systems (Eftimie et al.,2022): (a) in a digital model, the data between the biological
object and the digital object is exchanged manually; (b) in a digital shadow the data flow from the biological object to the digital object is automatic, while the reversed flow is manual; (c) in a digital twin there is a bi-directional automatic data flow between the biological object and the digital object.
The large majority of the published quantitative studies in mathematical biology focus on the manual flow between the biological object (i.e., molecules, cells, tissues, organs, whole patients) and the corresponding digital object; i.e., these are digital-model/digital-shadow types of models (see Figure 2). Among the very few true digital twins developed in the context of mathematical biology we mention here the artificial pancreas (Brown et al.,2019), where mathematical models developed since 1970’s (Bergman,2021) have been combined with real-time data to better control the blood glucose levels. The slow application of digital twins in biology and medicine is the result of a lack of understanding of the many biological laws that govern the complex single-scale and multi-scale processes in various living systems (Eftimie et al.,2022; Dhar and Giuliani,2010; Dorato,2011). Very recently, the concept of digital twins has started to be discussed also in the context of the human immune system (Laubenbacher et al.,2022), as well as different environmental systems (Blair, 2021; Nativi et al.,2021; Zhao et al.,2022), and more biological applications will be identified over the next few years.
To return to the discussion in Section 3, we emphasise that the lack of understanding of biological laws it is expected to diminish in the future due to the continuous development of new live imaging methods (Lelek et al.,2021; Moerner,2020; Sydor et al.,2015). The generation of huge amounts of live imaging data will have a major impact on the future of digital twins. It will lead not only to the development of new (multi-scale) mathematical models to be parametrised in real time by such live data, but it will also lead to the development of various other mathematical areas: from the development of new statistical and artificial intelligence methods to analyse the collected data (Khater et al.,2020; Lee et al.,2017), to the development of mathematical methods andccomputer algorithms for accurate reconstruction of super-resolution images (Lelek et al.,2021; Chui,2022).
The field of mathematical biology will continue to develop over the next decades, being supported by the development of new methods for real-time multi-scale data acquisition and analysis at different space/time scales and across various biological disciplines, which will then lead to the development of new mathematical models that will be digital twins of real-life biological processes. In turn, this will lead to the development of new analytical and numerical approaches to investigate these models and take our understanding of the real-life biological phenomena even further. The Mathematical Biology section of Frontiers in Applied Mathematics and Statistics aims to promote the development of a variety of mathematical/statistical/computational models that describe various single-scale and multi-scale phenomena in biology, as well as the investigation of the dynamics exhibited by these models, with the overall aim of advancing the field. Also, by supporting the development of joint research topics between mathematics and various other biological disciplines, the Mathematical Biology section aims to emphasise the role of mathematical/statistical/computational approaches to understand life in general.
Keywords: mathematical biology, multi-scale biological interactions, Digital Twins, Data, uncertainty, Mathematical Models
Received: 03 Aug 2022;
Accepted: 30 Aug 2022.
Copyright: © 2022 Eftimie. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence: Prof. Raluca Eftimie, University of Franche-Comté, Besançon, France