Main

Since the outbreak of the novel coronavirus at the beginning of 2020, over 100,000 papers have been published related to COVID-19, with a substantial portion of them focusing on epidemiological models that describe the observed and unobserved dynamics, primarily in postdictive mode, although some models are also used for short-term forecasting. These models represent the lumped dynamics of a big city, a state or a country, but suffer from large uncertainties1, resulting primarily from the lack of identifiability as well as the noise in the available sparse data. This lack of identifiability is related to modeling assumptions (structural identifiability), data availability and the complex biological characteristics of virus transmission, which are largely unknown and hard to measure2 and include various emerging mutations of the virus3. The true scientific challenge is to recognize the large limitations of these (potentially useful) models, identify the multiple sources of uncertainty and suggest flexible models that can deal with seasonal variation in susceptibility, time delays, noisy data, under-determined systems, non-Markovian behavior and inherent stochasticity4. In addition to seasonal variation in transmission, for example, due to weather or mobility, for a given model the data uncertainty is usually propagated into the model parameters, rendering them as random variables/processes with an underlying probability distribution. The uncertainties in the input parameters affect the model predictability adversely, leaving many of these models inadequate for any decision-making, as they lack robustness, which is a measure of the extent to which the forward solvers amplify uncertainties from the input to the output5. In general, quantification of parametric input uncertainty is only based on a single given model, hence ignoring the bigger source of uncertainty associated with the model structure. A clear example of uncertainty associated with several different models in analyzing and predicting the dynamics of this complex system can be found in the COVID-19 Forecast Hub6, which is a public online server that serves as a central repository of forecasts and predictions from over 50 international research groups. Even with a 95% confidence interval, one can see that the predictions of these different models can vary drastically (a snapshot of vastly different predictions is shown in Supplementary Section 2).

A parameterized differential equation model is structurally identifiable if the undetermined parameters can be uniquely identified when the exact forms of certain state variables are available. If a model is not structurally identifiable, there will be large uncertainty in the estimated parameter values and the inferred state variable values, even when the model is fitted to noiseless data. Practical identifiability refers to the non-uniqueness issue that occurs when the model is fitted to noisy data on discrete time points. Practical identifiability is a sufficient but not necessary condition of structural identifiability. Parameter identifiability relates to model uncertainty for a specific model, but here we want to pose questions beyond this uncertainty notion, and aim to investigate two fundamentally different approaches to epidemiological modeling. In the classical approach, the models are obtained by reducing the complex system into several compartments governed by a (nonlinear coupled) system of ordinary differential equations (ODEs), where we adjust the (possibly time-dependent) parameters of the model to fit the available data. This is of course all we can do because of the constraint of using pre-fabricated integer-order differential operators that represent fixed rate dynamics at all times. A variation of this is to introduce time delays into the ODEs to account for some memory effects. An opposite way of constructing a model is to tweak the differential operators instead of tweaking the parameters, using, for example, fractional derivatives with their orders (and hence the rate of dynamics) determined directly by the data, while we use simple parameters with some nominal values in front of the operators. By considering these two classes, we expand both the structural and practical identifiability as well as predictability analyses in a broader perspective that allows us to investigate a plurality of epidemiological models and test their performance with diverse COVID-19 datasets for cities, states and countries. In particular, we consider several different integer-order, fractional-order and time-delay models, each expressed as a system of ODEs, where the parameters are either biologically determined and fixed or time-dependent. The integer-order and time-delay models fall into the first class of modeling with time-dependent parameters and unknown time-delay duration to fit the data. The fractional-order models are different paradigms of modeling, where we tune the time-dependent operator to fit the available data. Our goal is not to pick a winner out of the multiple available models, but rather to provide a framework to screen which of these models are possible good candidates to fit the data and make reasonable predictions with bounded levels of uncertainty. Specifically, we investigate why we need time-dependent parameters in the lumped epidemiological models, how to extrapolate and forecast beyond the training period, and which is the most robust model for short- and long-term prediction.

In this broader framework, where the unknowns to be determined in the integer-order and time-delay models or the derivative orders in the fractional-order models are functions of time, the inverse problem to solve is particularly challenging. Several numerical methods have been developed to solve the inverse problem of inferring model (mostly constant) parameters from available data. They typically convert the problem into an optimization problem and then formulate a suitable estimator by minimizing an objective function7,8,9,10,11,12,13,14,15. Here we overcome this difficulty by developing physics-informed neural networks (PINNs) both for integer-order and fractional-order models16. PINNs provide a flexible computational tool that encodes the ODEs into the neural networks to satisfy the equations while accurately fitting the data. In other words, parameter inference and simulation of the observed and unobserved dynamics take place simultaneously. The PINN formulation allows the parameters to be time-dependent functions, represented by separate neural networks, and this makes the inverse problem of inferring the parameters and unobserved dynamics from available data readily applicable to several different models. The structural and practical identifiability of integer-order models have been studied recently by Zhang and colleagues17. However, this analysis cannot be applied to the fractional models or even to integer-order models with parameters, which are continuous functions of time. The unique determination of the derivatives’ variable order in the case of a time-fractional linear diffusion equation on a general multidimensional domain was proved by Zheng and colleagues18. To the best of our knowledge, the existence and uniqueness of the solution have not been proven for the inverse problem of identifying variable order for the nonlinear system of equations. Through systematic numerical experiments, in this Article we study and discuss the structural and practical identifiability of different time-dependent fractional orders of fractional models by using different amounts of data. If limited to only the reported available data, the inference is not very robust and the inferred fractional orders may exhibit relatively large uncertainty. However, if sufficient data are available for training, the PINN formulation can accurately infer different time-dependent fractional orders for each state of the modified susceptible-infectious-removed (SIR) models. The following steps show the implementation of our formulation:

  1. 1.

    Pre-processing of the available data.

  2. 2.

    For each epidemiological model, studying the identifiability.

  3. 3.

    Using PINNs to infer the unknown parameters of the model and discover the unobserved dynamics.

  4. 4.

    Using the inferred model to forecast the pandemic.

  5. 5.

    Propagating the parameter uncertainty to the predicted trajectories of the dynamics.

Results

Model plurality

The pluralization of epidemiological models strives for either developing a variety of compartmental models with time-dependent parameters or employing new fractional operators with different time-dependent fractional orders of the derivatives. This systematic analysis of different models provides a measure of their identifiability, predictability and uncertainty.

A general framework used in modeling the spread of infectious diseases among a fixed population is compartmental modeling, in which individuals are categorized into different epidemiological classes (compartments) according to their disease-related status. The dynamics of the virus spread through the population is governed by a nonlinear system of ODEs. The classical prototypical compartmental model is the SIR model. This simple model can be thought of as a lumped-compartment model, where the main compartments can be decomposed into various other compartments to form a variant of the SIR model to study specific state dynamics in more detail. These variations of the SIR model are usually developed by appending additional compartments to the basic SIR model. The choice of the model is subjective to the purpose of the study and the available data to calibrate the model parameters, and at the present time it is done in an ad hoc manner.

We consider different integer-order, fractional-order and time-delay models for different variations of SIR model. The integer-order models use simple derivative operators that do not account for the effect of memory in the evolution of dynamics, but instead the model parameters are treated as time-varying variables. However, many works have confirmed the power-law scaling feature of the dynamics of COVID-19 transmission19,20, which indicates the notion of memory effects in the spread dynamics. In the classical SIR model, the current number of infectious individuals has an exponential distribution in the infectious period, that is, \({I(t)}={{I}_{0}}{{\rm{e}}^{-\gamma t}}+{{\int\nolimits_{0}^{t}}{\beta S(u)I(u){{\rm{e}}^{-\gamma (t-u)}}{\rm{d}}u}}\), where s and \(\beta\) are number of susceptible individuals and transmission rate, respectively. This can be interpreted as the probability for an infected subject to remain infected at time t being eγt, where \({\frac{1}{\gamma }}\) refers to the mean recovery period21,22. Thus, the number of patients has an exponential distribution in the mean infection period. In a general setting, we can extend the SIR model into \({I(t)}={{I}_{0}{{\rm{e}}^{-\gamma t}}}+{\int\nolimits_{0}^{t}}{\beta S(u)I(u)\phi (t-u){{{\rm{d}}}}u}\), where ϕ is a probability function (non-increasing in t, ϕ(0) = 1) and the mean infection period \({\int\nolimits_{0}^{+\infty }}{\phi (t){{{\rm{d}}}}t}\) (refs. 23,24). If we choose ϕ to be a step function then we arrive at a constant time-delay model. If we choose ϕ to be an exponential function then we recover a classical SIR model22. If we choose ϕ to be a power-law waiting time distribution then we obtain a fractional SIR model25. Other epidemiological models characterized by partial differential equations (PDEs) can be viewed as variations of the standard ODE models that include more variables such as spatial effects, health status and age26,27,28. Although they may provide a more realistic picture of disease via spatial diffusion and higher level parameterization, the PDE models lack structural identifiability. More importantly, the spatial data incompleteness and heterogeneity become a major issue for such models, making them impractical at this juncture. In this Article, we consider a total of nine different ODE models. In a general setting, if we let U(t) be the vector of all epidemiological classes considered in a model, then the coupled system of differential equation governing the dynamics of that model can be written as

$${{{{\mathcal{L}}}}{{{\bf{U}}}}(t)}={{{{\mathcal{F}}}}({{{\bf{U}}}},t;\lambda )},$$
(1)

where \({{{\mathcal{L}}}}\) is either an integer-order or a fractional-order temporal differential operator, \({{{\mathcal{F}}}}\) is a nonlinear operator and λ is the set of known/unknown model parameters. The epidemiological classes in the models include the number of individuals in susceptible (S), exposed (E), pre-symptomatic (P), quarantined (Q), infectious (I), asymptomatic (J), hospitalized (H), death (D) and recovered/removed (R) compartments. We consider the auxiliary cumulative compartments (Ic and Hc) to help fitting the available data. The three crucial time-dependent parameters17 are the community transmission rate (βI(t)), the proportion of disease-related death from the H class (q(t)) and the proportion of hospitalized individuals (p(t)). The transmission rate of a disease, βI(t), is the per capita rate of infection when a contact occurs. Control measures implemented in different regions and the subsequent relaxation of restrictions can directly impact the incidence curve in different ways. Thus, estimating this value accurately is critical because it represents the effects of public health policies. The percentage of disease-related deaths, q(t), changes over the course of the outbreak mainly due to various public health policies and the discovery of better therapies and treatments, and other parameters take different values29. Similarly, the hospitalization ratio, p(t), also varies due to increased resources being channeled to the healthcare system in the city30,31. In Supplementary Fig. 4, we carry out a systematic study to demonstrate the need for considering the aforementioned three parameters as time-dependent instead of fixed parameters.

Seven of the nine models are shown in Fig. 1 (the other two models are discussed in Supplementary Section 3). The first row in Fig. 1 shows the three integer-order variations of the classic SIR model and the transition graphs between the epidemiological classes in these models. The second row in Fig. 1 shows the three fractional models with different fixed/time-variable fractional orders. In particular, we use the Caputo fractional derivative32,33,34 of variable order κ(t)  (0, 1). Let u(t) C1 be a differentiable function, the first derivative of which is continuous, and Γ the Gamma function, then the Caputo fractional derivative is given as

$${{\ }_{0}^{{{{\rm{C}}}}}{{{{\mathcal{D}}}}}_{t}^{\kappa (t)}u(t)}={\frac{1}{{{\varGamma }}(1-\kappa (t))}}{\int\nolimits_{0}^{\kappa (t)}{(t-s)}^{-\kappa (t)}{u^{\prime}} {(s)}{{{\rm{d}}}}s},$$
(2)

where the variable order κ(t) allows us to adjust the effect of non-locality as the dynamics evolves. A similar definition holds if the order is constant. The third row in Fig. 1 shows the time-delay model and different epidemiological classes. A detailed definition of the corresponding differential equations for all of these models and the complete list of parameters are provided in Supplementary Section 3.

Fig. 1: A plurality of epidemiological models.
figure 1

Shown are sketches of the compartments as well as their mathematical definitions. Each gray box represents an epidemiological compartment and the arrows show the direction of population flow. Panels 1–3 are integer-order models, panels 4–6 are fractional-order models, and panel 7 is a time-delay model. Model \({{\mathbb{I}}}_{1}\), integer-order SEIJDHR; model \({{\mathbb{I}}}_{2}\), integer-order SEPIJDHR; model \({{\mathbb{I}}}_{3}\), integer-order SEPIJDHQR; model \({{\mathbb{F}}}_{1}\), fractional-order SIR; model \({{\mathbb{F}}}_{2}\), fractional-order SIDR; model \({{\mathbb{F}}}_{3}\), fractional-order SIHDR; model \({{\mathbb{T}}}_{1}\), time-delay SIJHDR. Details of the mathematical definitions are given in Supplementary Section 3.

Experimental set-up

We carried out our analysis based on different datasets for COVID-19 from New York City (NYC)31,35, Michigan state (MI)36,37, Rhode Island state (RI)38 and Italy39. In each case, the datasets were pre-processed by applying a moving averaging window of seven days to smooth the weekday–weekend fluctuations in the reporting of the outbreak. Because not all of the epidemiological states are tractable in practice, the reported data of individuals are restricted to only a few compartments. Each dataset reports different epidemiological classes, leading to different formulations of PINNs (a detailed explanation of the different datasets for the PINN formulation is provided in Supplementary Section 6). For example, NYC data report the cumulative infectious Ic, hospitalized Hc and death Dc cases. The MI data instead report the current values of hospitalized individuals H, while the RI data report both the current H and the cumulative hospitalized individuals Hc. The Italy dataset reports the current I, R and D. Typically, the D compartment in the epidemiological models does not have an outflow and therefore we can have Dc = D. Given the cumulative values, we can obtain the daily increase in (new) values for infectious, hospitalized and deaths cases by Inew(t) = Ic(t) − Ic(t − 1), Hnew(t) = Hc(t) − Hc(t − 1) and Dnew(t) = Dc(t) − Dc(t − 1).

In the following, we show different aspects of our formulation and the results based on the NYC and Italy datasets. We provide the results based on other datasets in Supplementary Section 4. We consider the effect of vaccination in integer-order and time-delay models by adding the extra connection from S to R with rate \({\frac{V(t)S(t)}{N}}\). Here, V(t) denotes the number of effective vaccinated individuals at time t, which is calculated by V(t) = 0.52(D1(t) − D2(t)) + 0.95D2(t) (ref. 17). The terms D1 and D2 denote the number of individuals that have received the first and second doses of vaccine, respectively. The first dose of COVID-19 vaccine in the United States was administered on 14 December 2020. Accounting for two weeks delay in building immunity after the first dose of vaccination, we assume that the number of effective vaccinated individuals to be zero before 1 January 2021.

Simultaneous inference of unknown parameters and dynamics

The PINN formulation encodes the mathematical model into the network and forms the loss function, comprised of two parts. The first part is the mismatch between the network output and the available data, and the second part comprises the ODE residuals. By minimizing the loss function, we optimize the network parameters to learn the data and simultaneously infer the unobserved dynamics by satisfying the ODEs. The success of PINN in this setting lies in its flexibility in allowing the model parameters to be time-dependent functions represented by separate neural networks. For the integer-order models we carry out structural and practical identifiability analyses in Supplementary Section 5. We note that, in the fractional models, we infer the time-dependent fractional orders as we fix the parameters and let the fractional operators change in time.

We train PINNs for integer-order models \({{\mathbb{I}}}_{1}\), \({{\mathbb{I}}}_{2}\) and \({{\mathbb{I}}}_{3}\) and time-delay model \({{\mathbb{T}}}_{1}\) based on the NYC data31. In Fig. 2a we show the fitting of PINN formulations for the four models to the available data. The trained network can accurately fit the different fluctuations in the data. In Fig. 2b,c we show the inferred time-dependent parameters (βI(t), p(t), q(t)) and the inferred unobserved compartments for the four models, respectively. The plots demonstrate an almost similar trend throughout the spread of the virus. Some of the models have additional compartments, for example, the Q compartment in model \({{\mathbb{I}}}_{3}\).

Fig. 2: PINNs inference using the integer-order models \({{\mathbb{I}}}_{1}\), \({{\mathbb{I}}}_{2}\) and \({{\mathbb{I}}}_{3}\) and time-delay model \({{\mathbb{T}}}_{1}\) for NYC.
figure 2

a, Fitting (black line) to the available data (red dots) of daily infectious, hospitalized and death cases. b, Inference of time-dependent parameters (βI(t), p(t), q(t)) for each model. Here the inferred delay is d = 3.10 for the time-delay model \({{\mathbb{T}}}_{1}\). c, Inference of unobserved dynamics. Each graph shows the evolution of one epidemiological compartment. The Q and P compartments only exist in model \({{\mathbb{I}}}_{3}\) and models \({{\mathbb{I}}}_{2}\) and \({{\mathbb{I}}}_{3}\), respectively. d, Prediction and uncertainty quantification of daily infectious, hospitalized and death cases for a four-month time window using model \({{\mathbb{I}}}_{3}\). The vertical line divides the fitting and prediction window. The two curves and shaded area in the prediction window correspond to 10% and 20% standard deviations in βI. We have included the new available data (plus symbols) for the prediction period that were not used in the fitting to show that the predictions are correct.

Source data

Forecasting with uncertainty

The inferred model from PINNs yields a predictive model that we use to forecast future dynamics by considering different control measures for each model. In particular, we can predict the evolution of dynamics using the integer-order models \({{\mathbb{I}}}_{1}\), \({{\mathbb{I}}}_{2}\) and \({{\mathbb{I}}}_{3}\) and the time-delay model \({{\mathbb{T}}}_{1}\). Figure 2d shows the prediction of new infectious, hospitalized and death cases for NYC data based on model \({{\mathbb{I}}}_{3}\). In the prediction window, we fix the parameters βI(t), p(t) and q(t) at their final values from the training window. However, we postulate that the effect of several different control measures in the community transmission rate βI(t) can be captured by adding uncertainty bounds of 15% and 30% to its mean value in the prediction window. By using standard forward propagation techniques, that is, Monte Carlo40 and probabilistic collocation methods41, we propagate the uncertainty into the prediction window. We also consider different vaccination scenarios by considering different vaccination rates (Vac. rate) in calculating the effective vaccination per day (V(t)) in the prediction window by

$${V(t)}=\left\{{\begin{array}{ll}0,&{t\in [0,\,{t}_{{{\rm{Vac.}}\,{\rm{start}}}}}),\\ (\,{{\rm{Vac.}}\,{\rm{rate}}})\times {(t-{t}_{{{\rm{Vac.}}\,{\rm{start}}}}}),&{t\in [{t}_{{{\rm{Vac.}}\,{\rm{start}}}},\,{t}_{{{\rm{Vac.}}\,{\rm{cap}}}})},\\ \,{{\rm{Vac.}}\,{\rm{cap}}}\,,&{t\ge {t}_{{{\rm{Vac.}}\,{\rm{cap}}}}}.\end{array}}\right.$$
(3)

In particular, for the NYC dataset35, we set the starting day of the effective vaccination as tVac.start = 300 and we cap (saturate) the effective value of vaccination by Vac. cap = 30,000. We consider two vaccination rates and adjust tVac.cap; for example, Vac. rate = 600 and tVac.cap = 350.

In Supplementary Fig. 5 we investigate three different approaches of extrapolating to predict the short-term dynamics (about two weeks). In addition, using the new available data that we did not use in the original training of the model (beyond 1 March 2021), we compare models \({{\mathbb{I}}}_{1}\), \({{\mathbb{I}}}_{2}\) and \({{\mathbb{I}}}_{3}\) and we find model \({{\mathbb{I}}}_{3}\) to give the best prediction.

Fractional models

The structure of PINNs for fractional models is usually more complex than for the other models. This is mainly because of the fractional operators, for which the automatic differentiation technology is not applicable, and we hence need to resort to numerical discretization, such as the L1 scheme42,43. This formulation, namely fractional PINNs (fPINNs), was first developed by Pang et al.44 and we extend it to apply to our problem. The other source of the complexity of the PINN structure for fractional models is the time-dependent fractional orders. The PINN formulation uses a separate neural network to represent each fractional order κi(t). In particular, Fig. 3 shows the fPINNs schematic for the fractional model \({{\mathbb{F}}}_{3}\). We use a (relatively large) neural network to represent the states \({{{{\bf{U}}}}}^{({{\mathbb{F}}}_{3})}{(t)}\) (the green-shaded box) and five separate neural networks to represent the fractional orders κi(t), i = 1, ..., 5 (the red-shaded boxes). We discuss our methodology in the Methods and provide the formulation in Supplementary Section 6. The PINN formulation makes the inverse problem more efficient as it uses the network to represent the time-dependent parameters and integrates data and differential equations through the loss function. Therefore, it provides a flexible computational tool to perform inference in fractional models.

Fig. 3: Schematic of the physics-informed neural network for fractional model \({{\mathbb{F}}}_{3}\).
figure 3

‘NN’ represents six different neural networks. The green-shaded NN represents the state \({{{{\bf{U}}}}}^{({{\mathbb{F}}}_{3})}{(t)}\) to fit the available data \({{{{\bf{U}}}}}^{({\mathbb{D}})}{(t)}\) and infer the unobserved dynamics. The red-shaded NNs represent different fractional orders κi(t), i = 1, ..., 5, to infer their time dependence. The ‘ODE residuals’ box represents computation of the residual of fractional model \({{\mathbb{F}}}_{3}\). The box marked ‘Loss’ comprises two parts; the mismatch between available data and NN output and the ODE residual. By minimizing the loss function \({L}^{({{\mathbb{F}}}_{3})}\), the NNs simultaneously fit the data and infer the unobserved dynamics and fractional orders by satisfying the system of ODEs. See Supplementary Fig. 9 in Supplementary Section 6 for similar PINNs for integer-order models.

From the available data for NYC, we only have the new cases infectious Inew, hospitalized Hnew and deaths Dnew. This is a limited amount of data for the fractional model \({{\mathbb{F}}}_{3}\) with the five unobserved compartments and five time-dependent parameters, so this leads to an identifiability problem for this model and therefore the parameters cannot be identified uniquely. We describe this issue in Fig. 4 via numerical experiments. We consider the fractional model \({{\mathbb{F}}}_{3}\) and its integer-order counterpart (by letting κi(t) = 1, i = 1, ..., 5 and representing βI(t) by a separate neural network) for comparison. We plot the results of fractional model \({{\mathbb{F}}}_{3}\) against the results of the integer-order counterpart in Fig. 4a–c by using different amounts of data.

Fig. 4: Identifiability study for PINNs for fractional-order model \({{\mathbb{F}}}_{3}\).
figure 4

Instead of time-dependent parameters, here we aim to infer different time-dependent fractional operators for κi(t), i = 1, ..., 5. ac, In the first row are plotted the inferred unobserved dynamics from fPINNs against the corresponding simulation results of the integer-order SIHDR model. In the second row, we show the inferred corresponding time-dependent fractional orders for different available data for NYC. In a, we employ only the available data on Inew, Hnew and Dnew for NYC. In b, in addition to the measures in a, we also assume that we have additional simulated data on I and H. In c, we have additional simulated data on S, I, H, D and R. The uncertainty bands shown reflect the robustness of each case. In each panel in the top rows, black curves show the results for model \({{\mathbb{F}}}_{3}\) and red the results for the corresponding integer-order model.

Source data

In Fig. 4a we only use the available data (Inew, Hnew, Dnew) to train the network. We can observe that the inferred dynamics and the corresponding fractional order have relatively large uncertainty bounds and are different from the results of the integer-order model. We note, however, that the D compartment has the smallest uncertainty bound. This is because the D compartment does not have any outflow and hence D = Dc. It is counterintuitive, however, that its corresponding fractional order has the largest uncertainty bounds. This is due to coupling of the dynamics of the D compartment with other H and R compartments that contain large uncertainty bounds. In Fig. 4b, in addition to the data in Fig. 4a, we use the simulated data of the I and H compartments. These values are not available in practice for the NYC data, so we obtain them from the corresponding integer-order model and use them to investigate the identifiability of fractional model \({{\mathbb{F}}}_{3}\). We can see that the inference can be improved by including the additional simulated data. In Fig. 4c we show that, if we add extra data we can further reduce the uncertainty bounds in the inferred time-dependent parameters. It is interesting to observe in Fig. 4c that a fractional model with time-dependent fractional order can resemble the dynamics of its integer-order counterparts with time-dependent parameters. This example supports the two alternate paradigms of modeling philosophy, discussed at the beginning of this Article.

We note that, although the fractional models are capable of accounting for memory effects via non-local operators, and their application has been motivated in the literature to a great extent, the crucial but less-studied element is the development of a robust computational framework to implement the inverse problem of inferring the time-dependent fractional orders. For example, Jahanshahi et al.19 suggest the use of time-dependent variable fractional orders; however, in their implementation, the fractional orders are parameterized with a few constant parameters as piecewise constant functions to render the inverse problem feasible in their formulation. The under-parameterization of fractional orders may lead to inaccurate results. More importantly, when different fractional orders are considered, the total population is not conserved automatically by the system of ODEs and should be enforced in the simulation (more details are provided in the Discussion). We show here that our formulation has the capacity to efficiently extend the parameterization of time-dependent fractional orders via neural networks, and we implement an efficient computational framework to infer them from data. For comparison, we consider the identifiability of fractional model \({{\mathbb{F}}}_{2}\) based on the Italy data. This is a practical example of the case discussed in Fig. 4c. The Italy data report the current values I, R and D and S = N − I − D − R, which comprise the four compartments in the fractional model \({{\mathbb{F}}}_{2}\) with time-dependent fractional orders. Because we have access to a large amount of data in this case, we do not fix the infection rate parameter (r(t), Supplementary Tables 1 and 3) and let it be represented by a separate neural network. The left column in Fig. 5a shows the accurate fitting to the available data and the right column shows the inferred time-varying parameters κi(t), i = 1, 2, 3, 4, and r(t). These results demonstrate that, even in the case with access to data for all compartments, the inference of fractional orders is not very robust and still exhibits some uncertainties, which is a reflection of the lack of identifiability of the model. The choice of piecewise constant parameters in the work of Jahanshahi et al.19 simplifies the problem but leads to erroneous results.

Fig. 5: PINNs results of the fractional-order model \({{\mathbb{F}}}_{2}\) based on Italy data.
figure 5

a, fPINN fitting (black) to the available data for S, I, R and D (red). In each graph, we also show the fPINN inference of the corresponding time-dependent fractional order (blue) and compare it with the piecewise fractional orders (dashed blue) from ref. 19. b, Comparison of the piecewise parameter r(t) (blue) and the fPINN inference of the time-dependent parameter r(t) (black). The piecewise fractional orders and parameters are provided in ref. 19. The other parameters of the model are fixed as a = 0.0215 and b = 0.0048 (ref. 19) in the simulation.

Source data

Discussion

One of the main drawbacks of lumped-compartment epidemiological models is the lack of uniqueness of the parameters, especially when mathematical models with time-dependent parameters are used to explore possible future scenarios for control measures, evaluate retrospectively the efficacy of specific interventions and identify prospective strategies45. At the present time there are no rigorous a priori identifiability analyses for integer-order models with time-dependent parameters or time-dependent fractional orders. Although this imposes a limitation on the underlying mathematical formulation, in the present work we resorted to systematic numerical experimentation to provide a posteriori some measures of identifiability using uncertainty quantification. We have learned some useful lessons by fitting the epidemic data using nine different epidemiological models (five integer-order, one time-delay and three fractional-order models; Supplementary Section 3). In the following, we summarize and discuss some observations related to our findings.

The different sources of uncertainty are the primary limitations to the efficiency of epidemiological models. The novelty of coronavirus naturally leads to many uncertainties1, there being many unknown factors, including the biological features of transmission, the different mutations of the virus and pathogen behavior and concentrations. The most systematic source of uncertainty that can adversely affect almost all mathematical models’ outputs is that the exact number of infected individuals is unknown. The onset of an outbreak always precedes the start of data reporting, so there is a lack of initial data at the beginning of the outbreak and inaccurate parameter estimation in all models during that initial period, especially for those models with memory effects. For example, if we use the parameters estimated in the paper by Hao et al.46 for fitting data up to the present time, the results are not entirely accurate. The reporting practices for COVID-19 case data are not consistent among different populations and the reported data contain intermittent artifacts as a result of weekday/weekend effects (Fig. 6a). For example, data reporting in NYC differs from that in RI, despite the geographic proximity, and the same model may perform differently if applied to datasets from those two sources. Accordingly, because of the multitude of uncertainties, quantifying the parametric input uncertainty is not sufficient5. Consequently, model selection becomes the major issue, in our opinion, not only in terms of how to best fit the data but also in relation to how robust the model is, given these uncertainties.

Fig. 6: The main challenges for epidemiological models.
figure 6

a, Number of daily infectious cases (Inew) for NYC, with blue representing the raw reported data and red representing the seven-day average. b, Additional available data of current hospitalized individuals for MI to fit the parameter q(t) expressing the proportion of disease-related deaths from the H class. which is only available in some states. c, The inherent time delay in the only available data for NYC. d, Inferred time-varying parameters (βI(t), p(t), q(t)) for the integer-order and delay models \({{\mathbb{I}}}_{1}\), \({{\mathbb{I}}}_{2}\), \({{\mathbb{I}}}_{3}\) and \({{\mathbb{T}}}_{1}\) based on NYC data. e, Although there are a priori identifiability analyses for integer-order models, there is no identifiability methodology for fractional models. Shown is a comparison of the inferred dynamics of models \({{\mathbb{F}}}_{1}\) and \({{\mathbb{I}}}_{1}\) for NYC data. The shaded region is the uncertainty of the fractional model. f, Inferred different time-dependent fractional orders of model \({{\mathbb{F}}}_{3}\) using simulated data based on the integer-order model for NYC. g, Left: plot of the inferred dynamics from integer-order and time-delay models, which show differences but consistent trends. Right: inferred dynamics of model \({{\mathbb{F}}}_{1}\), which shows an erroneous trend due to the lack of identifiability (window in yellow). The SIHDR model in e is defined as model \({{\mathbb{I}}}_{5}\) in Supplementary Section 3.4.

Source data

The use of fixed parameters in lumped-compartment epidemiological models is not appropriate because of the long duration of the pandemic and the associated multi-rate dynamics (Supplementary Fig. 4). Indeed, as the disease symptoms, concentration and behavior of the pathogen are changing throughout the course of the disease47,48, the models should accommodate time-dependent parameters. In our formulation, the integer-order models account for this time variability (Fig. 6d). Additionally, the fractional models include different memory effects for each epidemiological class by considering different time-dependent fractional orders (Fig. 6f). In Supplementary Fig. 4 we show that we can easily infer the fixed parameters by fitting the data; however, due to the non-uniqueness of the problem, running the models forward with the inferred parameters leads to erroneous dynamics. This error is due to the identifiability issue for fractional models. The identifiability of integer-order models with piecewise constant parameters has been systematically investigated in the work by Zhang and colleagues17. However, such an analysis is not available for the fractional models, which may exhibit higher uncertainty (Fig. 6e). Here, we have investigated the robustness of fractional-order models and we have concluded that, due to their fragility, if any fractional model should be considered, its identifiability must be investigated at least numerically. Even if fitted accurately to the available data, a non-identifiable model may still show some serious artifacts that are not intuitively consistent (Fig. 6g). It is also important to comment on dimensional consistency in fractional models. In general, the dynamics of each compartment in the fractional epidemiological model can be obtained by

$${\frac{{({{\Delta }}t)}^{\kappa -1}}{{{\varGamma }}(1+\kappa )}}{}^{\rm{C}}{{{{\mathcal{D}}}}}_{t}^{\kappa }{u(t)}=\,{{\rm{inflow}}}-{{\mbox{outflow}}}\,$$

where the timescale Δt denotes a characteristic time of observation, which amounts to a built-in scale effect. The scale effect goes away as κ → 1, Δtκ − 1 → Δt0 → 1, and thus the equation recovers the classical continuity equation. This scaling factor is also useful in practice as it ensures the dimensional consistency of the units in the fractional model (more details are provided in Supplementary Section 3.3). Another consideration for the fractional models is the conservation of total population. In most integer-order models, the conservation of the total population is satisfied automatically by the system of ODEs. This is not the case for the fractional models when different fractional orders are considered. We have observed that, in many published studies using the fractional modeling, this fact has been overlooked and the population conservation has not been included in the system of equations. This can result in an erroneous output of the model, especially because compartmental models require this assumption.

The purely statistical approaches are generally not well suited for long-term predictions of epidemiological dynamics (for example, Supplementary Fig. 1). They do not have the ability to account for how the transmission occurs, so most of the forecasting models limit their projections to one week or a few weeks ahead49. The mechanistic models, if their time-dependent parameters are inferred correctly from data, are often helpful to quantify different sources of uncertainty and examine the implications of various assumptions and control measures about a highly nonlinear process that is hard to predict using only intuition or statistical models. However, they are constrained by their limitations, what we assume and what we do not know. For example, the proportion of hospitalized individuals, p(t), may reach an asymptotic value after some time, but the community transmission rate, βI(t), is highly influenced by control measures, mobility, mutant variations and so on, and hence we specify it as a random variable for future predictions (Fig. 2d). More specifically, we have investigated three different approaches to extrapolate for short- and long-term predictions (Supplementary Figs. 5 and 6), and we have found that by using PINNs we can extrapolate the time-dependent parameters, at least for the short term, and this would lead to more accurate predictions compared to other approaches. Furthermore, we have evaluated the integer-order models, which seem to be more robust than the fractional-order models, and among them the model \({{\mathbb{I}}}_{3}\) gives the best prediction (Supplementary Fig. 6) based on the new available data from March to June, which we have not used in our previous training.

Given the discussed observations and limitations, we recommend the use of multiple models with time-dependent parameters tailored to a specific region and specific data availability, and quantify both parametric uncertainty and model uncertainty. The use of both structural and practical identifiability analyses could help to determine the proper parameter ranges and the consistency of the inferred values.

Methods

PINNs were first introduced in the work of Raissi and colleagues16. Since then, PINNs have been successfully applied in solving forward and inverse problems in many practical applications50,51,52,53,54,55,56,57. An analysis and convergence study of PINNs was carried out by Shin and others58. Figure 3 presents a schematic of such PINNs that is almost the same for all the models considered in this Article. The depicted structure in Fig. 3 is specifically related to the fractional-order model \({{\mathbb{F}}}_{3}\) with five time-dependent parameters.

In the PINN formulation, we use separate deep neural networks with input t to represent the states U(t) and (time-dependent) parameters. Each network is parameterized by a set of parameters Θ as weights and biases of the network. Thus, we let U(t) ≈ UNN(t; ΘU) (green-shaded box, Fig. 3). For integer-order models \({{\mathbb{I}}}_{1}\), \({{\mathbb{I}}}_{2}\) and \({{\mathbb{I}}}_{3}\) and time-delay model \({{\mathbb{T}}}_{1}\), we let \({{\beta }_{I}}{(t)}\approx {{\beta }_{I}}_{\rm{NN}}{(t;\,{{{\varTheta }}}_{\beta })}\), p(t) ≈ pNN(t; Θp) and q(t) ≈ qNN(t, Θq). For fractional-order models \({{\mathbb{F}}}_{1}\), \({{\mathbb{F}}}_{2}\) and \({{\mathbb{F}}}_{3}\), we let \({\kappa }_{i}{(t)}\approx {{\kappa }_{i}}_{\rm{NN}}{({t;}\,{{{\varTheta }}}_{{\kappa }_{i}})}\) for i = 1, ..., 5 (red-shaded boxes, Fig. 3). Using equation (1), we define the residual of equation as \({{{{\mathcal{R}}}}}_{\rm{NN}}{(t)}={{{\mathcal{L}}}}{{{{\bf{U}}}}}_{\rm{NN}}{(t)}-{{{\mathcal{F}}}}{({{{{\bf{U}}}}}_{\rm{NN}},{t;\lambda} )}\) (orange-shaded box, Fig. 3). This residual is the measure of the approximation UNN satisfying the ODEs, and ideally the exact solution is recovered when the residual is identically zero. We define two finite sets of training points \({\{{t}_{\rm{u}}^{j}\}}_{j = 1}^{{N}_{\rm{u}}}\) and residual points \({\{{t}_{\rm{r}}^{j}\}}_{j = 1}^{{N}_{\rm{r}}}\). The training points are the points where data are available and the residual points are the points where the residual \({{{{\mathcal{R}}}}}_{\rm{NN}}(t)\) is satisfied and they are freely available all over the computational domain. Therefore, we define the loss function of PINN as

$$\begin{array}{rcl}{L({{\varTheta }},\lambda )}&=&{{\omega }_{\rm{u}}}\ {{{{{\rm{m.s.e.}}}}}_{\rm{u}}}+{{\omega }_{\rm{r}}}\ {{{{{\rm{m.s.e.}}}}}_{\rm{r}}}\\ &=&{{\omega }_{\rm{u}}}{\frac{1}{{N}_{\rm{u}}}}{\mathop{\sum }\limits_{j=1}^{{N}_{\rm{u}}}}{\left|{{{{\bf{U}}}}}_{\rm{NN}}{({t}_{\rm{u}}^{j})}-{{{\bf{U}}}}{({t}_{\rm{u}}^{j})}\right|}^{2}+{{\omega }_{\rm{r}}}{\frac{1}{{N}_{\rm{r}}}}{\mathop{\sum }\limits_{j=1}^{{N}_{\rm{r}}}}{\left|{{{{\mathcal{R}}}}}_{\rm{NN}}{({t}_{\rm{r}}^{j})}\right|}^{2},\end{array}$$
(4)

where m.s.e. is the mean squared error. The loss function of PINNs is composed of two terms—m.s.e.u and m.s.e.r, which are defined by the last part of the above equation. Parameters {ωu, ωr} denote the weight coefficients in the loss function that can balance the optimization effort between learning the data and satisfying the ODEs. They may be user-specified or tuned manually or automatically, for example, in practice based on the numerical experiment in each problem59.

In all cases, the derivatives of the network with respect to the input t and network parameters Θ are computed by applying the chain rule for differentiating compositions of functions using the automatic differentiation60. In particular, we use TensorFlow programming61 for automatic differentiation and deep learning computations. Detailed derivations of the loss functions for different models and different datasets are provided in Supplementary Section 6.