Tracking the time course of reproduction number and lockdown’s effect on human behaviour during SARS-CoV-2 epidemic: nonparametric estimation

Understanding the SARS-CoV-2 dynamics has been subject of intense research in the last months. In particular, accurate modeling of lockdown effects on human behaviour and epidemic evolution is a key issue in order e.g. to inform health-care decisions on emergency management. In this regard, the compartmental and spatial models so far proposed use parametric descriptions of the contact rate, often assuming a time-invariant effect of the lockdown. In this paper we show that these assumptions may lead to erroneous evaluations on the ongoing pandemic. Thus, we develop a new class of nonparametric compartmental models able to describe how the impact of the lockdown varies in time. Our estimation strategy does not require significant Bayes prior information and exploits regularization theory. Hospitalized data are mapped into an infinite-dimensional space, hence obtaining a function which takes into account also how social distancing measures and people’s growing awareness of infection’s risk evolves as time progresses. This also permits to reconstruct a continuous-time profile of SARS-CoV-2 reproduction number with a resolution never reached before in the literature. When applied to data collected in Lombardy, the most affected Italian region, our model illustrates how people behaviour changed during the restrictions and its importance to contain the epidemic. Results also indicate that, at the end of the lockdown, around \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$12\%$$\end{document}12% of people in Lombardy and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\%$$\end{document}5% in Italy was affected by SARS-CoV-2, with the fatality rate being 1.14%. Then, we discuss how the situation evolved after the end of the lockdown showing that the reproduction number dangerously increased in the summer, due to holiday relax, reaching values larger than one on August 1, 2020. Finally, we also document how Italy faced the second wave of infection in the last part of 2020. Since several countries still observe a growing epidemic and others could be subject to other waves, the proposed reproduction number tracking methodology can be of great help to health care authorities to prevent SARS-CoV-2 diffusion or to assess the impact of lockdown restrictions on human behaviour to contain the spread.

www.nature.com/scientificreports/ lockdowns and to predict their impact on people's behaviour is key in order to inform health-care decisions on emergency management. This would allow to design better control strategies on the epidemic curve, by gaining insight on the number of future people who could need medical treatments. Modeling also allows to better assess the total number of infected, including also asymptomatic people, and the fatality rate associated to COVID-19. Motivated by the above arguments, mathematical modeling of SARS-CoV-2 dynamics has been subject of intense research in the last months 7 . An important class is that of compartmental models where the population is assumed well-mixed and divided into categories. A notable example is the SIR model which includes three compartments with susceptible (S), infected (I) and removed (R) individuals 8 . To describe more complex dynamics, SIR variants can be found e.g. in [9][10][11][12][13] where additional phenomena, like the increasing of vaccination rate, are included. More recent extensions focus on COVID-19 pandemic and are described in [14][15][16][17][18] where e.g. public perception of the risk and delays effects in lockdown's setting are studied. An eight-compartmental model, called SIDARTHE, has been also proposed in 19 . By following some ideas introduced to describe SARS dynamics in 2004 20 , it increases SIR complexity to discriminate between detected and undetected cases of infection. Another important class is the so-called spatially explicit models 21,22 . They mitigate homogeneity assumptions by introducing compartments connected through transmission parameters to describe infection along both time and space 23,24 . Spatial models that describe COVID-19 spread can be found, e.g., in 25 and 26 where, beyond epidemiological measurements, information on people mobility is also exploited. More sophisticated models have been also proposed which include single individual dynamics, i.e. the so called network models; however their identification is especially challenging since they contain a large number of unknown parameters [27][28][29] .
A common feature of all the above models is the presence of an important parameter which describes the virus transmission rate and takes into account also the level of social interactions. We will denote it by a(t), stressing its dependence on time t. To grasp its role, for sake of simplicity a time-varying version of the SIR model can be considered. Using b to denote another parameter regulating the healing/death rate, the differential equation which governs the number of infected is One can thus see that the contact rate a(t) establishes the interaction level between susceptible S(t) and infected I(t) people, hence regulating the virus transmission rate. The ratio defines a fundamental epidemiological variable, called reproduction number, which represents the average number of infections per infected case and, thus, measures the disease infectivity level.
Since the restrictions set to contain COVID-19 outbreak aim to reduce the value of the reproduction number, one fundamental issue is to characterize in mathematical terms their impact on a(t). In this regard, compartmental or spatial models use parametric descriptions of a(t) by introducing an unknown vector of finite dimension. In particular, e.g. as described in 6,7,26 , it is common practice to adopt just two parameters a 1 , a 2 to quantify the two different levels of social interactions present before and during the lockdown.
Letting t * indicate the lockdown's instant, the time-course of a(t) is so given by An extension can be found in 30 , where the reproduction number is assumed piecewise constant on a finite number of intervals. The contact rate is allowed to change its value only when new restraints are introduced. Estimation is then performed using a stochastic hierarchical model which however requires significant Bayesian prior information. Finally, an approach whose unique aim is to reconstruct the reproduction number over time (without using compartmental models) can be found in 31 . Some prior distributions needs to be introduced on the regeneration times and estimates based on the observed time of symptom onset are then obtained.
The main novelty present in this work consists of assuming that a(t) belongs to an infinite-dimensional space containing a very rich class of functions able to approximate any continuous time-course. Then, we will develop a new (purely data driven) machine learning technique which, for the first time, reconstructs a continuous-time profile of the contact rate using compartmental models.
Our model is graphically depicted on the top of Fig. 1 where two different kinds of (unknown) variables are introduced. The first one is an unknown finite-dimensional vector θ which can e.g. contain the parameter b entering (1). The second one is the function a(t) which indeed comes from an infinite-dimensional space. These two variables fully define a compartmental model whose outputs include the temporal profile of infected I(t) and of removed R(t) (who represent people who die or heal after being infected). We stress that the function a(t) may complement any kind of compartmental (or also spatial) model, the time-varying SIR being just one example. Hence, Fig. 1 defines an entire novel class of nonparametric compartmental models. The bottom part of the same figure graphically describes a class of nonparametric estimators able to identify such models using epidemiological data like e.g. the number of diagnosed infected or hospitalized.
As a case study to show the potential of this novel non-parametric modeling identification strategy we consider the Italian scenario, and in particular the Lombardy region which has been the most affected (see also Results). Many people in Italy underwent screening for COVID-19 starting from the end of February 2020. Data are publicly available and are collected on a daily basis 32 . Some of them are displayed in Fig. 2 where one can see: 1) number of people diagnosed as infected by COVID-19; 2) number of infected and hospitalized; 3) number of infected people who are in intensive care. However, the data on the number of people diagnosed as infected www.nature.com/scientificreports/ by COVID-19 (point 1) have important limitations. They do not give an accurate information on how many subjects were infected exactly at a certain day due to delays in the swabs processing. In addition, the amount of performed swabs and the criteria used to select people who are tested may vary in time. In contrast, data describing the number of hospitalized people (point 2) and, even more, patients in critical care (point 3) appear more reliable and informative. Hence, to identify the model, our estimator will exploit the number of people in critical care. In particular, an assumption which appears statistically reasonable is to assume that the number of infected is proportional, through a multiplier H, to the number of people in intensive care. Hence, the scalar H represents another unknown component of the parameter vector θ. However, even when a relatively simple time-varying SIR model is adopted, the resulting estimation problem turns out to be ill-posed 33 . In fact, beyond θ , the infinite-dimensional function a(t) must be inferred from a finite number of measurements. Despite these difficulties, in Methods we show that it is possible to design an estimator which is purely data-driven and does not need additional epidemiological or clinical prior information. The problem is set in the framework of regularization in stable reproducing kernel Hilbert spaces (RKHSs) whose importance in machine learning and system identification problems has been recently described in 34,35 . Here we just recall that one fundamental peculiarity of these spaces is that their complexity is regulated by a stability Figure 1. Top The figure illustrates a novel class of nonparametric compartmental models. They depend on a finite-dimensional vector θ and on a function a(t) which models the virus transmission rate and is assumed to belong to an infinite-dimensional space. Hence, the complexity of a(t) is no more regulated by discrete model order, like e.g. the number of exponentials describing its time-course. It is instead established by a few hyperparameters, tunable in a continuous way from data, which provide qualitative information e.g. regarding how fast a(t) is expected to decay to zero during the lockdown. The richness of this model may allow a(t) to better describe how people's social interactions and awareness of infection risk evolve in time after the restrictions' setting. Bottom The nonparametric estimators developed in this paper are able to map epidemiological data, e.g. the number of diagnosed infected or hospitalized people, into the estimates of the variables a(t) and θ entering the nonparametric compartmental model. This allows also to reconstruct the timecourse of the reproduction number γ (t) defined in (2), which is key to monitor the epidemic evolution, as well as the evolution of other crucial variables like the number of infected I(t).  www.nature.com/scientificreports/ parameter that also needs to be estimated from data. In our setting, it regulates how fast a(t), and the reproduction number γ (t) , is expected to decrease during the lockdown. As described in Methods, implementation of the resulting estimator, which is graphically depicted in the bottom part of Fig. 1, is difficult. In fact, state of the art machine learning approaches, like kernel-based ridge regression or support vector machines 36,37 , reconstruct in a nonparametric way (i.e. without postulating a finite-dimensional class of models) an unknown function starting from direct and noisy samples. In our context, the unknown function a(t) and the parameters contained in θ are nonlinearly related to the intensive care data. This requires a non trivial extension of the above mentioned machine learning techniques. Nevertheless, an efficient optimization scheme providing the desired estimates can be defined, together with a Markov chain Monte Carlo scheme 38 which returns confidence intervals around them.

Results
We will now describe results obtained by our nonparametric model identification method, with dynamics of infected people described by (1), estimated by using intensive care data collected in Lombardy. This region contains around ten million people and is a natural candidate to tune our model since most of the Italian outbreak happened there. In the last months it has collected almost 40% of infected and hospitalized Italian people and around 17000 people died in Lombardy due to COVID-19 as of July, 2020, see Fig. 2. The large diffusion is also revealed by some small preliminary studies on antibody responses to virus performed on blood donors. Authors of the work 39 considered a random sample of 789 blood donors in Milan. At the start of the outbreak, on February 24, the overall seroprevalence of SARS-CoV-2 was 4.6% (2.3% − 7.9%).
In the town Castiglione d' Adda, the epicentre of the outbreak, at the beginning of April 70% out of 60 asymptomatic blood donors had the antibodies 40 . Outcomes from Lombardy can then be used to achieve estimates of the number of infected at the Italian level by exploiting the assumption on the multiplier H.
To correctly interpret the following results, it is worth recalling that Italy has been the first country in Europe to set nationwide restrictions by introducing the lockdown to the whole territory on March 9, 2020. Restrictions have then been first further reinforced and then gradually relaxed. Almost all the activities re-opened on May 18, 2020. For this reason, we will first exploit data on the temporal window going from March 1 to May 17, 2020 to study how the lockdown has affected the contact rate and the reproduction number. Data collected starting from May 18, 2020 will be then used to study how the situation evolved after the end of the lockdown. Figure 3 reports the intensive care data collected in Lombardy together with the fit returned by our nonparametric technique. One can see that the model is able to well describe the observational data. Fig. 4 compares the data fit obtained by the nonparametric (solid line) and the parametric model (dotted), this latter using (3) to describe a(t). Apparently, there is no real difference in the data fit but we will show the very different results obtained with the nonparametric technique. Figure 5 plots the estimate of a(t) (solid line) which describes how the level of social interactions in Lombardy evolved in time. The same figure also reports 95% confidence intervals (dashed). Uncertainty bounds are quite asymmetric and not so small, pointing out the difficulty of the problem, but results appear somewhat significant. One can in fact see that, after March 9 (the beginning of the lockdown), a(t) decreases from 0.28 to 0.184. Such value in practice remains constant for almost ten days. After March 20, the consequences of the restrictions become more pronounced and the curve decreases until April 20 (day 50 on the x-axis). The contact rate then diminishes again starting from the beginning of May (61 on the x-axis), reaching a plateau value equal to 0.05 ten days before the lockdown's end. Overall, the reconstructed profile of a(t) seems really realistic, also in view www.nature.com/scientificreports/ of the fact that restrictions were further strengthened after the first lockdown. Beyond delays in observing the restrictions effects, our estimated nonparametric trend surely incorporates also other complex time-varying phenomena, not only biological but also social, that influence the virus infection rate. Examples of such phenomena regard increments in people perception of infection risk and in the use of precautions, like social distancing and protective masks, which greatly helped to control the virus spread. The importance of our new technique is also illustrated in Fig. 6 which compares the nonparametric estimate of a(t) (solid line, the same displayed in Fig. 5) and the parametric one (dashed) where a(t) can assume only two values according to (3). Obviously, the parametric reconstruction is unable to track the time-varying impact of the restrictions in Lombardy. It is interesting to see that, for all the duration of the lockdown, the estimate of the social interactions level is around 0.18, close to that returned by the nonparametric estimator at the beginning of the lockdown. So, all the considerations regarding the evolution of people behaviour are lost using the classical approach. This inevitably leads to overestimation of SARS-CoV-2 transmission rate. Fig. 7 displays the nonparametric estimate of the reproduction number in Lombardy (solid line) as defined in (2). Tracking γ (t) is crucial to understand if the restraints are effective: the epidemic is under control when it is smaller than one. Our results suggest that, before the lockdown, its value was somewhat large, around 3.4.   (1)  www.nature.com/scientificreports/ After March 9 it decreased to 2.3 and values close to one were obtained only after 40 days from the beginning of the lockdown (day 50 on the x-axis). The reproduction number reached its minimum value, around 0.6, just a few days before the end of the lockdown. The same figure also reports the parametric estimate of the reproduction number in Lombardy with a(t) constrained to assume only two values (dotted line). Differently from the nonparametric technique, this approach overestimates the reproduction number and does not allow to understand when the epidemic is under control. Failure of the parametric method has been assessed also using other parametrizations. In particular, beyond results obtained by using (3), the top panel of Fig. 8 also reports the estimate of the contact rate by letting a(t) assume eight values (dashed), with switching times uniformly distributed in time. One can see that the information obtained regarding the reproduction number is unsatisfactory and its value remains larger than one during all the lockdown. The bottom panel reports also the estimate by assuming that a(t) decays exponentially (see Material and Methods for details). Even with this parametrization, one would obtain that the reproduction number was not smaller than one also at the end of the lockdown. This suggests that it is worth adopting a nonparametric approach where a(t) may assume more complex trajectories able to capture all those time-varying (biological and social) phenomena which influence the virus infection rate.  www.nature.com/scientificreports/ After documenting the spread evolution in Lombardy during the lockdown, the nonparametric model is now exploited to understand how the situation evolved in the last months. For this aim, we use the intensive care data collected from May to August, 2020. They are shown in the top panel of Fig. 9 together with the nonparametric model fit. The bottom panel then reports the estimated time-course of the reproduction number in Lombardy after the end of the lockdown. On May, and for the most part of June, the estimate is below the critical threshold. But at the end of June (around day 40 on the x-axis) and in the first part of July the reproduction number was close to one. Then, after decreasing (starting from day 60 on the x-axis), it increased becoming larger than one on the first of August. The maximum value 1.36 was reached on August 5 while on August 17, the last day here considered, the reproduction number was close to 1.1. Our outcomes suggest that the adoption of social distancing measures and protections like masks greatly decreased during the summer season. Policy makers need to carefully consider this to avoid a second wave of SARS-CoV-2 spread in Italy.
In this regard, a few months after the submission of the first version of this paper, Italy was indeed subject to such second wave. Fig. 10 then plots the time-course of the reproduction number in Lombardy estimated by exploiting the new data which became available starting from September 2020. One can see that in October the situation became again critical with the reproduction number reaching a peak with value around 1.8. This forced to introduce new restrictions and a new lockdown (less severe w.r.t. the first one) was then set on November 5, 2020. Its effect is evident in the reconstructed profile: on December 22th, the last day here considered, the value of the reproduction number decreased to 0.8.
The nonparametric approach can also infer the number of infected in Italy. A very recent study has detected the presence of antibodies against SARS-CoV-2 in 2.5% of Italian population and 7.5% in Lombardy, e.g. see 41 .
These estimates are however affected by two sources of uncertainty. The first one is due to the size of the sample (only around 60 thousand out of 60 million people underwent the test in Italy). The second one is the fact that the recent Nature Medicine report 42 shows that antibody levels can drop significantly during recovery. The levels could become undetectable within 2-3 months, especially in asymptomatic or people who showed mild symptoms. To account for these uncertainties, still considering Lombardy as case study, we interpret the percentage of www.nature.com/scientificreports/   www.nature.com/scientificreports/ infected as a nonnegative random variable with mean 7.5. Among the infinite types of probability distributions compatible with such information we then choose that maximizing the entropy 43 , i.e. the least committing one on the basis of the current information available. It turns out that the maximum entropy prior on the percentage of infected deriving from the antibody tests is an exponential distribution whose mean and SD is 7.5 44 . Interestingly, when coupled with such a prior, the parametric approach is unable to describe the intensive care data in an acceptable way. This means that model (3) is not sufficiently flexible to trade off measurements and prior information coming from antibody tests. The nonparametric approach is instead much more versatile and can well describe the data obtaining a fit similar to that displayed in Fig. 3. Our model then predicts that the estimated percentage of infected in Lombardy at the end of the lockdown was close to 12.5% . By projecting such result at national level, using the assumption on the multiplier H, Fig. 11 reports the estimated time-courses of infected I(t) (top panel) and of total infected I(t) + R(t) (bottom) in Italy. The top panel shows that the estimated value of the peak of infected was around 1.8% . The bottom panel shows that our model then predicts that almost 5.1% of Italian population has been infected at the end of the lockdown. The upper bound for the 95% CI is around 13% , suggesting that antibody tests could underestimate significantly the number of total infected. Finally, since around 35000 people died in Italy as of August 17, 2020, our estimate of SARS-CoV-2 fatality rate turns out to be 1.14%.

Discussion
The compartmental and spatial models of SARS-CoV-2 dynamic so far adopted in the literature use parametric descriptions of the contact rate and the reproduction number. This paper shows that these approaches may lead to a wrong assessment on ongoing pandemic since they have difficulties to properly capture lockdown's time-varying www.nature.com/scientificreports/ impact on people behaviour. This may lead to an overestimation of the reproduction number which does not allow to well understand if and when the epidemic is under control. Motivated by these difficulties, in this work we have developed new regularized machine learning techniques which lead to the definition of an entire new class of nonparametric compartmental models. The contact rate is not confined to live in a finite-dimensional space and the temporal profile of the reproduction number is estimated from intensive care data within a very rich family of functions. Our approach, applied to data collected in Italy, shows that the reproduction number can embed many factors really hard to be captured by parametric models. They include significant delays in lockdowns effects since social distancing measures and use of protections like masks can greatly vary over time. The nonparametric estimator is instead able to track these phenomena with a resolution never reached before. This may greatly help in quantifying their impact to contain SARS-CoV-2 spread, allowing to convey useful messages to political decision makers. We have used the Lombardy data, the most affected Italian region, as case study to show the potential of the reproduction number tracking methodology to document how people behaviour changed during the restrictions and its importance to contain the epidemic. We have also illustrated how the situation changed after the end of the lockdown. Obtained results suggest that the adoption of social distancing measures and protections much decreased during the summer season due to holiday relax especially in the younger population and increased migrants arrival. This unfortunately helped to trigger a second wave of SARS-CoV-2 spread in Italy during the subsequent months. Results have been also properly extrapolated to obtain a new nonparametric estimate of the number of infected in Italy. Even if care has to be taken in their interpretation, e.g. in view of the homogeneity assumptions underlying the time-varying SIR model here adopted (the population has to be well mixed), results appear important and describe a level of epidemic diffusion in Lombardy and Italy around 12% and 5% , respectively.
There is no precise appreciation of virus circulation in Lombardy, nor in Italy, from which deriving a real measure of morbidity and lethality. This latter parameter, obtained from total deaths over confirmed positive cases (35,437 over 259,345 as of August 24, 2020) actually accounts for 13.66% , one of the highest estimates in the world. The cause for this exceptional event certainly stems from the extremely severe COVID-19 cumulative incidence at the pandemic onset in Lombardy and in some northern regions. All available intensive care beds were in fact overloaded in Lombardy as well as first aid facilities, a condition that allowed a large and rapid spreading of a considerably high contagion both in the nosocomial and in the community settings. To have a more precise assessment of COVID-19 lethality, only a few studies of seroprevalence have been then conducted in Italy, see e.g. 45 . They were performed independently by some regions or single hospital institutions by means of either accurate CLIA or ELISA blood tests or quick lateral chromatography assays, whose performance is rather questionable 46,47 . The only nation-based study, a CLIA assay measuring anti-N antibodies failed to reach its designed target of 150.000 subjects stratified per age, risk, sex and location. This leads to an estimated fatality rate in Italy around 2.5% . If one considers that specific antibody testing is not always available for new pathogens and that serology can underestimate the total number of SARS-CoV-2 infected individuals 42 , our proposed non parametric compartmental model brings about a very valuable method to measure the real clinical and public health impact of the pandemic and its evolution. Interestingly, it provides an estimate of the fatality rate around 1% which is now in line with those concerning other countries and recently reported in the literature 48 .
In conclusion, a number of mathematical models have been proposed to predict the biological and medical implications of the ongoing COVID-19 pandemic with the aim of optimizing preventive and interventional Table 1. Policy Summary.
Background Substantial research has been recently carried out to understand SARS-CoV-2 dynamics. In particular, an accurate modeling of the lockdown effects on human behaviour and on epidemic evolution is crucial to inform health-care decisions on emergency management. The compartmental and spatial models so far proposed adopt parametric descriptions of the contact rate (e.g. to account for social interactions), often assuming a time-invariant effect of the lockdown with a single parameter adopted to describe people's behaviour during the restrictions. However, the contact rate depends on many factors hard to be captured by parametric models. These include delays in lockdowns effects, also because social distancing measures and use of protections like masks can greatly vary over time.

Main findings and limitations
We show that parametric models of the contact rate may lead to erroneous evaluations of the ongoing pandemia. A new class of nonparametric compartmental models is introduced. To track a realistic profile of the contact rate we develop a model identification strategy able to describe how the impact of the lockdown varies in time with a resolution never reached before in the literature. This permits to reconstruct a continuous-time profile of SARS-CoV-2 reproduction number, a fundamental epidemiological variable to measure the disease infectivity level. Data from Lombardy, the most affected Italian region, are used as case study to show the potential of our methodology by documenting how people behaviour changed during the restrictions and its importance to contain the epidemic. We also illustrate how the situation changed after the end of the lockdown. Finally, the number of infected people in Italy is inferred by incorporating the uncertainty around the estimates obtained by antibodies tests. By giving quantitative results regarding an entire country, our nonparametric estimate seems to confirm studies, like the recent Nature Medicine report 42 , which shows that antibody levels can drop significantly during recovery. Available information on SARS-CoV-2 dynamics is still incomplete, thus the use of Bayesian priors in the modeling, while certainly feasible, bears important uncertainties in model prediction. Hence, our estimate of the reproduction number is purely data-driven. As with all modeling studies, care has to be taken in the interpretation of our outcomes. We use reliable data regarding number of people in intensive care on a daily basis. However, they are available only on a (macroscopic) regional scale. This requires the introduction of homogeneity assumptions (population has to be well mixed) which, in turn, allows the use of compartmental models. In the future, our methodology can be easily adapted to exploit data on the evolution of hospitalized people in smaller districts. This can further increase its validity, permitting also the inclusion of spatial components to infer geographical spread distribution.
Policy implications Several countries still observe a growing epidemic, and may be subject to other waves. Hence, the proposed reproduction number tracking methodology can be of great help to health care authorities to prevent another SARS-CoV-2 diffusion or to assess the effect of the lockdown on human behaviour to contain the spread. Our results also show that the adoption of social distancing measures and protections much decreased during the summer season in Italy. This appears especially dangerous since it could trigger a second wave of SARS-CoV-2 spread in Italy during the next months. www.nature.com/scientificreports/ measures 19,26,30 . Our model, herewith described, could establish a new very precise dynamic estimate of the infection evolution by accurate tracking of the reproduction number, giving in advance the real morbidity and lethality measures. These are extremely helpful parameters for setting up preparedness and responsiveness plans for control of possible recurrent waves of SARS-CoV-2 as well as of future emerging and pandemic infections. Moreover, it is these authors' opinion that the present model could be a critical indicator for establishing sustainability of any health system and preventing its collapse. Finally, for future studies the nonparametric approach here developed can be incorporated also in models which have a broader scope than recovering the reproduction number and the number of infected.

Methods
We describe our nonparametric approach for contact rate estimation in the context of the SIR model used to generate the results described in the paper. This is done without loss of generality. As also clear in the sequel, all the ideas and mathematical results here obtained then apply to any compartmental model, hence defining the entire nonparametric class illustrated in Fig. 1.
Time-varying SIR. Consider, without loss of generality, a population normalized to 1. According to SIR models, the notation S(t) indicates the susceptible people (who can be infected), I(t) denotes the infected people (who have been infected and are able to spread the infection), while R(t) represents the removed people (who were infected but then either healed or died). Healed people acquire immunity, so that S(t) is a decreasing function. In addition, susceptible people can be infected through dynamics depending on the number of contacts between infected and susceptible ones, i.e.
Above, a(t) is the contact rate which is strongly related to the adopted restraints. The function I(t) can both increase, because of susceptible who become infected, and decrease, because of healing and/or death. The decreasing rate is proportional to I(t), i.e.
Finally, R(t) increases accordingly to the healing/death rate, i.e.
In the above equations, b describes average time for healing/death. In absence of effective cures it can be assumed independent of time. We are only interested in positive solutions, i.e. S(t), I(t), R(t) ≥ 0 , which are necessarily bounded since By defining q(t) := b(t) a(t) > 0 , which corresponds to the inverse of the reproduction number γ (t) at the beginning of the epidemic, it easily follows that The integral on the lhs is evaluated for t → −∞ , when no infected and, hence, no removed are present. So, one has S(−∞) = 1 and I(−∞) = R(−∞) = 0 . By defining

I(t) and R(t) become the following functions of S(t):
This permits us to rewrite the differential equations in terms of q(t) as We have seen that the function a(t) has to describe how the level of people social interactions evolves in time, accounting also for lockdowns effects. Before the lockdown's instant t * , such a function is assumed constant. Under these stationary assumptions, i.e. in absence of lockdowns, the function q(t) is thus equal to the constant q and one has δ(t) = 0 . This impliesṠ (t) = −a(t)S(t)I(t), a(t) > 0.   www.nature.com/scientificreports/ Now, let the instant t = 0 be the beginning of our experiment, with S(0) ≈ 1 . The lockdown's instant is instead denoted by t * > 0 (corresponding to March 9, 2020, in Italy). Using (9), the following approximated relationship is obtained A parametric class of time-varying SIR. We start introducing a parametric class of time-varying SIR models instrumental for the building of the nonparametric approach. Our data model is Note that the measurable output y(t) is proportional to the number of infected people through the inverse of the unknown parameter H. The state dynamics then depend on the parameter b and the time-varying contact rate a(t).
Since the system is assumed to be stationary before the lockdown's instant, initial conditions are where in the expression of S(0) we have assumed exact the approximation (10). While a(t) is constant before the lockdown's instant t * , next we assume that it has a discontinuity in t * . Then, during the restrictions, its value could still decrease e.g. due to people's growing awareness of infection risk or because restrictions can be further strengthened after the first lockdown. One simple parametric time-course for a(t) is given by where t end denotes the end of the lockdown (May 18, 2020, in Italy). Note that a(t) would tend to zero if the lockdown would never end ( t end = +∞).
The above model does not follow the paradigm depicted in Fig. 1 because the contact rate is not described through an infinite-dimensional model. It depends on parameters which are the components of the following finite-dimensional parameter vector For our future developments, we need to prove that θ is globally identifiable from data, i.e. it can be reconstructed under the ideal assumption of knowledge of the entire output trajectory y(t). Using differential algebra tools, e.g. see 49 , the system leads to the following characteristic set: If t < t * , one has a(t) = a 1 and ȧ(t) = 0 so that the coefficients of the characteristic set become Hence, since all the three parameters are known to be positive, the values of a 1 , H, b can be univocally determined. If t ≥ t * , one has a(t) = a 2 e −cτ ,ȧ(t) = −ca 2 e −cτ . If τ 1 = t 1 − t * and τ 2 = t 2 − t * are two distinct and known time-instants, the characteristic set permits e.g. to reconstruct and this fact, combined with the knowledge of H, permits to achieve also a 2 and c.
Having shown that such parametric time-varying SIR is globally identifiable, we can use least squares to estimate θ . Let y θ (t i ) denote the output of the SIR model as a function of the unknown parameter vector. Then, two estimators are now considered. The first one uses a subclass of models defined by imposing c = 0 , making the contact rate description equal to (3). The resulting estimator is . The other estimator exploits the entire class and is thus given by Using intensive care data in Lombardy, the estimates of the components of θ turn out while, assuming Gaussian noise, the maximum likelihood estimate of the noise variance is σ 2 =3.5e-12. This corresponds to a standard deviation equal to 18.8 on the intensive care data not normalized w.r.t. whole population in Lombardy which were shown in the right panel of Fig. 2. Thus, the estimate of a(t) for t ≥ t * is 0.19e −0.011t and provides a first hint as how the contact rate decreased during the lockdown in Lombardy. But it is questionable if a mono-exponential is suited to describe a so complex phenomenon. For this reason, in the next section this simple model will be generalized through nonparametric arguments.
Nonparametric model of the contact rate. We will assume that a(t) belongs to a special class of Hilbert spaces H called Reproducing kernel Hilbert spaces (RKHSs) 50,51 . To introduce them, recall that, if X denotes the function domain, K : X × X → R is called positive definite kernel if, for any finite natural number p, it holds that One can then prove that any RKHS is in one-to-one correspondence with a positive definite kernel and inherits the properties of the kernel, e.g. continuous kernels induce spaces of continuous functions. For our developments, the following fact is also important. Given a kernel K, the kernel section K x centered at x is the function X → R defined by Then, one has that any function in H is a linear combination of a possibly infinite number of kernel sections 52 . The question is now which RKHS can be conveniently introduced as hypothesis space for a(t). During a lockdown, this function is expected to have a smooth decay as time progresses. We can then consider the so called first-order stable spline kernel defined by which was originally introduced in the literature to describe impulse responses of stable systems 53 . It depends on the positive scale factor and the scalar α which regulates the decay rate of the functions contained in the associated RKHS. We will fix these two parameters by exploiting the estimates of the mono-exponential decay obtained in the previous section. In particular, we set =â 2 2 and α = 2ĉ . Thinking also of the Bayesian interpretation of regularization, where the kernel is seen as a covariance 54 , this makes our space in some sense centred around exponentials of amplitude â 2 and decay rate ĉ . This fully defines the kernel and, hence, the associated RKHS H . It can be proved that such stable spline space is infinite-dimensional and able to approximate any continuous map. Our nonparametric model for a(t) is then defined by So, the overall model now follows the paradigm in Fig. 1 with θ = [a b H] and the a(t) defined by a and f ∈ H. Estimation of f and θ is however ill-posed. This problem is circumvented using regularization in H with penalty term defined by the RKHS norm � · � H . Specifically, letting y f ,θ (t i ) be the output of the SIR model as a function of f and θ , our estimator is given by where σ 2 is the maximum likelihood estimate of the noise variance already mentioned in the previous section. The objective in (14) includes two different components. The first one is a quadratic loss and penalizes values of θ and f associated to compartmental models unable to well describe the observational data. The second one is the regularizer, defined by the RKHS norm, which restores well-posedness. It excludes non plausible solutions for the contact rate a(t), e.g. defined by too irregular temporal profiles of f(t). The problem thus corresponds to a nonlinear version of a regularization network 36,37 . Its solution exists since, according to the results in 55 , optimization can be restricted to a compact set of the continuous functions equipped with the sup-norm where the map y f ,θ is continuous (see also Appendix of 56 ) and the regularizer is lower semicontinuous. K x (y) = K(x, y) ∀y ∈ X .
(12) K(t, τ ) = e −α max (t,τ ) , 0 < α < 1, ≥ 0 www.nature.com/scientificreports/ However, differently from the classical machine learning problems where f is linearly related to data, the nonlinearities present in our compartmental model makes the solution (14) not available in closed-form. To compute it, the following strategy has been then adopted. By fixing an integer M, we define the following representation f (t) = M i c i K t i (t) given by sum of kernel sections over a uniform grid of temporal instants t i . Next, a sequence of solutions of the problem (14), with the objective restricted to these finite-dimensional subspaces of dimension M, is obtained for increasing values of M. This is done until reaching convergence (which is guaranteed still exploiting the results in 55 ).
The stable spline kernel is useful to describe the contact rate during the lockdown, since it embeds information on smooth decay of the reproduction number. Since γ (t) after the end of the lockdown is not expected to decay, and could also increase, to obtain the results depicted in Fig. 9 we have used the Laplacian kernel 37 which embeds information only on continuity of the time-course. System initial conditions at the end of the lockdown are set to the estimates obtained by the procedure reported above. Next, a parametric model with constant contact rate a(t) is fitted to data, obtaining also a recalibration of the noise variance, and the scale factor is set to its squared value. The kernel width η is then estimated through the concept of Bayesian evidence, exploiting the stochastic interpretation of (14), as also discussed in the next paragraph, and using the Laplace approximation to compute the model posterior probability 57 .
Finally, to complement the estimates with confidence intervals, a Bayesian framework has been adopted resorting to the stochastic interpretation of regularization and the duality between RKHSs and Gaussian processes 54 . The noise affecting the data is assumed to be Gaussian. The components of θ , and also the noise variance, are seen as mutually independent random variables and are assigned poorly informative prior distributions, in practice including only nonnegativity information. The contact rate for t ≥ t * is then seen as a Gaussian process defined by f (t) = M i c i K t i (t) . Here, the c i are the components of the zero-mean Gaussian vector c whose covariance matrix is the inverse of K ∈ R M×M with (i, j) entry given by K(t i , t j ) . In this way, the function f(t) sampled on the t i is indeed Gaussian with covariance matrix K . Markov chain Monte Carlo has been then used to reconstruct the posterior in sampled form 38 . In particular, a random walk Metropolis has been implemented. Covariances of the increments have been tuned through a pilot analysis to obtain an acceptance rate around 30% and then 4 million iterations have been performed.