A stochastic model explains the periodicity phenomenon of influenza on network

Influenza is an infectious disease with obvious periodic changes over time. It is of great practical significance to explore the non-environment-related factors that cause this regularity for influenza control and individual protection. In this paper, based on the randomness of population number and the heterogeneity of population contact, we have established a stochastic infectious disease model about influenza based on the degree of the network, and obtained the power spectral density function by using the van Kampen expansion method of the master equation. The relevant parameters are obtained by fitting the influenza data of sentinel hospitals. The results of the numerical analysis show that: (1) for the infected, the infection period of patients who go to the sentinel hospitals is particularly different from the others who do not; (2) for all the infected, there is an obvious nonlinear relationship between their infection period and the visiting rate of the influenza sentinel hospitals, the infection rate and the degree. Among them, only the infection period of patients who do not go to the sentinel hospitals decreased monotonously with the infection rate (increased monotonously with the visiting rate), while the rest had a non-monotonic relationship.

Influenza is an respiratory infection caused by a virus. Since the outbreak of the Spanish flu in 1918, there has been a large amount of literature on influenza, among which mathematical model is a very important tool for studying influenza [1][2][3][4] . The traditional deterministic mean-field model [5][6][7][8] regards different populations as uniformly mixed and ignores the differences in the contact process and behavior among individuals. In general, this contact process can be viewed as a network in which nodes represent individuals and edges represent contacts between individuals. Network has its unique topology structure, such as degree k (i.e. the number of edges connecting to one node), degree distribution (i.e. P(k) denotes the probability that a randomly chosen node in the network has degree k), clustering coefficient (i.e. the average probability that two neighbors of a node are themselves neighbors) and so on 9,10 . The topological structure of complex network may lead to some results of overthrowing the traditional, for instance, there is no epidemic threshold in a scale-free network which follows a degree distribution of power-law ( P(k) ∝ k −γ when 1 < γ ≤ 3 ), for the SIS model 11 raised by Pastor. So far, there are many epidemic disease modelling methods based on different features of the network, such as pairwise model 12 , edge-based compartmental modelling 13 , degree-based modelling 14 and effective degree modelling 15 .
The recurrent of epidemic brings a big challenge for people to treat the disease 16 . Many diseases exhibit the phenomenon of annual, biennial, multi-annual and irregular oscillations, such as measles, whooping cough, influenza and so on. The prediction of these diseases by using deterministic dynamic model satisfactorily both the internal mechanism and persistence properties exhibited by case report remains elusive, especially the multi-year periodicity phenomenon, which can be regarded as the comprehensive results of nonlinear dynamics, random factors and seasonal forcing 17 . Many works of literature related to seasonal cases using the deterministic dynamic model where the contact rate is temporal 18,19 , and the period is artificial, yet, the endogenous period is not always fixed at 1 year in reality. By using some methods, such as Fourier transform and wavelet analysis, which are used in the following part, we can get the endogenous frequency of these case data, which better conforms to the actual situation and helpful in controlling a pandemic.
In addition, noise is usually considered a nuisance, yet it is unavoidable in reality. However, many researches show that it may play a constructive role. Noise interacting with the deterministic dynamic system may lead to some different results: (1) extinction 20 , the disease may die out even though the basic reproduction number R 0 is larger than one in the deterministic system; (2) fluctuation 21 , the autonomous system is allowed to remain The stochastic SHIRS model on network Stochastic modelling. In this paper, we divide the population N into four classes, in which S denotes the number of individuals who are susceptible to disease, but are not yet infected at the moment; H or I represents the size of those who are infected with the disease and are able to transmit disease to the susceptible people by contacting with them, the only difference between them is whether to go to the sentinel hospitals, in which H denotes the infected people who have gone to the sentinel hospitals, yet I not; R is the size of people who have recovered from the infected and immune to the disease.
The number of contacts (i.e. the degree) is extraordinary different because of the diversity of factors between people like age, profession and so on. For example: people in school due to the factors of enormous population density has a higher degree. So degree is a important indicator to depict the heterogeneity. Similar to the modelling method proposed by Pastor 11,14 , we mark off the whole population into n groups according to the degree, The wavelet spectrum analysis corresponds to time series of the data of (a). High power values are colored in yellow, orange and cyan denote intermediate power, blue denotes low one. Note the black line is the 95% confidence level. Only patterns within these lines which can neglect the edge effects are considered reliable. (c) The average wavelet spectrum (blue line) and the corresponding 95% confidence contour(red). www.nature.com/scientificreports/ i.e. N = N 1 + ... + N n , in which, N k is the size of population with degree k, (k = 1, 2, ..., n) . P(k) = N k N is the degree distribution of the network, k is the mean degree, i.e. �k� = n k=1 kP(k) , similarly, �k 2 � = n k=1 k 2 P(k) is the secondary moment of the degree distribution. Based on the classification of disease we mentioned above, people in one group can be divided to four classes: S k , H k , I k , R k , in which S k denotes the number of the susceptible people with degree k, and H k , I k , R k have the similar meanings. We assume the total population size and degree distribution do not change with time, which implies that the N k is a constant. By using the relationship expression: R k = N k − S k − H k − I k , we can obtain the recovered individual R k under the new (S k , H k , I k ) framework, so the state of the system can be defined by the three integers (S k , H k , I k ).
Though the deterministic ordinary differential equation can be viewed as the approximate solution of the stochastic differential equation when the number of population is large, the stochastic modelling method is more accurate, in which continuous time Markov modelling is used in our paper.
At first, we should understand the whole transfer processes of the disease and the corresponding transfer rate, which has been listed in Table 1. Figure 2 has shown the schematic diagram of our model, from which we can know people in one compartment may enter into another due to the three processes: Infection S k may become infected by contacting with a infected individual in any group. i.e. S k where ρ is the proportion of infected people who have go to the sentinel hospitals, β is the probability of infection per contact, if the correlation of the states of the diseases and the degree have been neglected, under this condition, θ which means the probability of any chosen edge being connected to infected has the following form: Because of the Markov property, the current system state just depends on the one of last time, so we use T(δ ′ |δ) to denote the transfer rate of the system state changing from δ to δ ′ , then the transition rate of the infection process can be written as:  Table 1. List of transition rates.

Event Transition Rate
Infection Figure 2. Schematic diagram of the SHIRS based on degree, the parameters meaning are listed in Table 2. − →R k , where γ 1 and γ 2 are recovery rates. This transition rate can be denoted as: Lose immunity The recovered people will lose immunity after some time, R k α − →S k , in which, α is the rate of lose immunity. The corresponding transition rate of this process is: Now, this continuous time Markov process can be modeled using a master equation, which has the general form: This equation describes the process of state transition of the system: some system states shift from δ ′ to δ , meanwhile δ changes to the other system state δ ′ , p(δ, t) is the probability of the system state in the δ at time t. The explicit form of the master equation involving the three processes is given below: The deterministic limit system. From the last section, the master equation has been obtained, which contains all information of the system, in this section, we will study the deterministic limit system corresponding to the (6).
For the mean values may be obtained by multiplying Eq. (6) by S k , H k , I k in turn, and then summing over all the states of the system, the mean field theory takes the explicit form: When we substitute these transition rates of Eqs. (2)-(4) into the above and introduce the corresponding density variables in the limit N k → ∞, . www.nature.com/scientificreports/ The corresponding deterministic equations of s k , h k and i k can be obtained: in which θ has the following equivalent form: It is easy to obtain that for the system (9), the basic reproduction number �k� is the critical threshold, which means the average number of secondary infections caused by one infected individual in a susceptible population. (9), the following situations hold

Theorem 2.1 For the system
h * k has no concrete expression, but it satisfies the conditions below: and the disease is permanent.

The proof of the theorem is given in the supplementary, and in which
The periodicity in different groups. Through the above analysis, we know that for the deterministic system (9), when R 0 > 1 , the disease ultimately fixed in the point of E * , yet, the influenza exhibits oscillating phenomenon in reality by the influence of noise even neglecting the seasonal factor, which can be seen from the Fig. 3. Hence we use the van Kampen's system-size expansion method to obtain the higher-order terms which can be used to investigate the perturbations around the steady-state solution of the deterministic system. Firstly, the new continuous random variables x k , y k , z k are brought in , which have the following relationship between the discrete variables S k , H k , I k and the corresponding density variables s k , h k , i k The variables x k , y k , z k are corrections of s k , h k , i k , which are in N k − 1 2 terms and can be viewed as the fluctuations around the epidemic equilibrium to be the order of N k −1 , from the aspect of central-limit theorem. Second, in order to study the property of the fluctuation, we define a new probability distribution function π(x k , y k , z k , t) = p(S k , H k , I k , t) , the following equation can be obtained by using the chain rule: The detailed derivation process of (13) is given in the supplementary.
By introducing the step operators below, and then substituting the (2)-(4), (12) and (16) into (15), making a comparison with (13) order by order yields the so-called macroscopic Eq. (9) to leading order and the following Fokker-Plance equation (FPE):   Table 2. The stochastic simulation is implemented by the event algorithm of Gillespie with transition rates listed in Table 1. The deterministic curve is generated according to the Eq. in which the stochastic variables x k , y k , z k are corrections of s k , h k , i k , and η 1 , η 2 , η 3 are Gaussian white noises whose mean are zero, correlation function are in which δ(t − t ′ ) denotes the Dirac delta function. The equivalence prove of LE and FPE can be found in 28 .
Taking the Fourier transform of (18) gets the following results: in which, x k = +∞ −∞ x k (t)e iwt dt and i is imaginary unit, similar to the ỹ k and z k . Because (18) is a OU process, the corresponding limits of the mean depend on the eigenvalues of A, when R 0 > 1 , by means of the Fig. 3, we can see the endemic equilibrium E * is stable, i.e. the eigenvalues of A are negative, so when t → ∞ , �x k , y k , z k � → 0.
Solving the equation of (19), we can obtain: dx k dt = A 11 x k + A 12 y k + A 13 z k + η 1 , Averaging the squared moduli of x k , ỹ k and z k gives the power-spectrum: Up to this point, the analytical expression of PSD have been derived, because of the difficulty of the calculation in high-dimension coupled system, the concrete expressions of the epidemic equilibrium can't be obtained, so we can analysis the frequency of different groups just depending on the numerical simulation.

Results
In this section, we will use the influenza data of Taiyuan sentinel hospitals in the first quarter of 2013 to fit about our model in order to avoid the influence of environment changing, and then discuss the influence of heterogeneity to the periodicity of influenza.

Estimation of parameters.
Based on the statistical bulletion on national economic and social development of Taiyuan, the number of population in 2013 is 4.3 million. Removal rate can be set to γ 1 = 7/3 , γ 2 = 1 and the rate of lose immunity is supposed to be α = 1/52 according to the mean disease course of influenza. The infection probability β is an estimated parameter. As for the degree distribution, the power-law distribution usually conforms to the features of real world in most cases, hence, P(k) = 2m 2 k −v (m = 3, v = 3.5) 29 , is used in this paper. As far as we know, Chinese National Influenza Center (CNIC) has set up 554 sentinel hospitals all over the country. The city of Taiyuan has two of which: Taiyuan central hospital and the first affiliated hospital of Shanxi medical university. We have the influenza data from the two sentinel hospitals, yet, a small enough number compared with those in the whole city of Taiyuan, so ρ as the proportion of Taiyuan influenza patients in sentinel hospitals, is a parameter to be evaluated. (20) x k = −(iw) 2η 1 + (iw)C 1 + D 11η1 + D 21η2 + D 31η3 D(w) , www.nature.com/scientificreports/ We adopt the least-square estimation to obtain the parameters sets � = {β, ρ} , which minimizes the objective function: Y (�) = [Y (t) − ρN max k=min P(k)βks k θ] 2 , in which Y(t) denotes the actually influenza data from sentinel hospitals at week t. The Fig. 4 shows that the model data agrees well with the actual data, and the corresponding value of parameters are summarized in Table 2.

Prevalence of influenza in Taiyuan.
CNIC has a history of more than 60 years since its foundation in 1957, especially the recently decade, the wholesome national influenza surveillance network system has made the influenza data more transparent. Our influenza data from Taiyuan sentinel hospitals are just a very small part of this system. We use the week data from 2013 to 2016 to detect whether there exists a periodic phenomenon. The data reveals a clear periodicity in the outbreaks of influenza, which can be confirmed through the wavelet power spectrum of Fig. 1. The wavelet analysis can reveal the periodic changing of a time-series, which performs the spectral characteristics as a function of time. The oscillations of influenza have a obvious annual variation, yet, the wavelet analysis based on the data shows another significant periodicity: semi-annual.
Influence of stochasticity and network structure on influenza outbreaks. The difference between the deterministic and stochastic simulation is revealed in Fig. 3, which is a particular case of fluctuation who has been mentioned in introduction. It indicates that the demographic noise and network struction can induce rich periodic phenomenon. We can understand the influence of network struction on the expected fluctuations of influenza via our analysis.
The accuracy of the theoretical analysis consequence of PSD via (21) has been verified with the simulated results in Fig. 5, using the parameters listed in Table 2. We can go further to detect how the period of influenza varies with the changes of degree k by using the formula (21). Based on this, the x-coordinate (i.e. ω ) for the maximum value of the PSD can be obtained from the analytical expression, then the corresponding period (i.e. 1/ω ) for different parameters are plotted in Figs. 6, 7, 8, and 9. In order to have a better visual effect, we just present the degree from 5 to 25 in these figures, on the whole, it does not affect our conclusions, because this group accounts for 99 percent of the total population.
When the ρ = 0.004 is fixed, how the period is influenced by the degree and β is showed in Figs. 6 and 7. In Fig. 6, for the H whose degree is less than 16, the period goes up as the β decrease, yet, when the degree is large The minimum (maximum) degree is 5 (500) and the mean degree is 7.5. www.nature.com/scientificreports/ than 16, the relationship between β and period is nonmonotonic, as β increases, the period decreases and then increases. On the other hand, when β 0.4 , as the degree increases, so does the H's period, while β < 0.4 , the period increases at the beginning and then decreases with the augment of the degree. There exists obvious nonlinear relationship between the I's period, degree and β . For the fixed degree, increasing the β , the value of period becomes small, yet, when the β is fixed, the relationship between the degree and I's period is non-monotonous. When the degree is small than k , the period becomes larger as the degree increases, yet, the trend is opposite on the other side of k , which can be obtained from the Fig. 7. This means that, for the infected people I whose degree more close to k , the more safer. When β = 0.8 is fixed, the relationship between the period, ρ and degree is showed in Figs. 8 and 9. From Fig. 8 we can observe that when ρ 0.4 , for the infected people of H, the degree is more bigger, the period is more larger, which means that people are more safer, yet, when ρ > 0.4 , with the degree increasing, the period increases firstly and decreases later, which means the people of H whose degree located intermediate is more safer. This is quite opposite for I in Fig. 9. On the other hand, when H's degree is fixed, the period decreases at the beginning and then increases as the ρ increasing. This phenomenon from a side explains that the importance of the medical resources. For the infected people of I, the relationship is different in Fig. 9. With the increase of ρ , which means more and more infected people go to the sentinel hospitals, leading the I becomes more and more safer, i.e. the period is more and more longer, this change is even more pronounced for I with higher degree.

Discussion
In this paper, we have investigated the effect of network structure on the frequency of the influenza outbreaks, focusing on the power-law networks, by adapting the van Kampen expansion approach from SHIRS disease to develop a fully stochastic model. This leads to a reasonable explanation for the periodic phenomena of influenza, and helps us to understand the complex fluctuations of influenza in Taiyuan with a dominant period of half years without the external factors of seasonal. www.nature.com/scientificreports/ We can have a better understanding of the interaction between the deterministic and stochastic components of the system by using the method of van Kampen expansion. The analytical solution of PSD agrees well with the simulation data. The result reveals that for the infected people who go to the sentinel hospitals, i.e. H, the relationship between the ρ , β and period is nonlinear, even for the hospitals hospitalization rate ρ , just an appropriate one is benefit for the patients. On the other hand, the nonlinear relationship between the three is also suitable for the infected people I in most cases, except that the relationship between ρ (or β ) and period is monotonous when other parameters are fixed.
Our work emphasizes the importance of heterogeneous contact network on the periodic outbreak of influenza, which further validates the truth that the combined action of the non-linear system and stochasticity may results many novel phenomena. On the other hand, our study also show that there exists a very close relation between the phenomenon of multi-cycle and the heterogeneity of contact.

Limitations.
To obtain the specific analytical solutions are difficult for the high dimensional coupling system, so we can not give the explicit formula about the relationship between the period, degree, β and ρ , and just have a intuitive understanding from the numerical simulation.
The influenza date we obtained from the sentinel hospitals are weekly, so we set the time-boxed intervals measured in week. Comparing to the daily one, the weekly contact network will be not very precise and may cause the repeatedly contact problem, which one we omitted in this paper. www.nature.com/scientificreports/ The approach of van Kampen is limited to the following two conditions: a formal treatment of demographic stochasticity; the existence of stable non-zero solution, for simpleness we do not take into account the changing population, seasonal forcing and the correlation between the degree. In essence, the migration of population and behavioral changing in flu season is unavoidable, which is missing in our work.
Possible extensions. It is easy to adapt this method to many diseases exhibited fluctuations, such as malaria, cholera, hand foot and mouth disease and so on. Especially for childhood diseases, similar study may gives a good idea on avoiding frequent illness. At the same time, if there are several strains that coexist for a disease, such as influenza, our method may also deduce the period for each strain, which may be good news for the vaccine's manufacturers.
We can find that from Fig. 10, even among the very nearest cities, like Beijing and Tianjin, Jiangsu and Shanghai, though the climate differences can be ignored, the wavelet power spectrum is greatly different. This may prove the importance of the contact network. On the other hand, even though the climate characteristics of city are different from north part of China (Beijing and Tianjin) to the south part (Jiangsu and Shanghai), it is a surprise phenomenon that Tianjin and Shanghai exhibit the similar periodic law. Furthermore, the same method can also be used to the influenza data of tropical cities, then comparisons with existing temperate data may reveal the dominant factors. Which one is the key factor lead to this? Temperature, humidity, wind speed, haze or human behavior? It deserves our further investigation.