Introduction

As a respiratory disease caused by Bordetella pertussis, pertussis (also known as whooping cough or 100-day cough) is one of the leading causes for serious illnesses in babies, children, teens and adults. According to the 2015 revised Chinese National Guidelines on Diagnosis and Management of Cough, pertussis is considered as a common cause of prolonged cough illness in adolescents and adults while frequently being associated with other symptoms of whooping cough1. Although pertussis is recognized as one of the most common vaccine preventable diseases, yet it still causes nearly 300,000 deaths in children each year worldwide2. In United Kingdom, a high incidence rate of pertussis infection was reported in 37% of school-aged children between 2001 and 20062. In United States, 48,277 cases of pertussis were reported in the most recent peak year of 2012 with many more cases still unreported3. In China, little is known about the occurrence of pertussis due to the scarcity of a large-scale investigation of the incidence of pertussis. A recent study from the capital of Shaanxi province which is located in the northwest of China, reported that only 5.31% of the confirmed pertussis cases were properly diagnosed, with a misdiagnosis rate as high as 94.69%4.

Accurate simulation and forecasting of the incidence of an infectious disease exert a significant impact on resource utilization and planning for future epidemics. However, epidemics of pertussis cannot be easily identified or managed, as some other respiratory pathogens often cause similar symptoms like pertussis. In addition, the change of disease incidence is influenced and constrained by changing trends, periodic changes and random disturbances like other infectious diseases5. Therefore, it is necessary to identify some effective accurate disease forecasting models to predict disease incidence based on the historical data. Specially, time series analysis, which comprises methods for analysing time series data to extract meaningful statistics and other characteristics of the data, is naturally required in this field.

Time series analysis means to implement a model to predict future values based on the previously observed values. Time series analysis consists of several approaches like the Exponential Smoothing (ETS model), Auto-Regressive and Moving Average Model (ARMA model), Neural Network model and some other models are subject to intense methodological developments in recent years6,7,8,9. Some of these models, such as the ARIMA (or seasonal ARIMA) model and the ETS model taking both overall trends and seasonal changes into account were considered useful tools in modeling time series with trend, cyclicity and seasonality10,11. Accordingly, these models were appropriate for analyzing pertussis incidence in China. Inspired by the advantages of these models, these methods were considered to optimal for forecasting the pertussis incidence in mainland China.

Materials and Methods

Data resource

The data of pertussis was obtained from the Chinese Center for Disease Control and Prevention (China CDC) [http://www.chinacdc.cn/] and the Bulletin of the Ministry of Health (from China VIP database) [http://lib.cqvip.com/] and the data from January 2005 to June 2016 were assembled as monthly counts of the reported cases.

Statistical Analysis

As the most well-known branch of decision supporting tools in clinical epidemiology, time series having being increasingly exploited in epidemiological research in recent years. Since ETS model and ARIMA model have become more popular in time series in recent years as mentioned above9,12, in this study, the two were implemented as the epidemiological analysis methods.

ARIMA model

Originally proposed by Box and Jenkins in the early 1970s, ARIMA model has been viewed as one of the most effective models for describing and forecasting time series13,14. An ARIMA model contains three components: The “AR” stands for autoregressive, the “I” stands for Integration and the “MA” stands for moving average. Generally, some time series have some form of cyclicity or seasonality trends (i.e., yearly or monthly). To illustrate these periodically changed data, a seasonal ARIMA model is thus adopted for modeling such data sets. A seasonal ARIMA model would be designed as ARIMA (p,d,q)(P,D,Q)s, (p =  non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal MA order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA order) and s = time span of repeating seasonal pattern. In R software, the seasonal ARIMA model automatically selected the parameters for the best performing model according to either the minimum of Akaike information criterion (AIC), the corrected Akaike information criterion (AICc) or the Bayesian information criterion (BIC)15,16. In the second step, the simulating and forecasting results are given by the chosen model. Finally, the Ljung-Box Q test was used to diagnose whether the residual error sequence was a white-noise sequence.

ETS model

The ETS model considers the error, trend and seasonal components of a given time series and evaluates 30 possible alterative models prior to selecting the best performing model to simulate the data17. The major three parameters are the error, trend and seasonal components which can be additive (A), multiplicative (M) or none (N). The best performing model is chosen according to either the minimum of AIC, AICc or BIC. As an automatic forecasting model incorporating the foundations of exponential smoothing, the ETS technique provided the forecast package for the R software outlined by Hyndman18, the Ljung-Box Q test was also used to diagnose whether the residual error sequence was a white-noise sequence.

The analytic procedure of pertussis incidence

In this paper, the analytic procedure of pertussis incidence in mainland China is divided into the following steps: The first step is the pre-processing step, to make the time series set stationary and to choose the alternative models being considered and to introduce the criteria that will be used to determine how well the alternative models performed. To remove short-term (monthly) fluctuations and determine the long term time-series over multiple years, the Hodrick-Prescott filter method is performed as the cyclical and seasonal decomposition method in this paper19,20,21. The ARCH-LM test is also provided to verify the existence of ARCH effects. The second step is the model processing step to perform in-sample simulating and forecasting, by running the “the “auto.arima()” code in R software, the best performing model ARIMA model with either the minimum of AIC, AICc, BIC is automatically selected, the Ljung-Box Q tests is performed to diagnose whether the residual error sequence the of the best performing models is white-noise sequence; For ETS modeling, the best performing ETS model is also automatically selected by following the principle of minimum of AIC, AICc or BIC. With the implementation of auto regressive conditional heteroscedasticity (ARCH) Lagrangian multiplier (LM) test, the structural break of the residuals is identified to determine the existence of volatility in the series. With testing goodness of in-sample simulating and forecasting, the optimal model with either the minimum of The Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Deviation (MAD) or Mean Absolute Percentage Error (MAPE) test is finally determined between the best performing ARIMA model and the best performing ETS model. The third step is to perform an out-of-sample forecasting with the optimal model.

The study was approved by the Affiliated Hospital of Chengdu University. As aggregated data with no personal information were involved. All statistical analysis is conducted through R software (version 3.2.3, The R Foundation for Statistical Computing, Vienna, Austria)17.

Role of the funding and data sources

The opinions reported in this paper are those of the authors, which are independent from the funding sources and no endorsement from China CDC or other official organizations is intended or should be inferred.

Results

General information

According to the monthly reported incidence rates in China, no more than 10, 000 cases of pertussis are reported every year, which is much less than many other countries22 and the disease incidence is also much less than other infectious diseases like hand-foot-mouth disease in China23. In spite of the relative low incidence rates compared to other infectious disease, the incidence rates of pertussis had increased by 66.20%, from 4058 in 2005 to 6744 in 2015 in the last decade (Fig. 1). While using the Hodrick-Prescott filter method to remove short-term (monthly) fluctuations, a substantial rise was observed from 2013–2015, followed by a slight decrease from 2005–2013 (Fig. 2). In 2015, the reported number of cases reached 6744, with a 266.92% increase rate compared with the lowest number of cases reported in 2013 (1743 cases).

Figure 1
figure 1

Yearly and monthly data of pertussis incidence numbers in mainland China from 2005 to 2015.

Figure 2
figure 2

Results from decomposing the monthly pertussis incidence time series using the Hodrick-Prescott filter model with monthly smoothing parameter λ = 14,400.

In-sample simulating and forecasting

By running the auto.ARIMA code, the seasonal ARIMA (0,1,0)(1,1,1)12 model (AIC = 1341.99 AICc = 1342.2 BIC = 1350.3) was automatically selected as the best performing ARIMA model. According to its Ljung-Box Q test which was performed to assess the fitness of the ARIMA (0,1,0) × (1,1,1)12 model, the residual error sequence was closer to achieve white noise (PBox-Ljung>0.05 at 5, 10, 20, 30 and 40 lags, see Table 1 and Fig. 3); While running the ETS code, ETS (M,N,M) model (AIC = 1675.0, AICc = 1678.6, BIC = 1715.4) was automatically selected as the best performing ETS model, the Ljung-Box Q testing result of the ETS (M,N,M) model showed that the residual error sequence was closer to achieve white noise as well (P Box-Ljung>0.05 at 5, 10, 20, 30 and 40 lags, also see Table 1 and Fig. 4). By running the two best performing models, the in-sample-simulating and forecasting results were given as shown in Fig. 5. The ARCH-LM testing results in Table 2 showed the ARCH effect, which existed in the original series, was minimalized to a great extent in residuals of both the ARIMA (0,1,0) × (1,1,1)12 model and the and the ETS (M,N,M) model. In the step of goodness test of in-sample simulating and forecasting, under the principle of the minimum of RMSE, MAE, MAD or MAPE the ETS (M,N,M), the ETS (M,N,M) model (see Table 3), which was proved to provide more accurate forecasts, was finally chosen as the optimal model and was thus presented for out-of-sample forecasting.

Table 1 Ljung-Box Q tests of two best performing models.
Table 2 Testing results of ARCH-effects about original series and best performing ARIMA and ETS models.
Table 3 Parameter estimation for best performing ARIMA and ETS models for pertussis incidence.
Figure 3
figure 3

Goodness tests of in sample simulating and forecasting of ARIMA (0,1,0)(1,1,1)[12] model.

Figure 4
figure 4

Goodness tests of in sample simulating and forecasting of ETS (M,N,M) model.

Figure 5
figure 5

In-sample simulating and forecasting results of best performing models from each alterative model.

Out-of-sample forecasting

The forecasting results of pertussis incidence from July to December 2016 in mainland China were given by running the optimal ETS (M,N,M) model, Table 4 and Fig. 6 present the out-of-sample forecasting results from July to December 2016.

Table 4 Forecasting incidence cases of pertussis from July to December in 2016.
Figure 6
figure 6

Out-of-sample forecasting results of pertussis incidence from July to December in 2016.

Discussion

As a highly infectious respiratory illness, pertussis was a disease with high incidence rate before vaccination. However, with the development and widespread application of effective pertussis vaccines, dramatic changes happened in the epidemiology of pertussis globally24. In the United States, after the routine use of pertussis vaccine in 1943, the reported cases of pertussis dropped dramatically and the number of cases was below 10, 000, which was the best in more than three decades (1965–2002). In China, pertussis immunization was introduced in the early 1960s, with three doses of whole-cell vaccine combined with diphtheria and tetanus toxoids (DTwP). Since 1982, a booster dose injected at 18–24 months has been added4 and the number of reported cases has dramatically decreased. However, in the recent resurgence of pertussis in many countries, a closer investigation on the aspects of pertussis causing its persistence should be prompted. Even with high vaccination rates, many countries have been reporting increasing pertussis cases in both the developed countries (the U.S, U.K, Netherlands, Portugal and Australia) and the developing countries (Cuba, Brazil, Mexico) in recent years22,25. Therefore, pertussis remains endemic worldwide which is still an important public health problem.

In this epidemiological study, the temporal trend of pertussis incidence in mainland China from 2005–2016 is analyzed according to the data reported by China CDC. Based on the descriptive statistics, it is observed that China has retained a lower pertussis reported cases from 2005 to 2013, which has been below 1 case per 100,000, lower than other countries22. In China, pertussis is always clinically diagnosed by physicians through laboratory methods such as culture and PCR, while serologic analysis is not commonly applied for diagnosis. Therefore, compared to other countries, its reported low incidence may be related to the adopted diagnostic criteria, suggesting substantial underreporting. Even though pertussis remains endemic to China, a sharp rise of reported numbers appeared in 2014 and 2015, which was a relatively high level. Whether an upward trend in incidence will be observed in 2016 is still unclear. Therefore, it is necessary to explore flexible and fractional methods for pertussis forecasting in a short term. In spite of time series models have been widely used in economics, environmental sciences and many other fields (eg. cerebrovascular diseases13, respiratory infections26,27, health care management and so on28,29,30), know little about the flexibility of this model for time series analysis of the incidence of pertussis. Therefore, the requirement of this model is highlighted in this epidemiological study. Besides, an apparent cyclicity and seasonality was observed in the pertussis reporting. According to out-of-sample forecasting, the model forecasts a relatively high incidence cases in 2016, which predicates an increasing risk of ongoing pertussis resurgence in mainland China in the near future, indicating that pertussis never goes away completely31,32.

The seasonality of pertussis has been reported in some other countries as well33,34. Though some scholars have explored the mechanisms of pertussis incidence behind temporal information where the seasonal variations were captured with the autocorrelation analysis, the mechanism of pertussis activity in China remains unclear, which highlights a need to identify the factors for clarifying and explaining the cyclicity and seasonality35,36. In this study, the HP filter method was added to detect the cyclic and seasonal variation pertussis incidence and a clear yearly cyclic pattern and seasonal pattern in the report of pertussis cases was found. During the study period, the peaks of seasonal periodicity occurred annually. For instance, the reported cases remained high in June until September. From January to February of the following year, the incidence was at a low level until the next reporting circle. The results of this study are consistent with previous studies, which showed an obvious cyclic and seasonal trend of the times series.

Globally, many countries are reporting the increase in pertussis cases and the results of this study suggest an increasing risk of ongoing pertussis resurgence in mainland China in the near future, though the pertussis flare-up is unlikely to happen with the developed preventive systems for pertussis. Therefore, it is still important to continually remind that pertussis never goes away completely. In addition, effective efforts for controlling the potential pertussis resurgence should focus on professional recommendations and appropriate public health education or instructions for people with high risk and enhancement of incident monitoring. From the methodological aspect, this study reveals that the selected ETS model is an assessable and flexible tool in forecasting the incidence of pertussis and helping decision makers to provide advanced warning of future cases and further optimize distribution of resources based on the advanced warning of disease incidence.

Additional Information

How to cite this article: Zeng, Q. et al. Time series analysis of temporal trends in the pertussis incidence in Mainland China from 2005 to 2016. Sci. Rep. 6, 32367; doi: 10.1038/srep32367 (2016).