Predicting trend of early childhood caries in mainland China: a combined meta-analytic and mathematical modelling approach based on epidemiological surveys

Early childhood caries (ECC) is the most common chronic disease in young children. A reliable predictive model for ECC prevalence is needed in China as a decision supportive tool for planning health resources. In this study, we first established the autoregressive integrated moving average (ARIMA) model and grey predictive model (GM) based on the estimated national prevalence of ECC with meta-analysis from the published articles. The pooled data from 1988 to 2010 were used to establish the model, while the data from 2011 to 2013 were used to validate the models. The fitting and prediction accuracy of the two models were evaluated by mean absolute error (MAE) and mean absolute percentage error (MAPE). Then, we forecasted the annual prevalence from 2014 to 2018, which was 55.8%, 53.5%, 54.0%, 52.9%, 51.2% by ARIMA model and 52.8%, 52.0%, 51.2%, 50.4%, 49.6% by GM. The declining trend in ECC prevalence may be attributed to the socioeconomic developments and improved public health service in China. In conclusion, both ARIMA and GM models can be well applied to forecast and analyze the trend of ECC; the fitting and testing errors generated by the ARIMA model were lower than those obtained from GM.

provinces were covered respectively, and children aged 5 were chosen as a representative age group. The surveys focused on levels of caries, periodontal disease, mucosal disease and dental fluorosis. Since then, no national surveys on ECC have been performed in China. So far there is no publication devoted to the prediction of ECC, specifically on the national level.
To do the point prediction, the time series analysis is the most commonly used method in statistic. Autoregressive integrated moving average (ARIMA) model 15 is one of the common means of the time series analysis with a complete theoretical basis, which can provide middle-long term forecast analysis. Comparing to other statistical models, the characteristic of the grey predictive model (GM) is outstanding 16,17 . It only needs a small sample to establish the model and to predict with a certain precision, which is especially applicable for the system with fuzzy structure or imperfect data. Therefore, these two models are applicable in this study for the prediction.
To establish the optimal model to predicate the trend of ECC in mainland China, we pooled data from existing reports with meta-analysis to calculate the national prevalence of ECC from 1988 to 2013. Then, we forecasted ECC prevalence in mainland China from 2014 to 2018 by the established ARIMA and GM models. The result was expected to provide quantitative basis for allocating medical resources to prevent and control ECC.

Materials and Methods
Data sources. Data used for establishing ARIMA and GM models came from the combined results of a meta-analysis, which was conducted according to the preferred reporting items for systematic review and meta-analyses (PRISMA) checklist. This approach has already been published in the previous literature 2 . No ethical statement was necessary because all data were secondary summary data.
Peer-reviewed articles were searched in the following databases from the date of establishment to March, 2016: PubMed, Embase, Chinese Biomedical Literature database (CBM), Chinese National Knowledge Infrastructure database (CNKI), Chinese Wan Fang database, and Chongqing VIP database, using the key terms 'caries' , 'prevalence' , 'epidemiology' , and 'China' . Two authors screened articles and extracted data independently. Any disagreement was resolved by consensus or the third author. A manual search was also applied to the relevant reference lists of all the eligible articles. Studies were included if they were cross-sectional surveys on ECC using random sampling, at city-level or above in mainland China (except for Hong Kong, Taiwan, and Macao). In order to exclude the effect of age structure, 5-year-olds were chosen as a representative age group. Additionally, studies were based on the general population rather than a specific group. The language of studies was limited to English and Chinese.
To reflect the temporal distribution of ECC, prevalence estimates for ECC in 5-year-olds in each survey year (1988-2013) were calculated by pooling the data from each study, with STATA software 11.1 (Stata, College Station, TX, USA). Statistical heterogeneity was detected by Q-test and I 2 -statistics. A random effects model was adopted in the case of significant heterogeneity (I 2 > 50% or P < 0.1). The quality of the selected studies was assessed using the Reporting of Observational Studies in Epidemiology (STROBE) guideline (Table S2) 18 . Potential publication bias was evaluated by funnel plots and Begg's test; P ≤ 0.05 was considered to be significant.
Combined prevalence rates were divided into two parts to compare the fitting and prediction performances: data from 1988 to 2010 were used to construct the models, while data from 2011 to 2013 were used to test the prediction accuracy of the models. Chow breakpoint test 19 was adopted to identify whether there had been a structural change around 2010; the result was regarded to be significant if P ≤ 0.05.

ARIMA model construction.
ARIMA is a traditional method to study the time series data. Since the sequence of ECC prevalence is a time series and generally have a trend, we chose ARIMA (p, d, q) model to fit it. The following parameters were selected when fitting the ARIMA model: p, the order of auto-regression; d, the degree of difference; q, the order of moving average 12,20 .
The sequence of prevalence usually had a trend which was non-stationary, thus augmented dickey-fuller unit root (ADF) test and KPSS test were chosen to test the stationary of the original sequence. If the sequence was non-stationary, differencing was used to transform it to stationary sequence. Under the circumstance, d = 1. And the tests were made on the differenced sequence to identify whether the trend still existed. If "yes", d = 2, and process went on until the sequence was stationary. Generally, when d = 2, the process could stop 21 .
When the differenced sequence was stationary, the variance and covariance of the sequence did not change over time. Then the autocorrelation function (ACF) graph and partial autocorrelation function (PACF) graph were used to identify the order of auto-regression (AR) and the order of moving average (MA) in the ARIMA model 22 . The model was fitted by the least squares method. The t-statistic was used to test the significance of the parameters and the F-statistic was used to test the significance of the equation. In addition, the Akaike Information Criterion (AIC) was certainly considered to be a comprehensive identifier of the parameters, and the R-squared (R 2 ) was an important index for model testing 23 .
At last, the residual series would be analysed by the Ljung-Box Q-test 24 to verify whether it was a white noise time series or not. The white noise series would indicate that the information has been sufficiently extracted, allowing the model to conduct the predictive analysis. Otherwise, the order re-determination and parameter re-estimation were needed. We used the obtained model to forecast the prevalence of ECC from 2011 to 2018. The flow chart to construct ARIMA model was illustrated in Fig. 1.   GM (1,1) construction. If a system is fuzzy in hierarchy relationship, random in dynamic change, and uncertainty in indicator databases, the system is called grey system. The modeling for a grey system is grey model. GM (1,1) is the typical representative for the grey model, which can be used to fit and forecast in the complex system.
Firstly, an accumulative sequence had to be made on the original sequence. Then the new sequence was assumed to be adopted to the differential equation as follows: Finally, solving the equation, the GM(1,1) was constructed 16 . We also used the established grey model to forecast the future prevalence of ECC from 2011 to 2018.
Performance Statistics Index. The ARIMA model was created with EVIEWS 8 with a significant level of P < 0.05; the GM (1,1) was constructed with Matlab 7.0. In order to compare the performance, two statistics indexes were used to evaluate the fitting and prediction accuracy: the mean absolute error (MAE) and the mean absolute percentage error (MAPE). Their calculation formulas were as follows: where y t and ŷ t denote the original and the predicted value respectively at time t. The smaller these two indexes were, the better the fitness and prediction performances.

Results
Results from meta-analyses. Literature search and quality assessment. A total of 11,776 publications were identified, and 78 eligible articles were included in the meta-analysis (Supplementary Fig. S1). The characteristics of the 78 articles were summarized in Supplementary Table S1. Quality assessment showed that all the studies scored at least 7 out of 10. Publication bias was statistically significant (Begg's test, P < 0.001).
Temporal trends in prevalence of ECC. The pooled prevalence of ECC among 5-year-olds was 66.1% (95% CI: 59.0-73.4%, ranging from 81.2% in 1988 to 56.1% in 2013. Figure 2 illustrated the trends in the prevalence at age 5 over time and a decreasing trend in ECC prevalence was observed during the study period. The result of chow breakpoint test indicated that no structural change occurred in 2010 (P > 5%, Fig. 2). In other words, there was no deviation between the fitting data and the forecasting data.
Simulation Results. ARIMA. The prevalence rates fluctuated between 50% and 80% with a downward trend (Fig. 3a). The result of ADF test (P > 0.05) indicated that the sequence was non-stationary. After 1-order differencing was used, the differenced sequence tended stationary (Fig. 3b). The result of ADF test (P < 0.05) also showed that the sequence was stationary. Then, the figures of ACF and PACF were used to identify the parameter p and q (Fig. 4).
The ARIMA (2,1,3) was chosen. The fitted result showed the significance test of regression was 0.000144, which meant the equation was significant. Moreover, the AIC information was about 6.28 and the R-squared was about 0.71 (Fig. 5). These meant that the effect of the model was good.
Thus, the ARIMA (2,1,3) model on the original sequence has been established.
GM (1,1). Firstly, we generated the accumulative sequence based on the original sequence, shown in Table 1. Then, using the method introduced above, the prediction model was as follows: Thus, the GM (1,1) model was established.
Comparison of the results from fitting and prediction. The established ARIMA (2,1,3) model was compared with GM (1,1) from two aspects of fitting and prediction. The comparison in fitting was shown in Table 2. While the average MAE and MAPE of GM (1,1) were 4.81% and 7.34%; the average MAE and MAPE of ARIMA (2,1,3) were 3.63% and 5.74%. Therefore, ARIMA was better than GM in fitting performance. In addition, the two models were used to predict the prevalence from 2011 to 2013, also showing that the ARIMA was better than GM with lower MAE and MAPE ( Table 3). The fitting and prediction curves of two models were compared with actual curve (Fig. 6), indicating that ARIMA (2,1,3) model was more accurate and stable than the GM (1,1).

Discussion
Oral health in children is an important public issue in China and worldwide. Understanding the temporal trend of ECC may facilitate the allocation of oral health resource. As far as we know, this is the first study to forecast the trend of ECC in mainland China based on the data from meta-analysis. It has broadened a new method for the forecasting field, which may provide the base data and theoretical support to establish and evaluate the prevention measures of ECC. The time series data of prevalence of ECC were forecasted by two prediction models, both of which fitted for the data and can be used in forecasting. According to the fitting and prediction accuracy, ARIMA model outperforms the GM (1,1).
This study demonstrates that the prevalence of ECC has declined over the past 30 years and will continue to decrease in the future. The reasons for this declining trend could be the socioeconomic developments and improved public health service in China recently. Since its reform and opening-up, China has experienced rapid socioeconomic changes. The average annual economic growth rate was as high as 9.8%, and per capita gross domestic product increased from 1,112 RMB in 1987 to 38,420 RMB in 2012 25 . Governmental spending on public health care has grown greatly; the number of dentists has increased 13 times from 1985 to 2008 26, 27 ; more and more oral health education programmes have been organized across the country 28 ; parental awareness toward oral health and children' oral habits have also been improved greatly according to the two national surveys 13, 14 . Our observation is consistent with many researches which have reported the inverse relationship between  socioeconomic status and caries prevalence 29,30 . If effective interventions are implemented in the near future, the prevalence of ECC may continue to decrease.
The fourth national oral health survey is in progress (2015-2016), and the related data (ECC prevalence at age 5) will be available in the near future. In this study, we have adopted ARIMA (2,1,3) model and GM (1,1) to  forecast the ECC prevalence in 2014-2017, which is 53.5% and 52.0% in 2015, respectively. Acuracy of this prediction method based on data from meta-analyses will be further authenticated, compared with the results of the national survey. If the predicting result is close to the actual data, we can develop and popularize this method in the future research and application. We have to mention several limitations in this study. First of all, the data used for forecasting were obtained from the pooled results of regional surveys. Error could not be avoided due to publication bias, sampling size and heterogeneity of the included articles 31 . Especially, pooling 1-6 years together included lots of variation and heterogeneities, and choosing year five was the appropriate time marker to demonstrate the epidemic trend for comparisons across studies. For the other age groups, it is not possible to set up the time series sequence due to limited literatures on other ages. Secondly, given the limited information of existing studies, we just conducted time series analysis without considering the risk factors which could affect the occurrence of ECC, such as medical expenditures, GDP, educational levels of the parents, and so on 4,32,33 . Further research can improve the efficacy of these models and provide more clues to explain the variation of the prevalence. Thirdly, there exist obvious economic and population differences among kinds of provinces and cities in China. However, we could not obtain sufficient annual or monthly ECC prevalence from the current publications, to construct the series analysis stratified by geographical and economic differences in different Chinese regions. Fourthly, we have noticed the constrained forecasting to extrapolate in our study, that is, the longer the forecasting duration, the lower the model's     In general, the oral health status of children in China has improved over time. We have developed a new prediction method based on data from meta-analyses. Both ARIMA model and GM can be used in fitting and forecasting the prevalence of ECC in mainland China. More precise prediction models may be needed to explain the variation of the ECC trend. We aim to promote general awareness in the local Chinese governments to establish the epidemiological database of ECC on a regional level. Then, we can take the economic or population difference among kinds of provinces and cities in the further research, if we have sufficient information. Developing and applying these prediction models could make us better understand the epidemiological characteristics of ECC and be helpful to prevent and control this disease.