Identification and Prediction of Tuberculosis in Eastern China: Analyses from 10-year Population-based Notification Data in Zhejiang Province, China

Tuberculosis, a severe infectious disease caused by the Mycobacterium tuberculosis, arouses huge concerns globally. In this study, a total of 331,594 TB cases in Zhejiang Province were notified during the period of 2009–2018 with the gender ratio of male to female 2.16:1. The notified TB incidences demonstrated a continuously declining trend from 75.38/100,000 to 52.25/100,000. Seasonally, the notified TB cases presented as low in January and February closely followed an apparent rise in March and April. Further stratification analysis by both genders demonstrated the double peak phenomenon in the younger population (“15–35”) and the elders (“>55”) of the whole group. Results from the rate difference (RD) analysis showed that the rising TB incidence mainly presented in the young group of “15–20” and elder group of “65–70”, implying that some implementations such as the increased frequency of checkup in specific student groups and strengthening of elder health examination could be explored and integrated into available health policy. Finally, the SARIMA (2,0,2) (0,1,1)12 was determined as the optimal prediction model, which could be used in the further prediction of TB in Zhejiang Province.

decreased TB incidence in this region was reported, the current velocity of reduction may not be adequate for reaching future demands of WHO's End TB strategy and the UN Sustainable Development Goals' target 5 . During the recent decade, Zhejiang Province reported nearly 300,000 notified cases. Thus, the exploration of the potential implications hidden in this information was essential. Based on previous studies, the time series model could be used to carry out short-term predictions effectively, providing useful clues and evidence for the control and prevention of TB in the future 6,7 . Among several time series models, the autoregressive integrated moving average (ARIMA) model, including the seasonal ARIMA model, takes several key variables into account including periodic variables, random factors, and actual fluctuation caused by epidemics 8,9 . The model also has distinct advantages like the requirement of limited data variables and high prediction accuracy 10 . ARIMA model has been widely used in the field of infectious diseases like hemorrhagic fever with renal syndrome (HFRS), hand foot and mouth disease (HFMD), avian influenza, and TB 11-15 . Thus, this study aimed to explore the underlying burden of TB in the past ten years, find the regulation of TB among different genders, discern the target groups, and predict further epidemics in Zhejiang Province. This study might not just contribute to the advancement of further health policy for TB control at the regional level but also provide useful references for TB prevention in China.

Materials and Methods
Study area. Zhejiang Province is in the eastern region of China with a land area of nearly 101,800 square kilometers, accounting for 1.06% of China 7 . As an economically developed province with a GDP of 6 trillion RMB in 2019, it consists of 11 regional cities: Hangzhou, Ningbo, Wenzhou, Jiaxing, Huzhou, Shaoxing, Jinhua, Quzhou, Zhoushan, Taizhou, and Lishui. As the smallest province in China, it has two sub-provincial cities and is composed of nearly 90 counties. In 2018, there was a reported total of 57.37 million permanent people and a migrant population of approximately 26 million in Zhejiang Province, which contributed to the complexity in controlling and preventing TB 4 . The location of the Zhejiang province is shown in Fig. 1.

Data collection. All included data was collected by date of notification from the Web-based TB Information
Management System (TBIMS) in China, which was established in 2005 16 . In this system, all notified TB cases including new and relapse cases were recorded in the designated hospital at the levels of county, city and province then checked by the local Centers for Disease Control and Prevention (CDC) in Zhejiang Province. In this study, the details of TB cases including gender, year, date of notification, reported city, etc. were acquired and analyzed. The residential populations of both genders in Zhejiang Province and other sociodemographic information were obtained from the Chinese Information System for Disease Control and Prevention (CISDCP) and the Zhejiang Statistical Yearbook (free access from official website: http://tjj.zj.gov.cn/col/col1525563/index.html). Permissions of data access in CISDCP and TBIMS were approved by the Zhejiang Provincial Center for Disease Control and Prevention. In this study, some private information, such as patient name, identification number, address, and contact information were excluded. The data was checked and screened by two independents, respectively. Case definition. Notified TB cases included in the TBIMS consisted of laboratory confirmed pulmonary tuberculosis (PTB), clinical diagnostic PTB, and extrapulmonary tuberculosis (EPTB). All TB cases were classified based on the National Diagnostic Criteria for Pulmonary Tuberculosis (WS288-2008, WS196-2001, and WS 288-2017) and Classification of Tuberculosis (WS196-2017) [17][18][19] . The confirmed PTB cases were denoted as people with possible PTB symptoms such as continuous cough for more than two weeks, hemoptysis, night sweat, etc. Stratified analysis by gender and rate difference (RD) before-after five years. Using the registered permanent population of both genders in Zhejiang Province, all included cases were categorized into 18 age groups, each consisting of 5 years. The notified incidences of TB in each group were calculated. We used the RD to examine alterations in notified TB incidence by comparing each group with its corresponding senior group. This was defined as the group five years into the future by age and years. If the RD value was positive, it implied a still rising risk in this age-specific group with the definition of positive RD. Otherwise, the RD value hinted at declined risk, denoting negative RD.
Time-series analysis of ARIMA model. The ARIMA model was first presented by Box & Jenkins in 1970 and consisted of three sections in the order of autoregression (p), the degree of difference (d), and the order of moving average (q) 22 . For seasonal trends, the model presented as ARIMA (p, d, q) × (P, D, Q) s, in which P denoted seasonal autoregression, D as the seasonal differencing degree, and Q as the seasonal moving average. Given the underlying seasonal feature of notified TB, the seasonal model was selected and performed in this study. As is common, the stationarity of data was tested in the first step; if the data was not stationary, the appropriate differencing and/or exponential transformation was conducted to convert the data into a stationary series. In addition, autocorrelation function (ACF) and the partial autocorrelation function (PACF) were used to identify the q and p 23 . Ljung-Box tests were also used to perform the white noise test, and indicators like Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) were adopted to screen the optimal model 24 .
Ethics statement. This research was approved by the Ethics Committee of the Zhejiang Provincial Center for Disease Control and Prevention. All personal information in this study was kept confidential as required.
Statistical analysis. The descriptive analysis and Mann-Kendall test were performed by R software (version 3.5.3) and Microsoft Excel, and the map presentation used the ArcGIS software (version 10.2, SERI Inc.; Redlands, CA, USA). The time series model was determined using R software (TSstudio package). All results were considered statistically significant at P < 0.05 with two sides.

Results
General epidemiological characteristics of TB. From 2009 to 2018, there was a total of 331,594 notified TB cases in the national surveillance system from Zhejiang Province, with a gender ratio of male to female 2.16:1. The number of males in all age groups was more than females. The top 10 ethnic groups were listed and the Han ethnic group accounted for more than 96% of notified TB cases. Also, a declining trend with significance by year was identified in Han, She and Mongolia ethnic groups, respectively (Supplement Table 1, determined by Mann-Kendall test). The sum of peasants and workers accounted for nearly 70% of notified cases in the study period. Additionally, the TB notification incidences showed a declining trend from 75.38/100,000 in 2009 to 52.25/100,000 in 2018. The nadir of notified TB cases was identified in February, and then the number of cases reached a peak in April with a persistent decline in the following months. For the regional distribution, Hangzhou, Ningbo, Wenzhou, and Jinhua had more TB cases than other cities, although recent decades witnessed decreasing case numbers in each prefecture. These are shown in Fig. 2. Furthermore, our results showed that the proportion of relapse cases accounted for nearly 7% of all notification TB cases in the study period (Supplement Table 2).
The stratified analysis of notified TB by age groups in males and females. To find the accurate notified TB incidence in each age-specific group, the permanent population during the study period was used to standardize notified TB incidence. From Tables 1 and 2, the notification rate of different genders in each age group nearly all showed a declining trend, particularly in the groups of "20-50" and ">75". Additionally, all age groups under 15 demonstrated a lower notified TB incidence in both male and female populations. Notified TB incidence rose sharply and reached its first peak within the age brackets of "15-35" in males and females. Following, the rate showed a slight decline and arrived at the second peak after ">55" for both genders. For the second peak, the incidence for males was higher than the female group by nearly 3-4 times. The details are shown in Tables 1 and 2. Rate difference (RD) of notified TB before-after five years among the study population. In this study, all notified TB cases were classified into five-year age groups. We considered age groups in 2014-2018 to be the follow-up of age groups younger by five years in 2009-2013. For these five comparison groups, the total trend of RD in each year (2014-2018) was nearly the same. For males, the positive RD of notified TB incidence focused on the age groups of "5-25" and "50-70", which was similar to females. The age groups of "25-50" and www.nature.com/scientificreports www.nature.com/scientificreports/ ">75" presented negative RD of notified TB incidence while the differences were diminishing in both genders. These results are shown in Fig. 3.

Time-series analysis of notified TB cases.
In this study, we used R software to identify the predicted model of notified TB cases. Given the obvious periodic feature of TB occurrence, the seasonal ARIMA model was constructed in this study. Ultimately, SARIMA (2,0,2) (0,1,1)12 (AIC = 1465.9, BIC = 1484.7) was determined as  www.nature.com/scientificreports www.nature.com/scientificreports/ the optimal one. The further prediction of notified TB cases in 2019 is shown in Fig. 4 and Table 3. Additionally, the estimated parameters for this SARIMA model are presented in Table 4.

Discussion
Although TB is deemed as a preventable and treatable disease, it still causes a considerable global disease burden 25 26 . That is to say, some existing prevention, control strategies, and policies should still be improved. Zhejiang Province, as a developed area in China with a GDP like the Netherlands, recently undertook numerous explorations and endeavors to accelerate the realization of the End TB goal such as improving etiology diagnosis by popularizing Gene-Xpert technology, promoting treatment compliance through the implementation of electronic pillbox, and implementing health insurance reform for TB patients through payment reform, etc. Also, International Cooperation Projects like the Global Fund TB Program and China-Bill Melinda Gate Phase III provided a new horizon to control TB epidemics in local regions. Thus, the identification and exploration of regulation in the past decade was of importance in summarizing the previous practices, identifying existing insufficiencies, and providing advice for optimizing available health policy.
In this study, we included nearly 332,000 notified TB cases in the recent decade. In total, the incidence of notified TB declined by about 30%, which was inseparable with the actions above in Zhejiang Province. However, there was an obvious higher proportion of cases in the male population. According to the annual monitoring report in China, the average proportion of male to female in the whole population was about 2.19:1, which was similar to the proportion in Zhejiang Province. In different countries and regions, the ratio of male to female also demonstrated disparities. Previous TB prevalence surveys demonstrated a ratio of 1.2 in Ethiopia and 4.5 in Vietnam 27,28 . Given the average sex ratio of 1.9:1 around the globe, this implies a higher burden in the male population in China and Zhejiang Province 29 . This appearance has also presented in some low-burden countries 30,31 . Interestingly, one study in Germany demonstrated that the prevalence of latent TB infection had no difference in gender distribution while active TB in males showed an apparent dominance 32 . Thus, further study should be considered in the field of immune responses and inflammatory responses to find its potential mechanism 28 ; meanwhile, more specific public health interventions, community health education and policy support should be considered to lower TB transmission 20 . The Han ethnic group accounted for the majority in Zhejiang Province, and the continuous decline of notified TB cases in this group was consistent with the trend of our findings in the whole population. Similar to the previous findings, the peasants and workers were the commonly vulnerable population for TB occurrence in Zhejiang Province, implying these occupation populations should still be prioritized in the developed area 33 .
The phenomenon of declined notification cases in January and February accompanied with the rapid rise around March and April was attributable to two factors. The first one, the Spring Festival, also known as the Lunar New Year in China, occurred in January or February. More patients seek medical care after this grand festival. Due to the concentration of family gatherings in the spring festival, delayed medical presentation might further aggravate potential TB infection and transmission. Therefore, more health education should be carried out before  this period. Furthermore, health checkups for college enrollment examinations around March and April might also contribute to the increased identification of active TB cases with no/mild symptoms.
In spatial distribution, more notified cases were reported in Hangzhou, Ningbo and Wenzhou city, which were ascribed to their large population in some part, but also associated with the developed economic level that attracted more migrant populations in other regions of Zhejiang Province and outside the province 33 . Therefore, combined with a previous study, more holistic policies such as comprehensive health-care policy should be put www.nature.com/scientificreports www.nature.com/scientificreports/ forward to give full coverage to all people in the community, which could reduce the risk of treatment interruption in patient groups, particularly in migrant groups with low socio-economic conditions 34 .
The stratified analysis was performed in both gender populations. Broadly speaking, the notified TB incidence in males was higher than that in females. For both genders, the low occurrences were concentrated in ages under  Table 3. Predicted Notified TB Cases by SARIMA (2,0,2) (0,1,1)12 and the Actual Notified Number in 2019.
15, which might be attributable to the efficacy of Bacillus Calmette-Guerin (BCG) in protection against childhood and disseminated TB 35 . This finding also proved to some extent that the preventive effect of BCG might cover the first decade of life, which was consistent with previous long-term results 36 . For both genders, age groups from "15-35" and ">55" showed a high notification incidence, especially in the senior age group, while the trend for females was relatively flat. The rising incidence in the "15-35" age group might be correlated with the attenuation of protective efficacy and the increased exposure of environmental mycobacteria that reduced reactivity to BCG [37][38][39][40] . Thus, for the student population during the age period of "15-35", it is suggested that the school infirmary provide additional care to students with TB symptoms, especially for students residing in the same dormitory, which might imply the possibility of TB clustering. For manual workers, workplaces should provide regular physical examinations and necessary promotion and education to enhance early TB findings and reduce clustered epidemics. For the age group of ">55", the increased TB incidence might be attributed to the low immunity in this specific population, particularly in the diabetic population with limited control levels of blood glucose 20 . Our previous study also demonstrated that active case findings could decrease the active TB incidence in some target populations 20 . In the future, more elaborated and comprehensive actions should be formulated and explored to reverse this TB epidemic in Zhejiang Province. RD of notified TB incidence before-after five years were analyzed in our study. To our knowledge, the notification rate of TB in each age group might be influenced by several factors such as internal immune levels and existing supervision (like underdiagnosis and underreporting, etc.) combined with corresponding external health policy (medical insurance and special government subsidies, etc.) 16,41,42 . Thus, we used the RD to offset some internal effects such as the efficacy of BCG and identified the underlying external reasons, providing evidence for strengthening health strategies and advancing health policies. Based on the available results, although the overall notified TB incidence experienced a decline, we still found increased notification of TB among age groups "15-20" and "65-70" in both genders. For the age group of "15-20", the majority was a student population around the period of high school to college and the relative risk ranged from 19.2 to 10.9 from 2014 to 2018 43 . Despite a successive decline of TB risk in this specific group, the rate of descent was insufficient under the existing health policy. Given that the only uniform medical checkup for students occurs during college enrollment, we appealed to have routine checkups conducted annually for students in this age group to enhance TB identification and integrate this strategy into further policies of TB control in Zhejiang Province. For the age group of "65-70", the increased TB notification incidence was attributable to recent efforts in health examination for the elderly, which improved the finding of active TB in this age group and correlated with the further decline in the older age group 20,44,45 . Thus, it is suggested that more comprehensive physical examination involving TB identification such as GeneXpert should be considered in some developed areas with a high TB incidence. Moreover, other age groups demonstrated declining notified TB incidence, implying the effectiveness of current health control and prevention strategies. Yet, the diminishing negative value of RD and absolutely high incidence in these groups might illustrate that novel implementations and strategy combined with the integration of early identification, clinical treatment and community management for TB control should be explored.
Ultimately, the SARIMA (2,0,2) (0,1,1)12 model was chosen and applied to the prediction of notified TB cases in Zhejiang Province. Comparing our predictions with actual notified TB number from the TBIMS system in 2019 demonstrated the accuracy in our model's fit. Previous studies in other regions also used the SARIMA model to give a short-term prediction of TB epidemics with high predictive precision, which was consistent with our findings 46,47 .

Limitations
Some limitations should be listed in this study. Firstly, the data we used was notification records. Due to differences in notification quality amid different regions, some bias in the results might be unavoidable. Due to the paucity of details involving socio-economic parameters, TB latent infection data, and drug-resistance information throughout the study period, we did not analyze these factors in our study, which might not reveal the comprehensiveness of TB occurrence. Besides this, although we had prudently drawn some conclusions, we did not take the potential influence of the migrant population from other provinces into account, which might also have had an effect in our available results. In addition, we tried to explore the possible epidemiological characteristics of TB notification in eastern China while the data from one province might not give a full description. Finally, the model we had chosen was a common one with the possibility of overfitting, and another more suitable integrated model such as the ARIMA-NAR hybrid model was not considered in this study.

Conclusion
In general, the notification of TB incidence in Zhejiang Province was declining in the past decade while the male population and critical months from January to April still need special attention. Some implementations such as the increased frequency of checkups in specific student groups and strengthening of elder health examination could be explored and integrated into available health policy. Ultimately, the SARIMA model can be used to fit trends in TB notification cases well, and can be used in the further prediction of TB in Zhejiang Province.