Meteorological and social conditions contribute to infectious diarrhea in China

Infectious diarrhea in China showed a significant pattern. Many researchers have tried to reveal the drivers, yet usually only meteorological factors were taken into consideration. Furthermore, the diarrheal data they analyzed were incomplete and the algorithms they exploited were inefficient of adapting realistic relationships. Here, we investigate the impacts of meteorological and social factors on the number of infectious diarrhea cases in China. A machine learning algorithm called the Random Forest is utilized. Our results demonstrate that nearly half of infectious diarrhea occurred among children under 5 years old. Generally speaking, increasing temperature or relative humidity leads to increased cases of infectious diarrhea in China. Nevertheless, people from different age groups or different regions own different sensitivities to meteorological factors. The weight of feces that are harmfully treated could be a possible reason for infectious diarrhea of the elderly as well as children under 5 years old. These findings indicate that infectious diarrhea prevention for children under 5 years old remains a primary task in China. Personalized prevention countermeasures ought to be provided to different age groups and different regions. It is essential to bring the weight of feces that are harmfully treated to the forefront when considering infectious diarrhea prevention.


Scientific Reports
| (2021) 11:23374 | https://doi.org/10.1038/s41598-021-00932-0 www.nature.com/scientificreports/ was highly associated with risk of typhoid fever in Blantyre, Malawi 53 . Septic system densities were associated with infectious diarrhea in central Wisconsin 51 . Local forest coverage may mitigate the effects of extreme heat on cholera outbreaks in Bangladesh 24 . While the impacts of potential factors on diarrheal incidence have been well documented, existing researches generally suffer from three drawbacks. First, those researches mainly focused on only one kind of infectious diarrhea 56 . Second, social factors that might contribute to statistical incidence of infectious diarrhea have not been put on enough attention 24,38,39 . Third, with regard to the fact that the inter-relations among the predominant factors of infectious diarrhea are quite complicated in the real world, traditional statistics methods could be unsuitable for discovering these complex relationships 22,[57][58][59][60] .
In this study, we explore the link between monthly/yearly variability and all sorts of infectious diarrhea during 2005-2017 in China. The potential factors that we take into consideration are meteorological and social factors. The method for data analysis is a well established algorithm in the field of machine learning. The infectious diarrhea cases that have been reported to Chinese Center for Disease Control and Prevention (China CDC), are grouped into three classes: Class A (cholera); Class B (bacillary dysentery, typhoid and paratyphoid); and Class C (other infectious diarrhea). We apply some meteorological factors that are the most commonly used factors 11,12 , including precipitation, wind speed, temperature, vapor pressure, and relative humidity.
Based on existing researches, we apply 4 categories of social factors to explore the linkage between social factors and infectious diarrhea. The factors we choose are the representative ones for each kind. In specific, we choose the quantity of centers for disease control and prevention (CDCP), the quantity of health supervision institute (HSI) and the quantity of health technicians (HT) to represent the resource of public health. As water pollution is usually the main threat to the safety of water 61 , the three major measuring elements of wastewater discharge are utilized to represent the safety of water. Those three elements are the amounts of wastewater discharge (WD), chemical oxygen demand emissions (CODE) and nitrogen output (NO). The weight of feces that are harmfully treated (FHT) is selected to represent the severity of fecal contamination. And the forest coverage rate (FCR) is used to represent the local forest coverage. The data we use in this study are all publicly available. The overall situation of infectious diarrhea in mainland of China is shown in Fig. 1. The first pane in the figure is the monthly observed number of infected cases during 2005-2017. The second pane is the average number of infected diarrhea cases of the whole year, which represents the infectious diarrhea trend with time. The third pane considers the average number of infected cases of the specific month during the observed 13 years, which is applied to represent the seasonal pattern of infectious diarrhea. These monthly averages are then normalized by subtracting the mean of these averages. The fourth pane is the remained data, which are obtained by the observed number minus both the yearly and monthly averaged numbers. The corresponding month of each year labeled in the horizontal ordinate is July. The cases indicated a long-term increasing trend, which is shown in the second pane. In the third pane, a peak from May to August and a nadir from November to February are observed annually.   Fig. 2. From this figure, we observe that the population clustered into three age groups: under 5, 5-10, and over 10 years age. The incidence of infectious diarrhea in the group with age under 5 years old increased mainly in the early autumn and winter, while the incidence in the group with ages over 10 years old increased mainly in summer. In comparison, the incidence of infectious diarrhea in the 5-10 age group stayed quite stable and was less sensitive to season change.

Results
The provincial situations about yearly infectious diarrhea cases are demonstrated in Figs    The influence of social factors with regard to different age groups. The social factors we take into consideration are: the quantity of centers for disease control and prevention (CDCP), the quantity of health supervision institute (HSI), the quantity of health technicians (HT), the amount of wastewater discharge (WD), chemical oxygen demand emissions (CODE), nitrogen output (NO), the weight of feces that are harmfully treated (FHT), and the forest coverage rate (FCR). The yearly data of social factors as well as the number of infectious cases in different age groups are illustrated later in Table 1. The influence of social factors on the number of infectious diarrhea cases in different age groups is shown in Fig. 6 and we leave the detailed data in Supplementary Table 2 of the Supplementary material. Generally speaking, social factors have different influence on infectious diarrhea cases of different age groups in mainland of China. Some age groups have principal influential factors while other age groups do not. For adolescents with age between 10 and 25, HT is the dominant factor. For middle aged people with years between 45 and 60, NO seems to affect those group more. For elderly aged over 75, WD matters the most.
From the perspective of social factor, CDCP, HSI, HT and WD affect the number of infectious diarrhea cases in almost the whole age groups. FCR has more significant impact on ages between 5 and 40 than other age groups. FHT primarily strikes the old and young. CODE has a main contribution in age groups of 45-50 and 55-60. NO has the most clear influence to people between 70 and 75 years old.
Predicting future numbers of infectious diarrhea cases by meteorological and social factors. In this article, we adopt meteorological factors and social factors to predict future number of infectious diarrhea cases and get reasonable results. The monthly data of infectious diarrhea cases in mainland of China during 2005-2015, which were divided by age groups and provinces, are adopted as training data. The rest data of 2016-2017 are adopted as prediction test data.    Table 3 in Supplementary material to obtain the tuned parameters for each age-province combination and Supplementary Table 4 to obtain the exhaustive list of other parameters). After the parameter tuning, we run 10 times of train-test implementation and obtain the mean error and an indication of the variance around that mean (see Supplementary Table 5 in Supplementary material to obtain the detailed data).
We apply a method called nMAE as predicting evaluation, which is a normalized metric, defined by 64 : Here, x is the average value of x, and MAE is the mean absolute error, which is defined by: Approximately, the average numbers of predicted infectious diarrhea cases coincide with those of the real situation in twelve months, which are April to October, December of 2016 and July to September, November of 2017 (shown in Fig. 7). In those twelve months, the difference between predicted data and real data are less than 10%. There are large gaps between predicted data and real data in seven months, which are January, February and November of 2016, and January to March, December of 2017. In those seven months, the difference between the two groups of data are more than 30% (see Supplementary Table 6 in Supplementary material to obtain the detailed data). The best predicting result ( nMAE = 0.088 ) and the worst predicting result ( nMAE = 2.65 ) are shown in Fig. 8). The detailed data about the worst and best predictions are given by Supplementary Tables 7 and 8 respectively in Supplementary material.
The top four provinces with the most infectious diarrhea cases are further observed (shown in Fig. 9). Compared with social factors, the meteorological factors contribute to diarrhea morbidity with greater impacts. The vapor pressure is the crucial feature and mainly affects children with ages between 2 and 9 in Guangdong province. The temperature has a primary influence on middle aged people with ages between 40-65 in Zhejiang province. Meanwhile, the junior are more vulnerable to vapor pressure while the senior are more sensitive to temperature in Anhui and Shandong provinces. Although the social factors have a weaker connection with diarrhea, some factor still has strong influence on some age groups. The quantity of health technicians has considerable effect on people with ages between 25-35 in Guangdong province. Generally speaking, the quantity of

Discussion
Our observations illustrate the vulnerability of children under 5 years old to infectious diarrhea. This group not only accounts for almost half of infectious diarrhea cases, but also is much more sensitive to pernicious exposure. In addition, our findings confirm the distinctive seasonal pattern of infectious diarrhea cases in the mainland of China while demonstrate the association between climate variation and epidemiological morbidity. In specific, we take different age groups and provinces into consideration to illustrate the change of diarrheal morbidity over time. Meteorological and social factors have been applied for prediction of number of diarrhea cases. The effects of respective factors are analyzed in different provinces, respectively. Furthermore, the influence of those factors on different age groups are evaluated, which underscore the potential use of those factors on future diversified public health management.

Infectious diarrhea prevention for children under 5 years old.
Infectious diarrhea is quite common in infancy and early childhood 65 . Children's diarrhea is usually caused by a viral infection 66 and is likely to  Table 6 in Supplementary material to obtain the detailed data). www.nature.com/scientificreports/ damage their growth, electrolyte, and nutrient balance 67 . More seriously, episodes of diarrhea may predispose to pneumonia in undernourished children 68 , which triggers further damage to children's health. Numerous countermeasures have been proven to effectively reduce the high burden of infectious diarrhea in children. This includes rotavirus vaccine 69 , improvement of zinc nutriture 70 , lactobacillus therapy 71 , and dietary counseling 72 .
Our study suggests that raising the number of health technicians and improving the way of treating feces could be effective for reducing the number of children infected with diarrhea, which also agrees with some existing researches. For instance, Stephen et al. 73 found out that participant of field workers at promoting hand washing among neighborhoods reduces the incidence of infectious diarrhea. Jai et al. 74 showed that insecticide spraying may reduce diarrhea in children in a region where there were clear connection with fly numbers and associated diarrhea. Although the authors did not further investigate the origin of flies, it seems reasonable to recognize the relationship between the appearance of flies and the way how the feces are treated.

Meteorological factors and their impacts on infectious diarrhea.
Our observation shows that the total case of infectious diarrhea mainly increase from May to September, when the temperature and the relative humidity of the whole country increase significantly.
When considering climate-related diarrhea among different sub-population, some researches suggest that children and older adults be the most susceptible to diarrhea caused by climatic variation 75 . Both the intensity and frequency of climate changes have significant effects on childhood diarrhea 76 . Furthermore, the climate effect on infectious diarrhea for men, teenagers, and young adults (10-29 years) were higher than those for other populations 77 . Our study replenish those researches. We found that for children under 5 years old, the infectious diarrhea morbidity is apparently distinct among different seasons. And the number of young adults (20-35 years) with infectious diarrhea raises evidently in summer and early autumn, which may be caused by the lifestyle of youths but not confirmed.
Not only the divergent sub-population, but also the divergent sub-locations show a distinctive relevance with the climate-related diarrhea. For instance, a study in Wuhan (the capital of Hubei province) suggests that www.nature.com/scientificreports/ the central districts have high risk compared to other areas 13 . In our study, the various locations are categorized by province. Our finding reveals that coastal provinces in central and eastern China have higher numbers of infectious diarrhea cases, among which Guangdong province has the highest number of infectious diarrhea. The high morbidity of infectious diarrhea in those coastal provinces may be explained by relatively frequent contact of seafoods, but this needs further exploration. The conjunction between infectious diarrhea and meteorological conditions may be explained by pathogens activity. Infectious diarrhea can be easily caused by microorganisms, which are closely dependent on the environmental situation 78 . Among those microorganisms, rotavirus, calicivirus, enteropathogenic and enterotoxigenic E. coli cause more than half of all diarrheal deaths in children under 5 years in the world [79][80][81] . The host, which those microorganisms parasitize, is directly or indirectly influenced by meteorological conditions. Microorganisms vary in the sensitivity to alterations in temperature, humidity, oxygen, light and nutrients [82][83][84] , which leads to different survival and reproduction rates outside of the host. Subsequently, the appropriate temperature triggers expression of the related genes inside a host [85][86][87] , which results in a successful infection of a host. Variations in rainfall and temperature also affect the possibility of fecal contamination 10,87 , which is an immediate cause of infectious diarrhea 48,88 . Heavy rainfalls and floods alter human exposure patterns and lead to population displacement, with a variety of resultant health impacts 87,89,90 .
Since there exists a distinct connection between meteorological conditions and infectious diarrhea, it seems natural that meteorological factors can be applied for infectious diarrhea prediction. As a matter of fact, lots of statistic models have been utilized for infectious diarrhea prediction, such as linear regression model 57,91 , multiple regression model 92 , poisson regression model 21,75,93 and spatial panel regression model 94 . However, traditional statistic models encounter inherent flaws when approximating complicated correlation. In order to fit the inter-relation between meteorological conditions and infectious diarrhea in a better way, we exploit the Random Forest Algorithm in our study. The Random Forest Algorithm is an "ensemble learning" algorithm, which was first proposed in 2001 by Breiman. It was then widely applied in multiple domains and was considered one of the most powerful machine learning algorithms [95][96][97][98] . As a relatively new algorithm, it has been seldom employed in the field of public health. Our study shows the great potential of the Random Forest Algorithm in predicting the case number of infectious diarrhea.

Social factors and their impacts on infectious diarrhea.
The social factors that we investigated are classified into 4 categories. They are resource of public health, safety of water, fecal contamination and local forest coverage. As shown in Fig. 6, the resource of public health and safety of water play more important roles in affecting diarrhea, compared to fecal contamination and local forest coverage.
The resource of public health, which are respectively the quantity of centers for disease control and prevention (CDCP), health supervision institute (HSI) and health technicians (HT), has obvious influence on the whole age groups. The positive influence of those factors may owe to their improvement of sanitation and hygiene, as infectious diarrhea could be easily caused by contaminated food 43,[99][100][101] , bad sanitation environment 54 and poor health situation 45,46 . Those risk factors of infectious diarrhea can be mitigated by health intervene, like a good habit of hand washing 54,102 , appropriate food interventions 47,102 , a relatively rapid diagnose/solution 103 and more medical technicians 41 . Increasing the quantity of CDCP, HSI and HT provide more health education for public and more professional treatments of patients 103,104 . Thus more resource of public health results in more effective diarrhea prevention.
As shown in Fig. 6, compared to other kinds of social factors, the safety of water has considerable impact on incidence of diarrhea, especially to the elderly. The major concern about the water safety is water pollution 61 . Existing researches demonstrate that not only the polluted water for drinking, but also the polluted water for using can bring infectious diarrhea 42,52,53 . The water pollution may trigger diarrhea through pathogens contamination and cause physical function degeneration. A review study illustrates that the most common polluters of water are pathogens (bacteria, viruses and protozoas) 61 . Those pathogens could lead to various kinds of infectious diarrhea. On the other hand, water pollution injury is likely to cause diseases like rheumatic, vascular diseases, myocardial infarct and nervous system damage, especially to the elderly 105 . Those diseases could make the elderly more vulnerable to diarrhea 106 .
The fecal contamination is represented by the weight of feces that are harmfully treated. The fecal contamination may result in diarrhea through pathogens contamination directly or indirectly. For instance, there are studies showing that the fecal contamination is associated with viral and bacterial infection 51,54 . Some researches demonstrate that the fecal contamination lead to water pollution 22,47 , which may further bring about diarrhea infection.
In contrast to other social factors, forest coverage has less impact on infectious diarrhea. The relationship between forest coverage and diarrhea may result from the the influence on environment from forest. For instance, there is research illustrating that forest contributes to diarrhea prevention by improving water quality 55 . Another study shows that heatwaves might promote the occurrence of cholera, but this connection could be modified by forest coverage 24 .
Our study shows that different social factors have different effects on specific age groups. Those findings shed new lights to infectious diarrhea prevention and may promote more flexible and efficient disease control measurement. For instance, our research indicates that the number of health technicians plays a more important role to children under 5 years old, youth with age between 15 and 30, and the elderly over 60 years old. A connection has been observed between the weight of feces that are harmfully treated and monthly numbers of infectious diarrhea cases of the old as well as children under 5 years old. Based on those findings, health technicians should put much more attention to those two age groups when considering infectious diarrhea prevention, which may bring about more accurate disease control as well as rational and effective management of human resource. Meanwhile, the weight of feces that are harmfully treated should be brought to the forefront. www.nature.com/scientificreports/ Figure 9 further demonstrates that the impact of social factors differ in different provinces. For instance, the social factors have a strong connection with diarrhea in Guangdong province than in Shandong province. The regional difference may result from socio-demographic variousness 107 . Although the mechanism remains unclear, it seems significant that geographical locations should be put on more attention when considering infectious diarrhea 108 . Figure 9 also shows that social factors contribute to diarrhea morbidity, but not in a dominant way. This conclusion coincides with some existing researches 13,37 . Furthermore, some study reveals that the impact social factors have on diarrhea remains unstable 107 . Further investigations are needed to explore major mechanisms underlying the association between infectious diarrhea and certain social factors.

Methods
Diarrhea data. In China, infectious diarrhea (excluding cholera, dysentery, typhoid, and paratyphoid) is an intestinal infectious disease with diarrhea and/or vomiting as the main symptom. Monthly aata on infectious diarrhea during 2005-2017 in mainland of China were collected from the Nationwide Notifiable Infectious Diseases Reporting Information System (NIDRIS) 109 . The information included time, age, and date of onset. The data are publicly available.
Meteorological data. Monthly climate data were obtained from the National Meteorological Information Center of China (http:// data. cma. cn/), including precipitation, wind speed, temperature, vapor pressure, and relative humidity. The data are publicly available.

Social factors data.
Monthly social factors data were obtained from the Chinese National Bureau of Statistics (http:// www. stats. gov. cn), containing the quantity of centers for disease control and prevention, the quantity of health supervision institute, the quantity of health technicians, the amounts of wastewater discharge, chemical oxygen demand emissions, nitrogen output, the weight of Feces that are harmfully treated and the forest coverage rate. The data are publicly available.
Data analysis. In our study, the Random Forest Algorithm was applied for data analysis, whose standard procedure is repeated as follows 110 : 1. Draw n tree bootstrap samples from the original data. 2. For each of the bootstrap samples, grow an unpruned classification or regression tree, with the following modification: at each node, rather than choosing the best split among all predictors, randomly sample m try of the predictors and choose the best split from among those variables. (Bagging can be thought of as the special case of random forests obtained when m try = p, the number of predictors.) 3. Predict new data by aggregating the predictions of the n tree trees (i.e., majority votes for classification, average for regression).
An estimate of the error rate can be obtained, based on the training data, by the following: 1. At each bootstrap iteration, predict the data not in the bootstrap sample (what Breiman calls "out-of-bag", or OOB, data) using the tree grown with the bootstrap sample. 2. Aggregate the OOB predictions. (On the average, each data point would be out-of-bag around 36% of the times, so aggregate these predictions.) Calculate the error rate, and call it the OOB estimate of error rate.