Etiological, epidemiological, and clinical features of acute diarrhea in China

National-based prospective surveillance of all-age patients with acute diarrhea was conducted in China between 2009‒2018. Here we report the etiological, epidemiological, and clinical features of the 152,792 eligible patients enrolled in this analysis. Rotavirus A and norovirus are the two leading viral pathogens detected in the patients, followed by adenovirus and astrovirus. Diarrheagenic Escherichia coli and nontyphoidal Salmonella are the two leading bacterial pathogens, followed by Shigella and Vibrio parahaemolyticus. Patients aged <5 years had higher overall positive rate of viral pathogens, while bacterial pathogens were more common in patients aged 18‒45 years. A joinpoint analysis revealed the age-specific positivity rate and how this varied for individual pathogens. Our findings fill crucial gaps of how the distributions of enteropathogens change across China in patients with diarrhea. This allows enhanced identification of the predominant diarrheal pathogen candidates for diagnosis in clinical practice and more targeted application of prevention and control measures.

The manuscript presents findings from surveillance efforts across 217 hospitals to understand the characteristics of clinically attended diarrhoeal disease in China over 10 years. To my knowledge, analyses of diarrhoeal disease surveillance at this geographical and time scale is unprecedented. Diarrhoeal disease remains a global health concern and this work provides a valuable, thorough description of the enteric pathogens and some of the environmental factors that might influence the diarrhoeal disease burden in China.
The analysis focuses on the proportion of samples testing positive for specific enteric pathogens. It would be interesting to understand how this proportion relates to trends in diarrhoeal disease incidence (i.e. absolute number of cases captured by the surveillance system over time and by region).
The discussion could better describe how findings from this work align or contrast with previous, smaller-scale studies seeking to identify enteric pathogens responsible for the diarrhoeal disease burden in other contexts (eg. GEMS study;Lambisia et al., 2020;…) and whether some of these findings, especially regarding the influence of environmental factors, might be applicable outside China. Clearer recommendations for future research -and diarrhoeal disease surveillance -could also be formulated.
The manuscript would benefit from grammar and spelling checks. I have noted in comments likely confusions between "tested" and "detected" that alter the meaning of the text and tables.
Other minor comments are noted below. Lines 112-114: it is unclear why the denominator here is so small (3,330); is that a yearly average? Lines 114-115: the finding of a significantly longer time between onset and hospital admission for patients infected by a known enteropathogens is not commented in Discussion. Do you have any hypothesis regarding why such a difference was observed?
Lines 118-119, 151-152, 178-179: the likelihood of detecting at least one pathogen depends on the number of pathogens tested for. How was this taken into account given that not all samples appear to have undergone the same tests?
Lines 152, 159: abbreviations should be defined upon first occurrence for clarity.
Line 189: how were viral, bacterial and parasitic "co-infection rates" defined? Infection with two or more viruses, bacteria, parasites?
Line 265: suggest to rephrase -this could be attributed to … Lines 289-290: 20% of all child deaths over which time period?
Line 297: if data available on water supply and sanitation services coverage over the study period, it would be an interesting aspect to consider in the analyses, in addition to the rural/urban categories.
Lines 297-300: better understanding AMR patterns (not only among bacteria) is an important avenue for future research.
Lines 309-313: it is unclear to me what the authors are suggesting here, concretely.
Lines 314-319: do the authors feel that their dataset is sufficient to propose a diagnostic tree to distinguish between bacterial, viral and parasitic infections among Chinese patients with acute diarrhoea? Otherwise, would using findings from their study to refine decision criteria for the selection of laboratory analyses to be performed on clinical samples be a more realistic recommendation?
Line 349: detected or tested?
A summary of the decision criteria used to decide which pathogens to test for in a given sample and whether they might have led to bias in the detection of certain pathogens would be helpful. Reporting the frequency of testing for each pathogen could also address this point.
Supplementary Materials and Methods: -Ref. 4 -the URL doesn't seem to be accessible. How was the rural/urban status defined for a given patient address? Sup. Fig 3: -related to my previous comment, were samples tested for parasites also tested for viruses and bacteria in some cases? If not, why? -The figure appears in contradiction with the statement on page 2 of Supplementary Methods and Materials: "Parasites are not included in the study of pathogen co-infection patterns…" -Bacterium (singular) and bacteria (plural) should be inverted on this figure.
Sup. Table 1: -headings are unclear -what does "detect" mean here? Should it be replaced with "tested", as in Fig 3? Suggest to include information on rural/urban categories in Supplementary Tables 1, 2, 3 if  available. Sup. Table 4: it seems like the line headings like "All viruses detected" should be replaced with "All viruses tested" (same for viruses, parasites).

Reviewer #1 (Remarks to the Author):
The study "Etiological, epidemiological, and clinical features of acute diarrhea in China, 2009China, -2018: an active sentinel surveillance study" is a long-term, comprehensive analysis of the pathogens that cause gastroenteritis in all age groups of the population covering widespread regions in China.
Between 2009 and 2018, a national sentinel surveillance programme enrolled 157 883 patients at 217 sentinel hospitals in 31 provinces of China. A total of 152 792 patient met the inclusion criteria. The participating laboratories screened for seven viruses, 13 bacteria and three parasites. A vast amount of data is presented in the paper, evidenced by the 17 supplementary tables and figures that accompany the paper. The authors detected rotavirus as the predominant pathogen overall in the study (20.4%) followed by norovirus (12.47%), diarrheagenic E. coli (6.71%) and nontyphoidal Salmonella (4.41%), adenovirus (3.33%), with the various other pathogens below 3%. Children under 5 years of age tested positive for viral pathogens more frequently, whereas bacterial pathogens were detected more commonly in adults (18-45 years). Interesting joinpoint regression analyses indicated that the detection rate of rotavirus, norovirus and adenovirus peaked at 2 -3 years of age and that norovirus displayed a second detection peak at 16 years of age. Rotavirus was the leading pathogen in children and norovirus the leading pathogen in adults. Compared to all other pathogens norovirus detection rates remained high across the adult age groups. Overall rotavirus A G9P[8] dominated followed by G3P[8] and G1P [8]. Norovirus GII was predominant, no strain genotype information was provided for norovirus. China was subdivided into seven ecological regions and seasonal patterns were investigated at the sentinel city level with various meteorological and four sociological factors. Virus infections exhibited winter/spring seasonality and bacterial infections summer/autumn seasonality. The seasonal patterns were less pronounced in the subtropical and tropical regions. The proportion of children in a sentinel city affected virus infections, whereas population density significantly affected bacterial infections.
This is a very informative study, the results are presented in a clear and comprehensive manner. Statistical analysis is well described and appropriate. The methodology is described adequately. The paper is well written. There are however some suggested corrections and a few clarifications are required.
[Response] We appreciate the reviewer's summary, positive tone and helpful comments that we believe have helped us better communicate our findings.
Major comments 1) The total number of included participants was 152 792. All of these patients were not tested for each of the 23 gastroenteritis pathogens. This is understandable due to the size of the study and the number of sites involved. However, it should be stated clearly at the start of the results section to avoid confusion. I suggest that Supplementary Figure 3 be moved to the manuscript to provide a clear overview of the number specimens that were tested for each type of pathogen. The authors might expand on the figure to include the numbers tested for each pathogen. The authors should also move the "Results" section in Supplementary data on page 3/24 to the Results section of the main manuscript, to avoid confusion.
[Response] We appreciate the reviewer's valuable comments. As suggested, we have stated the tested number of pathogens clearly at the start of the results section (Page 6, Lines 109-113). We have moved " Supplementary Fig. 3" to main text " Fig. 1" and expanded on the figure to include the numbers tested for each pathogen (Page 30). We have moved the "Supplementary Results" section to the main text "Results" in the revised manuscript (Pages 6-8, Lines 125-161).
2) Another concern is that the authors stated a number of 28,811/85,731 patients who had at least one viral pathogen detected. Since all 85,731 patients were not tested for all the viruses, I do not think this is a very meaningful number, since it could be much higher if all 85,731 specimens were tested. I think it would be more informative if the authors present the numbers of the group of specimens that were tested for all viruses or bacteria, before they present the pathogen specific data.
[Response] We appreciate the reviewer's comments. We agree with the reviewer that the sample groups tested for all viruses or bacteria provided more informative data. As suggested, we have moved the "Supplementary Results" section on Page 3 of the Supplementary Information to the Results section of the main text, ahead of the pathogen specific data in the revised manuscript (Pages 6-8, Lines 125-161).
3) Do the authors have access to data on the length of hospitalization and mortality associated with each admission? In the "Statictical analysis" section on page 8 they state that "outcomes were collected by reviewing medical records". But no mention is made of the number of deaths recorded in the study population. The authors did include data on the time that elapsed between onset of symptoms to hospitalization, but the length of hospital stay and eventual outcome is also of interest. This could be added to Supplementary Table 9, if available.
[Response] We appreciate the reviewer's comments. According to our data, 89,441 patents had a definite outcome recorded, among whom 40 patients (0.04%) died. We have described the outcome data in the revised manuscript (Page 6, Lines 117-118) and supplemented the outcome data in Supplementary Table 1 (Page 7 in the Supplementary Information). We agree that the length of hospitalization would be a useful indicators of disease outcome, but unfortunately, these were not available in this study.
4) The authors should state the proportion of cases from urban and from rural settings in the section that describes the study population in the results.
[Response] We appreciate the reviewer's comments. We have supplemented data on the proportion of cases from urban and from rural settings to the main text (Page 6, Lines 116-117) and Supplementary Page 7, line 152 and 153: DEC and NTS are used here without definition. The abbreviations are defined in the Methods section, but since that is only at the end of the manuscript, it should rather be defined here at first use.
[Response] We appreciate the reviewer's corrections. We have defined all abbreviations at the first time they appear.
Page 10, Line 239: Please rephrase the first sentence in line 239 to improve clarity.
[Response] We appreciate the reviewer's comments. We have rephrased the first sentence into "The seasonal patterns of enteropathogen also differed regarding the geographical locations." (Page 13, Lines 303-304). Page 18 -Please provide the primer/probe sequences that were used to the detection PCRs/RT-PCRs in the supplementary section.
[Response] We appreciate the reviewer's comments. As suggested, we have provided the primer/probe sequences that were used to the detection PCRs/RT-PCRs in the Supplementary Table 13 (Pages 28-33 in the Supplementary Information).

Supplementary appendix
Page 3: At the start of the results paragraphs use "In total" instead of "Totally" [Response] Revised as suggested (Page 6, Line 126; Page 7, Line 138; Page 7, Line 156).
[Response] We appreciate the reviewer's comments. We have rephrased the sentence into "Overall, DEC and NTS were the leading bacterial pathogens that were identified from pediatric patients <18 years old." (Page 7, Lines 145-147).

Supplementary Figure 2:
A -Norovirus is not listed as a virus tested for in the block PCR or RT-PCR…..etc…. B -Carry to the microbiology laboratory….change to…..Transported to the microbiology laboratory….
[Response] We appreciate the reviewer's comments. As suggested, we have listed norovirus as a virus tested for in the block PCR or RT-PCR and modified "Carry to the microbiology laboratory" into "Transported to the microbiology laboratory" in Supplementary Fig. 7 (Page 34 in the Supplementary Information).
Supplementary Table 2: Please provide the time unit for delay. I assume it is days from onset to admission, but it should be stated for clarity.
[Response] We appreciate the reviewer's comments. Yes, the time unit for delay is day, which had been added to the Supplementary Tables 2, 3 and 10 (Page 9; Page 10; Page 21 in the Supplementary Information).
Reviewer #2 (Remarks to the Author): The manuscript presents findings from surveillance efforts across 217 hospitals to understand the characteristics of clinically attended diarrhoeal disease in China over 10 years. To my knowledge, analyses of diarrhoeal disease surveillance at this geographical and time scale is unprecedented. Diarrhoeal disease remains a global health concern and this work provides a valuable, thorough description of the enteric pathogens and some of the environmental factors that might influence the diarrhoeal disease burden in China.
The analysis focuses on the proportion of samples testing positive for specific enteric pathogens. It would be interesting to understand how this proportion relates to trends in diarrhoeal disease incidence (i.e. absolute number of cases captured by the surveillance system over time and by region).
[Response] We appreciate the reviewer's comments. As suggested, we have displayed the absolute number of tested cases and the proportion of at least one pathogen for each of the study year in Supplementary Fig. 1 (Page 8 in the Supplementary  Information), however, we found no relation between them.
We have also displayed the absolute number of tested cases and the proportion of at least one pathogen for each of the study regions in the figure as follows, again, we found no relation between them.
The discussion could better describe how findings from this work align or contrast with previous, smaller-scale studies seeking to identify enteric pathogens responsible for the diarrhoeal disease burden in other contexts (eg. GEMS study; Lambisia et al., 2020; …) and whether some of these findings, especially regarding the influence of environmental factors, might be applicable outside China. Clearer recommendations for future research -and diarrhoeal disease surveillance -could also be formulated.
[Response] We appreciate the reviewer's helpful comments. We have compared the current findings with previous similar studies in the discussion: "In this study, rotavirus was found to be the predominant pathogen overall, also the leading viral pathogen among children, while norovirus was the leading pathogen in adults. This is consistent with the previous findings 10 " (Page 14, Lines 327-330), "We found a slow decrease in the detection of rotavirus across the study years, probably owing to the rotavirus vaccine interventions that had been advocated after the year of 2000 14 . This was in also line with previous results showing a similar decreased rate of rotavirus during post-rotavirus vaccine introduction in Coastal Kenya 15 ." (Page 15, Lines 346-350); We have also discussed the potential application of the current findings outside China. "In comparison with previous similar studies that were performed on limited enteropathogens, within narrow geographic regions, or with small-scale population 30-33 the current finding, especially regarding the influence of environmental factors, might have wider application than China." (Pages 17-18, Lines 420-423 in the revised manuscript)  Pediatr. Int. 57, 590-596 (2015). 33. Sumi, A. et al. Effect of temperature, relative humidity and rainfall on rotavirus infections in Kolkata, India. Epidemiol. Infect. 141, 1652-1661(2013. Recommendations for future research were added to the last paragraph of the discussion "Continuous longitudinal surveillance is encouraged in order to maintain the insights to date and form a baseline for future epidemiological studies on diarrheal pathogens in China." (Page 19, Lines 452-454).
The manuscript would benefit from grammar and spelling checks. I have noted in comments likely confusions between "tested" and "detected" that alter the meaning of the text and tables.
[Response] We appreciate the reviewer's comments. We have used "tested" to represent the sample that had undergone the test assay, while used the detected positive to represent the positive results, which had been modified in the revised manuscript. Lines 112-114: it is unclear why the denominator here is so small (3,330); is that a yearly average? [Response] We appreciate the reviewer's comments. These 3,330 patients referred to those who have been tested for all the 23 gastroenteritis pathogens, including seven viral pathogens, thirteen bacterial pathogens and three parasites. Although 25,239 samples had been tested for both viral and bacterial pathogens, this number had been reduced to 3,330, due to small sample number (N=11,167) that were tested for parasites. We have discussed this limitation in the revised manuscript (Page 18, Lines 429-434).
Lines 114-115: the finding of a significantly longer time between onset and hospital admission for patients infected by a known enteropathogens is not commented in Discussion. Do you have any hypothesis regarding why such a difference was observed?
[Response] We appreciate the reviewer's valuable comments. We cannot confirm the reason underlying this difference for sure, but we indeed found a longer delay between onset and hospital admission among pediatric patients than among the adults (median delay 3 days in the Supplementary Table 2), while the pediatric patients also had a significantly higher positive rate than the other age groups (60.06%, 636/1059, in the Supplementary Table 2). This might lead to an indirect relationship between longer time between onset and hospital admission for patients infected by a known enteropathogens. We have commented this finding in the revised manuscript (Page 18, Lines 424-428).
Lines 118-119, 151-152, 178-179: the likelihood of detecting at least one pathogen depends on the number of pathogens tested for. How was this taken into account given that not all samples appear to have undergone the same tests?
[Response] We appreciate the reviewer's valuable comments. We agree with the reviewer that the number of tested pathogens determined the positive rate to a large extent. To report the rate of detecting at least one pathogen might be misleading under this situation, thus we have removed this rate for virus, bacterium, or parasite in the revised manuscript.
[Response] Done as suggested.
[Response] We appreciate the reviewer's corrections and suggestions. We have rephrased the sentence and split it into two sentences. (Page 9,.
Lines 152, 159: abbreviations should be defined upon first occurrence for clarity.
[Response] We appreciate the reviewer's corrections and suggestions. We have defined all abbreviations the first time they appear.
Line 189: how were viral, bacterial and parasitic "co-infection rates" defined? Infection with two or more viruses, bacteria, parasites?
[Response] We appreciate the reviewer's corrections and suggestions. For viruses, co-infection rates mean co-infection rates among viruses. For bacteria, co-infection rates mean co-infection among bacteria. For parasites, co-infection rates mean coinfection among parasites. We have modified this expression in the results section of the revised manuscript (Page 10, Lines 239-240) and the Fig. 2

legends (Page 31).
Line 265: suggest to rephrase -this could be attributed to … [Response] We appreciate the reviewer's comments. We have rephrased the sentence into "This could be in part a function of the continuous mutation and recombination abilities of norovirus, generating novel strains with high potential of causing outbreak events and sporadic cases" (Page 14, Lines 332-334). Line 297: if data available on water supply and sanitation services coverage over the study period, it would be an interesting aspect to consider in the analyses, in addition to the rural/urban categories.
[Response] We appreciate the reviewer's valuable comments. This is a fantastic idea to relate the water supply and sanitation services coverage to the diarrhea incidence and enteropathogens detection. Unfortunately, these data are inaccessible at this moment, which point should be explored in the future investigation.
Lines 297-300: better understanding AMR patterns (not only among bacteria) is an important avenue for future research.
[Response] We agree with the reviewer that AMR is an important avenue for future research. We have added this comment to the revised manuscript "Better understanding Anti-Microbial Resistance patterns of enteropathogens is an important avenue for future research." (Page 16, Lines 367-369).
Lines 309-313: it is unclear to me what the authors are suggesting here, concretely.
[Response] We have revised the sentence as "The current knowledge on whether the circulation of one pathogen enhances or diminishes the infection incidence of another might lend to an enhanced estimation of the diagnosis or treatment choice. The elaboration of such interactions may have economic implications through public health planning, and the clinical management of diarrhea disease." (Page 16, Lines 378-383).
Lines 314-319: do the authors feel that their dataset is sufficient to propose a diagnostic tree to distinguish between bacterial, viral and parasitic infections among Chinese patients with acute diarrhoea? Otherwise, would using findings from their study to refine decision criteria for the selection of laboratory analyses to be performed on clinical samples be a more realistic recommendation? [Response] We appreciate the reviewer's important comments. Because the number of samples that were tested for parasitic infections were too small to perform differential diagnosis between bacterial, viral and parasitic infection, we only used data from samples tested for both bacterial and viral pathogens (25,239 cases) to distinguish between bacterial and viral infections. The results had been added to the revised the manuscript (Pages 11-12, Lines 260-276): "To attain a differential diagnosis between bacterial and viral infections with acute diarrhea, a binary eXtreme Gradient Boosting model was applied on 5,816 patients with viral single infections and 2,942 patients with bacterial single infections. In total, 11 valid variables regarding demographical and clinical symptoms/syndromes were entered into model, with season, age, mucous stool, vomiting, respiratory symptoms, mushy stool, bloody stool, watery stool, sex, fever, and neurologic symptoms observed. Among them, season was the most important predictor with a relative contribution of 37.46%, followed by age (16.73%). The contributions of mucous stool, vomiting and respiratory symptoms were slightly larger than 5%, and the contribution for each of the others was less than 5%. The area under curve of the model was 0.79 (95% confidence interval: 0.77-0.81), accuracy=74.14%. (Supplementary Fig. 4A and 4B). The mean accuracy of cross validation was 74.52% and the variance was 4.17%. Based on this multivariate logistic regression model, children patients of acute diarrhea with vomiting and respiratory symptoms occurred in cold season were shown to be likely caused by virus infections, while adult with mucous stool or bloody stool occurred in warm season were probably infected with bacteria (Supplementary Table 11)." We also commented this finding in the discussion "These findings may help refine clinician decision criteria for the selection of laboratory analyses to be performed on clinical samples. The degree to which the cost and benefits of such an approach might reduce hospital cost would be a promising area of future research." (Page 16, Lines 387-390).
Line 349: detected or tested? [Response] We appreciate the reviewer's comments. It should be "tested", which has been modified in the revised manuscript (Page 18, Line 430).
A summary of the decision criteria used to decide which pathogens to test for in a given sample and whether they might have led to bias in the detection of certain pathogens would be helpful. Reporting the frequency of testing for each pathogen could also address this point.
[Response] We appreciate the reviewer's valuable comments. There is no decision criteria used to decide which pathogens to test for in a given sample. Instead, the participating hospitals or laboratories had predesigned priority of testing pathogens according to their different test capacity and would test all samples for the predesigned pathogen list. On the other hand, the patients who received different panel of testing pathogens were highly comparable for their demography, thus revealing minor bias caused by this test strategy. Still, this is a limitation of our study, which had been discussed in the revised manuscript (Page 18, Lines 429-434). Also as suggested by the reviewer, we have described these data in the main text (Page 8, Lines 163-168 for viral pathogen; Page 9, Lines 197-205 for bacterial pathogen; Page 10, Lines 228-237 for parasitic pathogens), and provided the frequency of testing for each pathogen in Supplementary Sup. Fig 3: -related to my previous comment, were samples tested for parasites also tested for viruses and bacteria in some cases? If not, why?
[Response] We appreciate the reviewer's comments. Yes, some of the samples that were tested for parasites were also tested for viruses and bacteria. This has led to 3,330 patients who were tested for each of the 23 gastroenteritis pathogens. This is a limitation of our study, which has been discussed in the revised manuscript (Page 18, Lines 429-434).
-The figure appears in contradiction with the statement on page 2 of Supplementary Methods and Materials: "Parasites are not included in the study of pathogen co-infection patterns…" [Response] We appreciate the reviewer's important comments. Co-infection patterns was only explored within parasites, not for parasites-bacteria or parasites-viruses. To clarify this confusion, we have revised this sentence as "Parasites are not studied for their interaction with virus or bacteria due to only small number of samples were tested for parasites." (Page 3 in the Supplementary Information).
-Bacterium (singular) and bacteria (plural) should be inverted on this figure.
[Response] Done as suggested.
Sup. Table 1: -headings are unclear -what does "detect" mean here? Should it be replaced with "tested", as in Fig 3? [Response] We appreciate the reviewer's corrections and suggestions. We have revised "detected" into "tested".
Suggest to include information on rural/urban categories in Supplementary Tables 1, 2, 3 if available.