Site specific incidence rate of genomic subtypes of enteropathogenic Escherichia coli and association with enteric inflammation and child growth

There is a lack of information highlighting the possible association between the genomic subtypes of enteropathogenic Escherichia coli (EPEC) on environmental enteric dysfunction (EED) and on linear growth during childhood. Genomic subtypes of EPEC from stool samples collected from 1705 children enrolled in the MAL-ED birth cohort were detected by TaqMan Array Cards. We measured site-specific incidence rate by using Poisson regression models, identified the risk factors and estimated the association of genomic subtypes of EPEC with the composite EED score and linear growth at 24 months of age. In general, the highest incidence rate (39%) was found among children having aEPEC infection, which was the greatest in Tanzania (54%). Exclusive breastfeeding and having an improved sanitation facility were found to be protective factors against EPEC infection. In the multivariate models, in overall effect after adjusting for the potential covariates aEPEC showed strong positive associations with the EED scores and tEPEC showed a positive association with poor linear growth at 24 months of age. Our analyses may lay the cornerstone for a prospective epidemiologic investigation for a potential vaccine development aimed at reducing the burden of EPEC infections and combat childhood malnutrition.

Diarrhea is responsible for the death of approximately 900,000 children per year worldwide, with the majority of cases of mortality occurring in developing countries 1 . Host factors such as: immunocompromisation, enteropathy and dysbiosis of the gut microbiome, which are commonly attributing to childhood malnutrition may in turn predispose malnourished young children to greater severity of diarrheal and other enteric diseases and their long-term sequelae 2,3 . Diarrheal diseases in combination with malnutrition have also been reported to hinder the cognitive development of millions of young children worldwide, with persistent infections occurring in their most formative early childhood when critical brain synaptogenesis occurs 4 .
Findings from the Global Enteric Multicenter Study (GEMS) report enteropathogenic Escherichia coli (EPEC) to be the leading contributor of diarrheal mortality among children in low and middle income countries (LMICs) aged less than 12 months 5 . EPEC does not possess the technology for the production of heat-labile or heat-stable enterotoxin and is classified based on a lesion known as the attaching-and-effacing (A/E) phenotype to the EPEC adherence factor (EAF) plasmid due to the eae gene located on the chromosomal pathogenicity island called the locus of enterocyte effacement (LEE locus) and produces the intimin adhesin 6 . EPEC is equipped with the locus of enterocyte effacement (LEE) pathogenicity island, which comprises of 41 open reading frames (ORFs) and codes for a distinct type III secretion system that is involved in the pathogenesis of these organisms 7 .
On the basis of the presence of the EAF plasmid encoding bundle-forming pili (BFP), EPEC can be subgrouped into typical EPEC (tEPEC) and atypical EPEC (aEPEC) 8 . Typical EPEC is usually linked to incidences of gastroenteritis, even severe diarrhea among infants, while atypical EPEC is associated with a wide array of clinical manifestations, ranging from asymptomatic colonization to prolonged diarrhea, based on different settings [8][9][10] . Findings from studies carried out across 13  www.nature.com/scientificreports/ 78% of all EPEC associated diarrheal cases among children aged less than 5 years 11 . Additionally, EPEC may lead to severe nutrient malabsorption, resulting in nutritional consequences and eventual persistence of diarrhea 12 . Results from a cross-sectional case-control study conducted among Brazilian aged between 2 and 36 months reported a higher association of tEPEC infections with clinical severity of diarrhea and undernutrition compared to aEPEC infections 13 . Later on, a recent study conducted in Bangladesh reported higher detection rates of both tEPEC and aEPEC among severely malnourished infants with diarrhea aged less than 6 months, compared to their age-matched well-nourished counterparts, although the detection of tEPEC was greater than the detection of aEPEC in both the severely malnourished and well-nourished groups 14 . Moreover, a reanalysis of the data from the Global Enteric Multicenter Study reported that tEPEC was significantly associated with higher incidences of moderate-to-severe diarrhea (MSD) among children aged 6-11 months and suffering from acute malnutrition 15 .
However, a thorough understanding of the epidemiology of infection by the EPEC subtypes is yet to be established, owing to the lack of proper discrimination in many studies 16 . The main goal of our study was to estimate the site-specific incidence rates of the two genomic subtypes of EPEC (tEPEC and aEPEC) and their possible associations with the composite EED (environmental enteric dysfunction) scores and the consequent growth failure among children at 24 months of age.

Results
General characteristics of the study population and incidence rates of virulence-related genes associated with EPEC. A total of 34,622 monthly stool samples were collected from 1715 participants who completed the follow-up to 24 months. All the stool samples collected over this time from all the participants at the different study sites were assessed for the presence of virulence-related genes associated with EPEC using TaqMan Array Cards (TAC). The general characteristics of the study children are presented in Table 1.
The incidence rates of the two genomic subtypes of EPEC (tEPEC and aEPEC) in the stool samples collected across all the 8 study sites over the 24 months study period have been shown in Fig. 1. The overall incidence rate of the aEPEC was the highest (39%). The incidence of virulence-related genes associated with aEPEC was highest in Tanzania (54%). It was also observed that the overall incidence of tEPEC virulence gene was lowest among all the sites (3%). The incidence of aEPEC infection was higher than the incidence of tEPEC infection, across all the study sites (Supplementary Table 1).

Factors associated with the two genomic subtypes of EPEC. Factors associated with infections by
the two genomic subtypes of EPEC across all study sites were identified using Poisson regression ( Table 2). The incidence rate for infection of EPEC in female children was comparable with male children. Additionally, exclusive breastfeeding status with infections by the aEPEC [IRR:0.99 (95%CI: 0.99-1.00); p < 0.001] and improved sanitation among children infected with tEPEC [IRR: 0.90 (95%CI: 0.81-0.99); p = 0.026] were associated and found to be statistically significant. The incidence rate of aEPEC was higher for the sites of Brazil, Nepal, South Africa, and Peru, while that of the aEPEC was the greatest in Peru. Consequently, the incidence rate of tEPEC infection was the lowest in Brazil and that of aEPEC was the lowest in Pakistan. The incidence rate of tEPEC infection was higher in India and Peru (Table 2). Association between the different virulence genes of EPEC and child growth. Infections with tEPEC were associated with poor linear growth (difference in 24 months LAZ: length-for-age z score), with a stronger association being observed for overall all the study sites (Table 3). In Nepal, infection with tEPEC [− 2.44 difference in 24 months LAZ (95% CI: − 4.36, −0.52); p = 0.013] had a statistically significant negative association with LAZ. In India, aEPEC [− 0.96 difference in 24 months LAZ (95% CI: − 1.87, − 0.05); p = 0.040] was negatively associated with LAZ which was also statistically significant. For the other countries (except Bangladesh) infections with tEPEC were negatively associated with LAZ and for the other three countries (Tanzania, Brazil, South Africa) infections with aEPEC were negatively associated with LAZ but were not statistically significant.

Association between genomic subtypes of EPEC and enteric inflammation.
After adjusting for the potential covariates like age, sex, WAMI index (water/sanitation, assets, maternal education, and income); enrollment length-for-age z score; maternal BMI; the number of children in the household, presence of poultry/ cattle in the household, seasonality, serum zinc level, AGP (alpha-1-acid glycoprotein), presence of co-pathogens (Campylobacter, LT-ETEC, ST-ETEC, Shigella/EIEC, and Giardia), site for the overall estimate and age as the time variable in GEE (generalized estimating equations) model for the genomic subtypes of EPEC, aEPEC was also clearly and consistently associated with increased EED score with the overall [Coef. 0. 15

Discussion
To our knowledge, this is the first study investigating the associations between the two prominent genomic subtypes of EPEC (tEPEC and aEPEC) with enteric inflammation and linear growth in children from birth up until 2 years of age. Several EPEC virulence genes have been used in case-control and epidemiological studies in recent years, with eae and bfpA being among the most common for detection of the prominent subtypes of EPEC 17 .
We found that exclusive breastfeeding and having an improved sanitation facility had a very small protective effect against both tEPEC and aEPEC. Concurrent observation from a study conducted among Tanzanian children have described the same findings, most of the children younger than 6 months were exclusively breastfeed whose stool specimens were negative for EPEC 18  www.nature.com/scientificreports/ Brazil over 46 women who had given birth to normal term babies found that, the human colostrum IgA antibodies reacting to enteropathogenic E. coli antigens and their persistence in the gastrointestinal tract was shown by the strong reactivity to the 94-kDa band in the Western blot analysis. These data confirm the role of colostrum antibodies in protecting the neonate against infections caused by EPEC 19 . Hands, soil, and water could all be key sources of exposure to EPEC. The most important reservoirs of pathogens are the ones that children come in contact with most often. Overall, the sanitation intervention seems to have a limited effect on the presence of EPEC in hands, soil, and water 20 . Such assessments of different factors that may be associated with EPEC infections may in turn enhance in devising new treatment strategies against EPEC infections.
We documented high incidence rates of both tEPEC and aEPEC in Tanzania. The Global Enteric Multicenter Study (GEMS) determined that the diarrheal death of children can largely be attributed to a mere few infectious agents 5 . In particular, diarrhea caused by tEPEC is associated with a 2.6-fold higher hazard of death, the largest reported in the GEMS 5 . In our study, the lowest incidence rate of tEPEC has been found in Brazil. Other studies reported aEPEC to have a significant association with diarrhea in several countries, including Brazil 21-23 , where aEPEC was more prevalent than tEPEC. These observations indicate that aEPEC infections have significant  In diarrheal animal models, positive culture or qPCR results for atypical EPEC (aEPEC) had significantly higher small intestinal and colonic lesion scores than a healthy animal. The increase in colonic lesion scores in animals with diarrhea and aEPEC infections was due to increased amounts of inflammatory infiltrate in the lamina propria 24 . In our study, the presence of aEPEC was more strongly associated with EED scores, implying a higher intestinal inflammation. The relevance of elevated intestinal inflammation associated with this particular genomic subtype of EPEC is not yet clear and no evidence indicates a particular genomic subgroup of EPEC to be associated with elevated intestinal inflammatory biomarkers and subsequently increased EED score. Henceforth, our study is the first attempt undertaken towards the generation of evidence-based knowledge of the contribution of the different genomic subgroups of EPEC with regards to enteric inflammation and poor child growth in LMIC settings.
Our study findings also illustrate that the presence of tEPEC and aEPEC was negatively associated with childhood linear growth in Nepal and India, respectively. However, in Norwegian children a significant association was observed with diarrhea lasting 14 days or more, a finding that may indicate a role for atypical EPEC in prolonged diarrhoeal episode 25 and which might cause chronic malnutrition. When the different genomic subgroups of EPEC adhere to epithelial cells in vitro or in vivo they cause characteristic changes known as Attaching and Effacement (A/E) lesions. Decrease in number and height of microvilli, blunting of borders of enterocytes, loss of the glycocalyx, shortening of villi, and presence of a mucus pseudo membrane coating the mucosal surface were the abnormalities observed in the majority of children 12 . These ultrastructural derangements may be due to a mechanism where the different subgroups of EPEC trigger diarrhea as a response to contamination of food or water, which in turn if persistent may, in turn, result in chronic malnutrition in an early phase of life.
Although antibiotic therapy is not recommended for cases of mild acute diarrhea 26 , studies on the determination of the antibiotic resistance profile of aEPEC are necessary, due to cases of persistent diarrhea associated with aEPEC being reported [25][26][27] . Henceforth, under such clinical manifestations, the administration of antimicrobial therapy may be useful 28 .
Despite some potentially promising nature of our study findings, there are several possible limitations associated with our analysis. As an observational cohort study, the causality of the associations between infection with various genomic subgroups of EPEC and both intestinal inflammation and linear growth cannot be proven but can be hypothesized based on several factors, including the appropriate adjustment of the models for possible confounders, the strength and consistency of the associations, and the biological plausibility. We were unable to establish a temporal relationship between infections and the outcomes, which would require structured longitudinal models. Moreover, our analysis lacked data of the host gut microbiome and host diet, which have been found to be linked with infection by a wide range of enteropathogens.
The present study concludes that exclusive breastfeeding and improved sanitation facility are the possible protective factors for EPEC infections by different virulence genes and thereby leading to a compromised linear growth in childhood. The burden of both tEPEC and aEPEC was associated with increased enteric inflammation among children in the first 2 years of life.   30 . Written informed consent was obtained from the parents or legal guardians of every child. All 8 of the MAL-ED sites contributed their expertise, explained the challenges unique to their site, debated vigorously, compromised, and created 1 set of consensus standard operating procedures (SOPs), which was then implemented at the sites prior to study recruitment. All methods and procedures that have been used in this study were performed in accordance with the relevant guidelines and regulations 31-34 . Data collection. Anthropometric measurements were taken at monthly intervals up to the age of 24 months using standard scales (Seca gmbh & co. kg., Hamburg, Germany). Length-for-age z score (LAZ), weight-forage z score (WAZ), and weight-for-length z score (WLZ) were calculated through the use of the 2006 WHO standards for children 35 . Details of illness and child feeding practices were collected during household visits, conducted twice weekly 32 . Additionally, household demographics, presence of siblings, maternal characteristics, and other data on the child's birth and anthropometry were obtained at enrollment 29 . Beginning at 6 months of age, socioeconomic data were collected every 6 months. The WAMI score (Water, sanitation, hygiene, Asset, Maternal education, and Income index, ranging from 0 to 1) is a socioeconomic status index. It includes access to improved water and sanitation, eight selected assets, maternal education, and household income as a representative of the socioeconomic status of the households 36 . A better socioeconomic status is indicated by a higher Table 3. Genomic subtypes of EPEC and burden on child growth at 24 months across each of the study sites (except Pakistan) from November 2009 to February 2012. Adjusted in the linear regression model for sex, birth weight, exclusive breastfeeding, WAMI Index (water/sanitation, assets, maternal education, and income); enrollment length-for-age z score; maternal BMI; poultry and cattle in house, a mother has less than 3 living children, average serum zinc level, average AGP (alpha-1-acid glycoprotein) level, presence of co-pathogens (Campylobacter, LT-ETEC, ST-ETEC, Shigella/EIEC, and Giardia) and site for overall estimate. Coef. Coefficient, CI confidence interval. www.nature.com/scientificreports/ WAMI score 37 . Improved water and sanitation were defined following World Health Organization guidelines 38 . Treatment of drinking water was defined as filtering, boiling, or adding bleach 39 .

Collection of stool and blood samples.
Non-diarrheal stool samples were collected monthly (at least 3 days before or after a diarrheal episode) from birth to age 2 years and peripheral blood was collected at 7, 15, and 24 months of age 40 . Raw stool aliquots and blood samples were processed at all sites using harmonized protocols and stored at − 80 °C freezers before subsequent laboratory analyzes 31 .
In this study, plasma zinc was assessed as the measure of zinc status at the age of 7, 15, and 24 months. Plasma zinc concentration is a proxy marker and recommended for assessment of population zinc status, especially for children in low-income countries 41 . Plasma alpha-1-acid glycoprotein (AGP) level was considered as a biomarker for systemic inflammation and was also assessed at 7, 15, and 24 months 42 .

Assessment of enteropathogens by TaqMan Array Cards (TAC).
Total nucleic acid (both DNA and RNA) was extracted from the fecal samples using the QIAamp Fast DNA Stool Mini kit (Qiagen), following the manufacturer's guidelines. Two external controls, namely: MS2 bacteriophage and Phocine herpesvirus (PhHV) were added to the samples for the confirmation of nucleic acid extraction and amplification efficiency 43 .
For the detection of enteropathogens, a quantitative polymerase chain reaction (qPCR) with the use of a customized TaqMan Array Card (TAC) involving compartmentalized probe-based real-time PCR assays was used for the detection of a possible 29 pathogens from each of the samples 44 . Ct (quantification cycle) value of 35 was set as a threshold for analysis, whereby a Ct > 35 was considered as negative, as mentioned elsewhere 43 . In our current study, we investigated the occurrence of putative virulence-related genes (VRG) of EPEC, namely: bfpA for tEPEC, and eae for aEPEC.

Assessment of biomarkers of intestinal inflammation. Intestinal inflammation was evaluated by
measuring the levels of the biomarkers: alpha-1-anti-trypsin (Biovendor, Chandler, NC), neopterin (GenWay Biotech, San Diego, CA), and myeloperoxidase (Alpco, Salem, NH) in the stool samples collected from the study participants at the 3, 6, 9, 15, and 24 months of age time points by quantitative ELISA, using manufacturer's guidelines 45 . Statistical analysis. All statistical analyses were performed in STATA 15 (Stata Corporation, College Station, TX). Descriptive statistics such as proportion, mean and standard deviation (SD) for symmetric data, and median with inter-quartile range (IQR) for asymmetric quantitative variables were used to summarize the data. Chi-square and proportion test were used to determine the association between two categorical variables and t-test was used to assess the mean difference between two groups for symmetric distribution. Incidence rates were calculated using Poisson regression where outcome variables were the number of infections of EPEC (different genomic strain) and offset variables were a log of a number of follow-up visits. The factors associated with virulence-related genes associated with EPEC in the monthly stool samples were calculated using Poisson regression models. In the final multiple Poisson regression model, the following variables were considered for inclusion using stepwise forward selection: child sex, birth weight, duration of exclusive breastfeeding in months, enrollment weight for age z-score, length for age z score, maternal age in years, maternal education, mother having less than 3 living children, maternal BMI, routine treatment of drinking water, improved sanitation, household ownership of cattle/poultry, and less than 2 people live in per room. We excluded children from the Pakistan site for growth analysis, owing to bias noted in a subset of this cohort within the study period. Myeloperoxidase (MPO), neopterin (NEO), and alpha-1-antitrypsin (AAT) values were log-transformed before the analysis. At each time point, the composite EED score ranging from 0 to 10 was calculated from the three fecal markers, as described in the previous literature by MAL-ED researchers 46,47 . Categories were assigned values as 0 (low), 1 (medium), or 2 (high). The formula for the composite EED score is as follows 48 : Associations between virulence-related genes linked with EPEC and composite EED score were estimated using generalized estimating equations (GEE) to fit regression models after adjusting for sex, age, water/sanitation, assets, maternal BMI, and (WAMI) index; enrollment length-for-age and weight-for-age z score, maternal height; poultry/cattle in house, serum zinc level, inflammatory biomarker AGP (alpha-1-acid glycoprotein), presence of co-pathogens (Campylobacter, LT-ETEC, ST-ETEC, Shigella/EIEC, and Giardia), seasonality, and site for overall estimate and age in the month as time variable 49 . To assess and compare the associations of virulence-related genes linked with EPEC infection burden on growth at 24 months of age, we used multivariable linear regression after adjusting for the site and the necessary covariates. To detect multicollinearity, the variance inflation factor (VIF) was calculated, and no variable producing a VIF value > 5 was found in the final model. We calculated the strength of association by estimating the coefficient and its 95% CI (confidence interval) during the multivariable analysis. A p value of < 0.05 was considered statistically significant.

Data availability
A publicly available MAL-ED dataset was analyzed in this study. This data can be obtained from here: ClinEpiDB [https:// cline pidb. org/ ce/ app/ record/ datas et/ DS_ 841a9 f5259].