Site specific incidence rate of virulence related genes of enteroaggregative Escherichia coli and association with enteric inflammation and growth in children

There is a lack of information highlighting the possible association between strain carrying genes of enteroaggregative Escherichia coli (EAEC) and environmental enteric dysfunction (EED) and on linear growth during childhood. Strain carrying genes of EAEC from stool samples collected from 1705 children enrolled in the MAL-ED birth cohort were detected by TaqMan Array Cards. We measured site-specific incidence rate by using Poisson regression models, identified the risk factors and estimated the associations of strain carrying genes of EAEC with the composite EED score and linear growth at 24 months of age. Overall highest incidence rate (43.3%) was found among children having infection with the aggR gene, which was the greatest in Tanzania (56.7%). Low maternal education, lack of improved floor, and ownership of domestic cattle were found to be risk factors for EAEC infection. In the multivariate models, after adjusting the potential covariates, strain carrying genes of EAEC showed strong positive associations with the EED scores and with poor linear growth at 24 months of age. Our analyses may lay the cornerstone for a prospective epidemiologic investigation for a potential vaccine development aimed at reducing the burden of EAEC infections and combat childhood malnutrition.

Factors associated with strain carrying genes of EAEC. Factors associated with strain carrying genes associated with EAEC across all study sites were identified using Poisson regression ( Table 3) The incidence rate of aaiC and aatA positive strains was higher for the sites of India, Nepal, Peru, and Tanzania, while that of the genomic strain of Aar was the greatest in Tanzania. Consequently, the incidence rate of aatA genomic strain was the lowest in Brazil and that of aggR was the lowest in Nepal, South Africa, Brazil, and Pakistan. For concomitant infection by the positive strains of aaiC and aatA, the incidence rate was greater in India, Nepal, Peru, Tanzania compared to the Bangladesh site (Table 3). Association between the strain carrying genes of EAEC and child growth. Infections with the strain carrying genes of EAEC, namely: Aar, aatA, aggR, and the concomitant presence of aaiC and aatA strains were associated with poor linear growth (difference in 24 months LAZ: length-for-age z score), with a stronger association being observed for all the study sites (Table 4). In Bangladesh, aaiC [− 1.32 difference in 24 months LAZ (95% CI: − 2.48, − 0.16); p < 0.027] and both aaiC and aatA carrying genes [− 1.36 difference in 24 months LAZ (95% CI: − 1.73, − 0.99; p < 0.001] had a negative association with LAZ. In Peru, Aar strain [− 0.96 difference in 24 months LAZ (95% CI: − 1.84, − 0.09); p = 0.030] was negatively associated with LAZ. In South Africa aatA [− 1.02 difference in 24 months LAZ (95% CI: − 2.00, − 0.05); p = 0.040]; in Tanzania, aatA [− 2.04 difference in 24 months LAZ (95% CI: − 3.55, − 0.53); p = 0.009] was negatively associated with LAZ. For the other four countries (South Africa, Brazil, Nepal and India), strain carrying genes associated with EAEC were negatively associated with LAZ but were not statistically significant.

Association between strain carrying genes associated with EAEC and enteric inflammation.
After adjusting for the potential covariates like age, sex, WAMI index (water/sanitation, assets, maternal education, and income); enrollment length-for-age z score; maternal BMI; the number of children in the household, presence of poultry/cattle in the household, seasonality, serum zinc level, AGP (alpha-1-acid glycoprotein), presence of co-pathogens (Campylobacter, LT-ETEC, ST-ETEC, Shigella/EIEC, and Giardia), site for the overall estimate and age as the time variable in GEE model for strain carrying genes of EAEC infection were also clearly and consistently associated with increased EED score (Table 5) across all sites except Pakistan and Brazil. In case of the overall effect, the presence of all the strain carrying genes was associated with the EED score except aaiC. The same findings were observed in India, Nepal and Peru individually except aaiC strain carrying gene. In Pakistan and Brazil, there was no statistically significant relationship found among all the strain carrying genes. In Tanzania, only aaiC gene was non-significantly associated with the EED score. Aar, and aggR, strain carrying genes had a positive relationship with EED score among the children from Bangladesh. In South Africa except for Aar and aaiC, all other genes were significantly associated with EED score (Table 5). In Nepal and Peru both aaiC and aatA strain carrying genes had a statistically significant positive relationship with EED score.

Discussion
To our knowledge, this is the first study investigating the association between different strain carrying genes of EAEC with linear growth and enteric inflammation in children from birth until 2 years of age. Several EAEC strain carrying genes have been used in case-control and epidemiological studies in recent years, with aatA, aap, aagR, astA, and aafA being among the most common for EAEC diagnosis [21][22][23] . We documented an overall high incidence rate of the concomitant presence of aggR gene carrying strains of EAEC across all eight study sites. The different clinical and nutritional outcomes associated with EAEC infection included: poor child growth and development during the study period and changes in the status of intestinal inflammation and provide stern challenges for understanding pathobiology and proposing potential therapeutic approaches for EAEC 8 . In this scenario, EAEC strain heterogeneity is a key contributor to these effects, but very little is known about the individual genomic strains responsible for this 8 .
In our study, the overall incidence rate for the genomic strain of aaiC was the lowest across all the study sites. Previously, an incidence rate of 38% for the aaiC gene among children under 5 years of age was reported in the Global Enteric Multicenter Study (GEMS) 10 . The incidence of aaiC, aggR, aatA, aaR, and the concomitant presence of aaiC and aatA strain carrying genes were higher among the study participants in Tanzania. Similar  15 . Reduced viability of infected G. mellonella larvae with an aggR mutant strain and elevated virulence of atypical EAEC strains were discovered in a study, suggesting that EAEC virulence is linked to the AggR regulon 25 .
The low incidence of the aaiC gene in our study compared to the other strain carrying genes of EAEC may be additional support for the plasmid-mediated gene transfer of the pAA. In several studies, the authors suggested that aggR was not a feasible virulence marker for the diagnosis of EAEC infections since this gene alone was not a considerably sensitive target in comparison to the aatA gene 26,27 . Our study findings regarding aaiC being an incongruent marker for EAEC identification were found to be consistent with a study conducted in southern Mozambique 23 .
The association between strain carrying genes associated with EAEC and malnutrition is not entirely clear, although plausible models for pathogenesis can be proposed. The Aar gene, which has been hypothesized to act directly or indirectly as a virulence suppressor, was present among the children in Tanzania 28 . Similar findings have also been reported in Mali and Brazil and these findings may strongly support a significant role for the Aar gene in the epidemiology of EAEC infection 23 . We also observed that there was poor linear growth among the children from Peru who were infected with the genomic strain of Aar; which may be pointed towards increased pathogenicity of Aar gene.
Consequently, the aggR gene mediates the expression of a large number of other EAEC genes responsible for virulence. The first genes found to be regulated by aggR were those encoding the aggregative adherence fimbriae (AAF) 11 . Our study findings demonstrate that an overall high incidence of the genomic strain of aggR was found across all the sites and the linear growth of children was negatively associated with the presence of aggR gene. However, though the role of aggR of EAEC in child growth remains unknown.
Our study findings also illustrate that the concomitant presence of aaiC and aatA strain carrying genes of EAEC was negatively associated with childhood linear growth except Nepal and Tazania. Other studies suggested that the presence of different virulence genes from within and outside the plasmid AA is necessary for complete EAEC virulence. Such aforementioned finding is in line with the findings of another concurrent study where the diagnosis of EAEC without overt diarrhea is associated with length/height-for-age z score decrements and chronic inflammation in children from Brazil 29,30 .
Our data shows presence of improved floors in the house has a protective effect against infection of EAEC by aaiC genomic strain. We found higher maternal education was protective for aaiC and Aar genomic strain of EAEC. Findings from several other studies conducted in the MAL-ED settings also found an association of these factors with EAEC and Campylobacter infection 31,32 .
The presence of cattle in the household was associated with aaiC, Aar, and aggR strain carrying genes. Still, there is no evidence regarding the association of the presence of an animal in the household with EAEC infection. The transfer of Giardia lamblia genes, any E. coli virulence gene, and unique E. coli virulence genes from animal feces to the hands of the mothers upon animal handling were reported in a study conducted in rural Bangladesh. As a result, domestic animals played a significant role in the spread of enteric pathogens in households 33 .
Previous findings have reported that EAEC detection was associated with higher levels of MPO (a marker for intestinal inflammation), NEO (intestinal inflammation), and AAT (permeability) among all 8 sites of MAL-ED 32 . In our study, the overall concomitant presence of aaiC and aatA strain carrying genes were more strongly associated with increased EED score, implying a higher intestinal inflammation. The relevance of elevated intestinal inflammation associated with virulence-related genes associated with EAEC is not yet clear and there is no evidence that virulence-related genes associated with EAEC were associated with elevated intestinal inflammatory biomarkers and subsequently an increased EED score. Henceforth, our study is the first attempt undertaken www.nature.com/scientificreports/ towards the generation of evidence-based understanding of the contribution of the different virulent genetic strains of EAEC with regards to enteric inflammation and poor child growth in LMIC settings. Despite some potentially promising nature of our study findings, there are several possible limitations associated with our analysis. As an observational cohort study, the causality of the associations between infection with various genomic strains of EAEC and both intestinal inflammation and linear growth cannot be proven but can be hypothesized based on several factors, including the appropriate adjustment of the models for possible confounders, the strength and consistency of the associations, and the biological plausibility. We were unable to establish a temporal relationship between infections and the outcomes, which would require structured longitudinal models. www.nature.com/scientificreports/ In conclusion poor maternal education, lack of improved floor, and ownership of cattle in the household are possible risk factors for EAEC infections by different genomic strains and thereby leading to a compromised linear growth in childhood. The burden of strain carrying genes associated with EAEC was associated with increased enteric inflammation among children in the first 2 years of life. EAEC virulence related genomic strains of aaiC, Aar, aatA, and the concomitant presence of aaiC and aatA genomic strains had a stronger association with growth failure among children of Bangladesh, whereas the association with inflammation was strongest for aggR, strain carrying genes.   Table 5. Association between strain carrying genes of EAEC and EED score across each of the eight study sites (Bangladesh, India, Nepal, Pakistan, South Africa, Tanzania, Brazil, and Peru) from November 2009 to February 2012. Adjusted in GEE model for sex, age, WAMI Index (water/sanitation, assets, maternal education, and income); enrollment length-for-age z score; maternal BMI; the number of children, poultry/ cattle in house, seasonality, serum zinc level, AGP (alpha-1-acid glycoprotein), presence of co-pathogens (Campylobacter, LT-ETEC, ST-ETEC, Shigella/EIEC, and Giardia), site for the overall estimate, and age as the time variable. EED score (AAT = alpha-1-anti-trypsin; MPO = myeloperoxidase; NEO = neopterin) Coef.: Coefficient; CI: Confidence interval The dependent variable was log (MPO, NEO, and AAT); Independent variables: the presence of strain carrying genes associated with EAEC at each month.  Data collection. Anthropometric measurements were done at monthly intervals up to the age of 24 months using standard scales (Seca gmbh & co. kg., Hamburg, Germany). Length-for-age z score (LAZ), weight-forage z score (WAZ), and weight-for-length z score (WLZ) were calculated through the use of the 2006 WHO standards for children 35 . Anthropometric measurements were performed monthly. Details of illness and child feeding practices were collected during twice-weekly household visits 36 . Additionally, household demographics, presence of siblings, maternal characteristics and other data on the child's birth and anthropometry were obtained at enrollment 34 . Beginning at 6 months of age, socioeconomic data were collected every 6 months. The WAMI score (Water, sanitation, hygiene, Asset, Maternal education, and Income index, ranging from 0 to 1) is a socioeconomic status index that includes access to improved water and sanitation, eight selected assets, maternal education, and household income as a representative of the socioeconomic status of the households 37 . A better socioeconomic status is indicated by a higher WAMI score 38 . Improved water and sanitation were defined following World Health Organization guidelines 39 . Treatment of drinking water was defined as filtering, boiling, or adding bleach 40 .

Collection of stool and blood samples.
Non-diarrheal stool samples were collected monthly (at least 3 days before or after a diarrhea episode) from birth to age 2 years and venous/peripheral blood was collected at 7, 15, and 24 months of age 41 . Raw stool aliquots and blood samples were processed at all sites using harmonized protocols and kept at − 80 °C freezers before subsequent laboratory analyzes 42 .
In this study, plasma zinc was assessed as the measure of zinc status at the age of 7, 15, and 24 months. Plasma zinc concentration is a proxy marker and recommended for the assessment of population zinc status, especially for children in low-income countries 43 . Plasma alpha-1-acid glycoprotein (AGP) level was considered as a biomarker for systemic inflammation and was also assessed at 7, 15, and 24 months 44 .

Assessment of enteropathogens by TaqMan array cards (TAC).
Total nucleic acid (both DNA and RNA) was extracted from the fecal samples using the QIAamp Fast DNA Stool Mini kit (Qiagen), following the manufacturer's guidelines. Two external controls, namely: MS2 bacteriophage and Phocine herpesvirus (PhHV) were added to the samples for the confirmation of nucleic acid extraction and amplification efficiency 45 .
For the detection of enteropathogens, a quantitative polymerase chain reaction (qPCR) with the use of a customized TaqMan Array Card (TAC) involving compartmentalized probe-based real-time PCR assays was used for the detection of a possible 29 pathogens from each of the samples 41 . Ct (quantification cycle) value of 35 was set as a threshold for analysis, whereby a Ct > 35 was considered as negative, as mentioned elsewhere 45 . In MAL-ED study, they investigated the occurrence of putative virulence-related genes (VRG) of EAEC, namely: aatA, aggR, Aar, and aaiC. To diagnose EAEC, the genes aatA (dispersin transporter protein), and aaiC (secreted protein) were targeted by PCR. Primers specific for EAEC identification were aaiC and aatA. Samples were considered positive for EAEC if they could detect either one of the two diagnostic genes or both. Only EAEC positive samples were further analyzed by multiplex PCRs to identify 20 EAEC VRG 46 . Moreover, cases positive for concomitant presence of both aaiC and aatA genotypes were further analyzed for co-infection and the list of pathogens in this study, as described elsewhere 42 . In MAL-ED Brazil site they compared the combinations of EAEC VRGs from positive samples presenting both diagnostic genes (aaiC and aatA). Their choice was based on the fact that only the samples presenting both genes were statistically associated with malnourished children 46 .

Assessment of biomarkers of intestinal inflammation. Intestinal inflammation was evaluated by
measuring the levels of the biomarkers: alpha-1-anti-trypsin (Biovendor, Chandler, NC), neopterin (GenWay Biotech, San Diego, CA), and myeloperoxidase (Alpco, Salem, NH) in the stool samples collected from the study participants at the 3, 6, 9, 15, and 24 months of age time points by quantitative ELISA, using manufacturer's guidelines 34 . Statistical analysis. All statistical analyses were performed in STATA 15 (Stata Corporation, College Station, TX). Descriptive statistics such as proportion, mean and standard deviation (SD) for symmetric data, and median with inter-quartile range (IQR) for asymmetric quantitative variables were used to summarize the data. Incidence rates were calculated using Poisson regression where outcome variables were the number of infections of EAEC (different strain carrying genes) and offset variables were a log of number of follow up. The factors associated with strain carrying genes of EAEC in the monthly stool samples were calculated using Poisson regression models. In the final multiple Poisson regression model, the following variables were considered for inclusion using stepwise forward selection: child sex, birth weight, duration of exclusive breastfeeding in months, enrollment weight for age z-score, length for age z score, maternal age in years, maternal education, mother having less than 3 living children, maternal BMI, routine treatment of drinking water, improved sanitation, household ownership of cattle/poultry, and less than 2 people live in per room. We excluded children from the Pakistan Scientific Reports | (2021) 11:23178 | https://doi.org/10.1038/s41598-021-02626-z www.nature.com/scientificreports/ site for growth analysis, owing to bias noted in a subset of this cohort within the study period. Myeloperoxidase (MPO), neopterin (NEO), and alpha-1-antitrypsin (AAT) values were log-transformed before the analysis. At each time point, the composite EED score ranging from 0 to 10 was calculated from the three fecal markers, as described in the previous literature by MAL-ED researchers 20,47 . Categories were assigned values as 0 (low), 1 (medium), or 2 (high). The formula for the composite EED score is as follows 48 : Associations between strain carrying genes of EAEC and composite EED score was estimated using generalized estimating equations (GEE) to fit regression models after adjusting for sex, age, water/sanitation, assets, maternal BMI, and (WAMI) index; enrollment length-for-age and weight-for-age z score, maternal height; poultry/ cattle in house, serum zinc level, inflammatory biomarker AGP (alpha-1-acid glycoprotein), presence of co-pathogens (Campylobacter, LT-ETEC, ST-ETEC, Shigella/EIEC, and Giardia), seasonality, and site for overall estimate and age in the month as time variable 49 . To assess and compare the associations of strain carrying genes of EAEC infection burden on growth at 24 months of age, we used multivariable linear regression after adjusting for the site and the necessary covariates. To detect multicollinearity, the variance inflation factor (VIF) was calculated, and no variable producing a VIF value > 5 was found in the final model. We calculated the strength of association by estimating the coefficient and its 95% CI (confidence interval). A p-value of < 0.05 was considered statistically significant during the multivariable analysis.