Hormonally active environmental compounds, so called “endocrine disruptors”, may be among the causes of pubertal disorders1, obesity2, 3, cryptorchidism4, and a variety of other conditions in childhood5. A key aspect of such investigations might be the measurement of endogenous hormones in early life. However, collecting multiple biological samples in well children is usually difficult; most studies reporting hormone levels have few samples, involved hospitalized children, or use pooling or other methods6.

An estimated 25% of infant formula sold in the United States is based on soy protein7. Soy formula is only clearly indicated for children with galactosemia or lactose intolerance, but it has been used in children with a variety of feeding problems for more than 60 years8. Soy infant formula contains plant isoflavones, mostly genistein and daidzein that have been shown to act as estrogens in experimental studies. They might prolong the effect of maternal estrogen, or interfere with hormonal homeostasis in children9, 10. An infant exclusively fed soy formula receives the estrogenic equivalent of between 0.01 and >1 birth control pills per day, depending on the potency estimate used for converting genistein and daidzein to estrogen equivalents11,12,13. By contrast, almost no phytoestrogens have been detected in dairy-based infant formula or in human milk, even when the mother consumes soy products, and endogenous estrogen, while appearing in breast milk, does so at low concentration14, 15. Although taken from dairy cows during pregnancy, the quantity of estrogens in various kinds of milk is too low (usually less than 10 pg/ml) to demonstrate biological activity16.

However, longitudinally collecting data on hormone levels in human infants based on feeding methods is lacking. There is only one long term follow-up study of infants fed soy that followed 811 subjects (85% of the initial study cohort) in their twenties or early thirties who, as infants enrolled in the prospective study, had been given soy formula (120 males and 128 females) or cow milk formula (295 males and 268 females)17. The study conducted interviews over telephone. Those given soy formula did not differ from those given cow milk based formula on their answers to general questions about health and reproduction. However, women who had been fed soy formula as infants reported longer duration of menstrual bleeding (about 8 hours) and greater discomfort with menstruation; they also reported more use of asthma or allergy drugs and a greater tendency for sedentary activities. The study found little or no evidence of excess morbidity among the women given soy as infants, nor did it find large differences in measures plausibly related to reproductive function, such as menstrual cycle length17. The result was criticized, however, because it did not measure hormone levels or reproductive function of individuals directly18.

Although it may not reflect concentrations at the site of action, urine concentrations were usually conveniently collected from infants for hormone data. Therefore, most hormone data from infants are from urine concentrations for the convenience of collection. Here, the intention was to estimate hormone levels of infants fed by different methods and also to investigate the degree to which urinary concentration reflects blood or saliva concentration. The later matrices may represent concentration at the site of action more closely than urine does19.

In a partly cross-sectional, partly longitudinal study to investigate possible hormonal effects for different infant feeding regimens, we collected urine, saliva, and blood from infants of different ages from birth to 1 year, and measured sex hormones, gonadotropins, and SHBG. This study was designed to develop methods, assess feasibility, and give information on the time course of the hormones and the correlation structure from the different sample matrices collected in infants.


The unadjusted geometric mean concentrations of estradiol, estrone, testosterone, LH, FSH and SHBG by matrix and sex are show in Table 1. Generally, statistically significant difference in analytes’ concentrations between feeding regimens was found neither in boys nor in girls, except for lower estradiol level in soy formula-fed boys than in cow formula-fed boys found in both urine and saliva samples (12.60 vs 17.89 pg/mL, p = 0.013 and 13.90 vs 19.66 pg/mL; p = 0.008, respectively), lower LH level in soy formula-fed boys than in cow formula-fed boys found in both urine and saliva samples (0.32 vs 0.55 mIU/mL, p = 0.025 and 0.57 vs 0.85 mIU/mL, p = 0.033, respectively), and higher LH level in soy formula-fed girls than in cow formula-fed girls (0.62 vs 0.36 mIU/mL; p = 0.006) found only in urine samples (Table 1).

Table 1 Geometric mean (95% CI) concentrations of urine, saliva and blood analytes by feeding method and sex.

We found strong correlations between measurements from ELISA and from recycling immunoaffinity chromatography (RIC) for all analytes in all matrices. R 2 values in all 18 correlations (six analytes in three matrices) exceeded 0.85 and 13 of 18 correlations exceeded 0.90. The intra- and inter-assay coefficients of variations (CVs) of all analytes for all matrices are less than 5%. Pair-wise Spearman’s correlation coefficients among urine, saliva and blood samples were generally high (Table 2). All three sex hormones (estradiol, estrone and testosterone) and FSH showed strong correlations among the three matrices (Spearman’s r > 0.9). LH had good correlations between urine and saliva samples (Spearman’s r > 0.8) but only moderate correlations between urine or saliva and blood (Spearman’s r between 0.6 and 0.8). Even SHBG, usually measured in serum, showed a strong correlation between urine and saliva samples (Spearman’s r > 0.9) and moderate correlation between urine or saliva and blood samples (Spearman’s r between 0.5 and 0.8). Because strong correlations existed among the matrices and we had urine samples for all visits, the following analyses were based on urine samples.

Table 2 Correlations between analyte concentrations in urine, saliva and blood samples.

In general, variance among subjects was larger than variance among visits by the same subject, though the relative magnitude of these variance components differed among analytes. For example, for testosterone, the within-subject variance was about 25% of the among-subject variance (Fig. 1) whereas for LH, the corresponding proportion was about 66% (Fig. 2).

Figure 1
figure 1

Urinary testosterone concentration (ng/dL) as a function of age (d). Plotted points are individual sample values, and those connected by line segments represent multiple visits by the same infant. Plotted nonlinear trajectories were fitted using generalized mixed model for individual sample values (2 values < 0.5 were excluded).

Figure 2
figure 2

Urinary LH concentration (mIU/mL) as a function of age (d). Plotted points are individual sample values, and those connected by line segments represent multiple visits by the same infant. Plotted nonlinear trajectories were fitted using generalized mixed model for log transformed individual sample values.

Although we saw some evidence of heterogeneity in temporal trajectories of analytes concentrations from birth to 1 year across feeding regimens, we detected no statistically significant effects of feeding method on any of the analytes. In 12 sex- and analyte-specific linear mixed models that compared the difference in analyte concentrations between feeding regimens adjusting for race, weight, length, and head circumference, 10 had p-value > 0.30 and remaining 2 had p-value > 0.10.

In particular, we failed (p = 0.37) to confirm a previous finding in marmosets of suppressed testosterone in males on soy formula20. We did, however, observe that testosterone trajectories for boys and girls were different both in slopes and intercepts (p = 0.0056 in a 2-degrees-of-freedom test). Although boys had higher testosterone but the slopes appeared flatter with age (Fig. 1). LH appeared lower in breast-milk-fed and soy-formula-fed boys than in cow-formula-fed boys, adjusting for race, weight, length, and head circumference; whereas it appeared higher in breast-milk-fed and soy-fed girls compared to cow-formula-fed girls, however, both the differences were small and not statistically significant (Fig. 2). FSH, Estradiol, estrone and SHBG were not statistically different between either sexes or feeding regimens (data not shown).

Since no statistically significant difference in analyte concentrations was found between feeding regimens, we combined three feeding regimes together. Sex-specific monthly average concentrations of the six analytes tracked the trajectories fitted for the individual measurements well. Most of the analyst increased with age within 1 year except for flat trend was found for SHBG in both sexes and for testosterone in girls (Supplemental Fig. 1). The adjusted monthly average concentrations of the analytes were within typical reference ranges21, 22.


It has become a concern that there might be both immediate and delayed consequences when infants are exposed to hormonally active agents in the environment and in their diet. Andres and Gilchrist et al. investigated the relationship between breast milk, cow-formula or soy-formula feeding and infants’ development and reproductive organ size in a child cohort from 3 months to 5 years, and no feeding effects were found on reproductive organs volumes and developmental status (mental, motor, and language) in the series of studies23,24,25. However, no hormone levels were reported in these studies and feasible methods to study hormones in infants are awaited.

In this study, we found that the feeding methods had no significant effect on any of the analytes. This result is seemingly different from previous finding in male marmosets of suppressed testosterone on soy formula26. Although we did see possible suppression of testosterone in girls fed soy. This result is unexpected and needs further exploration. The reasons may involve but not limited to the following facts. First, marmosets harbor bacteria that convert daidzein to the more potent equol, but human infants do not have those bacteria, and equol was only detected in 35 urine samples. Second, it has been demonstrated that the diets lead to different hormonal responses among different species27. Third, among the many homeostatic negative feedback control mechanisms, the central nervous system has the greatest influence to confine gonadotropin-releasing hormone in the so-called “juvenile pause” status in infants. More specifically, the hypothalamic–pituitary–gonadal axis (HPG axis) plays a critical part in the negative feedback control mechanisms by sensitively responding to the sex hormone levels in the peripheral blood of the infants, so as to maintain the sex hormones and gonadotropins at a rather low level in the urine, saliva and blood, despite of any significant effects from the diets28. Therefore, feeding methods had limited effects on the hormone levels as shown in the present study. Interestingly, the data showed heterogeneity in temporal trajectories of analytes concentrations. Moreover, the variance among subjects was larger than that among visits by the same subject, and the relative magnitude of these variance components differed among analytes. This may be partly explained by the different effects to the HPG axis by genetic factors, environmental changes, physiological statuses and leptin levels, etc 28.

Generally, the adjusted monthly average concentrations of the analytes were in accordance with typical reference ranges21, 22. However, sex-specific testosterone trajectories were noted in the present study. More specifically, boys and girls were different both in slopes and intercepts, and boys had higher testosterone levels and flatter slopes with age. The reasons may include but not limited to the fact that boys have a testosterone surge during the first few months of life, when testosterone level may be as high as that of an adult male29, but the surge disappears later on.

Early life exposure to an exogenous estrogen might have both immediate and delayed consequences. During this period, the infant is thought to be programmed to express male characteristics after puberty, not only in sexual development, but also in setting patterns in the brain characteristic of male behavior30, 31. In monkeys, deficiency of male hormones impairs learning and the ability to perform visual discrimination tasks – such as would be required for reading – and retards the development of spatial perception, which is normally more acute in men than in women32, 33. In a marmoset model, feeding soy formula to infant male monkeys transiently reduced circulating testosterone by half, and juvenile animals still had evidence of Leydig cell hypertrophy26. Girls synthesize estrogen over the first 18 months to two years. An exogenous estrogen might reduce the synthesis of endogenous hormone in either sex, interfering with long term programming34, 35.

For this study, we wanted to characterize a broad array of hormones. To do so, we used a micro-scale system which employed an array of capillary immunoaffinity columns as the isolation step coupled with laser-induced fluorescence detection of the isolated analytes. It is capable of measuring up to 30 different analytes in a 10-μl sample simultaneously. Comparisons in values obtained by this method and conventional high-sensitivity ELISA assays were very similar (R 2 values are in the range of 0.92–0.99)36. We found most concentrations are in the conventional normal range. We found that estradiol range was high, but there are few reference values in the literature for infants less than one year of age, and the identity was confirmed with mass spectrometry. The high precision and low variation of the assays in the study suggest that this new technique for analyzing multiple analytes in a single, small-volume sample may be the only way to perform this kind of study.

Collecting specimens from relatively large numbers of small children requires compromise. Many analytes are secreted intermittently or with a diurnal peak. Twenty-four hour urine collection, indwelling blood sampling devices, and a bed in a metabolic ward would be optimal, but are not practical in the real world. The samples in the study were collected at approximately the same time of day, in the same order, and at least one hour after feeding. We thus could see general trends in concentrations of hormones but may miss any but gross effects on patterns or peaks of synthesis or excretion. We saw strong correlations among the sample matrices, leading us to believe that, while the kinetics may differ among the matrices, most of the information is obtained by analysis of one or two.

Our methods of sample collection, processing, and analysis appeared suitable for use in longitudinal research. Our prior belief was that saliva collection would always be easier than blood collection. We found, however, that in very young children, especially those breastfed, 2 ml of saliva can be difficult to collect without distressing the child, and the effects of oral hygiene might affect saliva hormonal levels in the infants, we therefore think urine samples are the most easily harvested, so they are predominantly used to present full time-course data for testosterone levels in this study. On another hand, modern ultra-sharp lancets and sweet suckers made collection of the small amounts of blood needed by the micro methods quite tolerable. It would seem reasonable to substitute blood collection for saliva collection in young infants if two matrices are needed.

This study was a pilot. Although we analyzed about 800 samples, the effective sample size for inference about feeding method is small. Given that, we did not see a strong uniform depression of endogenous sex hormones and gonadotropins in human infants fed a soy-formula diet. Our data cannot rule out more complex dietary differences that vary with age. Because oral hygiene habits determine the amount and types of bacteria harbored in the oral cavity which may have potential effects on hormone levels in adults37, the effects of oral hygiene on salivary hormonal levels in the infants may need further evaluation in future studies. In addition, socioeconomic status (SES) and maternal demographics might have impacts on infant hormone concentrations given that we know that SES and genetic factors often are related to all health outcomes. They might be major predictors of child hormone concentrations. However, no SES and maternal demographic data were available in our study and we hope the studies in the future may fill the information gap. A potential limitation in terms of studied outcomes is that we did not examine estrogen stimulation of reproductive tissues in girls. This is because Andres and Gilchrist et al. have already provided convincing evidence showing no feeding effect in reproductive organ volumes and developmental status in the serial studies23,24,25, so we didn’t study inappropriate estrogen stimulation of reproductive tissues in girls in current study. However, further research is needed on potential effects for the reproductive tissues that might be related to sex hormones in early life. Moreover, the urinary testosterone data fails to show any evidence for rise in testosterone levels in boys during the neonatal period, with a decline beyond 3–4 months, as well as a male-female difference in testosterone levels during this period. Therefore, urinary testosterone levels may not accurately reflect blood levels during mini-puberty, and further studies are needed to clarify to what extent the urinary testosterone levels can accurately reflect blood levels during this age-specific window.

The possibility that exogenous substances, such as trace amounts of environmental pollutants or dietary components such as isoflavones, could have hormonal effects in humans is known as the “endocrine disrupter” hypothesis. In 1996, the U.S. Congress enacted two pieces of legislation requiring the US Environmental Protection Agency to screen and test chemicals in food (Food Quality Protection Act of 1996) and water (Safe Water Drinking Act Amendments of 1996) for estrogenic and possibly other hormonal activity38 in the hopes of preventing such exposures. In a 1999 consideration of this topic, the National Research Council ranked isoflavone exposure as the highest (in the general population) of all putative endocrine disrupting compounds39. Thus, infants whose diets consist of 100% soy formula are a model group, and any method that fails to find effects in them would be unlikely to detect effects from other agents. We examined the nonlinear time trend of hormone levels by feeding methods (see Figs 1 and 2) and wanted to know if there is a specific threshold/time point where the hormone levels had a significant change. However, no statistically significant threshold was found. The reason might be the relatively small sample size and sparse number of visits (up to four visits for each infant) during the study.

In conclusion, although our study shows no difference in urinary testosterone levels between soy-fed boys and other boys and we cannot resolve whether or not soy feeding might interfere with testosterone levels, possibly due the methodological limitations. We demonstrated that blood, urine, and saliva samples are readily collectible and suitable for some multi-hormone analyses from infants, and the methods may allow direct examination of hypotheses concerning endocrine effects of environmental or dietary compounds in infants. We believe the presented method could allow facilitate powerful and specific investigations of the endocrine disrupters than had been possible before, and could be useful to investigate hormonally related phenomena in small children.


Study Design

We used the data from the Study of Estrogen Activity and Development (SEAD). The detailed description of the SEAD was published elsewhere19. Briefly, SEAD was conducted between 2006 and 2009 at the Children’s Hospital of Philadelphia (CHOP), the Hospital of the University of Pennsylvania (HUP), and affiliated clinics, with the laboratory assays done at Division of Laboratory Sciences of the U.S. Centers for Disease Control of Prevention (CDC) and the Ultramicro Analytical Immunochemistry Resource of the U.S. National Institutes of Health (NIH). The Institutional Review Boards at CHOP, HUP, and the U.S. National Institute of Environmental Health Sciences (NIEHS) approved the study. All methods in the study were performed in accordance with the relevant guidelines and regulations of the aforementioned institutions.

The study recruited children from the nursery at HUP, the clinics at CHOP, and several CHOP satellite clinics. The researchers used flyers, information sessions targeting the clinic staff, and a computer-generated reminder to physicians when they accessed a potentially eligible patient’s record. Children were eligible if they had been born at term (37–41 weeks), with birth weight 2500–4500 g, met one of the feeding regimens and ages (Table 3), and had no major illness or birth defect. Exclusion criteria included chromosomal anomaly, major malformation, or any endocrinopathy (ambiguous genitalia, congenital hypothyroidism, etc.). Families were compensated for meal and travel expenses and given coupons for local food stores.

Table 3 Feeding regimen specifications.

The signed consent forms were obtained from all the participants for the study participation, use of samples and publication. All the data used in the analysis and publication were de-identified and no personal information was disclosed. No information or images that could lead to identification of a study participant were contained in the manuscript.

For feasibility reasons, we did this study mostly cross-sectionally. Although we did not expect to be able to test hypotheses about differences by feeding method, we wanted to include breast-fed children and children fed both soy and cow milk based formulas in order to inform the planning of a longitudinal study. Since the way infants were fed changed over the course of their first year of life, we set feeding regimens that would provide substantial contrast in the feeding histories of the participants without making recruitment too difficult. The definitions of the feeding regimens were described in detail elsewhere19. In brief, infants in the breast milk, cow formula and soy formula groups were exclusively fed by breast milk, cow formula or soy formula within three months of age. Breast-milk-fed and cow-formula-fed infants could have cow formula or breast milk exclusively or together after three months but were not allowed to have had any foods containing soy in their lifetime. For infants in cow formula or soy formula groups, if a baby was breastfed or cow-formula-fed in the nursery, the baby must have gone home on cow formula or soy formula and have been on cow formula or soy formula exclusively ever since. Such a child could not participate until he/she had been fed exclusively cow formula or soy formula for at least 2 weeks. The feeding methods were recorded at the beginning and throughout the study. A given child was allowed to be in the study for up to 4 visits, so long as they met age and feeding requirements. The requirements were described in detail elsewhere19. All decisions regarding infant feeding were made by families in consultation with their own physicians. The study called for 372 total visits: 2 boys and 2 girls in each of 31 ages (<48 hours of age, at weekly intervals from 1 week to 23 weeks of age, then at monthly intervals from 6 months to 12 months) and three feeding regimens.

Sample Collection and Laboratory Methods

The researchers mailed the parents a special gel-free cotton blend diaper which they were to put on the infant. After the overnight, diaper was removed in the morning of the clinic visit. The diaper was checked in the clinic, and if it was wet, it was removed and placed in a 50 cc syringe and compressed. If the diaper was badly soiled or 5 cc of urine could not be collected, then girls were re-diapered and boys were bagged. A saliva sample was collected at least 60 minutes after a feeding. If residual formula/breast milk was present, the child’s mouth was swabbed with a sterile 2 × 2 gauze pad. The saliva collection device was made at the NIH clinical center. It was a vacuum device with a soft tube that was placed on the side of the infant’s mouth or under the tongue40. The researchers collected 2 mL of saliva per child. Because of the difficulty of collecting blood from small children, the study planned to focus on urine and saliva as the primary sample matrices, and attempted to collect both from all children at each visit. For validation purposes, blood samples were from one boy and one girl in each age interval. Capillary blood was collected between 30 and 120 minutes after a morning feeding by a heel stick, and was filled four circles on two Guthrie cards.

In total, urine samples were collected in 381 visits (9 more than planned, because of some inadvertent extra scheduling) from 84 boys and 82 girls aged from birth to 12 months (Supplemental Figure 2, panel A), saliva samples in 359 visits (missing mostly newborns, Supplemental Figure 2, panel B), and blood samples in 88 visits (Supplemental Figure 2, panel C).

All the samples were transported to the CHOP’s General Clinic Research Center (GCRC) and frozen and stored in sterile cryotubes at −70 °C in freezer. For analyzing, the samples were thawed, divided into aliquots, and shipped to the aforementioned laboratories.

Urine samples were analyzed at the Division of Laboratory Sciences of CDC. The automated online solid-phase extraction (SPE) coupled to isotope dilution high-performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS) was used for measuring estradiol, estrone, testosterone, luteinizing hormone (LH), follicle-stimulating hormone (FSH) and SHBG. Briefly, the analytes of interested were enzymatically hydrolyzed using β-glucuronidase/sulfatase (Helix pomatia, H1). After hydrolysis, the analytes were preconcentrated by online SPE, separated by reversed-phase HPLC, and detected by isotope dilution atmospheric pressure chemical ionization-MS/MS. The SPE recoveries were 83–94%, and the coefficients of variation were 4–12%. Details of the method and its validation were reported elsewhere41.

Blood and saliva samples were analyzed at the lab of NIH by recycling immunoaffinity chromatography (RIC) using an array of capillary immunoaffinity columns packed with antibody-coated glass beads. Each column contained a single, specific antibody and isolated its specific analyte, allowing the sample to pass to the next column. In this way, all analytes could be isolated from the same sample during the same run. The specificity of each antibody was immunochemically checked by 2-dimensional Western blotting, against all of the analytes of interest to ensure no cross-reactivity, prior to use. Bound analytes were labeled with laser dye and detected by laser-induced fluorescence using a scanning detector and a fiber-optic spectrometer. The concentrations of each analyte were calculated by comparison with standard curves constructed by running known amounts of each analyte through the array under the same conditions. Additionally, the analytes from each column were collected and subjected to characterization by mass spectrometry to ensure specificity.

To validate the RIC assays against ELISA, the NIH lab made triplicate runs of spiked blood spots, urine samples, and saliva samples for all analytes and calculated R2 values from linear regression analyses using GraphPad 4 software42. To assess reproducibility of the RIC assays, we calculated intra- and inter-assay coefficients of variations (CVs) from data obtained by running the same sample 5 times within the same day and on 5 consecutive days. The method was described in detail elsewhere36.

All samples had all analytes detectable, except for 27 urine LH determinations, 5 urine and 5 saliva FSH determinations.

Data analysis

This was a primarily descriptive study of children at ages when hormone levels, measured in multiple matrices, were changing. Thus the primary analytical approaches involved the correlation structure of the analyte concentrations in different matrices, the graphical display of the analyte trajectories through time, and fitting of appropriate regression models. Hormone concentrations are all continuous variables. We transformed them using natural logarithmic transformation when necessary to achieve symmetric approximately normal distributions of regression residuals, which in turn increased the validity of estimated confidence intervals. Residuals for all hormones were more symmetric after transformation, and we presented the analyses based on the transformed values. Since few samples reported as below the limit of detection (LOD), we used listwise deletion method for missing data43.

In adults, where the concentration of a urine specimen can vary greatly and creatinine production does not vary strongly with age, creatinine correction removes variability due to differences in urine concentration. Children, especially infants, have relatively less ability to concentrate urine. In addition, creatinine production increases with lean body mass whereas urine production increases less steeply and there is thus a non-linear increase in creatinine concentration with age44. Using the standard method for creatinine correction would then produce a negative slope in age for an analyte that was present at a constant concentration over the first year. We have thus chosen to present results from analysis of urine without creatinine correction.

Unadjusted average concentrations of analytes between feeding regimens by sexes were compared using Kruskal-Wallis H test and Bonferroni-adjusted p-values were used for post-hoc multiple comparison45. We further used locally weighted linear regression method to explore the temporal trajectories of the various hormones by sex and feeding method46. For inference, we used linear mixed models to account for possible correlations among hormone levels from multiple visits of the same infant. We fitted liner mixed regression models that included feeding regimen as fixed effect with separate intercepts and a common slope with respect to age, adjusting for race, weight, length, and head circumference47. The models accounted for inter-infant differences in hormone levels via random infant-specific intercepts. Because hormone levels are sex-specific and potential estrogenic effects may be different in boys and girls, data were modeled separately by sex. Monthly average concentrations of the analytes were estimated from the linear mixed models controlled for the race, weight, length, and head circumference of the infants.

The statistical analyses were performed using Stata version 12 and SAS version 9.1348, 49. All tests were two-tailed and a p-value less than 0.05 was considered statistically significant.