Introduction

Human adult height is a complex trait with high heritability of more than 75%.1, 2, 3 Variation in height has been associated with risk of some common diseases, such as type 2 diabetes,4 osteoporotic fractures,5 cardiovascular diseases,6 prostate cancer,7 as well as disease-specific mortalities in different populations.8 It is thus important to investigate human adult height because this may also provide new insights into the mechanisms of other related diseases.

Human adult height is highly correlated with body growth9, 10 and regulated by multiple genetic factors. Growth hormone (GH) and estrogen are two of the most important hormones regulating body growth. GH is involved in GH/IGF1 (insulin-like growth factor 1) pathway and exerts systemic and local control on hypothalamus–pituitary–growth plate axis influencing longitudinal bone growth.9, 11 Estrogen endocrine system has been shown to have pleiotropic effects on many endocrine pathways such as cell proliferation and differentiation, and skeletal metabolism.10, 12 Furthermore, the biosynthesis or metabolisms of these two hormones are regulated by many factors. For instance, IGF1 gene mediates the growth-promoting effects of GH. CYP19 (cytochrome P450 19) and CYP17 genes, encoding the key enzymes, catalyze the in vivo biosynthesis of active estrogens from their lipid precursors.13, 14 ESR1 and ESR2 encode estrogen receptors α and β. Therefore, IGF1, CYP17, CYP19, ESR1, and ESR2 genes may have effects on human adult height variation. Previous studies have performed height association studies for some of those genes including IGF1, CYP17, CYP19, and ESR1,15, 16, 17, 18, 19, 20, 21 which showed the potential importance of certain genes.15, 16, 22, 23 Among them, ESR1 and CYP1915, 16 were identified to associate with height by our studies. In this study, we focused on three genes that are not intensively studied before, namely IGF1, ESR2 and CYP17 genes. Especially, ESR2 has not been investigated by association studies with human adult height so far. Here we characterized the LD (linkage disequilibrium) patterns and haplotype structures of these three genes using high-density single nucleotide polymorphisms (SNPs) and then tested the associations with adult height variation in 1873 Caucasians from 405 nuclear families.

Materials and methods

Subjects

The study was approved by the Creighton University Institutional Review Board. Signed informed consent documents were obtained from all study participants before they entered the study. In brief, all of the 1873 participants from 405 nuclear families were US Caucasians of European origin and recruited randomly in terms of height variation. People with chronic diseases and conditions that might potentially affect height were excluded as detailed before.24 All height measurements without shoes were made using a standard wall mounted statiometer in the clinic by nurses. The basic characteristics of the study subjects are presented in Table 1. For the 405 nuclear families used in association analyses, the average family size was 4.63±1.78 (mean±SD, standard deviation), ranging from 3 to 12. The overall sample yielded a total of 1512 sib pairs.

Table 1 Characteristics of study subjects genotyped in 405 nuclear families

Genotyping

Genomic DNA was extracted from whole blood, using a commercial isolation kit (Gentra Systems, Minneapolis, MN, USA). All SNPs were identified through searching public databases such as HapMap (http://www.hapmap.org/), dbSNP (http://www.ncbi.nlm.nih.gov/SNP/), JSNP (http://snp.ims.u-tokyo.ac.jp/), HGVbase (http://hgv-base.cgb.ki.se/), SNP Consortium (TSC) (http://snp.cshl.org/), and SNPper (http://snpper.chip.org/bio/snpper-enter), based on the following criteria: (1) validation status, especially in Caucasians, (2) an average density of 1 SNP per 4 kb, (3) degree of heterozygosity, that is, minor allele frequencies (MAFs) >0.05, (4) functional relevance and importance, (5) tagging SNP information (on the basis of data from HapMap), and (6) reported to dbSNP by various sources. A total of 48 SNPs in or around IGF1, ESR2, and CYP17 genes were successfully genotyped using the high-throughput BeadArray SNP genotyping technology of Illumina Inc. (San Diego, CA, USA) and 43 were analyzed subsequently (5 rare SNPs were abandoned because of insufficient power to analyze them in association studies). The average rate of missing genotype data was reported to be 0.05% by Illumina. The average genotyping error rate estimated through blind duplicating was reported to be less than 0.01%. The information of the 43 analyzed SNPs was summarized in Table 2.

Table 2 Information of the studied SNPs

We used PedCheck25 to check Mendelian inheritance errors of SNP genotype data and any inconsistent genotypes were removed. Then the error checking option embedded in Merlin26 was run to identify and discard the genotypes flanking excessive recombinants, thus further reducing genotyping errors. Less than 0.02% of total genotypes was removed due to the violation of any of the above two rules. Allele frequencies for each SNP were calculated using the method of Mendel for family data,27 and the Hardy–Weinberg equilibriums were tested using the Pedstats procedure in Merlin.

LD and haplotype analyses

Our LD and haplotype analyses were based on the 703 unrelated parents (340 men and 363 women) from the 405 nuclear families. Population haplotypes and their frequencies were inferred using Phase v2.1.1.28 We used HaploBlockFinder29 to identify block structures and select haplotype-tagging SNPs (htSNPs). To generate graphical representation of LD structure as measured by D′, we adopted Haploview30 to yield similar haplotype block structures when compared to HaploBlockFinder. To infer haplotypes defined by the htSNPs within each block, we adopted the algorithm of integer linear programming (ILP) implemented in PedPhase V2.0,31 which is based on LD assumption and able to recover phase information at each marker locus with great speed and accuracy even in the presence of 20% missing data.

Statistical analysis of association

In association analyses, significant covariates including age and sex were used to adjust for the height data in the total sample. In sex-specific analyses, age was used as covariate. Normality tests and adjustments were done by MINITAB. The quantitative transmission disequilibrium test (QTDT)32 was used to test the htSNPs and haplotypes with estimated frequencies greater than 5% for associations with height. We adopted the orthogonal model implemented in QTDT for our analyses, which incorporates the variance components method in the analysis of family data and includes exact estimation of P-values. Monte-Carlo permutation test33 were performed 10 000 times to correct for multiple testing of markers and genes tested. The significant threshold was established as 0.0033 for an individual test to achieve a global significance level of 0.05 for our analyses.

Then we performed population-based association analyses by ANOVA for both single-SNP and haplotypes (frequencies greater than 5%) versus height in Minitab software (Minitab Inc., State College, PA, USA). ANOVA tests were repeated in two unrelated samples to validate the results. As we have no additional sample, we extracted the unrelated subjects from the entire sample. For ‘Total’ sample, we selected the parental generation as sample 1 (630 subjects), and then randomly selected one child from each family as sample 2 (400 subjects). For ‘female’ subjects, one daughter from each family was randomly selected to generate sample 1 (326 subjects), then one daughter from the rest members of each family was randomly selected as sample 2 (312 subjects). For ‘male’ sample, as our sample contained more female subjects than male subjects, and many families had no son or only one son, we randomly selected one son from each family (210 subjects) as sample 1, and then fathers (300 subjects) were selected as sample 2.

Bioinformatic analysis

We used Vista program (http://www-gsd.lbl.gov/VISTA/index.shtml) to compare the interesting genomic sequences from human and mouse, which visualize the pairwise percentage identity as calculated for every 100 bp. Potential SNP functions like transcriptional factor binding sites (TFBSs) were queried by using a web-based Bioinformatics tool named FASTSNP (function analysis and selection tool for SNPs, http://fastsnp.ibms.sinica.edu.tw).34

Results

LD and haplotype analyses

Figure 1 shows the LD structures of these three genes. For IGF1 gene, 7 htSNPs (SNP 2, 3, 4, 13, 16, 17, and 19) were selected to represent three blocks containing 18 SNPs with the size of 13, 53, and 17 kb, separately. SNP 15 could not be assigned to any block due to its low LD with the SNPs around. The average density of these 19 SNPs was 4.5 kb/SNP. For ESR2 gene, two blocks with high LD were identified, which contained 17 SNPs with an average density of 1 SNP per 4.2 kb. For each block, three htSNPs (block 1: SNP 1, 4, and 6; block 2: SNP 9, 10, and 14) were inferred to represent common haplotypes. Only one block (htSNPs:SNP 1 and 4) was identified for CYP17 gene with the size of 23 kb, containing seven SNPs. The average density is 1 SNP per 3.3 kb.

Figure 1
figure 1

The linkage disequilibrium (LD) structures of IGF1, ESR2, and CYP17 genes. Squares in black indicate strong LD; *, tag SNPs; numbers in bracket, the length of blocks. For IGF1, SNP 15 had weak LD with some SNPs in Block 2 or Block 3, so it cannot be assigned to any of the blocks.

Association analyses

All of the association results are presented in Table 3 (single-SNP analyses) and Table 4 (haplotype analyses). For IGF1 gene, SNP 4 and 13 were detected as significant to human adult height by QTDT test in total sample with P-values of 0.0097 and 0.0057, respectively. They were still significant in female sample (SNP 4: P=0.034; SNP 13: P=0.0015). However, only marginally significant result was found in SNP 4 in male sample (P=0.081). The results of these two SNPs in ANOVA analyses of both samples 1 and 2 showed consistent and more significant results (Table 3). The contribution of SNP 4/13 to the variation of height was 5.23/6.00 and 3.75/3.83% in female and male samples, respectively. Interestingly, SNP 4 and 13 were only two htSNPs in block 2, and the results of haplotype analyses were consistently significant in ANOVA in all three categories sample (total, female, and male samples), but not in QTDT (Table 4). For ESR2 gene in total sample, SNP 4 and 6 were detected as marginally significant to height variation (P=0.061 and 0.066, separately) by QTDT. When sex-specific analyses were conducted, the significant association with height was only present in women both in QTDT and ANOVA (Table 3). For haplotype analyses of ESR2 gene in female sample, only block 2-hap8 was nominally significant (P<0.05) in QTDT, which obtained the most significant result in ANOVA analyses (3.6 × 10−4/0.006 in sample 1/2). In addition, block 1-hap4 and block 2-hap1 were detected as nominally significant (Table 4). No significant result was found by either single-SNP or haplotype analyses in CYP17 gene using the two statistical methods adopted here (data not shown).

Table 3 Single-SNP analysis for human adult height
Table 4 Haplotype analysis for human adult height

Bioinformatic analysis

Vista program was used to compare the genomic sequences of IGF1 and ESR2 genes from human and mouse. SNP 5, 6, 7, 11, 13, and 14 in block 2 of IGF1 gene and SNP 8 of ESR2 gene were all located in noncoding conserved regions, the detailed results are shown in Figure 2. In addition, according to the FASTSNP program, two SNPs of ESR2 (rs1256061 and rs7154455) were potentially transcription factor binding sites.

Figure 2
figure 2

Conservation of the human and mouse genomic IGF1 and ESR2 gene sequence. The region between the two dashed vertical lines represents the region in one block. The longer black bars on the gene symbol indicated the SNPs we selected in conserved regions. The shorter bars indicated the SNPs we selected in non-conserved regions.

Discussion

IGF1, ESR2 and CYP17 may have potential effects on human adult height variation, by GH/IGF1 pathway and estrogen endocrine system regulating body growth. Thus, we performed association analyses to investigate whether the common variants in these important candidate genes contribute to the variation of adult height in human. Our work showed that about 12.5% genetic variation can be explained by the 16 htSNPs in the three genes analyzed here.

IGF1 gene encodes somatomedin C that is important in mediating the effects of GH. Although several polymorphisms of it have been assessed in the association studies of human adult height, the results were not consistent.17, 35 Voorhoeve et al21 found that there was no statistical significant between wild-type carriers of IGF1 and final height. Frayling et al35 reported that a common allele (Z), a microsatellite polymorphism 1 kb upstream to the IGF1 gene, had no association with adult height. Whereas Rietveld et al17 found a polymorphic CA repeat in IGF1 gene was associated with adult height. In our sample, subjects carrying ‘C’ allele of SNP 4 and ‘A’ allele of SNP 13 were taller than those carrying ‘A’ allele of SNP 4 and ‘G’ allele of SNP 13, separately. Notably, the bioinformatics analyses showed that most IGF1 region in block 2 were highly conserved between the human and mouse species. Interestingly, this region may have biological effects on other traits.36, 37, 38 For example, Johansson et al38 reported that this region was associated with prostate cancer risk. Cheng et al36 revealed that some SNPs (ie rs5742637, rs5742639, rs5742657, and rs2072592) belonging to block 2 of this study were associated with prostate cancer risk. In addition, rs1520220 near the SNP4 in this study was detected to be associated with increased circulating IGF1 level and increased risk of breast cancer.37 For ESR2, although most SNPs we selected were not conserved compared to mouse species, two significant SNPs (rs1256061 and rs7154455) were in the core recognition sequences of potential TFBS and hence may have a role in transcription regulation.

We, for the first time, performed the association study of ESR2 gene on human adult height based on its biological role as an estrogen receptor. Our results showed that ESR2 gene had sex-specific effects on height, as the association was detected only in women. This can be explained from two aspects. First, the association tests for height in men were statistically less powerful than in women because the size of the male sample (n=749) was much smaller than female sample (n=1124). Second, biological differences exist in terms of height growth between men and women. For instance, the physiology of pubertal growth in females is different from that in men as the former begins and ends earlier and has a lower peak velocity. And adult height in women could be more influenced by in vivo estrogens status due to menarche.39

In present study, we used both family-based and population-based methods, which have their own limitations and merits, but can complement each other. Family-based analyses assessed the association through allele transmissions from parent to children, and is robust against population stratification.40 In contrast, population-based test may supply higher power than family-based methods to detect association. However, it may increase the false-positive rate.41 Therefore, the optimal solution is to use both methods in such context. We estimated the power of our study sample by the Program Genetic Power Calculator (http://pngu.mgh.harvard.edu/~purcell/gpc/qtlassoc.html) with a conservative significance level of α=0.001. Assuming that incomplete LD of D=0.9, our sample can reach 90 and 80% power in men and women, respectively, under additive models to detect a QTL responsible for about 4% variation of height.

Although the sample we used came from a study of osteoporosis, the subjects were recruited randomly, and we only excluded the subjects with bone-related diseases or other diseases influencing bone development. The exclusion criteria are consistent with analysis for height, as abnormal stature is a basic characteristic of many bone disorders. Thus, it is reasonable that we can ignore the sample effect on the analysis of height. Another potential limitation in this study is that some of our adult height data may not represent maximal adult height due to the early loss of height caused by osteopenia, vertebral fractures, loss of intervertebral disc's turgor and elasticity, and kyphosis. Adjusting height by age may only partly overcome the differences between adult heights of younger and older members of the families. However, due to the lack of data in those aspects to adjust the effects of environments, this study represents the best we can do under present conditions.

In summary, we identified the significant effects of two important genes, IGF1 and ESR2, on adult height variation in Caucasians, and first suggested the potential sex-specific effect of ESR2 on women height. However, multiple replication studies are needed to confirm our results and identify the most possible functional variants for molecular studies.