Introduction

Recently, migration to Korea has increased with influx of foreign workers and international marriages. The Ministry of Health & Welfare of Korea constructed a cohort study for immigrants. We, as one of leading members of the HUGO Pan-Asian SNP Consortium, reported Southeast Asia as a major geographical source of East Asian populations.1 The genetic diversity was seen to decrease from the south to the north of Asia, because of migration from ancient times.

In the past 10 years, the international marriages between Koreans and Southeast Asians have increased markedly and now comprise 11.1% of all marriages in Korea.2 In addition, the number of infants born to families of different ethnic groups represent about 15% of all newborns in Korea.3 For providing them with proper health care, it is important to understand the factors affecting their health/disease status. It is well known that disease patterns of immigrants are expected to be different from that of Koreans by virtue of adapting to different environment and/or having a different genetic background.4 The growth and development of admixture children born to a family of an international marriage may be strongly affected not only by the mixed gene pools but also by environmental factors such as economic status, cultural behavior and so on. Admixture has significant effect on patterns of various disease and complex trait, including hearing and vision problems, behavior problems, allergies5, 6 and anthropometric variation.7 Among them, height is an important factor in assessing health, and is also an indicator of growth, a component of body mass index and an easily and accurately measurable trait. It is meaningful to investigate the growth of admixed infants and early children.

Growth is a longitudinal process and the fastest gain is observed in the first year of life, followed by a period of slower growth, with another peak in puberty.8 At the end of puberty, growth velocity rapidly decreases as a result of growth plate maturation and fusion to the shaft, resulting in termination of longitudinal growth.9 The growth rate in relative and absolute terms varies substantially between these developmental stages as a result of dynamic interaction between genes and environmental factors. Most genetic studies for height have been conducted in an adult population because, although the heritability of height is dynamic, depending on age, it is stable and high in adults: >0.75 and 0.65 in Caucasians10 and Chinese,11 respectively.

Although height has been considered as a good model for identifying the genetic trait, the specific genes that affect height were not identified before the development of array-based genotyping platforms and the introduction of genome-wide association studies (GWAS).12, 13, 14, 15 GWAS on height have been reported since 2007 and our group also reported genetic variants for adult height in the Korean population and apparent ancestral differences between Koreans and Europeans.16 Recently, 54 genetic variants were known to influence the height and all of them were results of studies conducted on adults.17 However, less than half of the genetic variants associated with adult height had a measurable effect on peak height velocity in infancy or puberty in Caucasians.18 This indicates that the genetic regulation of human height may be different at different stages of growth. Therefore, it is necessary to systematically identify genes that affect height during infancy/early childhood in other ethnic groups.

The aim of this study was to explore the genetic variants of length/height during infancy/early childhood in admixed nuclear families at a genome-wide scale. In this study, an initial GWAS was conducted on 165 Vietnamese-Korean families of the International Marriage-based Immigrant Cohort in Korea, using Illumina 370K single-nucleotide polymorphism (SNP) arrays to examine the association of SNPs with length/height during infancy/early childhood.

Materials and methods

Study samples

Enrolment into the International Marriage-based Immigrant Cohort, which was supported by Korea Centers for Disease Control and Prevention, began in 2006 with recruitment of 741 immigrant women and their families (399 Southeast Asian women, 206 spouses and 136 offspring) and the total number of participants was 4779 by 2009. This cohort is an ongoing, prospective, epidemiological study. The primary purpose of the study was to follow up individuals over time to identify the genetic and environment factors affecting their health and diseases. The subjects in this report included 173 families consisting of a Korean father, a Vietnamese mother and offspring (total number of 550 individuals). The family size ranged from 3 to 5, with a mean of 3.16. Married female immigrants from Vietnam who had been in Korea for more than 6 months were eligible for the study. All offspring met the following criteria: (1) a birth weight not exceeding 2 s.d. of mean for the Korean population;19 (2) gestational age between 39 and 41 weeks, determined by reference to the last menstrual period; (3) healthy gestation (mothers were considered healthy if they had no known diabetes, hypertension, medical condition, smoking or addiction associated with impaired fetal growth); (4) availability of phenotype data and DNA for both parents. The study was approved by the institutional review board of the Ewha Womans University School of Medicine. Informed consent was obtained for all study subjects before entering the project. The study was conducted in accordance with the guidelines of the Declaration of Helsinki.

Phenotypes

For all standards involving length or height measurements, recumbent length was used for infants <2 years and standing height for children 2 years. Using a measuring mat for infants and a body meter measuring tape for children (SECA, Hamburg, Germany), a nurse took all height measurements to the nearest 0.5 cm with the children's shoes removed. Quantitative trait family-based association analyses were performed on the phenotypes for height standard deviation scores (SDS, also called Z-scores).

Genotyping

A total of 550 individuals were successfully genotyped. For those who agreed to be genotyped, genomic DNA was extracted from whole-blood samples using a commercial isolation kit (G-DEX II, Intron, Seoul, Korea) according to the manufacturer's protocols. A genome-wide scan was carried out with the Infinum Human CNV 370K-Duo BeadChip (Illumina, San Diego, CA, USA) and BeadStation500 Chip Scanner (Illumina). This array contains 317 000 HapMap phase I and II tag SNPs and 55 000 copy number variation-specific markers. Genotyping was processed using Illumina's Infinium II Assay. After completion of the assay, BeadChips were scanned with a two-color confocal BeadArray reader (Illumina). Image intensities were extracted and genotyped using Illumina's BeadStudio 3.2 software (Illumina). The overall sample success rate, SNP success rate and genotype call rate were 100% (550/550), 98.33% (341 039/346 831) and 99.79%, respectively.

Statistical analysis

As the growth reference of Korean-Vietnamese was not yet available, height measurements of cohort members were transformed into SDS for specific age and sex, according to the three growth references (Korean, World Health Organization (WHO), Vietnamese). Height SDS is the difference between height and mean height of the reference population according to the subject's age and sex, divided by the corresponding population's s.d. Using growth data (0–5 years of age) from the 2006–2008 cohort of the marriage-based immigrant cohort in Korea, height SDS values were derived for length/height, age (in months) and sex from 2005 Korean child growth standards20 and compared with those from 2009 global child growth standards21 and 2003 Vietnamese child references.22 The observed mean height SDS was −0.5. The outliers of height SDS (<−5 and >+3) were excluded according to the WHO recommendation on outlier cutoffs.23 Height SDS standardized by sex and age group was a normally distributed quantitative trait. The height SDS from the three references were compared using one-sample t-tests (SPSS 12.0, SPSS, Chicago, IL, USA). Basic characteristics of the study subjects are presented in Table 1.

Table 1 Characteristics of study subjects genotyped in 165 nuclear families

The family-based association test for quantitative height SDS was conducted using PBAT version 3.6 (http://www.biostat.harvard.edu/~clange/default.htm). FBAT analysis of PBAT was performed under the assumption of an additive model. To ensure asymptotic validity of the FBAT statistic, it was not calculated when fewer than 20 families were informative (families with at least one heterozygous parent). A quantile–quantile (QQ) plot was generated using R statistical software (http://cran.r-project.org) and visually inspected to detect strong inflation of test statistics.

To assess the extent of pairwise linkage disequilibrium (LD) between SNPs, standard D′ and r2 were calculated using Haploview software (http://www.broad.mit.edu/haploview). LD blocks were defined by the algorithm of confidence interval method by Gabriel et al.24 Default settings were used in these analyses, which invoked a one-sided upper 95% confidence bound of D′>0.98 and a lower bound of >0.7 to define SNP pairs in strong LD.

Haplotype-based analyses were performed by two methods: (1) HBAT (the haplotype version of FBAT)25 using haplotypes constructed by tag SNPs that were selected with Haploview and (2) the alternative sliding windows methodology with HaploBuild (http://snp.bumc.bu.edu/modules.php?name=HaploBuild). To calculate exact P-values in haplotype-based analysis, 100 000 permutations were determined by means of the Monte Carlo permutation procedure implemented in HBAT. HaploBuild constructs associated noncontiguous haplotypes of variable length (user-defined maximum haplotype size=4) and permitted the haplotype length to be determined by the strength of an association with a phenotype. The first step of HaploBuild, using HBAT, is testing all combinations of two-SNP haplotypes and then expanding to longer haplotypes within a defined region (50 kb).26 HaploBuild calculates a global P-value from false discovery rate (FDR) analysis using the q-value package in the R statistical language (http://genomine.org/qvalue). The cutoff for significant association at the genome-wide level was set at <0.05 for both the Monte Carlo permutation procedure and the FDR q-value.

Results

Quality control

Sample quality control was performed as a first step. A total of 21 individuals who no longer had biological relatives were excluded; the majority of these individuals were found to have an excessive number of Mendelian errors as identified by pedstats of quantitative transmission disequilibrium tests (QTDT).27 Five half-sibs were excluded. The data of infants/children in early childhood who were between 0 and 5 years of age were used for GWAS; hence, four children >5 years of age were excluded. For SNP quality control, SNPs with a minor allele frequency <0.05 in either fathers or mothers were excluded from further analysis. After data cleaning, 275 675 polymorphic SNPs were analyzed on chromosomes 1–22. A total of 5787 Mendelian error SNPs were found and genotypes of families with Mendelian error have been reset to missing by the FBAT version 2.0.2 (http://www.biostat.harvard.edu/~fbat/default.html). Finally, 165 families of 518 individuals (165 spouse pairs, 93 sons and 95 daughters) and 269 888 SNPs were used for analyses after the exclusion steps.

Data

Of the 188 children in infancy and early childhood, eight participants for whom length/height measurements were unavailable and an additional two participants whose height SDS were biologically implausible (less than −5 or greater than +3)23 were excluded. Thus, the sample for analysis consisted of 178 children in infancy and early childhood (156 families). The general characteristics of the subjects are listed in Table 1.

The raw height measurements of adults were normally distributed, but those of infants and children in early childhood were slightly skewed to the left. However, after converting to SDS, height SDS measurements in both adults and infants/children in early childhood were normally distributed. Analysis of data of infants/children in early childhood showed that subjects had significantly lower height SDS (−0.55, P<0.001) than the Korean group, but higher height SDS (+1.00, P<0.001) than the Vietnamese group. A significant difference could not be found in height SDS from that of the WHO reference group (−0.08, P=0.346). There were no significant differences in height SDS on the basis of sex among infants/children in early childhood (P=0.937, 0.359 and 0.680 for the Korean, WHO and Vietnamese groups, respectively).

GWAS on height SDS in infants/children in early childhood: single-SNP-based analysis

From a quantile–quantile (Q–Q) plot for the distribution of P-values involving 270 000 eligible SNPs in samples of this study (Figure 1), the observed P-values matched the expected P-values over the range of 1<−log10 (P)<4.0, but a few FBAT P-values were more extreme than expected at the tail [−log10 (P)>4.0].

Figure 1
figure 1

Quantile–quantile plots for the 269 888 single-nucleotide polymorphisms (SNPs) from the genome-wide association. (a) Height standard deviation scores (SDS)-Korean, (b) height SDS-World Health Organization, (c) height SDS-Vietnamese. Each black circle represents an observed P-value (defined as −log10 (P)). A linear trend is expected under the null distribution. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 2 presents the SNPs most closely associated with height SDS according to their degree of statistical significance (lowest P-value) in FBAT. Associations were compared among height SDS using Korean (KR), WHO and Vietnamese (VN) growth references. Table 2A lists the 20 SNPs most closely associated with height SDS-KR. The minor allele frequencies for SNPs show an ethnic difference that was similar among Asian ethnic groups but was evidently different in Caucasians. Overall, 13 SNPs yielded a P-value <10−4 in FBAT for height SDS. The SNPs most closely associated with the height trait include rs4255073, rs1121066 and rs5000370 in MAGI2 (membrane-associated guanylate kinase, WW and PDZ domain containing 2); rs7498403 near musculoaponeurotic fibrosarcoma oncogene homolog); and rs4901444 in BMP4 (bone morphogenetic protein 4). Table 2B displays the association of trait with height SDS-WHO. Nineteen SNPs yielded a P-value <10−4 in FBAT. The five most closely associated SNPs included rs241882 in SYN3; rs9396245 and rs9370550 in C6orf65; and rs1121066 and rs5000370 in MAGI2. Table 2C shows that rs1121066, rs5000370 and rs4255073 in MAGI2; rs4078258 in PPFIBP2; and rs7592571 in ALK were associated with height SDS-VN. Only one of these SNPs had a P-value <10−5 (rs749843, P=4.8 × 10−6) in FBAT for all growth references.

Table 2 Summary of association results for height SDS in single-SNP-based analysis

Four genes were found to be overlapped by comparing the suggestive SNPs between the three panels of references for height SDS. MAGI2 (rs4255073, rs1121066 and rs5000370) and BMP4 (rs4901444) ranked as highly suggestive in the top 20 associations with height SDS for the three growth references. MAF (rs7498403) and PTPN7, protein tyrosine phosphatase, nonreceptor type 7, (rs10920338), were associated with both height SDS-KR and SDS-WHO. After Bonferroni's adjustment (significance level: 0.05/269888=1.85 × 10−7) and the FDR method for multiple comparisons, the SNPs with significance could not be found.

Haplotype-based analysis

For height during infancy/early childhood, the highly suggestive genes were MAGI2, MAF, BMP4 and PTPN7 for two or three references among three height SDS references in single-SNP testing. The analysis identified 240 blocks with high LD across the four genes, which confirmed the association at the individual SNP level. These blocks can be represented by the 354 tag SNPs (tSNPs) selected by Haploview. SNPs that could not be assigned to any of the blocks containing more than one SNP were considered as a separate block of size 1 (there were 145 such orphan SNPs in these data). Figure 2 shows the LD pattern and haplotype block structure for SNPs in the region of suggestive genes for height; results are based on single-SNP association analyses. These SNPs were in consistently strong LD and formed a single perfect haplotype block. Within each highly suggestive gene, multiple markers showed an exact P-value <0.05 after Monte Carlo 100 000 permutation testing, with the most significant markers showing P<0.0001 (Table 3). Among the genes for height variation, association results of haplotype blocks defined by LD structures were similar to those revealed by analysis of single SNPs.

Figure 2
figure 2figure 2

Association results for height standard deviation scores (SDS)-Korean using single-single-nucleotide polymorphism (SNP) markers and haplotype windows near suspected genes. (a) MAGI2, (b) MAF, (c) BMP4, (d) PTPN7. The top sector is the graph showing the −log10 (P) significance values of individual SNPs and haplotypes. SNP results (triangles) are FBAT P-values without multiple test correction; tag SNP haplotype results (red bold line). Exact P-values from HBAT analysis were calculated after 100 000 permutations. Only haplotypes with significant evidence (−log10 (P)>1.3, that is, P<0.05; gray dashed line) were plotted; noncontiguous haplotype results (diamond-point line). Empirical P-values were false discovery rate (FDR) q-values. The middle sector is the exon structure of the gene. The linkage disequilibrium (LD) block structures, as depicted by Haploview, are shown at the bottom of the graph. The colors from white to black represent increasing strength of LD. Values for D′>0.9 are dark black boxes and values for D′<0.9 (indicated as original value × 100) are shown as gray and white boxes in the cells. Tag SNPs are marked in front of the rs number by an asterisk (*). The SNPs that had suggestive association in the single-SNP-based analyses were underlined by a solid line. Vertical dashed lines located in the middle sector below the magnified blocks represent regions that were shown to have a suggestive association in the single-SNP-based analysis. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 3 Association of haplotypes using tag SNP with height SDS in HBAT analysis

As MAGI2 is a large gene, 129 blocks were identified, including 72 orphan SNPs with high LD, which ranged in size from 13 bp to 112 kb. Although blocks 11, 26, 33, 34, 35 and 36 for height SDS-KR and blocks 11, 33, 35 and 36 for height SDS-WHO were significant (exact P-value <0.05), only block 40 was significant for all three height SDS values. Haplotype 1 of block 40 showed significant association signals overall (two-sided P=0.0010, 0.0002 and 0.0151 for height SDS-KR, SDS-WHO and SDS-VN, respectively); however, after permutation testing, SDS-KR and SDS-WHO remained significant (exact values of P=0.0044 and 0.0008 for height SDS-KR and SDS-WHO, respectively). The haplotype designed using three SNPs of MAGI2 that had suggestive association in the single-SNP-based analyses showed significant association for all three height SDS (exact values of P=0.0040, 0.0005 and 0.0099 for height SDS-KR, SDS-WHO and SDS-VN, respectively). SNP rs7498403 (MAF) had little LD with any of the other SNPs, yet it was still counted as a haplotype tSNP for simplicity. SNP rs7498403 was suggestive (P=4.8 × 10−6 and P=5.3 × 10−5 for height SDS-KR and SDS-WHO, respectively) for height in the single-marker association but was significant as tSNP at haplotype analysis (exact value of P<0.0001 for the two height SDS references, respectively). The other haplotypes of the MAF gene were not significant for any of the three height SDS references. The associations with height were detected within LD blocks of the BMF4 gene. SNP rs4901444 of the adjacent block 7 had a suggestive association with height SDS-KR, SDS-WHO and SDS-VN, whereas it was significant as a tSNP (exact values of P <0.0001, <0.0001 and 0.0491 for height SDS-KR, SDS-WHO and SDS-VN, respectively). Although SNP rs4901444 was not included in block 7, the haplotype was analyzed because this SNP was located adjacent to block 7 and had high LD (D′=1) with rs8009949 of block 7. The haplotype of rs8009949 and rs4901444 represented a significant association for height (exact P=0.0004, 0.0002 and 0.0272 for height SDS-KR, SDS-WHO and SDS-VN, respectively). The haplotypes constructed with all four tSNPs in block 7 showed significant association signals for only height SDS-KR (exact P=0.0064). SNP rs10920338 (PTPN7) also had a low suggestive association with height (P=1.15 × 10−4 and 1.01 × 10−4 for height SDS-KR and SDS-WHO, respectively), whereas it was significant as a tSNP (exact value of P=0.0002 and 0.0001 for height SDS-KR, SDS-WHO and SDS-VN, respectively). The associations between PTPN7 and height SDS-VN were not observed in the single-SNP-based and haplotype-based analyses.

The algorithm HaploBuild was also used to construct noncontiguous associated haplotypes in the family-based genetic studies. The results of the noncontiguous haplotype analysis with SNPs of suspected loci are shown in Table 4 and Figure 2. Table 4 lists the most closely associated haplotypes (lowest P-value), including SNPs suggestive of an association with height in the single-SNP analysis. Haplotypes constructed with HaploBuild were distributed in a range near the suggestive SNPs or significant haplotypes using tSNP for infancy/early childhood height. For MAGI2, haplotypes were distributed across the gene; for MAF, noncontiguous haplotypes were located at the 3′ untranslated region (UTR), including SNP rs7498403 and 5′ UTR; for BMP4, all haplotypes were distributed in a range at the 3′ UTR within 150 kb of SNP rs4901444. Similar to the results of the contiguous haplotype analysis using tSNP, noncontiguous haplotypes of variable length also showed significant association (FDR q<0.05) for the trait near the suspected loci.

Table 4 Association of noncontiguous haplotypes with height SDS-KRa in HBAT analysis

Discussion

This study used data of infants/children in early childhood who were between 0 and 5 years of age because most of our samples consisted of infants <2 years of age and sample size of 3- to 5-year-old children was insufficient to analyze separately. The correlation between parent and offspring was calculated using additional phenotype data of the same population of our cohort before GWAS and compared according to age group. There is no significant difference in correlation between parent and offspring between the groups of 0–2 and 3–5 years of age (data not shown); hence, the subjects of 0–5 years of age were combined for analyses.

Our subjects are families with members from two different countries: Korean father and immigrant Vietnamese mother. The stratification or structure of the population-based association study usually affects the results. However, the family-based association study provides robustness to the effect of the admixture. The FBAT P-values for height among infants/children in early childhood tended to be less significant than expected, especially the smaller than expected P-values in the QQ plot. This is likely due to the limited statistical power to detect small and modest effects given the small sample size of this study. In addition, as FBAT analysis uses only informative families, the sample size used in the analysis was even smaller. Results of the Framingham heart study were similar to situations of the present study, and the authors reported that FBAT analysis was not significant because it was limited by the number of informative transmissions.28

The results of haplotype GWAS using Monte Carlo permutation were significant after correction. Haplotype-based algorithms were developed to cope with multiple comparison problems in GWAS at multiple SNP loci in LD.29 Permutation testing is another approach to confront the problem of multiple comparisons,30 because this method accounts for the exact correlation between tests and does not overcorrect by assuming that all tests are independent; hence, it is more powerful than family-wise type I error and FDR methods of correction. Recent reports suggest that the application of both haplotype-based and individual-SNP testing to GWAS should be adopted as a routine procedure.31, 32, 33

Most studies of genetic variants of height have been conducted in adult populations, but those for infants/children in early childhood are just a few.18 Recent studies using GWAS for adults have identified 54 loci robustly associated with adult height variation, which include genes involved in chromatin structure (HMGA1, HMGA2, DOT1L, two histone clusters, SCMH1), extracellular matrix proteins that form bone and cartilage (ACAN, FBLN5, EFEMP1, ADAMTS17, ADAMTSL3), BMP signaling (NOG, GDF5, BMP2, BMP6), cell-cycle regulation (CDK6, CABLES1, ANAPC13, NCAPG), hedgehog signaling (IHH, HHIP, PTCH1) and microRNA (HMGA2, CDK6, DOT1L, LIN28B, PAPPA).34 Sovio et al.18 were the first to show the genetic association for longitudinal height growth in a prospective cohort consisting of subjects in stages from birth to adulthood; their results showed that less than half of the genetic variants associated with adult height had a measurable effect on peak height velocity during infancy or puberty.

There was no overlap between previously reported genetic variants of adult height or longitudinal height in GWAS and those from this study for infant/early childhood height. The six SNPs identified in this study were mainly associated with embryogenesis and development. The biological function of MAGI2 is most likely the assembly of a macromolecular structure composed of two types of activin receptor and an intracellular signaling molecule for activin signal transduction through the activin/transforming growth factor-β signaling pathway.35 MAF was specifically expressed in late hypertrophic and terminal chondrocytes, as well as in osteoblasts.36 Total long-bone length was reduced in embryos of c-MAF-null mice compared with wild-type mice. In a 4-week postnatal c-MAF-null mouse, which is almost similar to a 2-year-old child,37 the hypertrophic chondrocyte domain was expanded in length approximately threefold compared with control littermates. This suggests that MAF may regulate bone growth as it initiates terminal differentiation and completes the chondrocyte differentiation program and also influences the height of infants and early children, as well as development of embryos. Several associated SNPs were identified in or near genes related to the transforming growth factor-β signaling pathway, which has a critical role in the regulation of cell growth, differentiation, development, apoptosis and fate in a wide range of biological systems. BMP4 is part of the transforming growth factor-β superfamily38 and has an essential role in osteoblast differentiation, bone formation and fracture healing.39 BMP2 and BMP6 were reported to be associated with adult height in a previous GWAS.40 Both genes are highly conserved, with over 90% similarity in mature proteins.41 There is also a possible biological explanation for height in infants/children in early childhood by PTPN7. PTPN7 is a member of the PTP family. PTPs are known to be signaling molecules that regulate various cellular processes including cell growth, differentiation, mitotic cycle and oncogenic transformation. The SNPs mentioned above might influence the growth and height in infants/children in early childhood as a vestige of fetal development. The ultimate validation of these findings will require replication in other cohorts and functional studies. These results may serve as an initial resource for future replication studies or meta-analysis.

In conclusion, this study identified previously unsuspected loci associated with height in infants/children in early childhood in admixture Vietnamese-Korean families. Most signals associated with height in infants/children in early childhood in this study lie close to genes that are involved in embryogenesis, development and bone growth through growth factor signaling pathways. These results suggest that length/height in infants/children in early childhood might be influenced by genetic variations and regulations that are somewhat different from those affecting adult height. This study might be the first genetic association analysis on cross-sectional height of infants/children in early childhood in an admixture family cohort.