Introduction

The total number of offspring born as female and as male individuals of a diploid population should theoretically be the same. In reality, however, the sex ratio of humans at birth has differed from one. For example, sex ratios of European populations have been fairly consistent, with the male-to-female ratio ranging from 1.02 to 1.08.1, 2, 3 Sex ratio can be affected by differential survival time of X-bearing and Y-bearing sperms in a variety of conditions including gonadotropins/testosterone concentration at conception, ovulation induction, paternal age, parity, birth order, coital rates, infertility, parental illness, maternal malnutrition/smoking/exposure to toxic agents and stress.2, 4, 5, 6, 7 Many genetic factors are suspected from the endogenous conditions. In fact, the sex ratio has been identified as paternally inherited in previous studies, in which men with more brothers had more sons and men with more sisters had more daughters.8, 9, 10 This may not be attributed to any Y chromosomal genes because the selection bias towards producing sons may be accelerated by constraining daughter-producing alleles from being inherited. Autosomal genes may be involved in sex ratio as shown in a simulation study where sex ratio was in equilibrium by frequency-dependent selection acting on the inheritance of polymorphic autosomal genes expressed in the male reproductive system.10

A meta-analysis across 51 genome-wide association studies (GWASs) with Caucasian populations was conducted to identify common autosomal variants for influencing sex ratio.11 The study, however, did not reveal any nucleotide variants significantly associated with sex (P>5 × 10−8). This might have resulted from the dilution of genetic effects by utilizing populations under different environmental conditions that interacted with genetic effects. We conducted a GWAS to examine genetic associations of common autosomal variants with sex in a large cohort of Koreans.

Materials and methods

Subjects and genotypes

This study used data collected from community-based cohorts by the Korean Association REsource (KARE) project. They included a total of 10 038 Korean individuals born from 1931 to 1963. Ethical approval was obtained from the Institutional Review Board of the Korea National Institute of Health, and all participants provided written informed consent after the aims and nature of the study were disclosed. All subjects were genotyped using the Affymetrix Genome-Wide Human SNP Array 5.0.12 Bayesian Robust Linear Modeling with the Mahalanobis Distance genotyping algorithm was used to obtain data.13 This study included 8842 subjects (4183 males and 4659 females) after screening for a genotype call rate smaller than 95%, sample contamination, gender inconsistency, cryptic relatedness and serious concomitant illness. A series of quality control procedures was also conducted to remove nucleotide variants with a subject call rate smaller than 95% or with minor allele frequency less than 0.05. Variants in Hardy–Weinberg disequilibrium (P<0.001) in both females and males were also excluded. This study used a total of 1 400 289 nucleotide variants, which included 1 118 622 variants imputed based on filtered haplotypes of 170 Japanese and Chinese HapMap founders (HapMap 3 phase 2; http://hapmap.ncbi.nlm.nih.gov/) as a reference panel.

Association analysis

Genetic associations were analyzed using a mixed model, which employed a genomic relationship matrix for random polygenic effects to avoid population stratification14 and incorporated the leaving-one-chromosome-out approach to avoid underestimation of genetic association15 as follows:

where Y is the vector of dichotomous phenotypes (sex); μ is the overall mean; 1 is the vector of 1's; β is the fixed effect for the minor allele of the candidate nucleotide variant; and x is the vector with elements of 0, 1 and 2 for the homozygote of the minor allele, heterozygote and homozygote of the major allele, respectively. g is the vector of random polygenic effects using the genome without the chromosome housing the candidate variant, where G is the n × n genomic relationship matrix without one chromosome, and is the polygenic variance component that should be re-estimated whenever a specific chromosome is excluded from the calculation of the genomic relationship matrix. The genomic relationship matrix consists of pairwise genetic relationship coefficients, which were estimated using genotypes of variants in linkage equilibrium (r2 > 0.8) as follows:

where Nc is the number of autosomes (=22), Nm is the number of variants in the ith autosome, Pij is the frequency of the minor allele at the jth variant in the ith autosome, and nijk (nijl) is the number (0, 1 or 2) of the minor allele at the jth variant in the ith autosome for the kth (lth) individuals. ɛ is the vector of random residuals , where is the residual variance component, and I is the n × n identity matrix. The polygenic and residual variance components were estimated using restricted maximum likelihood. Polygenic and residual variance components were first estimated by expectation maximization-restricted maximum likelihood, and then the estimates were used as initial values to obtain their average information-restricted maximum likelihood estimates. The fixed effect was then solved with the estimated variance components under the mixed model equations. The statistical analysis was conducted using the Genome-wide Complex Trait Analysis (GCTA) freeware.16

Linkage disequilibrium blocks were constructed to determine if some variants associated with sex were considered to be one association signal. The linkage disequilibrium blocks were estimated at association signals using Haploview.17

Replication analysis

Genetic associations with sex ratio were further examined using an independent Korean population to validate genetic associations revealed in the current GWAS. This included 935 subjects recruited from routine health checkups at Hallym University Hospital.18, 19 Genomic DNA was extracted from their peripheral blood cells using QIAamp DNA blood mini kit (Qiagen, Hilden, Germany). Nucleotide variants were genotyped using the TaqMan polymerase chain reaction assay (Applied Biosystems, Foster City, CA, USA). All reactions were carried out following the manufacturer’s protocol, and the products resulted from the reactions were analyzed using ABI PRISM 7900HT (Applied Biosystems). Genetic associations were determined by the Fisher’s exact test under the assumption of an additive contribution of minor allele. Multiple testing by Bonferroni correction was also conducted for the association analysis.

Results

The genome-wide association analysis revealed 14 single-nucleotide polymorphisms associated with sex (P<5 × 10−8; Figure 1; Table 1). After excluding single-nucleotide polymorphisms in strong linkage disequilibrium (r2>0.8; Figure 2), nine association signals were identified for sex ratio distortion (Table 1). Five of the signals were observed with lower minor allele frequency in men than in women. They were all deviated from Hardy–Weinberg equilibrium (HWE) in men (P<10−3), but not in women (P>10−3) except for one signal (rs3013386), which showed a deviation in women (P<10−3), but not in men (P>10−3). The other four association signals were observed with lower minor allele frequency in women than in men. They were all deviated from HWE in women (P<10−3), but not in men (P>10−3).

Figure 1
figure 1

Manhattan plot for genome-wide association with sex by autosomal position. Red horizontal line indicates genome-wide significance threshold value (P=5 × 10−8). A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 1 Nucleotide variants associated with sex ratio in the genome-wide association study of a Korean population samplea
Figure 2
figure 2

Linkage disequilibrium (LD) blocks with nucleotide variants in proximity of association signals for sex ratio. Pair-wise linkage disequilibrium estimate (r2) is presented in each cell. Red box indicates nucleotide variant with the most significant association within each LD block. (a) chromosome 1, (b) chromosome 10 and (c) chromosome 15. A full color version of this figure is available at the Journal of Human Genetics journal online.

Of the nine association signals, three were located in genes encoding protein phosphatase 1, regulatory subunit 12B (PPP1R12B, intron 12, rs1819043); dynein, axonemal, heavy chain 11 (DNAH11, intron 61, rs10255013); and ornithine aminotransferase (OAT, intron 6, rs11244715). The rs9788182 on chromosome 12 was an intronic variant within non-coding genes (LOC105369897, LOC105369897). The others were all intergenic variants.

Further genetic association analysis using an independent Korean population confirmed the associations of sex ratio with the intronic variants of PPP1R12B and DNAH11 genes (P<5.56 × 10−3; Table 2). The PPP1R12B variant was deviated from HWE in men (P<0.05), but not in women (P>0.05). The PPP1R12B variant was deviated from HWE in women (P<0.05), but not in men (P>0.05). These sex-specific Hardy–Weinberg disequilibrium were corresponding to the results from the genome-wide association analysis.

Table 2 Validation of nucleotide variants associated with sex ratio using an independent data set

Discussion

The current GWAS identified nine common autosomal variants associated with sex ratio in a large cohort of Koreans. This implies sex-specific selections against particular common autosomal genetic variants. The mixed model analysis of the current study showed that autosomal genetic variants might explain 7.5% of phenotypic variability as polygenic effects. This implies that sex-specific selections against particular autosomal genetic variants may not be negligible. Furthermore, the associations of sex ratio with the intronic variants of DNAH11 and PPP1R12B genes were confirmed using another independent data set. The genes with variants that were discovered to be associated with sex ratio are potential candidates for sex ratio distortion. In particular, the genes encoding DNAH11 and PPP1R12B are involved in the reproductive system. The DNAH11 is used for the genetic diagnosis of patients with primary ciliary dyskinesia without ciliary ultrastructural defects,20 which show defects in the action of the flagella of sperm cells.21 The PPP1R12B engages in protein phosphorylation which might be influential in reproductive systems. Percoll fractionation analysis of human sperms showed that their morphology, motility, and hyperactivation were associated with protein tyrosine phosphorylation (P<0.05).22, 23 A gene enrichment analysis revealed protein phosphorylation to be associated with father-specific transmission distortion of a Yoruba population in Nigeria.24 Mouse sperm turned out to be functionally regulated by a gene (protein phosphatase 1 gamma 2, PP1γ2) with a role in protein phosphorylation.25 Furthermore, a considerably large amount of transcription of PPP1R12B was observed in the reproductive system.26

A caution should be addressed with regard to the association signals resulting from this study. We could not exclude the possibility that the associations may be involved in differential survival by sex after birth. This is because the current study dealt with the male-to-female sex ratio from ages in the 40s to 60s, but not at birth. Therefore, the results may reflect differential survival after birth to the age range examined, although we presumed that such a difference may not be considerable. Further investigations with sex ratio at birth are needed to distinguish differential survival before and after birth.

No genetic associations (P>5 × 10−8) were discovered with regard to sex ratio in a previous GWAS where a large-scale meta-analysis was conducted with 51 Caucasian cohorts.11 False-negative associations might be produced by the conservative multiple testing with the Bonferroni correction. Furthermore, confounding might be introduced when using multiple studies with different populations that had different genetic backgrounds under different environmental conditions. Also, the smaller sex ratio of Caucasians than that of Asians1 might make the study with Caucasians difficult to identify genetic associations.

The current association signals were identified from Korean populations. Nevertheless, this did not imply that the results are limited to only Koreans. Association studies with populations similar to Koreans (e.g., Japanese, Chinese and other Asians) would help enhance applicability of the signals.

Heterogeneous genetic variance of a particular trait by sex might be attributed partially to sex-specific selection against common genetic variants that are also associated with the trait. A larger minor allele frequency in one sex may produce larger genetic variance than in the other sex. Such heterogeneity would be reduced by excluding variants without HWE. For example, while genetic variances explained by nucleotide variants showed heterogeneity (P<0.05) by sex for body mass index, waist-to-hip ratio, pulse pressure, high-density lipoprotein cholesterol, triglyceride, low-density lipoprotein cholesterol and fasting glucose level, heterogeneity in genetic variances explained by variants with HWE was observed only for body mass index and triglyceride (P<0.05).27

Sex-specific selection against particular genetic variants might be a plausible explanation for the unequal sex ratio in humans although it is difficult to explain the 1.02–1.08 sex ratio with the current findings. This study showed female-specific selection at the DNAH11 association signal and male-specific selection at the PPP1R12B association signal, with different effect sizes. A presumable scenario is that sex ratio can be maintained by a balance of power between female-specific selection and male-specific selection, but is not exactly equal to 1:1.

In conclusion, the current study revealed two novel genetic association signals influencing sex ratio. This implied that genetic variants in proximity to the association signals might influence sex ratio. Fine mapping is needed to track down causal genetic variants responsible for sex ratio distortion. Additional studies with rare and sex-chromosomal variants would be helpful to understand the genetic architecture of sex ratio distortion. Further studies are necessary to reveal the mechanisms underlying sex ratio distortion.