Introduction

Endometriosis is an estrogen-dependent disorder observed in 5–10% of women of reproductive age1 and in 20–50% of women with infertility.2 It is characterized by the presence of endometrial glands and stroma outside the uterine cavity, primarily on the pelvic peritoneum and ovaries.3 The main pathological processes associated with the disease are peritoneal inflammation and fibrosis, and the formation of pelvic adhesions and ovarian cysts.4 The common symptoms include dysmenorrhea, dyspareunia, noncyclic pelvic pain and infertility.1, 5

Endometriosis is a common complex disease caused by genetic and environmental factors including their interactions.4 There is ample evidence that substantial components of genetic factors have an important role in the disease risk.6, 7, 8, 9, 10 Its incidence in relatives of affected women in USA is up to seven times the incidence in women without family histories of endometriosis.11 The relative risk of endometriosis in female siblings of the case subjects is estimated to be 5.7 in Japanese.12 In addition, evidence of genetic risk of endometriosis has been suggested by twin studies showing increased concordance rates of the development of endometriosis in monozygotic twins13, 14 compared with dizygotic twins. The heritability of human endometriosis was estimated at 51%.9, 10, 13, 14

Genetic components underlying endometriosis have been examined directly by a genome-wide linkage study with a large set of families. A total of 1176 affected sib-pairs were recruited in the Australian and the UK groups having sufficient statistical power (80%) to detect a locus with a sibling relative risk of 1.3 using nonparametric linkage analysis. The analysis of the Australian and the UK families demonstrated a significant linkage to chromosome 10. Taking more stringent criteria of endometriosis families (three or more affected individuals in a family), nonparametric and parametric linkage analysis identified 7p13–15 as a susceptibility locus with near-Mendelian inheritance.15 However, identification of the causal genes from these loci is not thus far successful.3, 4, 16

An alternative approach to identify susceptibility genes of common complex diseases such as endometriosis is genome-wide association (GWA) analysis using high-density single nucleotide polymorphism (SNP) arrays that have recently been exemplified by a number of groups.17 In this study, we carried out a meta-analysis of two GWA studies to search for novel genetic contributors underlying endometriosis in Japanese population.

Materials and methods

Study subjects

All women with endometriosis were registered at the Niigata University Hospital, the Nagasaki University Hospital, the Kumamoto University Hospital, the Takarazuka City Hospital, the National Hospital Organization Kyoto Medical Center and several hospitals in Niigata, Toyama and Yamagata Prefecture (for details, see Acknowledgements). The affected subjects comprised 728 Japanese women who fulfilled any of the following diagnostic criteria: (i) women who underwent laparotomy or laparoscopic surgery for diseases other than endometriosis, each of which procedure was also used to provide biopsy-proven evidence of endometriosis, (ii) women who were verified to have endometriosis in diagnostic laparoscopy and (iii) women who were diagnosed to have ovarian cysts by imaging diagnostics. The mean ages for 500K and 6.0 array cohorts were 31.1±5.8 years and 34.9±7.9 years, respectively. The clinical characteristics of all the cases studied are summarized in Table 1.

Table 1 Characteristics of women with endometriosis

Control samples for association analysis comprised 834 Japanese women from various resources as follows: (i) 96 fertile women or those with benign gynecological tumors, with no history of endometriosis diagnosed at the Niigata University Hospital (age, 37.1±7.4 years, mean±s.d.), (ii) 241 Affymetrix CEL files (including per-probe intensity values) from the Japanese Integrated Database Project,18 each of which had passed 93% call-rate threshold (GeneChip Human Mapping 500K array), or the 86% threshold (Genome-Wide Human SNP array 6.0) with the Dynamic Model algorithm to generate an initial quality control (QC) call rate (for 500K array cohort, panic disorder control cohort (n=81 females),19 multiple system atrophy control cohort (n=77 females), control database cohort (n=17 females) and for 6.0 array cohort, control database cohort (n=66 females)20) and (iii) Genotype count data for 906 703 SNPs genotyped with 6.0 arrays from the Japanese Integrated Database Project (for 6.0 array cohort, late-onset Alzheimer's disease control cohort (n=497 females)) (Table 2).

Table 2 Sample quality control process

Genomic DNA was extracted from peripheral blood lymphocytes per sample using a QIAamp DNA Blood Maxi Kit (QIAGEN, Tokyo, Japan) according to the manufacturer's protocol. DNA samples used in this study were derived from individuals after they had given written informed consent. The study protocol was approved by the ethical committees of the University of Niigata and the affiliated hospitals.

Genotyping

Genotyping was conducted on two types of the Affymetrix platforms, GeneChip Human Mapping 500K array or Genome-Wide Human SNP array 6.0 (Affymetrix, Santa Clara, CA, USA), according to the manufacturer's instruction. In this study, 411 subjects (315 cases and 96 controls with fertility or no endometriosis) were newly genotyped with 500K arrays, and 413 subjects (413 cases) were genotyped with 6.0 arrays (Table 2). Genotype calls were determined with the Bayesian Robust Linear Model using Mahalanobis distance classifier (BRLMM) algorithm21 for 500K arrays or the Birdseed v2 algorithm for 6.0 arrays, embedded in Affymetrix Genotyping Console 3.0.1 (Affymetrix).

Sample QC

The following QC filters were applied to exclude samples in each of array cohorts (Table 2). We excluded from analysis samples that showed >5% missing genotypes (per-individual call rate <95%) and were outliers with respect to genome-wide heterozygosity (<21% or >30% heterozygous SNP rate), for which excess heterozygosity could be attributable to DNA contamination.22

Samples showing cryptic relatedness were excluded. Cryptic relatedness was examined by pairwise identity-by-state analysis with gPLINK23 using nearly independent SNPs, among which no pair were correlated with r2>0.2. We estimated the degree of relatedness for each pair of samples, and found duplicates, first- and second-degree pairs based on pairwise identity-by-descent sharing. We kept only one subject from the duplicate or relative samples inferred. Samples showing population outliers were also excluded. Because it is known that there is a population substructure in Japanese population,24 we performed multidimensional scaling analysis with the use of gPLINK for our cohort with 90 HapMap samples (45 JPT and 45 CHB samples) as references to provide a two-dimensional projection of the data based on pairwise identity-by-state distances. After removal of samples that were clearly separate from the main cluster of our cohorts and HapMap JPT samples, we further performed similar multidimensional scaling analysis only using the present cohort samples. In the secondary multidimensional scaling analysis, samples showing >4 s.d. from the mean along any of the two-dimensional scaling axes were regarded as population outliers to be excluded.

After the above sample QC, genotype count data for 497 women, which had been independently quality controlled in late-onset Alzheimer's disease cohort (R. Kuwano, personal communication), were added to perform the following SNP QC (Table 2).

SNP QC

Data underwent QC in each of array cohorts, and only SNPs fulfilling the following criteria were included: (i) minor allele frequency 0.01 in cases and controls, (ii) per-SNP missing rate <4% (for 500K array) or <2% (for 6.0 array) in cases and controls, (iii) P-value of exact test of the missing rate differences between cases and controls 0.05 and (iv) P-value of exact test of Hardy–Weinberg equilibrium 10−5 in cases and controls (Table 3). For these calculations, we used R statistical environment version 2.9.0 (http://www.r-project.org).

Table 3 SNP quality control (QC) process

Statistical analysis

In cohort-wise association analysis, we tested for single-locus association between each QC-passed SNP (Table 3) and endometriosis in an allele frequency model (1 degree of freedom) using the χ2-test. We then assessed the association by combining cohort-wise odds ratios (ORs) in each array cohort based on a meta-analysis of the two GWA studies. To examine between-cohort heterogeneity of the effect size of SNP association, we conducted the Cochran’s Q-test25 and used P-value <0.01 as evidence of significant heterogeneity. When the between-cohort heterogeneity was not significant, a fixed-effects model of meta-analysis was conducted using the Mantel–Haenszel methods.26 We applied a random-effects model (the DerSimonian and Laird method27) of meta-analysis to SNPs with significant between-cohort heterogeneity for combining the results from the two cohorts. From the meta-analysis, we estimated summary log-ORs of risk alleles with standard errors, and obtained P-values from the score test (two-tailed) for 282 838 SNPs common to two genotyping platforms. To evaluate the degree of overdispersion of test statistics, the genomic inflation factor was calculated by the ratio of the median of the observed test statistics to that of the expected χ2-values. For general statistical analysis, we used R statistical environment version 2.9.0 (especially, the R-package metafor (http://www.wvbauer.com/)). We used Haploview 4.2 software28 to generate Manhattan plot of genome-wide significance (P-values) and to draw linkage disequilibrium map on 2q13, based on HapMap JPT data from the HapMap database (http://hapmap.ncbi.nlm.nih.gov/). We calculated the statistical power of the analysis using Genetic Power Calculator,29 assuming that type I error rate was 5 × 10−7, the prevalence of endometriosis was 10%, genotype relative risk (GRR) was 1.5 or 2.0 and unselected controls (random population samples) were used as controls in the case–control association analysis.

Results

Cohort-wise analysis

A total of 586 female subjects (315 cases and 271 controls) were genotyped with the Affymetrix Mapping 500K arrays, and 976 women (413 cases and 563 controls) were genotyped with the Affymetrix SNP 6. 0 arrays (Table 2). The newly and previously genotyped data were subjected to well established sample QC filters. After sample QC, the remaining case–control data consisted of 290 cases and 262 controls in 500K array cohort, and 406 cases and 563 controls in 6.0 array cohort (Table 2).

We further performed data cleaning for autosomal and X-chromosomal SNPs (500K array, 499 264 SNPs and 6.0 array, 905 013 SNPs) to extract genotype data that passed SNP QC filters. From the pre-cleaned SNPs, we selected 330 389 and 557 299 SNPs through the SNP QC filters in 500K and 6.0 array cohorts, respectively (Table 3).

We tested allelic association between each QC-passed SNP and endometriosis in each array cohort. The genomic inflation factors were 1.022 and 1.048 for 500K array and 6.0 array cohorts, respectively (Table 3), indicating that systematic inflation of genetic association due to population stratification or undetected genotyping error was unlikely in each of the array cohorts.

Meta-analysis of two GWA studies

Among QC-passed SNPs in each array cohort, 282 838 SNPs on autosomes and X-chromosome were overlapped between 500K and 6.0 arrays, and used for a meta-analysis of two GWA studies. When combining the results from the two array cohorts, we assessed between-cohort heterogeneity in genetic effects of the respective SNPs using the Cochran’s Q-test.25 According to the absence or presence of between-cohort heterogeneity in cohort-wise ORs for each SNP, we selected an appropriate model (fixed-effects or random-effects model, respectively) of meta-analysis for calculating a summary OR. Figure 1 shows quantile–quantile (QQ) plot of P-values (−log10 scale) for association with endometriosis from the present genome-wide meta-analysis. The genomic inflation factor of 1.031 was similar to those from the two array cohorts (Table 3).

Figure 1
figure 1

Log quantile–quantile (QQ) plot of P-values from genome-wide meta-analysis of 282 838 SNPs on autosomes and X-chromosome. The genetic inflation factor was 1.031.

From the GWA meta-analysis, P-values across autosomes and X-chromosomes for association with endometriosis are shown in Figure 2. In our meta-analysis, we found that none of the SNPs analyzed surpassed a genome-wide significant threshold of 5 × 10−7.22 This meta-analysis had 80% power to detect a common risk allele (risk allele frequency, 0.1–0.8) with GRR of 2.0 (Supplementary Figure 1), suggesting that common risk variants with large effect sizes (GRRs >2.0) are unlikely to have roles in the development of endometriosis. On the other hand, we found an excess of SNPs with P-values <10−4 (36 SNPs observed vs 28 SNPs expected by chance). Some of these may be demonstrated to be true risk variants, as additional cohorts would be evaluated. For replication study, we show five SNPs with P-values <10−5 for association with endometriosis from the meta-analysis in Table 4. Four of the five SNPs with P-values <10−5 were located in and around IL1A (interleukin 1α) on 2q13, and were in high linkage disequilibrium with each other (Supplementary Figure 2).

Figure 2
figure 2

Summary of results from meta-analysis of two GWA studies by chromosome. Manhattan plot for single-locus association (y-axis, shown as −log10 P-values) of 282 838 SNPs with endometriosis in the meta-analysis by SNP position along chromosome (x-axis).

Table 4 SNPs showing association with endometriosis at P-values <10−5 from meta-analysis of two GWA studies

As additional data, we provide 31 SNPs with P-values between 10−5 and 10−4 from the meta-analysis in Supplementary Table 1. Furthermore, five SNPs showing association with endometriosis at P-values <10−6 only in each cohort are shown in Supplementary Table 2. These lists of SNPs may also aid the search for susceptibility loci associated with endometriosis.

Discussion

Endometriosis is a condition that millions of women suffer worldwide in terms of pelvic pain and infertility, and its etiology remains to be largely uncovered. Despite much evidence showing the involvement of genetic components in the disease risk, both linkage and candidate gene-based case–control studies have not been successful to identify replicable genetic susceptibility to endometriosis thus far. We performed a meta-analysis of GWA studies to identify genetic susceptibility underlying endometriosis in Japanese population using two independent cohorts comprised of 696 cases and 825 controls. Population stratification is the first concern, particularly in the current GWA studies, because the case–control samples were recruited by multiple centers distributed in Honshu and Kyushu islands in Japan, where the population structure is genetically evident.24 After removal of samples and SNPs through the sequential QC processes in each array cohort, we observed genomic inflation factors of 1.022 and 1.048 in 500K array and 6.0 array cohorts, respectively. Thus, the inflation of false-positive rates on genetic association would be within an acceptable level (genomic inflation factor <1.1)24 for a case–control association study, indicating that the current QC processes are successful.

We assessed single-locus association of 282 838 SNPs with endometriosis using a genome-wide meta-analysis of the results from two array cohorts. Despite using QC-passed samples of Japanese ethnicity, there might be between-cohort heterogeneity in genetic effects because of sampling bias in each case–control cohort and/or chip-to-chip difference by using two types of genotyping platforms. To accommodate the possible between-cohort heterogeneity, we combined the association results using a random-effects model (the DerSimonian and Laird method) of meta-analysis, a more conservative approach,25, 27 only when the between-cohort heterogeneity was very significant (P-value of the Cochran's Q-test statistic <0.01); otherwise, we combined the results using a fixed-effects model (the Mantel–Haenszel method) of meta-analysis. This meta-analysis approach caused no marked under- or overdispersion of test statistics for genetic association, because genomic inflation factor of 1.031 in combined data set was almost equivalent to those from the two array cohorts. This suggests that the current meta-analysis could provide a genome-wide summary of genetic association with endometriosis.

In the GWA meta-analysis, there were no association signals reaching the genome-wide significance (P-values <5 × 10−7). This GWA study is sufficiently powered to detect common risk alleles with GRR 2.0 (Supplementary Figure 1) so that there is less likely to be common and large-effect susceptibility loci contributing to the risk of endometriosis across the genome. On the other hand, common variants with modest and/or small effects might confer the disease risk, as an excess of associated SNPs with P-values <10−4 (36 SNPs observed vs 28 SNPs expected by chance) is observed in the GWA meta-analysis. To examine this possibility using GWA approaches, independent sets of larger case–control cohorts will be required.

Recently, Uno et al.30 have reported that rs10965325, which is located in CDKN2BAS on chromosome 9p21, is significantly associated with endometriosis in the Japanese population using a GWA study. In our GWA analysis, SNP rs17761446, which is in perfect linkage disequilibrium with rs10965235 (D′=1, r2=1, HapMap JPT population (Phase III, Rel no. 2)), showed P-value=8.9 × 10−3 with summary OR=1.29 (95% CI, 1.10–1.48), suggesting the existence of an endometriosis susceptibility locus at 9p21.

For replication and GWA analysis with other cohorts, we provide a set of SNPs that are associated with endometriosis in the current meta-analysis at the significant level of P-value <10−4 (Table 4 and Supplementary Table 1). We also give another set of SNPs showing association at P-values <10−6 only in each cohort (Supplementary Table 2), although the SNPs have no significant summary ORs from the meta-analysis owing to large between-cohort heterogeneity. These SNPs might be regarded as candidates for future genetic studies. Among the SNPs listed, four of the top five SNPs with P-values <10−5 (Table 4) are mapped in and around IL1A on 2q13. The IL-1α encoded by IL1A is a member of the interleukin 1 cytokine family. This cytokine binds to the IL-1 receptor 1, acts as an agonist for the receptor and regulates production of other proinflammatory cytokines and chemokines that drive further inflammation.31 There is a general agreement that endometriosis is essentially a pelvic inflammatory process and these cytokines, including IL-1α, could be important for the development and progression of endometriosis. Kondera-Anasz et al.32 have found that IL-1α levels in peritoneal fluid and serum are higher in women with endometriosis than in women without the disease, and therefore this gene might be further regarded as a functional candidate. In any case, we hope that the SNP lists will be useful in an extensive search of susceptibility loci for endometriosis.

In summary, this is the GWA study to endometriosis in Japanese population. Our data may be still preliminary because it is a single meta-analysis GWA study with limited sample size. It will be necessary to conduct more detailed GWA investigations using independent sets of larger endometriosis cohorts in Japan and other countries with higher density platforms for genotyping common and rare alleles. Further genetic studies will expand our findings and provide specialized diagnostic options and approaches for the treatment of endometriosis.