Original Article

Genes and Immunity (2011) 12, 605–614; doi:10.1038/gene.2011.40; published online 30 June 2011

Pathway-based analysis of genetic susceptibility to cervical cancer in situ: HLA-DPB1 affects risk in Swedish women

E L Ivansson1, I Juko-Pecirep1, H A Erlich2 and U B Gyllensten1

  1. 1Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
  2. 2Roche Molecular Systems, Pleasanton, CA, USA

Correspondence: Professor UB Gyllensten, Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Rudbeck Laboratory, Uppsala SE-751 85, Sweden. E-mail: Ulf.gyllensten@igp.uu.se

Received 20 December 2010; Revised 25 March 2011; Accepted 6 May 2011; Published online 30 June 2011.



We have conducted a pathway-based analysis of genome-wide single-nucleotide polymorphism (SNP) data in order to identify genetic susceptibility factors for cervical cancer in situ. Genotypes derived from Affymetrix 500k or 5.0 arrays for 1076 cases and 1426 controls were analyzed for association, and pathways with enriched signals were identified using the SNP ratio test. The most strongly associated KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were Asthma (empirical P=0.03), Folate biosynthesis (empirical P=0.04) and Graft-versus-host disease (empirical P=0.05). Among the 11 top-ranking pathways were 6 related to the immune response with the common denominator being genes in the major histocompatibility complex (MHC) region on chromosome 6. Further investigation of the MHC revealed a clear effect of HLA-DPB1 polymorphism on disease susceptibility. At a functional level, DPB1 alleles associated with risk and protection differ in key amino-acid residues affecting peptide-binding motifs in the extracellular domains. The results illustrate the value of pathway-based analysis to mine genome-wide data, and point to the importance of the MHC region and specifically the HLA-DPB1 locus for susceptibility to cervical cancer.


cervical cancer; CIN III; genetic risk factors; HLA-DPB1; pathway analysis; GWA



Cervical cancer is the second most common cancer among women worldwide with 274000 deaths and 493000 new cases in year 2002,1 and the first cancer recognized by the WHO (World Health Organization) to be 100% attributable to infection with human papillomavirus (HPV).2 Infection by HPV is very common, and most sexually active women have been infected during their lifetime.3 Persistent infection by an oncogenic HPV is a necessary risk factor for developing cervical cancer, but only a minority of the infected women develop precancerous cervical lesions, and even fewer develop invasive cancer. This indicates that other risk factors, such as genetic predisposition, are also important.

Cervical carcinoma displays familial aggregation and a pattern of decreasing familial relative risk correlating with degree of biological relatedness, supporting the idea that genetic factors are a major cause of the familial aggregation.4 The heritability in cervical cancer is of the same magnitude as seen for other common cancers.5 A number of genetic risk factors have been identified, but the effects of these are generally weak. The most prominent among the known risk factors is the HLA class II DRB1-DQB1 haplotype, such as DRB1*1501-DQB1*0602 and DRB1*1301-DQB1*0603 associated with increased and decreased risk, respectively.6, 7, 8, 9, 10, 11, 12 Most HPV infections are transient and cleared by the immune response,13 but persistent infections may also clear and premalignant lesions may regress. This points to the importance of the immune response toward HPV infection in preventing carcinoma development and explains why genetic variations in genes of immunological pathways, in addition to the HLA class II genes,6, 7, 8, 9, 10, 11, 12 are associated with in situ and invasive cervical cancer.9, 14, 15, 16, 17, 18 However, only a handful of the identified genes show a consistent effect in multiple cohorts, and thus additional risk factors for persistent infection and tumor development are likely to exist. Assuming that variation in genes with a specific function is of particular interest, there are several optional study designs. Wang et al.15 studied the effect of 7146 tag single-nucleotide polymorphisms (SNPs) in 305 genes with a presumed function in DNA repair, viral infection and cell entry, on cervical cancer. The success of this approach relies on the criteria used for identification of candidate genes and on the SNP markers tagging functional variants of these genes. A more exploratory approach is to perform a genome-wide association study, which has been successfully used to identify genetic associations across a wide range of traits and disease entities.19 The drawback of the genome-wide association study approach is that unless very large cohorts are used, the power to identify genetic associations is hampered by the need to correct for multiple testing. Even in diseases with a strong genetic component, the P-values for a number of true associations are likely to be below the threshold for genome-wide significance. A third approach, which combines the ability to examine sets of genes with a specific function with a scanning of the genome, is a pathway-based analysis.20 A number of different approaches for pathway-based analyses have been developed,21, 22, 23 with most of them based on calculating a P-value from genome-wide association study data for sets of genes involved in predefined functions, based on one or several annotation tools such as KEGG (Kyoto Encyclopedia of Genes and Genomes)24 or Gene Ontology.25

In the current study, we have explored a pathway-based analysis using genome-wide SNP data for cases diagnosed with at least cervical cancer in situ and controls. This approach may provide insight into the enrichment of association signals in the context of biologically pathways, rather than individual loci.



After quality control, the genome-wide data set used for analysis consisted of 326977 markers genotyped in 1076 cases and 1426 controls. The cases were from families with several affected, and controls were merged from three previous studies. Figure 1 displays the first and second dimension of multidimensional scaling (MDS) analysis of cases and controls, and illustrates that the cohorts are drawn from the same population. Genome-wide association was analyzed by logistic regression with the first two dimensions of multidimensional scaling analysis as covariates. After adjusting for genomic control (genomic inflation factor λ=1.33), some systematic bias remained in the full data set, as illustrated by the quantile-quantile plot shown in Supplementary Figure 1. The substructure in the data causing the systematic bias was due to the inclusion of familial cases as well as the fact that controls were genotyped on two different SNP arrays. Analysis of a data set consisting of only one case per family (n=617) and controls genotyped on the same array as the cases (n=512) eliminated the bias (genomic inflation factor λ=1.00) but reduced the sample substantially, such that no associations with genome-wide significance were detected (Supplementary Figure 2).

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

MDS plot of cases and controls included in the statistical analysis of the study. The graph plots the first and second dimensions of MDS analysis for each individual and illustrates the lack of population structure in the sample.

Full figure and legend (136K)

In order to explore the genome-wide data set in a functional context we performed pathway analysis. Permutation-based analysis using 216 KEGG pathways and the SNP ratio test resulted in three pathways with empirical P-values less than or equal to0.05 (unadjusted for multiple comparisons). Table 1 summarizes the top-ranking pathways (empirical P<0.1), and details concerning associated SNPs and genes are provided in Supplementary Table 1. The KEGG pathway with the strongest association was Asthma (empirical P=0.033), where the signal was derived from four SNPs in HLA-DPB1, two SNPs in HLA-DQA1 and one SNP in each of HLA-DOB, HLA-DQA2 and HLA-DRA. The second most significant pathway was Folate biosynthesis (empirical P=0.040) with five associated SNPs in ALPL and one SNP in QDPR. The third pathway was that of Graft-versus-host disease (empirical P=0.050) represented by four SNPs in HLA-DPB1, two SNPs in HLA-DQA1 and one SNP in FAS, HLA-A, HLA-DOB, HLA-DQA2, HLA-DRA and HLA-G. Of the 11 top-ranking pathways shown in Table 1, 6 are related to the immune system: Asthma, Graft-versus-host disease, Antigen processing and presentation, Allograft rejection, Staphylococcus aureus infection and Intestinal immune network for IgA production. The immune-related pathways all include HLA genes along with a number of other genes located within and outside the major histocompatibility complex (MHC) locus on chromosome 6 and are not mutually independent. The overlapping association signals between pathways consisted mainly of associations within the MHC.

Allele-based association analysis of the 728 SNPs on the array that were located in the MHC, adjusting for genomic control, yielded 126 SNPs with PGC<0.05 and 15 SNPs with PGC<1E–03 (Table 2 and Figure 2a). The strongest association signal was derived from the HLA class II region, with 12 SNPs showing PGC<0.001 (Figure 2b). Most of the associated SNPs were located downstream of HLA class II DPB1. The lack of signal for the DQB1 and DRB1 genes, well known to affect cervical cancer risk, is likely to reflect the lack of SNPs in these regions on the SNP arrays. In the HLA class I region, rs16899646, located 15kb upstream of HLA class I histocompatibility antigen protein P5 and between MICA and MICB, was associated with P=0.0005. In the class III region, rs522162, located in the untranslated 3′ region, and rs550513, located in an intron of the RD RNA binding protein, were both associated with P=0.0002. This locus is just downstream of Complement factor B preproprotein.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The −log10 P-values from allele-based association analysis, corrected for genomic control, of SNPs genotyped across the MHC region. Each typed SNP is depicted by a bar and the taller the bar the more significant the association. (a) Association signals in the MHC region and (b) around HLA class II loci.

Full figure and legend (154K)

The effect of the strongly associated SNPs was confirmed in the stringent data set, restricted to one case per family (n=617) and controls genotyped on the same array as the cases (n=512) (Supplementary Table 2). All SNPs in the HLA class II region remained significantly associated in this subset. The point estimates were similar and the 95% confidence intervals were somewhat wider as a result of reduced sample size. Restricting the analysis to all cases versus only female controls (n=250) did not affect the results; the effects were still similar and statistically significant (Supplementary Table 2).

Because of the strong signal from parts of the MHC region that have previously not been firmly associated with cervical cancer, we investigated this further. Most of the significant associations were found for SNPs located 3′ of the DPB1 gene. Haplotype analysis using 11 strongly associated SNPs (rs9277542, rs3128963, rs3128965, rs3128966, rs3117229, rs2179920, rs3128921, rs3128923, rs3117230, rs3128930 and rs872956) in a 29kb region resulted in five distinct haplotypes (Table 3a). The most common haplotype A was seen in 67% of cases and 74% of controls and associated with decreased risk (permuted P<1E–4). Haplotype B had a frequency of 19% in cases and 15% in controls and was associated with increased risk (permuted P<1E–4). The frequency of haplotypes C–E did not differ between cases and controls. Haplotype analysis in the restricted data set limited to one case per family and controls genotyped on the same array resulted in the same haplotype frequencies and an association with decreased risk for haplotype A (permuted P=0.0008). The frequencies of haplotype B in the restricted data set were also similar to the full data set, but the difference was not statistically significant, probably because of the smaller number of observations.

To address whether the association was affected by linkage disequilibrium (LD) with DRB1/DQB1, the frequencies of the associated DPB1 haplotypes were assessed in cases carrying DQB1 risk alleles or alleles associated with protection (Table 3b). The frequency of the protective DPB1 haplotype A was similar between cases carrying only DQB1 risk alleles and cases with only DQB1 protective alleles. The protective effect of the DPB1 haplotype was also visible in the group with neither risk nor protective DQB1 alleles, but could not be seen in the group with both risk and protective DQB1 alleles. A similar pattern was observed for the DPB1 haplotype B associated with increased risk. The observation that the DPB1 effects are seen in cases carrying DQB1 risk alleles as well as cases carrying DQB1 alleles that decrease risk argues that the DPB1 locus provides an independent effect.

In order to tie the association of haplotypes based on SNPs in the vicinity of DPB1 to the coding variation between DPB1 alleles, we performed high-resolution typing of DPB1 in cases by reverse line blot genotyping. DPB1 alleles are defined by typing of polymorphic positions in exon 2 and the DPB1 alleles were converted into exon 2 SNP data (Supplementary Table 3). The genotype data of the 11 strongly associated SNPs in the DPB1 region were merged with the exon 2 genotype data and imported into Haploview.47 Seven extended haplotypes across the entire DPB1 region were detected (Table 4), consistent with strong LD between SNPs in exon 2 of DPB1 and SNPs 3′ of the gene (Supplementary Figure 3). Linking the DPB1 SNP haplotypes with the DPB1 alleles, based on exon 2 polymorphisms, shows that the protective haplotype A occurs with the most common DPB1 allele *0401, as well as with *0201 and *0402. The risk haplotype B occurs with *0301 and *0601. Haplotype 3 is associated with *0101 and haplotype 4 with *0501 (Table 4).

We next compared the DPB1 exon 2 allele frequencies in cases with data on allele frequencies in Swedish controls available at www.allelefrequencies.net. For DPB1*0201 and *0402, which were linked to the protective haplotype A, there was a tendency toward a lower allele frequency in cases compared with controls, but there was no apparent difference for the common allele *0401, or for the *0301 and *0601, which were linked to the risk haplotype (Supplementary Table 4).

In order to further understand the molecular nature of the association, we investigated the amino-acid changes in exon 2 alleles linked to the associated haplotypes. The *0201, *0401 and *0402 alleles linked to the protective haplotype A all shared the amino-acid motif Leu-Phe-Gln-Gly at codons 8–11, whereas the *0301 and *0601 allele, linked to the risk haplotype 2, instead had Val-Tyr-Gln-Leu at these positions (Figure 3). Additional differences at the amino-acid level could also be seen at codon 57 where all alleles linked to the risk haplotype had Asp instead of Glu and at codon 65 where all alleles linked to the risk haplotype had Leu instead of Ile. Finally, there was a difference in the motif at the end of exon 2, residues 84–87, where the alleles linked to risk had Asp-Glu-Ala-Val and the alleles linked to the protective haplotype had Gly-Gly-Pro-Met. This comparison of the pattern of polymorphism in exon 2 and the SNPs and haplotypes in the DPB1 region associated with cervical cancer in situ indicates that the observed association may be because of a small number of amino acids encoding critical peptide-binding elements in the extracellular domain of the DPB1 molecule.

Figure 3.
Figure 3 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Dissecting the amino-acid motifs in exon 2 for DPB1 alleles linked with risk haplotypes (*0301, *0601) or protective haplotypes (DPB1*0201, *0401, *0402).

Full figure and legend (113K)



We have performed the first pathway analysis in cervical cancer in situ using genome-wide genotype data. This type of analysis considers the joint effect of many contributing loci in a pathway and may therefore improve the statistical power to detect genetic effects in complex diseases, where most single variants have a limited effect on the genetic risk.20 In addition, pathway analyses based on genome-wide data provide insight into the molecular networks that are involved in the susceptibility and can be used to identify candidate loci for genetic and functional follow-up studies. The current study examined 216 KEGG pathways using the SNP ratio test, and the results of the pathway analysis were not corrected for multiple comparisons. The issue of multiple testing is complicated by the fact that the same SNPs and genes occur in several different pathways; the pathways are not independent. Keeping this in mind, we propose that the results of the pathway should be considered as suggestive evidence generating ideas for further follow-up and evaluation.

The top-ranked pathway was Asthma, with the main signal of association coming from the HLA class II region. Asthma and allergy have previously been inversely correlated with the risk of cervical cancer.17, 26 The next strongest pathway was that of Folate biosynthesis, where the associations detected were mainly related to the tissue nonspecific alkaline phosphatase (ALPL) gene. Mutations in this enzyme cause hypophospatasia; a metabolic bone disease characterized by defects in skeletal mineralization.27, 28 Variants in ALPL have been associated with vitamin B6 serum levels; the enzyme mediates clearance of vitamin B6 with differing efficiency between variants.29 The third strongest pathway was Graft-versus-host disease, again reflecting HLA class II genes and also two SNPs in the class I region and a SNP in the FAS receptor. This was followed by three pathways sharing the same empirical P-value: Antigen processing and presentation, Allograft rejection and Glycosylphospatidylinositol (GPI)-anchor biosynthesis. Of the 11 top-ranked pathways, 6 were related to the immune response and this illustrates the significance of genes related to the immune response in general and, in particular, the presentation of antigen to natural killer cells and T cells in susceptibility to persistent HPV infection and cervical cancer. We also note that among the top-ranked pathways were the Gap Junction and Tight Junction pathways. This points to genes involved in regulating the adhesion and diffusion barriers of epithelial cells, which is of particular interest as the detailed mechanism of HPV entry into the cell is still unclear.

The association of variants in the MHC with immune-related diseases is well known and several genes within the MHC have previously been associated with cervical cancer. Our group has reported linkage and association with the strongly linked HLA class II loci DRB1/DQB1,7, 30, 31 and it has also been proposed that DPB1 might add an independent effect.30 The current study did not have the power to detect association with DQB1 and DRB1 because of lack of SNPs in these genes but instead clearly indicated the contribution of DPB1. Genotype data from the DPB1 region showed strong associations at both single SNP and haplotype level. The most common haplotype seemed to reduce risk whereas the second most common haplotype increased risk. Further analysis linking the SNP haplotypes in the DPB1 region to the classical exon 2 coding DPB1 alleles revealed that the protective haplotype was associated with DPB1*0201, *0401 and *0402 and the risk haplotype was associated with DPB1*0301 and *0601. A limitation of the study is that DPB1 exon 2 typing data were available in cases only; there were not sufficient data on DPB1 alleles in controls to perform an association analysis of the actual DPB1 alleles.

An interesting question is whether the association of DPB1 is independent of the established associations of DRB1/DQB1 or reflects LD with alleles in the DRB1/DQB1 region. DRB1 and DQB1 are in strong LD in the Swedish population as well as in all populations tested. The genome-wide SNP data set did not provide sufficient coverage of these genes to study the well-known associations with DRB1/DQB1, and there were no data available on the DRB1/DQB1 genotypes for the controls. In order to investigate the independence of the DPB1 association, cases were stratified based on carrier status of DQB1 risk and protective alleles, and the frequency of DPB1 haplotypes A and B were estimated within the strata. The frequencies of the DPB1 haplotypes were similar in cases carrying DQB1 risk alleles and cases carrying DQB1 protective alleles, supporting the notion that the observed DPB1 association is a separate effect. This inference is strengthened by the observation that, when the database allelefrequencies.net was searched for DRB1/DQB1/DPB1 haplotypes in Caucasians, both the risk haplotype DRB1*1501-DQB1*0602 and the protective haplotype DRB1*1301-DQB1*0603 occurred with DPB1*0201, *0301 and *0401. Thus, there is no evidence supporting a strong LD between the disease-associated DRB1/DQB1 haplotypes and the disease-associated DPB1 alleles. However, the lack of HLA typing in controls makes it difficult to completely rule out a LD effect.

The difficulty to disentangle associated loci in the MHC due to LD is a well-known issue in all diseases with an immunological component. Apart from the statistical challenges involved, most data sets do not have high-resolution typing of all HLA loci along with sufficient coverage of SNPs. Recently, a method that imputes classical HLA alleles based on MHC SNP data32 has been used in an effort to identify the primary association signals in the MHC for seven immune-mediated diseases.33 Unfortunately, this method has not been extended to the HLA-DPB1 locus.

DPB1 encodes the β-chain of the HLA-DP α-β heterodimer, which is expressed on the cell surface of antigen-presenting cells. Exon 2 encodes the peptide-binding groove of the extracellular domain, and genetic variation altering the amino-acid sequence will influence the ability to bind and present different peptide antigens. It is known for other HLA class 2 β-chain genes, such as DQB1(ref. 34) and DRB1(ref. 35), that there is strong LD between the region of exon 2 encoding part of the β-pleated sheet (codons 8–60) and downstream intron sequences, whereas the region of exon 2 encoding the first part of the α-helix (codons 61–78) shows less LD to other parts of the second exon or intron sequences. This pattern has been proposed to reflect extensive gene conversion involving the region of exon 2 encoding the part of the α-helix.35 Consistent with this explanation, we detected strong LD between SNP haplotypes in the DPB1 region and motifs in the second exon encoding part of the β-pleated sheet of the β-chain.

In HLA class II molecules, five main pockets (pockets 1, 4, 6, 7 and 9) accommodate peptide anchoring residues.36 These pockets are lined by polymorphic residues and the variation affects recognition by T cells as well as peptide specificity. Analysis of the binding repertoire of the five HLA-DPB1 alleles with the highest population frequency worldwide revealed that DPB1*0201, *0401 and *0402, which were found to be protective in the current study, together with *0101 and *0501, share largely overlapping peptide-binding motifs. DPB1*0301 and *0601, linked with the risk haplotype in the current study, were not included in the analysis.37 The amino-acid motifs of alleles linked with the protective and risk haplotypes revealed clearcut differences for a small number of residues with locations in the class II molecule structure that are known to affect the repertoire of peptides bound. Diaz et al.38 investigated the functional effects of HLA-DPB1 variants and reported that amino-acid changes at residues 8, 9 and 11 resulted in altered peptide-binding ability and residues 9 and 11 also influenced T-cell allorecognition. Molecular modeling indicated that variation at these sites affects pockets 6 and 9. DPB1 residues 84–87 are located near pocket 1. The substitution of these residues, corresponding to the difference between risk and protective alleles in our study, had a strong effect on both peptide binding and T-cell allorecognition. Molecular modeling suggested that although only residue 84 is involved in the formation of pocket 1, all four residues influence the interaction between the DPB1 α- and β-chains and affect the part of the binding groove that is in contact with peptide residues P-1 and P-2.38 Taken together, these observations support the hypothesis that the differences in risk can be attributed to the amino-acid motifs in the extracellular peptide-binding domain of DPB1.

In conclusion, our study indicated that pathways related to the immune response are of importance for the development of cervical cancer in situ, and underscored the paramount importance of genes involved in antigen presentation for cervical cancer susceptibility. The pathway-based analysis pointed toward the MHC region, and the strongest effect was found to be due to HLA-DPB1. The ability to compare the association of polymorphic residues in the second exon with regional haplotypes allowed us to explain this association in a functional context and identify the set of amino-acid residues that are likely to affect the peptide spectrum presented by the HLA molecule.


Materials and methods

Genome-wide genotype data

Table 5 summarizes the genome-wide data sets used in the present study. The case data set comprised 1140 cases selected from Swedish families with several affected. This cohort has been previously described.30 All patients in this data set have at least one first-degree family member with the same diagnosis. The current study includes 942 individuals with one first-degree relative (mother, sister or daughter) and 39 individuals with two first-degree relatives also participating in the study as well as 159 individuals with no participating family member. The cases were from all over Sweden and diagnosed in 1964–1993 with cervical cancer in situ (cervical intraepithelial neoplasia stage III; n=1104) or invasive carcinoma (n=36). DNA from the patients was extracted from whole blood using standard methods. The samples were genotyped using the Affymetrix Genome-Wide Human SNP Array 5.0 using 500ng genomic DNA (Affymetrix, Santa Clara, CA, USA). Genotypes were called using the Affymetrix Genotyping Console and the BRLMM-P algorithm,39 which analyzes 440794 SNPs. First-pass quality control of genotyping was performed using the Dynamic Model algorithm40 that generates a quality control call rate by evaluating a set of 3022 selected SNPs.

The control group consisted of 1450 control individuals from the Swedish population (220 females and 1230 males). The controls were recruited as part of Swedish studies on type II diabetes and prostate cancer (selected from the CAPS cohort41). Three sets of control genotypes were downloaded from the Nordic Control Allele Frequency and Genotype Database created by the Nordic Center of Excellence in Disease Genetics program. Control data sets 1(ref. 42) and 2(ref. 43) were originally obtained using the Affymetrix GeneChip Human Mapping 500k Array set and genotypes were called with the standard algorithm BRLMM. Control data set 3(ref. 44) was typed using the Affymetrix Genome-Wide Human SNP Array 5.0 and genotypes were called by the corresponding standard BRLMM-P algorithm.

For the present investigation the statistical analysis was based on a merged data set containing cases and the three sets of controls. Quality control and harmonization of the data sets was performed using PLINK.45 For each data set, all individuals with <95% genotypes and individuals with heterozygous haploid genotypes were excluded. Markers displaying Hardy–Weinberg disequilibrium (P<1E–6) in any of the control sets were excluded from the full data set. Genotype data from the four data sets was merged and further quality control was applied to the full data set. Criteria were imposed in order to exclude markers with: (1) low genotype call rate (<90% overall), (2) heterozygous haploid genotypes, (3) multimapping sites, (4) differing allele frequency (P<1E–3) between the different sets of controls and (5) a minor allele frequency of <5%. In addition, SNPs displaying genome-wide significant differences between cases and controls were retyped using a different method in 36 case individuals to confirm the genotyping. SNPs displaying at least one inconsistent genotype between methods were removed.

In order to investigate population structure in the full data set, MDS analysis was performed. The MDS analysis was based on a matrix of identity-by-state pairwise distances; calculated in PLINK using a genome-wide subset of 88070 markers in approximate LD. LD pruning was carried out with R2=0.5 as threshold. Plotting the first and second dimension from the MDS analysis provides an image of how similar the individuals in the data set are genetically (Figure 1). Individuals that appeared to be outliers by visual inspection were removed from the data set (Supplementary Figure 4).

The final pruned data set used for association analysis consisted of 326977 markers genotyped in 1076 cases and 1426 controls.

Genome-wide association analysis

The genome-wide association analysis was performed using logistic regression with the first two components of the MDS analysis as covariates.46 The case sample includes cases from affected families; this should increase the statistical power to detect associations, but the interrelatedness of the sample might also result in inflated statistics. Another source of potential stratification is that the controls were from three separate data sets. Control data sets 1 and 2 were typed on the Affymetrix 500k arrays and control set 3 was genotyped on the same array as the cases, Affymetrix 5.0. In addition to the full data set, analysis was performed on a stringently selected subset of the data restricted to only one case from each family (n=617) and control set 3 (n=512) with genotypes from the Affymetrix 5.0 array. Quantile-quantile plots were used to check for stratification or systematic bias in the data set and Manhattan plots of −log10 P-values were generated to illustrate the results across the genome. Adjustment for genomic control (GC) was calculated. The Bonferroni threshold for genome-wide significance (P<0.05 corrected for 326977 tests) was estimated to 1.5E–07.

Pathway analysis

In order to explore the genome-wide SNP data set at another level, pathway analysis using the SNP ratio test was conducted.23 The SNP ratio test identifies enrichment of association signals from genome-wide SNP data in a pathway context. For each pathway an empirical P-value is calculated that indicates how the ratio of associated to nonassociated SNPs in the pathway differs compared with ratios generated by simulated data sets. KEGG pathways24 are connected to SNP identifiers using dbSNP gene annotation, which includes SNPs in upstream (5kb) and downstream (2kb) regions.23 For the current study, 216 pathways (KEGG Feb 2011) were linked to dbSNPb129 using the Human Genome Organisation gene nomenclature. The SNP ratio test was carried out using 1000 alternative phenotype simulations generated in PLINK and P<0.05 as threshold for significance. The empirical P-values for each pathway were not adjusted for multiple comparisons. Analysis was based on results from allele-based tests of association (χ2 test, 1 d.f.) corrected for genomic control, in the full data set (1076 cases and 1426 controls). For the top-ranked pathways, details regarding the genes and SNPs displaying associations were extracted.

Analysis of association within the MHC

To further investigate the association signals coming from the MHC region on chromosome 6p22, we examined the classical MHC region, defined as 20kb upstream and downstream of the region between MOG and COL11A2. Gene coordinates were based on information from the University of California Santa Cruz (UCSC) genome browser on the human Mar. 2006 (NCBI36/hg18) assembly. Genotype data were extracted from the genome-wide data set and reanalyzed by allele-based tests of association (χ2 test, 1 d.f.), corrected for genomic control in the full data set (1076 cases and 1426 controls). Odds ratios with 95% confidence intervals were calculated with the minor allele as reference. The results were visualized by plotting −log10 P-values along the chromosome using the UCSC Genome Browser.

For MHC SNPs with PGC<1E–3 association analysis was repeated in the stringently restricted data set comprising one case per family (n=617) versus controls genotyped on the Affymetrix 5.0 array (n=512), as well as in all cases (n=1076) versus female controls (n=220).

HLA-DPB1 analysis

Haploview47 was used to construct haplotypes for strongly associated SNPs in the DPB1 region. Haplotypes with frequency <1% in cases or controls were excluded. Association was analyzed by comparing the frequency of haplotypes in cases and controls. In order to correct for multiple testing, 10000 permutations were used. Association of haplotypes was evaluated in the full data set (1076 cases and 1426 controls) and in the stringently restricted data set (617 cases and 512 controls).

To evaluate independence of the effect of HLA-DPB1 from previously reported association with HLA-DQB1, the cases were stratified based on carrier status for DQB1*0301, *0402, *0602, previously shown to increase risk in this population, and DQB1*0501 and *0603, which decrease risk.31 All case individuals were assigned to one of the following mutually exclusive groups: ‘risk’, ‘protective’, ‘neither’ or ‘both’ based on their DQB1 alleles. Individuals carrying at least one risk allele but no protective allele were assigned to the ‘risk’ group and individuals with at least one protective allele but no risk allele were assigned to the ‘protective’ group. The third group consisted of cases with neither risk nor protective alleles and the final group consisted of cases with alleles of opposing effect.

HLA-DPB1 typing was performed on the 1076 cases using a linear array of immobilized sequence-specific oligonucleotide probes developed by Roche Molecular Systems (Pleasanton, CA, USA) according to previously described methods31, 48 DPB1 alleles are defined by polymorphism in exon 2 and the hybridization of specific sets of probes reflects a certain sequence for each allele. Alignments of DPB1 alleles were produced using the sequence alignment viewer in dbMHC and the sequence of each HLA-DPB1 allele was converted to the corresponding DPB1 exon 2 SNP haplotype. Extended haplotypes linking the genotyped SNPs in the DPB1 region to the genotypes of exon 2 DPB1 alleles were constructed in Haploview. The contribution of different exon 2 amino-acid motifs was investigated.

For comparison, HLA-DPB1 allele frequencies for Swedish controls derived from three separate studies,49, 50, 51 including one from our own group, were downloaded from www.allelefrequencies.net.52


Conflict of interest

The authors declare no conflict of interest.



  1. Parkin DM, Bray F, Ferlay J, Pisani P. Global cancer statistics, 2002. CA Cancer J Clin 2005; 55: 74–108. | Article | PubMed | ISI |
  2. Walboomers JM, Jacobs MV, Manos MM, Bosch FX, Kummer JA, Shah KV et al. Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. J Pathol 1999; 189: 12–19. | Article | PubMed | ISI | ChemPort |
  3. Baseman JG, Koutsky LA. The epidemiology of human papillomavirus infections. J Clin Virol 2005; 32 (Suppl 1): S16–S24. | Article | PubMed | ISI |
  4. Magnusson PK, Sparen P, Gyllensten UB. Genetic link to cervical tumours. Nature 1999; 400: 29–30. | Article | PubMed | ISI | ChemPort |
  5. Magnusson PK, Lichtenstein P, Gyllensten UB. Heritability of cervical tumours. Int J Cancer 2000; 88: 698–701. | Article | PubMed | ISI | ChemPort |
  6. Apple RJ, Erlich HA, Klitz W, Manos MM, Becker TM, Wheeler CM. HLA DR-DQ associations with cervical carcinoma show papillomavirus-type specificity. Nat Genet 1994; 6: 157–162. | Article | PubMed | ISI | ChemPort |
  7. Beskow AH, Josefsson AM, Gyllensten UB. HLA class II alleles associated with infection by HPV16 in cervical cancer in situ. Int J Cancer 2001; 93: 817–822. | Article | PubMed | ISI | ChemPort |
  8. Wang SS, Hildesheim A. Chapter 5: viral and host factors in human papillomavirus persistence and progression. J Natl Cancer Inst Monogr 2003; 31: 35–40. | PubMed |
  9. Zoodsma M, Nolte IM, Te Meerman GJ, De Vries EG, Van der Zee AG. HLA genes and other candidate genes involved in susceptibility for (pre)neoplastic cervical disease. Int J Oncol 2005; 26: 769–784. | PubMed | ISI | ChemPort |
  10. Sanjeevi CB, Hjelmstrom P, Hallmans G, Wiklund F, Lenner P, Angstrom T et al. Different HLA-DR-DQ haplotypes are associated with cervical intraepithelial neoplasia among human papillomavirus type-16 seropositive and seronegative Swedish women. Int J Cancer 1996; 68: 409–414. | Article | PubMed | ISI | ChemPort |
  11. Ghaderi M, Wallin KL, Wiklund F, Zake LN, Hallmans G, Lenner P et al. Risk of invasive cervical cancer associated with polymorphic HLA DR/DQ haplotypes. Int J Cancer 2002; 100: 698–701. | Article | PubMed | ISI | ChemPort |
  12. Ivansson EL, Magnusson JJ, Magnusson PK, Erlich HA, Gyllensten UB. MHC loci affecting cervical cancer risk: distinguishing the effects of HLA-DQB1 and non-HLA genes TNF, LTA, TAP1 and TAP2. Genes Immun 2008; 9: 613–623. | Article | PubMed | ISI |
  13. Ho GY, Bierman R, Beardsley L, Chang CJ, Burk RD. Natural history of cervicovaginal papillomavirus infection in young women. N Engl J Med 1998; 338: 423–428. | Article | PubMed | ISI | ChemPort |
  14. Ivansson EL, Gustavsson IM, Magnusson JJ, Steiner LL, Magnusson PK, Erlich HA et al. Variants of chemokine receptor 2 and interleukin 4 receptor, but not interleukin 10 or Fas ligand, increase risk of cervical cancer. Int J Cancer 2007; 121: 2451–2457. | Article | PubMed | ISI | ChemPort |
  15. Wang SS, Gonzalez P, Yu K, Porras C, Li Q, Safaeian M et al. Common genetic variants and risk for HPV persistence and progression to cervical cancer. PLoS One 2010; 5: e8667. | Article | PubMed |
  16. Castro FA, Haimila K, Sareneva I, Schmitt M, Lorenzo J, Kunkel N et al. Association of HLA-DRB1, interleukin-6 and cyclin D1 polymorphisms with cervical cancer in the Swedish population--a candidate gene approach. Int J Cancer 2009; 125: 1851–1858. | Article | PubMed | ISI |
  17. Johnson LG, Schwartz SM, Malkki M, Du Q, Petersdorf EW, Galloway DA et al. Risk of cervical cancer associated with allergies and polymorphisms in genes in the chromosome 5 cytokine cluster. Cancer Epidemiol Biomarkers Prev 2011; 20: 199–207. | Article | PubMed | ISI |
  18. Ivansson EL, Juko-Pecirep I, Gyllensten UB. Interaction of immunological genes on chromosome 2q33 and IFNG in susceptibility to cervical cancer. Gynecol Oncol 2009; 116: 544–548. | Article | PubMed | ISI |
  19. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest 2008; 118: 1590–1605. | Article | PubMed | ISI | ChemPort |
  20. Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81: 1278–1283. | Article | PubMed | ISI | ChemPort |
  21. Holmans P, Green EK, Pahwa JS, Ferreira MA, Purcell SM, Sklar P et al. Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet 2009; 85: 13–24. | Article | PubMed | ISI | ChemPort |
  22. Eleftherohorinou H, Wright V, Hoggart C, Hartikainen AL, Jarvelin MR, Balding D et al. Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inflammatory diseases. PLoS One 2009; 4: e8068. | Article | PubMed |
  23. O’Dushlaine C, Kenny E, Heron EA, Segurado R, Gill M, Morris DW et al. The SNP ratio test: pathway analysis of genome-wide association datasets. Bioinformatics 2009; 25: 2762–2763. | Article | PubMed | ChemPort |
  24. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 2010; 38 (Database issue): D355–D360. | Article | PubMed | ISI | ChemPort |
  25. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25: 25–29. | Article | PubMed | ISI | ChemPort |
  26. Ivansson EL, Rasmussen F, Gyllensten UB, Magnusson PK. Reduced incidence of cervical cancer in mothers of sons with allergic rhinoconjunctivitis, asthma or eczema. Int J Cancer 2006; 119: 1994–1998. | Article | PubMed | ISI |
  27. Weiss MJ, Cole DE, Ray K, Whyte MP, Lafferty MA, Mulivor RA et al. A missense mutation in the human liver/bone/kidney alkaline phosphatase gene causing a lethal form of hypophosphatasia. Proc Natl Acad Sci USA 1988; 85: 7666–7669. | Article | PubMed | ChemPort |
  28. Mornet E. Hypophosphatasia: the mutations in the tissue-nonspecific alkaline phosphatase gene. Hum Mutat 2000; 15: 309–315. | Article | PubMed | ISI |
  29. Tanaka T, Scheet P, Giusti B, Bandinelli S, Piras MG, Usala G et al. Genome-wide association study of vitamin B6, vitamin B12, folate, and homocysteine blood concentrations. Am J Hum Genet 2009; 84: 477–482. | Article | PubMed | ISI | ChemPort |
  30. Engelmark M, Beskow A, Magnusson J, Erlich H, Gyllensten U. Affected sib-pair analysis of the contribution of HLA class I and class II loci to development of cervical cancer. Hum Mol Genet 2004; 13: 1951–1958. | Article | PubMed | ISI | ChemPort |
  31. Ivansson EL, Magnusson JJ, Magnusson PK, Erlich HA, Gyllensten UB. MHC loci affecting cervical cancer risk: distinguishing the effects of HLA-DQB1 and non-HLA genes TNF, LTA, TAP1 and TAP2. Genes Immun 2008; 9: 613–623. | Article | PubMed | ISI |
  32. Leslie S, Donnelly P, McVean G. A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet 2008; 82: 48–56. | Article | PubMed | ISI | ChemPort |
  33. Rioux JD, Goyette P, Vyse TJ, Hammarstrom L, Fernando MM, Green T et al. Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases. Proc Natl Acad Sci USA 2009; 106: 18680–18685. | Article | PubMed |
  34. Gyllensten UB, Lashkari D, Erlich HA. Allelic diversification at the class II DQB locus of the mammalian major histocompatibility complex. Proc Natl Acad Sci USA 1990; 87: 1835–1839. | Article | PubMed |
  35. Gyllensten UB, Sundvall M, Erlich HA. Allelic diversity is generated by intraexon sequence exchange at the DRB1 locus of primates. Proc Natl Acad Sci USA 1991; 88: 3686–3690. | Article | PubMed | ChemPort |
  36. Stern LJ, Brown JH, Jardetzky TS, Gorga JC, Urban RG, Strominger JL et al. Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide. Nature 1994; 368: 215–221. | Article | PubMed | ISI | ChemPort |
  37. Sidney J, Steen A, Moore C, Ngo S, Chung J, Peters B et al. Five HLA-DP molecules frequently expressed in the worldwide human population share a common HLA supertypic binding specificity. J Immunol 2010; 184: 2492–2503. | Article | PubMed | ISI |
  38. Diaz G, Amicosante M, Jaraquemada D, Butler RH, Guillen MV, Sanchez M et al. Functional analysis of HLA-DP polymorphism: a crucial role for DPbeta residues 9, 11, 35, 55, 56, 69 and 84-87 in T cell allorecognition and peptide binding. Int Immunol 2003; 15: 565–576. | Article | PubMed | ISI | ChemPort |
  39. Rabbee N, Speed TP. A genotype calling algorithm for Affymetrix SNP arrays. Bioinformatics 2006; 22: 7–12. | Article | PubMed | ISI | ChemPort |
  40. Di X, Matsuzaki H, Webster TA, Hubbell E, Liu G, Dong S et al. Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays. Bioinformatics 2005; 21: 1958–1963. | Article | PubMed | ISI | ChemPort |
  41. Zheng SL, Sun J, Wiklund F, Smith S, Stattin P, Li G et al. Cumulative association of five genetic variants with prostate cancer. N Engl J Med 2008; 358: 910–919. | Article | PubMed | ISI | ChemPort |
  42. Saxena R, Voight BF, Lyssenko V, Burtt NP, de Bakker PI, Chen H et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 2007; 316: 1331–1336. | Article | PubMed | ISI | ChemPort |
  43. Duggan D, Zheng SL, Knowlton M, Benitez D, Dimitrov L, Wiklund F et al. Two genome-wide association studies of aggressive prostate cancer implicate putative prostate tumor suppressor gene DAB2IP. J Natl Cancer Inst 2007; 99: 1836–1844. | Article | PubMed | ISI | ChemPort |
  44. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 2010; 42: 937–948. | Article | PubMed | ISI | ChemPort |
  45. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575. | Article | PubMed | ISI | ChemPort |
  46. Tian C, Gregersen PK, Seldin MF. Accounting for ancestry: population substructure and genome-wide association studies. Hum Mol Genet 2008; 17 (R2): R143–R150. | Article | PubMed | ISI | ChemPort |
  47. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005; 21: 263–265. | Article | PubMed | ISI | ChemPort |
  48. Erlich H, Bugawan T, Begovich AB, Scharf S, Griffith R, Saiki R et al. HLA-DR, DQ and DP typing using PCR amplification and immobilized probes. Eur J Immunogenet 1991; 18: 33–55. | Article | PubMed | ISI | ChemPort |
  49. Lindblom B. Proceedings of the 11th International Histocompatibility Workshop 1995.
  50. Aldener-Cannava A, Olerup O. HLA-DPB1 typing by polymerase chain reaction amplification with sequence-specific primers. Tissue Antigens 2001; 57: 287–299. | Article | PubMed | ISI |
  51. Allen M, Sandberg-Wollheim M, Sjogren K, Erlich HA, Petterson U, Gyllensten U. Association of susceptibility to multiple sclerosis in Sweden with HLA class II DRB1 and DQB1 alleles. Hum Immunol 1994; 39: 41–48. | Article | PubMed | ISI | ChemPort |
  52. Gonzalez-Galarza FF, Christmas S, Middleton D, Jones AR. Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic Acids Res 2011; 39 (Database issue): D913–D919. | Article | PubMed | ISI |


This study was supported by grants from the Swedish Cancer Foundation and the Knut and Alice Wallenberg Foundation to Ulf Gyllensten. Henry Erlich is employed by Roche Molecular Systems who kindly provided reagents and protocols for HLA-DPB1 typing. The population allele and genotype frequencies were based on samples regionally selected from Sweden obtained from the data source funded by the Nordic Center of Excellence in Disease Genetics. The study from which control set 1 originated was supported by Novartis Pharmaceuticals, Sigrid Juselius Foundation, Folkhälsan Research Foundation and the Swedish Research Council Linné grant. The study from which control sets 2 and 3 originated was supported by grants from the National Cancer Institute (CA106523, CA105055, CA95052, CA112517, CA58236, CA86323); Department of Defense (PC051264); Swedish Cancer Society; and Swedish Research Council.

Supplementary Information accompanies the paper on Genes and Immunity website