HLA-associated susceptibility to childhood B-cell precursor ALL: definition and role of HLA-DPB1 supertypes

Childhood B-cell precursor (BCP) ALL is thought to be caused by a delayed immune response to an unidentified postnatal infection. An association between BCP ALL and HLA class II (DR, DQ, DP) alleles could provide further clues to the identity of the infection, since HLA molecules exhibit allotype-restricted binding of infection-derived antigenic peptides. We clustered >30 HLA-DPB1 alleles into six predicted peptide-binding supertypes (DP1, 2, 3, 4, 6, and 8), based on amino acid di-morphisms at positions 11 (G/L), 69 (E/K), and 84 (G/D) of the DPβ1 domain. We found that the DPβ11-69-84 supertype GEG (DP2), was 70% more frequent in BCP ALL (n=687; P<10−4), and 98% more frequent in cases diagnosed between 3 and 6 years (P<10−4), but not <3 or >6 years, than in controls. Only one of 21 possible DPB1 supergenotypes, GEG/GKG (DP2/DP4) was significantly more frequent in BCP ALL (P=0.00004) than controls. These results suggest that susceptibility to BCP ALL is associated with the DP2 supertype, which is predicted to bind peptides with positively charged, nonpolar aromatic residues at the P4 position, and hydrophobic residues at the P1 and P6 positions. Studies of peptide binding by DP2 alleles could help to identify infection(s) carrying these peptides.

childhood malignancy in developed countries, where it constitutes over 30% of childhood cancers (Stiller et al, 1998;Smith et al, 1999). The striking age-incidence peak between 2 and 5 years of age consists mainly of common, B-cell precursor (BCP) ALL (Greaves et al, 1993(Greaves et al, , 1985McKinney et al, 1993;Buckley et al, 1994). Molecular data indicate that BCP ALL can arise in utero in association with acquired chromosomal rearrangements that result in covert preleukaemic clones (Wiemels et al, 1999;Greaves, 2006), but progression to clinical ALL requires additional clonal genetic abnormalities, accumulated in a variable postnatal latent period. These may arise under the influence of an immune response to delayed infection (McNally and Eden, 2004;Greaves, 2006), but lack of information incriminating a specific infectious agent (Greaves, 2006;MacKenzie et al, 2006) has hindered verification of this causal pathway.
Insights into the role of infection in the aetiology of BCP ALL could be provided by associations with HLA class II alleles (Dorak et al, 1995(Dorak et al, , 1999Taylor et al, 1995Taylor et al, , 1998Taylor et al, , 2002. The highly polymorphic HLA DR, DQ, and DP loci are encoded by genes in the human major histocompatibility complex (MHC), and are responsible for the binding and presentation of infection-derived peptides to CD4 þ T cells, leading to adaptive immune responses to infections (Cooke and Hill, 2001). The affinity of different HLA class II allotypes for infection-derived peptides is influenced by a series of discrete peptide-binding pockets (PBP) embedded in the antigen-binding groove of the HLA class II a/b heterodimer (Hammer et al, 1997).
Since T-cell responses to infection in the presymptomatic phase of BCP ALL are not readily accessible to functional analysis, HLA class II alleles provide a potential PBP 'footprint' of the infection that may be involved in this disease. However, tight linkage between the HLA-DR and DQ loci makes it difficult to distinguish the primary contribution of alleles at these loci. Contrasting patterns of DR-DQ allelic linkage disequilibrium (LD) in different ethnic groups (Oksenberg et al, 2004) could resolve this problem, but such studies have yet to be reported in childhood leukaemia. Since the HLA-DP locus is only weakly linked to DR-DQ (Begovich et al, 1992;Cullen et al, 2002), analysis of DP alleles in BCP ALL should identify associations independent of DR-DQ. We and others have previously reported associations between DP alleles and human leukaemia (Pawelec et al, 1988;Taylor et al, 1995Taylor et al, , 2002. Furthermore, DP alleles are known to be associated with, or to act as restriction elements for a number of parasitic (Meyer et al, 1994;May et al, 1998), microbial and viral diseases, including hepatitis B and rabies (Celis and Karr, 1989;Celis et al, 1990), herpes simplex (Koelle et al, 2000), streptococcus (Dong et al, 1995), dengue virus (Kurane et al, 1993;Okamoto et al, 1998), Epstein -Barr virus (Voo et al, 2002), respiratory syncytial virus (RSV) (De Graaf et al, 2004;De Waal et al, 2004), and HIV (Cohen et al, 2006).
Peptide binding by HLA class II allotypes, including DP, is the outcome of interactions between the amino acid side chains of the peptide and four major peptide-binding pockets (1, 4, 6, and 9;Hammer et al, 1997). Since different alleles can have overlapping peptide-binding properties, depending on the number of PBP that they share (Southwood et al, 1998), this has permitted DR alleles with the same amino acid polymorphisms lining specific peptidebinding pockets to be clustered into supertypes Southwood et al, 1998;Doytchinova and Flower, 2005). Using a similar approach, Castelli et al (2002) defined three DP supertype clusters with shared amino acid residues in the P1 (b84) and P6 (b11) PBP. However, the P4 peptide-binding pocket, at position b69, also makes an important contribution to antibody and peptide-binding (Arroyo et al, 1995;Chicz et al, 1997), T-cell responses (Berretta et al, 2003;Diaz et al, 2003) and disease susceptibility (Potolicchio et al, 1999;Wang et al, 1999). For this reason we clustered 430 DPB1 alleles into six supertypes based on polymorphisms in three PBP, at positions 11, 69, and 84 of the b1 domain (i.e., pockets 6, 4, and 1). We compared their frequencies in childhood BCP ALL, non-BCP leukaemia and solid tumours recruited as part of the UK Childhood Cancer Study (2000) with newborn controls. We discuss the implications of our findings in relation to an infectious aetiology for BCP ALL.

Cases and controls
Childhood leukaemia cases were recruited between 1992 and 98 as part of the UK Childhood Cancer Study (UKCCS, 2000). Leukaemias were classified as BCP ALL (CD10 þ , CD19 þ ; n ¼ 687) or non-BCP acute leukaemia. The non-BCP leukaemias were the sum of Pro-B ALL (CD10À, CD19 þ ), T-ALL (CD2/ CD7 þ , CD19À, DRÀ), and AML (n ¼ 208). Diagnostic immunophenotyping was carried out according to the protocol for UK Medical Research Council leukaemia trials (UKCCS, 2000). Childhood solid tumour cases (n ¼ 409) were also recruited as part of the UKCCS (UKCCS, 2000). Umbilical cord blood samples from a cross-sectional series of normal white UK newborns (n ¼ 864) born in Manchester UK between 1991 and 1997 were used as controls (Taylor et al, 2002). Blood sample collection and HLA molecular typing were carried out with national and local ethical consent. UKCCS patient data (diagnoses, gender, ages, ethnic background) were validated by the UKCCS data centre at the Epidemiology and Genetics Unit, University of York.

HLA-DPB1 molecular typing
HLA-DPB1 molecular typing was carried out as previously described in detail (Taylor et al, 2002) by amplifying a 327 bp exon 2 product in each case and control genomic DNA sample using a single pair of generic DPB1 PCR primers, spotting aliquots of each PCR product onto 384 sample nylon filters, and hybridising replicate filters with a panel of 28 32 P-labelled sequence specific oligonucleotide probes. Probe hybridisation was detected using real-time autoradiography, and alleles assigned from published DPB1 ideograms.

Data analysis
DPB1 alleles in cases and controls were grouped into the six supertype clusters defined in this study (see Table 2 and Results for further details). Supertype allele and genotype frequencies were compared in cases and controls using global and univariate statistical analysis. As discussed previously (Taylor et al, 2002) ethical constraints precluded the collection of samples from casematched control children, so we used local white UK newborns as controls. DPB1 alleles with a cumulative frequency of o5% that did not fall within the supertype clustering system were excluded from the analysis. Only sequence variation in the three peptidebinding pockets (positions 11, 69, and 84; pockets 6, 4, and 1, respectively) used for supertype clustering was included in the analysis. Global case -control supertype frequencies were compared using the CLUMP programme of Sham and Curtis (1995), a Monte Carlo method that computes a Pearson w 2 statistic (T1) from a series of simulated case -control tables. In univariate analysis, cross-product odds ratios (ORs), and 95% confidence intervals were calculated from case -control supertype and genotype frequencies by the RERI program in the Linkage Utility Package, LINKUTIL, using the Sheehe method. The 2by2 programme in LINKUTIL was used to determine 2-sided P-values for case -control supertype and genotype differences using Fisher's Exact test. Six supertypes require an uncorrected P-value o0.008, and 21 supergenotypes an uncorrected value o0.002 to achieve significance (P ¼ 0.05). No correction for the total number of classical DP alleles was applied. POPGENE version 1.31 was used to test for two-locus linkage disequilibrium between DPB1 and DQA1, or DQB1 alleles.

Case and control characteristics
The UKCCS is an epidemiological case -control study designed to test the role of environmental factors in the aetiology of childhood cancer and leukaemia (UKCCS, 2000). As part of the UKCCS, we obtained HLA-DPB1 types for 982 cases of childhood leukaemia (Taylor et al, 2002). Ninety-one percent of the leukaemia cases were classified as white, based on parental information, the remainder being Asian (3.8%), Black (1%), mixed ethnicity (1.9%), other ethnic groups (0.5%) or unknown. Of 875 cases of ALL, 559 were identified as BCP ALL, and a further 228 ALL cases were unclassified (Taylor et al, 2002). Subsequent diagnostic information for the unclassified ALL cases enabled us to identify 128 additional BCP ALL, seven Pro-B ALL, and six T ALL cases. These were included in the present study, which therefore comprises 895 DP-typed cases of childhood leukaemia with a confirmed diagnosis, of which 687 were BCP ALL and 208 were non-BCP leukaemia cases (Table 1). A mixed diagnostic series of childhood solid tumour cases (n ¼ 409), not including childhood lymphoma (Taylor et al, 2002) is included for comparison. Of these, 405 cases had informative ethnic data, being classified as white in 91% of cases. Cord blood samples from a cross-sectional series of normal white UK term newborns (n ¼ 864) were used as controls. Male -female ratios were slightly higher in the leukaemia cases (1.22) than the solid tumours (1.14) and controls (1.01).
The association of BCP ALL with DP2 and DP8 raised the possibility of a chance finding. To test this, supertype frequencies in four BCP ALL case series were compared with controls: (1) cases included in our previous study (n ¼ 559; Taylor et al, 2002); (2) half of the cases in the present study (n ¼ 344); (3) half of the cases in the previous study combined with the 'new' cases (n ¼ 343); (4) the 'new' cases (n ¼ 128) alone. DP2 and DP8 were significant in all four case series, though only DP2 remained significant after correction (Table 5).
To determine the relationship between the age at diagnosis of BCP ALL and DP supertype, we compared the frequencies in cases diagnosed o3 years of age, 43 -6 years, and 46 years, with controls. Figure 1 shows that the risk of BCP ALL was increased by 98% in DP2 þ cases diagnosed at 43 -6 years (OR, CI: 1.9, 1.4 -2.6; P ¼ 10 À4 ), but was not significant in BCP ALL diagnosed o3 or 46 years. DP4 was significantly increased in BCP ALL diagnosed o3 years, though not after correction. DP8 was not significant after correction, while DP1 protected from BCP ALL in all age groups.

Linkage disequilibrium analysis
To test whether the DP supertype associations could be explained by LD with HLA-DQ alleles, we analysed the co-occurrence of DP and DQ alleles in 451 BCP ALL cases, using POPGENE. We detected only one DP allele, 1601, in LD with DQ (DQB1*0401; w 2 ¼ 37.4; uncorrected Po10 À4 ). Five BCP ALL cases (0.4%) typed for DPB1*1601, a frequency not significantly greater than in the controls, indicating that the DP-supertype results cannot be explained by LD between DP and DQ alleles.

DISCUSSION
Selective peptide binding by HLA allotypes is a prerequisite for the recognition of antigens by T cells leading to adaptive immunity (Madden, 1995). Such a mechanism may underpin the immunemediated progression of pre-ALL to overt leukaemia following delayed postnatal infection (Greaves, 2006). In our previous study, we suggested that the presence in pocket 4 of a glutamic acid (E) residue at position 69 of the DPb1 domain was associated with BCP ALL (Taylor et al, 2002). However, HLA class II allotype-associated peptide binding is not the property of a single PBP; rather, it is the sum of a series of key PBP forming a DP allotype-associated peptide-binding motif or 'footprint'. Polymorphisms in PBP accommodating the P1, 4, 6, and 9 amino acid anchors appear primarily to influence the DP allotype footprint (Hammer et al, 1997;Diaz et al, 2003Diaz et al, , 2005. Since pocket 9 is composed of polymorphisms in residues 9, 35, 36, 55, and 56 , we excluded this level of complexity. Furthermore, grouping amino-acid polymorphisms at positions 36, 56, and 76 failed to define recognised supertypes, and were not associated with leukaemia (data not shown). Clustering of DP alleles into six supertypes based on amino acid dimorphisms at positions 84 (P1 pocket), 69 (P4 pocket), and 11 (P6 pocket) represents an expanded version of the scheme proposed by Castelli et al (2002) based on peptide binding, and a slightly modified version of the hierarchical clustering scheme proposed by Doytchinova and Flower (2005). We have provisionally denoted the six supertypes DP1 (GKD), DP2 (GEG), DP3 (LKD), DP4 (GKG), DP6 (LED), and DP8 (GED) since they broadly resemble those defined in the primed lymphocyte test (PLT) as DPw specificites (De Koster et al, 1991). Furthermore, HLA-DPw2 defined by PLT was previously reported to be associated with ALL (Pawelec et al, 1988). The DPB1 locus is the second most polymorphic HLA class II locus after DRB1, with at least 120 alleles identified to date (http:// anthonynolan.org.uk/HIG/lists/class2list.html). In a rare disease such as BCP ALL in which there are likely to be multiple aetiological factors, weak HLA associations potentially require hundreds of cases and controls to allow for correction for multiple testing. Supertype analysis, in which alleles are clustered according to common functional (i.e., peptide binding) properties, overcomes this problem. DPB1 alleles comprise combinatorial series of six variable regions (A-F) encoded by exon 2 (Bugawan et al, 1988), in which alleles with the same variable region polymorphisms have the same peptide-binding pockets. DP alleles with the same polymorphisms at position 11 in variable region A, position  Figure 1 Odds ratios for DPB1 supertype frequencies compared with normal newborns in relation to the age at diagnosis of BCP-ALL. Ages at diagnosis: 0 -o3 years (white bar), 3 -6 years (grey bar), 46 years (checked bar). Vertical limits are 95% confidence intervals. w One-sided, corrected Fishers P-values: 0 -o3 years: DP4 ¼ 0.018, DP3 ¼ 0.012; DP1 ¼ 0.012. 3 -6 years: DP2 ¼ 0.0006, DP1 ¼ 0.012. 46 years: 69 in variable region D, and position 84 in variable region F (Bugawan et al, 1988) can be predicted to have similar immune functions, based on identical (P6, P4, and P1, respectively) PBP. Our supertype classification includes position 69 (P4 pocket) since this is known to influence antibody-binding (Arroyo et al, 1995), allorecognition and peptide binding (Diaz et al, 2005), and disease susceptibility (Potolicchio et al, 1999;Wang et al, 1999). Furthermore it allowed us to split b69E alleles into three supertypes (GEG (DP2), LED (DP6), GED (DP8)), and to compare these with three homologous b69K series (GKG (DP4), LKD (DP3), GKD (DP1)). We observed a 70% increase in BCP ALL risk in children typing for DP2 (GEG), a 98% increase in DP2-associated risk between 3 and 6 years of age, and a 130% increased risk associated with a single supergenotype, DP2/DP4. This association was not present in BCP ALL diagnosed o3 or 46 years of age, and leads us to conclude that the peak of BCP ALL (Greaves et al, 1993(Greaves et al, , 1985 may be influenced by the immunological sequelae of age-related interactions between DP2/DP4 and a specific antigenic peptide derived from delayed infection. Analysis of replicate case series, including the 128 BCP ALL cases new to this study, suggest strongly that the association with DP2 was unlikely to be due to chance. Furthermore, DP6, which also has E at position 69 was not associated with BCP ALL, but was associated with non-BCP leukaemia. Phylogenetic analysis suggests that the DPB1 peptide-binding motif may have undergone rapid recent diversification and b69E alleles, such as DPB1*0201 and DPB1*0601, are not all closely related (Gyllensten et al, 1996). Supertype analysis groups HLA alleles with convergent immunological properties (Hughes et al, 1996;Trachtenberg et al, 2003), based on common peptide-binding motifs, and may be more relevant to BCP ALL aetiology than individual alleles.
We measured the significance of case -control supertype frequency differences using Fisher's Exact tests, corrected for six supertypes or 21 supergenotypes. We did not correct for total DP alleles since our analysis was informed by the results of our previous study (Taylor et al, 2002) and would have been overly influenced by low frequency alleles. Nevertheless, our results require confirmation with independent case -control series.
Although associations between childhood ALL and DR, DQ and DP alleles have been reported in previous studies (Dorak et al, 1995(Dorak et al, , 1999Taylor et al, 1995Taylor et al, , 2002, there has been no test of the effect of LD between alleles at the different loci. We found no evidence that the association of BCP ALL with DP2 could be explained by LD with DQ alleles, suggesting that DP has a primary role in susceptibility to BCP ALL. It is unlikely that the association of BCP ALL with DP2 is due to a defect in the immune response to an oncogenic virus (immune evasion). There is no evidence that childhood BCP ALL is caused by an oncogenic virus (MacKenzie et al, 2006), and the positive association with DP2 suggests that binding of specific peptide(s) and T-cell activation are involved in causation, which is inconsistent with immune evasion by an oncogenic virus. The negative association of DP1 with BCP ALL may be due to the binding and recognition of TEL-AML1 peptide(s) in children with pre-ALL with this supertype, as discussed elsewhere (Taylor et al, 2008), since a TEL-AML1 junctional peptide has been shown to elicit a DPB1*0501-restricted (DP1) CD4 þ T cell response (Yun et al, 1999).
The delayed response to infection hypothesis for BCP ALL (Greaves, 2006) proposes that a child carrying an in utero-initiated preleukaemic clone is vulnerable to the development of leukaemia if it is insulated from infection during the early postnatal period, but exposed at a later age. We previously reported that the risk of BCP ALL was greater in DPB1*0201 heterozygotes than homozygotes (Taylor et al, 2002), suggesting that BCP ALL might be the rare 'down-side' of the advantage that MHCheterozygosity confers on immune responses to infection. Although evolution of HLA allelic diversity is thought to favour heterozygotes (Takahata and Nei, 1990), a recent study suggests that this advantage may be allele-specific (Lipsitch et al, 2003). Our finding that only one (DP2/DP4) of 15 heterozygous supergenotypes (GEG/GKG) is associated with BCP ALL fits this model.  Using DPB1*0201 peptide-binding data and molecular modelling (Diaz et al, 2005), it is possible to make predictions about the amino acid anchors at P1, P4, and P6 of peptides binding to DP2. Pocket 4 of DP2 is deeper, more negatively charged than DP4 , giving it a greater affinity for positively charged nonpolar aromatic residues, such as glutamine (Q), arginine (R), and lysine (K). Furthermore, glycine (G) makes pocket 1 (b84) and pocket 6 (b11) deep and hydrophilic, preferentially-binding hydrophobic and aromatic amino acids, notably phenylalanine (F), and tyrosine (Y) (Berretta et al, 2003;Diaz et al, 2003Diaz et al, , 2005. This predicts that infectious peptides with an 1 FXXKXFXXA/V 9 motif (where X is unknown, and P9 can be A or V) are likely to bind to DP2.
In this context, Van Steensel-Moll et al (1986) reported a negative (protective) association between childhood ALL and infections in the first year of life, and Rosenbaum et al (2005) documented a weak negative association between childhood ALL and bronchiolitis and pneumonia. Roman et al (2007) found a slight deficit in lower respiratory tract infection in the first year of life of UKCCS ALL cases diagnosed at 2 -5 years. Together these findings suggest that the immune response to RSV infection may be a factor in BCP ALL. RSV is a highly contagious, weakly pathogenic, but strongly immunogenic virus that is widely distributed in the childhood population (Handforth et al, 2000;McNamara and Smyth, 2002). The G protein of RSV elicits CD4 þ T-cell responses (De Graaf et al, 2004;De Waal et al, 2004), the peptide 162 D-N 179 containing two overlapping T-cell epitopes, 163 FHFEVFNFV 171 and 165 FEVFNFVPC 173 that are restricted by DPB1*0401 (DP4), and DPB1*0201 (DP2) (De Graaf et al, 2004). Both peptides have F at P1 and P6 suggestive of binding to GEG (DP2) and GKG (DP4), consistent with the association of BCP ALL with DP2/DP4. While this conclusion is speculative it points to a need for detailed sero-epidemiological studies of RSV in BCP ALL.