Elevated risk of invasive group A streptococcal disease and host genetic variation in the human leucocyte antigen locus.

Invasive group A streptococcal (GAS) disease is uncommon but carries a high case-fatality rate relative to other infectious diseases. Given the ubiquity of mild GAS infections, it remains unclear why healthy individuals will occasionally develop life-threatening infections, raising the possibility of host genetic predisposition. Here, we present the results of a case-control study including 43 invasive GAS cases and 1540 controls. Using HLA imputation and linear mixed models, we find each copy of the HLA-DQA1*01:03 allele associates with a twofold increased risk of disease (odds ratio 2.3, 95% confidence interval 1.3-4.4, P = 0.009), an association which persists with classical HLA typing of a subset of cases and analysis with an alternative large control dataset with validated HLA data. Moreover, we propose the association is driven by the allele itself rather than the background haplotype. Overall this finding provides impetus for further investigation of the immunogenetic basis of this devastating bacterial disease.


Introduction
Invasive group A streptococcal (GAS) disease is defined by isolation of Streptococcus pyogenes at a normally sterile site. Although uncommon, the incidence rate reaching 3 per 100,000 in Northern Europe [1], the case-fatality rate is high relative to other infections, reaching 20% in some studies [2]. While infection can occur at a variety of sites, soft tissue infections predominate, of which necrotising fasciitis (NF) is a rare but particularly dangerous form often necessitating extensive surgical debridement. This and other forms of invasive GAS disease can be complicated by streptococcal toxic shock syndrome (STSS) characterised by hypotension, multi-organ failure and a case-fatality rate exceeding 40% [1].
Despite growing recognition of the importance of host genetic factors in susceptibility to infectious diseases, limited attention has so far been paid to host genetic susceptibility to invasive GAS disease [3]. The only study to investigate this in humans dates from the candidate gene era focussing on haplotypes in the class II region of the human leucocyte antigen (HLA) locus [4]. Rather than investigating susceptibility itself, this study reported specific haplotypes associated with severe disease defined by the presence or absence of hypotension and multi-organ failure. In particular, the authors found the HLA-DRB1*1501/HLA-DQB1*0602 haplotype to be associated with a fourfold reduced risk of severe disease among previously healthy individuals with invasive GAS disease [4]. Nonetheless, further support for the role of HLA in invasive GAS comes from several studies showing binding of GAS superantigens to HLA-DQ molecules [5]. In particular, streptococcal pyrogenic exotoxin A (SpeA), a key superantigen, binds HLA-DQA1 in a manner dependent on DQA1 polymorphism [6]. Added to this, transgenic mice expressing human HLA-DQ molecules were found to be highly sensitive to superantigens compared with non-transgenic littermates [7], while particular HLA-DQ molecules were associated with enhanced infection of the nasal cavity in a manner dependent on SpeA [8]. Nonetheless, despite rapid recent progress in the field of human genetics, the association between the HLA locus and invasive GAS has not been revisited, likely reflecting the challenges of recruiting patients with what is essentially a rare and extreme phenotype.
In the present study, we investigate the relationship between HLA class II alleles and susceptibility to invasive GAS disease, limiting our analysis to otherwise previously healthy children and young adults. Here, using contemporary methods that are robust to the major confounders of candidate gene approaches, we find the HLA-DQA1*01:03 allele to be associated with a twofold increased risk of susceptibility to invasive GAS disease. While this allele is not part of any of the haplotypes linked to the trait in the candidate gene era [4], it adds weight to the notion that HLA polymorphism contributes to the outcome of invasive GAS infections, perhaps in a manner dependent on GAS superantigens [6]. Overall this finding provides impetus for further investigation of the immunogenetic basis of this devastating bacterial disease.

Results and discussion
After quality control, we included 43 cases of European ancestry aged <65 years without comorbidity (Supplementary Table 1). Of these, 34 had been diagnosed with NF while nine had been diagnosed with other manifestations of invasive GAS disease ( Table 1). The youngest patient was 18 months and the eldest was 63 years (median 35 years, interquartile range 25-41 years). Four of the seven children had preceding varicella, five of the women were postpartum, two after caesarean section, and two other adult patients after other surgery. Otherwise, the patients had no risk factors for invasive GAS disease. For our primary case-control analysis we compared our cases with 1540 healthy children of European ancestry previously recruited to studies of vaccine efficacy undertaken by the Oxford Vaccine Group, University of Oxford, Oxford, UK. For sensitivity analyses, we compared our cases with 430 healthy adults of European ancestry, a subset of the 5544 individuals from the National Institute for Health Research Oxford Biobank, for whom validated HLA data were available [9].
We first considered genotypic associations in the extended major histocompatibility complex based on SNP genotyping. Among 434 directly ascertained genotypes with minor allele frequency (MAF) > 5%, the strongest association signal was found at rs2534816 (P LMM = 0.0013) located in the class I region 37 kb from HLA-E. The strongest signal among 137 variants in the class II region was found at rs9276171 (P LMM = 0.006) located in the intron of HLA-

DQB3.
Of 4585 imputed genotypes the strongest signal was found at rs2524222 (P LMM = 0.0005), located 49 kb from HLA-E, while the strongest class II signal was found at rs1383265 (P LMM = 0.004), located 8.5 kb from HLA-DQB2 ( Supplementary Fig. 1). We then proceeded to analyse associations based on HLA imputation. Of 160 imputed four-digit HLA alleles, of which 19 in the class I region and 27 in the class II region had MAF > 5%, the strongest signal was linked to HLA-DQA1*01:03 allele (Fig. 1a), which was found at MAF 12.7% in cases compared with 5.9% in controls (odds ratio, OR, 2.3, 95% confidence interval, CI, 1.2-4.4, P LMM = 0.009). Consistent with this, the presence of a lysine in place of an arginine at position 41, corresponding to rs36219699, and an alanine in place of a serine at position 130, corresponding to rs41547417, which together define DQA1*01:03, were similarly associated with disease (P LMM = 0.009). The DQA1*01:03 signal was marginally weaker when limiting the analysis to the 34 patients with NF (OR = 2.1, 95% CI 1.0-4.4, P LMM = 0.049) despite the fact it was preserved in an analysis based on the similarly sized subgroup of 32 patients aged <40 years (OR = 2.6, 95% CI Fig. 1 Classical HLA alleles associated with invasive GAS disease. a For each locus the negative common logarithm of the p-value from LMM analysis is plotted with two-digit alleles to the left and four-digit alleles to the right. b The same is plotted for an LMM analysis conditioned on HLA-DQA1*01:03. c For the primary analysis and three sensitivity analyses, effect size estimates for HLA-DQA1*01:03 are shown based on a logarithmic scale with the number of controls in each shown in brackets. The first three analyses use LMM with transformation [36], while the latter based on the entire Oxford Biobank uses logistic regression. d For both HLA-DQA1*01:03 and HLA-DQA1*05:01, effect size estimates are shown, on a logarithmic scale, each conditioned on the other allele 1.3-5.2, P LMM = 0.007). Two additional four-digit alleles and three additional amino acids in the class II region, along with five amino acids in the class I region were associated with susceptibility with P LMM < 0.05 (Supplementary Table  2, Supplementary Table 3). However, after controlling for the presence of DQA1*01:03, none of the four-digit class II alleles remained significant at this level (Fig. 1b).
To validate our findings, we performed classical HLA typing of the DRB1, DQA1 and DQB1 loci in 30 cases for which sufficient DNA was available. Across the 42 alleles observed, concordance with imputation was generally high, ranging from 85.0% for DRB1 through 91.7% for DQA1 to 96.7% for DQB1. Moreover, the six copies of DQA1*01:03 were perfectly imputed while only three of the remaining alleles were imputed with accuracy of 95% or less. We then reran our analyses substituting the available classical for imputed HLA types and found the effect size estimate for DQA1*01:03 unchanged (Fig. 1c).
We next repeated our analyses comparing the 43 cases to the alternative population of adult European controls from the Oxford Biobank among whom the MAF of DQA1*01:03 was also 5.9%. The effect size estimate for DQA1*01:03 remained unchanged in analyses using either logistic regression with all 5544 European individuals for whom validated HLA data were available, or a linear mixed model with the subset of 430 European individuals for whom genome-wide data were available (Fig. 1c), the latter correcting for ancestry and relatedness.
Next, we investigated effects of other DQA1 alleles using both linear mixed-models and logistic regression. Based on likelihood ratio, the best fit was achieved by a model comprising fixed effects parameters for both DQA1*01:03 and DQA1*05:01 (Fig. 1d), the latter having a weak protective effect (OR = 0.62, 95% CI 0.35-1.09). In this scenario, each copy of DQA1*01:03 was associated with a twofold increased risk of invasive GAS disease (OR = 2.1, 95% CI 1.2-4.1), an effect size and allele frequency that would imply a population attributable fraction of 11.6%. In addition, the effect size estimates for the two alleles were highly consistent across a number of alternative analytical approaches including logistic regression with or without principal components [10] and a generalised linear mixedmodel analysis ( Supplementary Fig. 2), also termed logistic mixed-model analysis [11].
Finally, we investigated whether the signal was better explained by haplotypes or individual alleles. Having defined nine three-locus class II haplotypes with MAF > 5%, we tested their association with susceptibility. Of these only the DRB1*13:01-DQA1*01:03-DQB1*06:03 haplotype, which had a MAF of 11.6% in cases compared with 5.6% in controls, was significantly associated with susceptibility (OR 2.2, 95% CI 1.2-4.3, P LMM = 0.015). Interestingly, one copy of the rarer DRB1*15:02-DQA1*01:03-DQB1*06:01 was also present among the cases giving a MAF 1.1% compared with 0.23% among controls, although this difference was not statistically significant (OR = 5.1, 95% CI 0.8-34, P LMM = 0.09). We did not observe the DQA1*01:03 allele in any other haplotype with MAF down to 0.01%. No further class II alleles with MAF > 5% were associated with susceptibility, including the previously implicated DRB1*15:01-DQA1*01:02-DQB1*06:02 haplotype [4], which was present in 14.0% of cases and 15.8% of controls (P LMM = 0.67). However, consistent with the same earlier report, the DRB1*14:01-DQA1*01:01-DQB1*05:03 haplotype, with MAF 5.8% in cases compared with 2.5% in controls, was associated with increased risk of disease (OR = 2.4, 95% CI 0.9-5.9, P LMM = 0.067), a signal that remained apparent after controlling for DQA1*01:03 (OR 2.5, 95% CI 1.0-6.1, P LMM = 0.043). In the earlier report, the DRB1*14:01-DQA1*01:01-DQB1*05:03 haplotype was found at higher frequency in cases of invasive GAS with severe systemic disease than either controls from the general population or cases of invasive GAS without severe systemic disease [4]. While the former comparison is analogous to our analysis, the effect reported in that study was limited to the invasive GAS cases without NF, a finding that was not apparent from our data, with the caveat that the small numbers in both studies prevent a definitive conclusion. In our analysis, the signal at this haplotype is most likely explained by DQB1*05:03, which was excluded from our primary analysis due to MAF 2.6% but showed the same borderline association with susceptibility (P LMM = 0.064). Otherwise none of the previously implicated haplotypes were associated with susceptibility (Supplementary Table 4).
Limited effort has to date been documented investigating host genetic susceptibility to invasive GAS disease. As a starting point to further study in this area, we have demonstrated an association between the HLA-DQA1*01:03 allele and susceptibility to invasive GAS disease in otherwise healthy children and adults. Importantly, we are encouraged by the high level of consistency of the HLA-DQA1*01:03 association across a variety of sensitivity analyses including using data based on classical typing and use of an alternative control dataset. The presence of the rarer HLA-DQA1*01:03 haplotype in one of 43 cases raises the possibility the association is driven by the allele rather than the background haplotype.
Beyond the earlier report linking the class II region to invasive GAS disease [4], HLA has long been implicated in a range of infectious, autoimmune and other diseases [12,13]. Moreover, the class II region has been a key finding in a number of recent GWAS of bacterial diseases including the somewhat analogous syndrome of invasive Staphylococcus aureus infection [14,15]. The DQA1*01:03 allele itself has not previously been implicated in susceptibility to infection but has been linked to several autoimmune and inflammatory diseases including primary sclerosis cholangitis [16], systemic lupus erythematous [17] and idiopathic achalasia [18]. More recently, DQA1*01:03 was part of one of several risk haplotypes that may potentially explain the HLA susceptibility locus in rheumatic heart disease, a post-infective complication of GAS infection [19]. Thus, while further work will be required to finemap the rheumatic heart disease association, it is possible that at least some genetic architecture may be shared across GAS diseases.
Interaction between HLA molecules and GAS superantigens has long been thought to play a key role in the pathogenesis of invasive GAS disease leading to activation of large numbers of T-cells [3]. This process results in massive production of cytokines causing widespread tissue damage, disseminated intravascular thrombosis and organ dysfunction which characterise the clinical picture [20]. Crucially, binding of HLA by superantigens is largely antigen independent and usually occurs at residues outside the peptide-binding cleft [21]. Moreover, SpeA, a key superantigen, binds with higher affinity to cell lines expressing DQA1*01 alpha chains compared with DQA1*03 or DQA1*05 chains, to which very little binding was detected [6]. By analogy to binding of staphylococcal enterotoxin B to DRB1 [22] and streptococcal superantigen to DQA1 [23], binding of SpeA to DQA1 is predicted to centre on a salt bridge formed between the glutamic acid at position 61 of SpeA and the lysine at position 42 of DQA1 [24], the latter widely termed position 39 in the superantigen literature in reference to the sequence of DRA [22][23][24]. Tantalisingly, however, in DQA1*01:03, the preceding arginine at position 41 is replaced by a second lysine, which could plausibly alter SpeA binding. Moreover, although heightened superantigen responsiveness might be expected to augment severity, it is also plausible that superantigens including SpeA may impair the acquisition of immunity to GAS thereby affecting susceptibility [25][26][27][28][29].
Our study has three main limitations. First our sample size is small, especially by the standards of modern genetic research. Despite this, we propose that power is likely to be increased by our focus on patients with an extreme and well-defined phenotype of whom more than three quarters had NF, and by using a large number of controls, giving us an effective sample size of 167 in the primary analysis. In addition, despite our more stringent upper age limit (65 vs 85 years), we include an equivalent number of previously healthy individuals with severe systemic disease (43 vs 44 cases) to that in the only comparable report in the literature [4]. Thus it is of particular note that, although we have not made comparisons between severe and non-severe disease, we see very limited signal at the haplotypes reported to influence susceptibility in that report [4]. One possible explanation for this difference is that, reflecting advances in genetic analysis since the publication of that report, our dataset underwent rigorous quality control including removal of individuals of outlying genetic ancestry limiting the risk of confounding due to issues such as differences in the genetic ancestry of cases and controls [30]. Moreover, we analysed our data using linear mixed-models further curtailing confounding due to ancestry and relatedness [31] which could plausibly contribute to the previously reported signals [4]. This issue is also relevant to a recent study [29] linking the DQB1*06:02 allele to recurrent GAS-associated tonsillitis, the findings of which are difficult to interpret due inclusion of a mixture of Caucasian and Hispanic individuals without correction for ancestry at the analytical stage. Nonetheless, even allowing for the high level of consistency across our sensitivity analyses, it is plausible that, owing to the small sample size, we may be overestimating the effect of DQA1*01:03 while being underpowered to detect other signals, including that linked to the rarer DQB1*05:03 allele which might also influence susceptibility. Overall further studies will be needed to confirm or refute the effect of DQA1*01:03 on susceptibility to invasive GAS disease.
Second it is likely that having ascertained the cases through a patient group and a sample bank from a single institution they are not fully representative of invasive GAS disease in the general population, not least because those recruited through the patient group were all survivors who had predominantly suffered NF. That said, prospective recruitment at multiple institutions would be a costly and challenging endeavour which would have been hard to justify without the preliminary work presented here. Moreover, we consider the ascertainment of 34 otherwise healthy individuals with NF aged <65 years an accomplishment in itself, one that was possible only through the close involvement of a patient group.
Third with our current dataset we are unable to deconvolute whether the HLA-DQA1*01:03 allele drives susceptibility to all invasive GAS disease or has a more specific effect on NF, although an effect on NF alone may be less likely given the weaker signal in the analysis limited to that subgroup. Similarly, due to limited data available on many cases, we are unable to ascertain whether the effect is dependent on variation in the bacteria, including the presence or absence of specific superantigen genes, or is influenced by other factors such as viral coinfections including influenza or varicella. Looking forward, however, we anticipate such questions will become answerable through large-scale prospective studies which will require collaborations involving investigators from multiple institutions and countries.
In summary, we have confirmed an association between class II polymorphism and invasive GAS disease, resolving it to a specific DQA1 allele. Future research into the genetic basis of this devastating disease may bring about much-needed progress in development of vaccines or other therapeutics.

Materials and methods
Genetic data from cases of invasive GAS disease came from a newly genotyped sample collection, while genetic data from controls was from two existing datasets from earlier studies.
Cases aged <65 years without comorbidity were either survivors recruited retrospectively with informed consent through the STREP GENE study (National Research Ethics Service Ref 13./SC/0520; ClinicalTrials.gov NCT01911572) from a patient group called the Lee Spark NF Foundation (www.nfsuk.org.uk) or identified from a bank of samples at Imperial College London linked to limited clinical data that had been prospectively assembled from material surplus to diagnostic requirement (National Research Ethics Service Ref. 06/Q0406/20). Owing to the preliminary nature of this study, the rarity of invasive GAS disease, and uncertainty about the expected effect size, we did not specify a sample size in advance. Those recruited through the patient group had survived an episode of invasive GAS disease at a UK hospital since 1980 with microbiological confirmation obtained either through Public Health England or from the treating hospital. Participants submitted a saliva sample using Oragene® kits (DNA Genotek, Canada) from which DNA was extracted using the accompanying extraction kits. Those identified from the sample bank had been diagnosed with invasive GAS disease at the Imperial College Healthcare NHS Trust, London, UK, since 2006. DNA was extracted from stored tissue or serum using the Gentra® Puregene® Tissue kit (Qiagen®, USA) or the QIAamp® Circulating Nucleic Acid kit (Qiagen). Cases were genome-wide genotyped using either the HumanCore platform (Illumina®, USA) or the Global Screening Array (Illumina). Controls for the primary analysis were children and adolescents recruited to various UK studies of vaccine efficacy for whom samples had been stored by the Oxford Vaccine Centre Biobank, University of Oxford, UK. These individuals had previously been genome-wide genotyped using the HumanOmniExpress platform (Illumina). Additional control data was available from the National Institute for Health Research Oxford Biobank including 5544 individuals for whom validated HLA data were available [9].
Quality control was undertaken using standard approaches [30] but with an additional test [32] aimed at identifying variants that differed between cases and controls due to the different platforms used for genotyping (Supplementary Table 1, Supplementary Fig. 3, Supplementary Fig.  4, Supplementary Fig. 5). During this process, seven cases were excluded, three on the basis of non-European ancestry, in addition to the six earlier exclude due to an age of 65 years or more (n = 4) or comorbidity (n = 2). In total 119,134 variants genotyped on all three platforms were carried forward of which 434 were located in the extended major histocompatibility complex. For HLA imputation we used SNP2HLA software (version 1.0.3) without the default parameters using the prebuilt Type 1 Diabetes Genetics Consortium reference panel [33]. With an overlap of 367 variants, we successfully imputed a total of 160 four-digit HLA alleles and 1097 HLA amino-acid substitutions with imputation accuracy assessed exceeding 0.6 using the Beagle software [34] (v3.0.4) R 2 metric. Of these 47 fourdigit alleles (27 in class II) and 869 amino acid substitutions (429 in class II) had minor allele frequency >0.05. To minimise the effects of population structure and cryptic relatedness, we performed our primary analyses using linear mixed-models implemented in GCTA software [35] (v1.26.0), limited to variants with MAF > 5% and with genotype represented by the dose of the minor allele estimated by imputation. We performed further analyses including estimation of effect sizes by transformation [36] in R (v3.0) using amongst other tools the GenABEL [37] and GMMAT packages [11], and estimated the population attributable fraction as previously defined [38]. To define three-locus haplotypes in the class II region, we phased four-digit alleles using Phase [39] software (v2.1.1) before extracting the probability of one or two copies of a given haplotype in each individual to define the dose of the minor allele. Finally, in a subset of samples, classical HLA typing of the class II locus using sequence-specific primer amplification was performed at the Transplant Immunology Laboratory at the Oxford Transplant Centre as previously described [40,41].

Data availability
Genotype and phenotype data from invasive GAS cases underlying this paper have been deposited in the European Genome-phenome Archive (www.ega-archive.org) under accession number EGAS00001003421 with access permitted for further research on susceptibility to invasive GAS disease. In addition, a preprint of the paper was released through bioRxiv: https://doi.org/10.1101/559161. Centre for Human Genetics for generating the genotyping data, subsidised by Wellcome Trust Core Awards (090532/Z/09/Z and 203141/ Z/16/Z). We also thank the Oxford Biomedical Research Computing facility, a joint development between the Wellcome Centre for Human Genetics and the Big Data Institute supported by Health Data Research UK and the National Institute for Health Research Oxford Biomedical Research Centre with funding from the Wellcome Trust Core Award Grant Number (203141/Z/16/Z). We also acknowledge the support of the National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UK, the National Institute for Health Research Imperial Biomedical Research Centre, London, UK and the National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, London, UK. None of these funders had any role in study design, data collection and analysis, decision to publish or preparation of the paper. Moreover, the views expressed here are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research or the Department of Health. We thank the Lee Spark NF Foundation (www.nfsuk.org.uk; Charity No. 1088094) for assistance with recruitment as well as the patients and family members who took part in the STREP GENE study, as well as the volunteers who participated in the both the Oxford Vaccine Group (www.ovg.ox.ac.uk) and the Oxford Biobank (www.oxfordbiobank.org.uk) studies.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.