Introduction

Staphylococcus aureus infections are a leading cause of disease in humans. A growing body of evidence suggests that genetic variation can influence susceptibility to infection with S. aureus.1 Genetic variation is associated with susceptibility to S. aureus infection in inbred mice,2, 3 sheep4 and cattle.5 In humans, first-degree relatives of patients previously hospitalized with S. aureus bacteremia (SAB) were themselves significantly more likely to develop SAB than the population as a whole.6 Our recent genome-wide study of >50 000 white subjects found that two single-nucleotide polymorphisms (SNPs) in the human leukocyte antigen (HLA) class II region on chromosome 6 were associated with susceptibility to S. aureus infection at a genome-wide significant level.7

African Americans (AA) represent a complex mixture of West African and European ancestry, with an average 20% of each AA genome inherited from European ancestors.8 Such admixture is traditionally regarded as a liability in genetic association studies and is typically avoided by studying genetically homogeneous populations. This, in turn, has contributed to under-representation of AAs in large-scale genetic association studies in general9 and in S. aureus susceptibility studies in particular. For example, all three of the genome-wide association studies (GWASs) performed to evaluate genetic susceptibility to S. aureus were conducted exclusively in individuals of European descent.7, 10, 11 This is problematic, as rates of invasive S. aureus are significantly higher among AAs than European-descended populations.12, 13

Admixture mapping (AM) is an innovative strategy to overcome the issue of genetic admixture in human genotyping studies. For diseases where associated genetic variants differ substantially in frequency between ancestral populations, admixed individuals with disease will have an overrepresentation of ancestry from the population with the higher proportion of risk alleles.9 For this reason, AM can be more powerful than standard genetic association methods in cases where there is substantial genetic heterogeneity at a disease locus.9 Diseases for which risk varies substantially between ancestral populations make particularly good targets for AM studies. For example, AM has been used successfully in hypertension,14 prostate cancer,15, 16 peripheral artery disease17 and type II diabetes.18

In the current study, we evaluated the role of genetic variation on susceptibility to SAB in AAs. We used AM to test the hypothesis that, at particular sites in the genome, cases will be enriched for a given local ancestry relative to that expected given their average, genome-wide, admixture proportion. Such sites could represent the presence of risk or absence of protective alleles in the population showing enrichment.

Results

A total of 390 cases and 175 controls were included in the analysis. Compared with AA controls, SAB cases were more likely to be diabetic (48.4 vs 33.7%, P<0.001) and hemodialysis dependent (42.3 vs 5.7%; P<0.001), less likely to have a neoplasm (10.6 vs 23.4%, P<0.001) and less likely to have had a recent surgical procedure performed (20.7 vs 36.0%, P<0.001) (Table 1). Genome-wide ancestry for each sample was defined as the average of local ancestry estimates across the genome. Genome-wide European ancestry in cases and controls were similar (cases 20.4%, s.d. 1.5%; controls 20.9%, s.d. 2.4%) (Table 2).

Table 1 Characteristics of the admixture mapping population
Table 2 Genome-wide average proportions of European ancestry

No SNPs were found to exhibit genome-wide statistically significant evidence of case–control association (accounting for admixture) using the MIX statistic at the traditional GWAS multiple-comparison corrected threshold of P<5.00e-08. For the admixture score (ADM) test of admixture, strong correlations between test statistics due to large blocks of local ancestry make conventional thresholds inappropriate. Therefore, we implemented a permutation scheme to derive empirical multiplicity adjusted P-values (described in Materials and Methods section and in complete detail in the accompanying Supplementary Materials). Using this permutation scheme, the empirical threshold used to declare statistically significant increased admixture is 9.46e-05, which corresponds to the 5th percentile of the permutation distribution of the maximum ADM test statistic.

Using the empirical threshold value, one region on chromosome 6 in the HLA class II region (52 SNPs; from physical position 32377284 to 32660943 (hg19); see Supplementary Materials for list) exhibited increased admixture association at a level of genome-wide significance (P=4.56e-05<9.46e-05) (Figure 1).

Figure 1
figure 1

ADM score P-values. The blue line indicates a significance threshold of 9.46e-05, corresponding to empirical multiplicity adjusted P-values.

Despite the significant evidence that local admixture proportions differ from the genome-wide average, visually there is no notable difference in the estimated proportion of European ancestry in cases and controls in the region on chromosome 6 (Figure 2).

Figure 2
figure 2

Average local ancestry of 565 AA samples in chromosome 6. Visually, there is no notable difference in the estimated proportions of European ancestry between cases and controls; however, a region of 52 SNPs (from 32.37728 to 32.66094 Mb) was found to exhibit statistically significant increased European ancestry.

Discussion

The genetic basis of human susceptibility to S. aureus infection is poorly understood.19 The current investigation is the first to evaluate genetic susceptibility to S. aureus in AA and is one of the first genetic studies involving infectious disease risk in AAs. The study was possible by using AM as an alternate analytical approach to consider the genetic basis of SAB in AA populations. Using this approach, one region on chromosome 6 met genome-wide significance thresholds.

This region overlaps the HLA class II region previously implicated in a GWAS of susceptibility to S. aureus infection.7 HLA has been identified as a potentially important determinant of the innate and adaptive immune response to infections caused by S. aureus and other Gram-positive bacteria.7, 20, 21, 22, 23, 24, 25 The current study adds to the body of evidence identifying HLA as a potential determinant of host susceptibility to S. aureus infection.

In our recent GWAS of 4701 unique white subjects with culture-confirmed S. aureus infection and 45 344 matched controls, two imputed SNPs near the genes encoding HLA-DRA and HLA-DRB1 achieved genome-wide significance (for example, rs115231074: odds ratio (OR), 1.22; P=1.3 × 10−10; rs35079132: OR, 1.24; P=3.8 × 10−8) and one adjacent genotyped SNP was nearly genome-wide significant (rs4321864: OR, 1.13; P=8.8 × 10−8).7 These SNPs were located near HLA-DRA and HLA-DRB1 genes in the HLA class II region of human chromosome 6. In the current study, there is significant evidence of increased European ancestry in AA SAB cases in the same region (whereby there is a higher degree of European ancestry in this region than in the genome as a whole in both cases and controls), although the strongest signal is located 5′ of the HLA-DRA gene. The current study identifies the same region previously identified in our prior GWAS and extends the association of HLA class II to SAB in multiple ethnic groups. Collectively, these results suggest that efforts to follow-up the association of SAB with HLA class II should consider European alleles and haplotypes in both European and African-descended populations.

The findings of the current study are consistent with a growing body of evidence that associates genetic variation in HLA with susceptibility to Gram-positive bacteria, including S. aureus. First, variations within the HLA class II region are the only ones to date to have been implicated on a genome-wide significance level as being associated with susceptibility to S. aureus.7 Second, haplotypes across the HLA class II region have been associated with invasive Streptococcus pyogenes infection20 and determine severity of response to bacterial superantigens from both Streptococcus pyogenes21 and S. aureus.22 Third, toxic shock syndrome toxin (TSST-1), a S. aureus superantigen, binds HLA-DR123, 24 and is important in the pathogenesis of SAB and endocarditis.26 Fourth, nasal colonization with S. aureus is associated with the HLA-DR3 and HLA-DR7 class II serotypes.25 Taken together, these results suggest that further study is needed to better understand the role of variation within the HLA class II region in the pathogenesis of S. aureus infection.

The impact of host genetic variation on susceptibility to S. aureus infection is unresolved. Rates of S. aureus infections are significantly higher among distinct ethnic populations, including Australian27 and Canadian28 aboriginal populations, New Zealand Maori29 and AAs.12, 13 Although a twin study failed to demonstrate an association between nasal carriage of monozygotic twin status,30 a recent 20-year nationwide cohort study drawing from the entire Danish population of >8 million individuals showed evidence of familial clustering of SAB in first-degree relatives.6 Among the 34 774 individuals with a first-degree relative (index case patient) previously hospitalized with SAB who were followed for a mean of 7.8 years, a higher rate of SAB was observed among first-degree relatives than among the background Danish population (standardized incidence ratio (SIR): 2.49; (95% confidence interval (CI): 1.95–3.19)). This estimate was significantly higher if the index case patient was a sibling (SIR: 5.01; (95% CI: 3.30–7.62) than a parent and highest in siblings of individuals who developed non-hospital acquired SAB (SIR: 5.66; (95% CI: 3.47–9.24)). These findings persisted after adjustment for a number of comorbid conditions.

The current study has limitations. Though we were able to identify a single genome-wide significant admixture signal, it is possible that additional sites are not identified owing to sample size. Although we estimated a priori that we had 90% power to detect single loci explaining differences between observed AA and European rates,12, 13 the difference in admixture proportion detected on chromosome 6 was more modest, suggesting that single locus explanation is unlikely and that the current sample size is underpowered to identify multiple loci with lesser admixture signals. The use of admixture analysis in a control set is a useful way to filter results that are not associated with the disease trait; however, as noted earlier, we cannot be certain that the controls were sufficiently exposed to S. aureus to develop bacteremia if susceptible. This limited the utility of using admixture proportions in controls because we do not know their exposure status with certainty. Finally, both cases and controls exhibited evidence for increased European ancestry in this region. However, this does not necessarily argue against the region having a role in susceptibility to infection, as many of the control subjects may simply have had insufficient exposure to S. aureus to develop bloodstream infections if susceptible.

Despite these limitations, the current manuscript makes a key observation. Using an innovative statistical approach that allows analysis of an underrepresented study population, we have again identified HLA as a promising and biologically plausible determinant of genetic susceptibility to S. aureus at a genome-wide level of statistical significance. These results provide further support for association of SAB susceptibility with the HLA class II region. When placed in context, these results suggest that future studies of HLA-mediated susceptibility to SAB should consider the role of European class II haplotypes in determining risk in multiple ethnic groups with European admixture.

Materials and methods

Study sample

Our study used a case–control design. Data for cases were obtained from the S. aureus Bacteremia Group repository,11, 31 which has prospectively cataloged clinical data, bloodstream S. aureus isolates and/or human DNA from all consenting patients with SAB at Duke University Medical Center since 1994. Cases were unique AA adult inpatients with monomicrobial SAB. Controls are age-matched adult AA inpatients with no current or past S. aureus infection. The study was approved by the Duke University Institutional Review Board (IRB). All participants provided informed consent according to IRB policies. Patients dying of SAB prior to consent were included using IRB-approved policies for decedent research.

Genotyping and quality control (QC)

The Illumina Multi-Ethnic Genotyping Array (MEGA, comprising 1 779 819 markers; San Diego, CA, USA), developed to capture common and rare variation in a multi-ethnic sample and particularly enriched for variations initially detected in African and AA samples, was used to genotype 672 samples (450 AA SAB, 194 AA controls, plus 28 QC samples) at the University of Miami Hussman Institute for Human Genomics (HIHG) Center for Genome Technology. One CEPH standard sample was included on each 96-well plate to ensure reproducibility of results and check for plate rotation errors. Samples were processed 8 per MEGA ex-array and randomized with respect to case/control status prior to laboratory analysis. MEGA array images were analyzed using Illumina GenomeStudio (San Diego, CA, USA), using the manufacturer’s protocols and cluster files. Called genotypes were exported from GenomeStudio for QC analysis.

First, replicate CEPH samples were checked for concordance; plates with discordant CEPH replicates were re-examined and rerun if necessary. Following this step, formatted data analysis files were extracted for subsequent statistical analysis.

Power and sample size

Prior to initiating the study, we conducted analytic power calculations using standard power formulas.32 These formulas involve four model parameters: qA, the average amount of African ancestry; pA, the African risk allele frequency; pE, the European risk allele frequency; and γ, the risk conferred by each copy of the risk alleles compared with the non-risk homozygote. A significance threshold of 3e-5 was used to account for the number of hypotheses planned to be tested. All parameters were calibrated using the published rates of invasive multidrug-resistant S. aureus in AAs and European-descended populations.12, 13 Thus it was shown that, if a single genetic locus explained the observed differences in AA and European rates,12, 13 then we have >95% power to detect loci conferring increased risk for SAB in a sample of 450 AA cases and 194 AA controls.

QC filtering

Following genotyping, 430 SAB cases and 200 AA controls remained. Several QC filters were applied to the genotyped data: sample and SNP genotyping success rates (95%) (removal of 8 cases, 0 controls and 250 956 markers), Hardy–Weinberg equilibrium (P<0.001) (removal of 7900 markers), gender discrepancies (removal of 11 cases and 13 controls) and identity-by-descent analysis (relatedness estimate 0.10) (removal of 17 cases and 8 controls). Duplicate SNPs were also excluded (24 026 removed) as well as SNPs on chromosomes coded 0 (8422 removed), SNPs on chromosome 26 (265 removed) and SNPs on non-zero chromosomes with base-pair location 0 (17 removed). A total of 390 AA SAB cases, 175 AA controls and 229 860 SNPs remained for statistical analysis after QC filtering.

Descriptive statistics

Selected baseline characteristics of the SAB cases and AA controls are summarized with frequencies (percentages). Comparisons between SAB cases and AA controls were made using Pearson’s chi-square test when cell frequencies were sufficient; otherwise Fisher’s exact test was used.

Inference of local ancestry using LAMP-LD

Local ancestry is defined as the genetic ancestry of an individual at a particular chromosomal location, where an individual can have 0, 1 or 2 copies of an allele derived from each ancestral population.33 Several approaches have been proposed for local ancestry estimation in two-way admixtures, with methods that explicitly model the linkage disequilibrium (LD) structure within the ancestral populations showing the highest accuracy in AAs.34, 35, 36, 37, 38 In this analysis, we use LAMP-LD,39 a window-based algorithm combined with a hierarchical Hidden Markov Model to represent haplotypes in the ancestral population. Phased CEU and YRI haplotypes from the HapMap Phase 3 project40 were used as reference panels. A total of 229 860 SNPs were analyzed.

Association statistics

The MIXSCORE software41 was used to evaluate two association statistics on the filtered admixed data. First, a case-only ADM41 was computed to test the hypothesis that the proportion of European ancestry at the candidate locus differs from the genome-wide proportion (local admixture enrichment). Second, because the causal SNP may have different allele frequencies in ancestral populations, a mixed χ2(1df) score (MIX)35 was also calculated to jointly evaluate both differences in SNP allele frequencies in cases vs controls and differential admixture; the P-value cutoff for this statistic was 5.00e-08.

Adjustment for multiple testing

As regions of the genome can span multiple loci, resulting in highly correlated tests, a standard Bonferroni correction is conservative. Therefore, we calculated multiplicity adjusted P-values using the permutation-based step-down maxT algorithm of Westfall and Young.42 Complete details of the method and its implementation relative to this study are given in Supplementary Materials.