The development of chronic viral infection represents a failure to mount an adequate innate and/or adaptive response to a specific pathogen. Infection with hepatitis C virus (HCV) in humans represents a paradigm of a dichotomous outcome of infection, as approximately three quarters of acute HCV infections evolve to a chronic state, but one quarter are spontaneously cleared1. As such, it is likely that genetic predispositions, especially at loci that regulate the innate and/or adaptive immune response, strongly contribute to the development of chronicity. A prior genome wide association study (GWAS) conducted by our consortium demonstrated striking associations of spontaneous resolution of HCV with polymorphisms near the IFN-L3 locus (IL28B) and in the HLA class II locus (Duggal et al.2).

The associations identified in Duggal et al. span large genomic regions, specifically 55,000 base pairs for the Il28B locus and >1 mega base pairs for the HLA class II locus. Recently advances in genomic technologies allowed a more precise characterization of genetic associations and facilitated resolving these associations to much smaller genomic regions. Firstly, ImmunoChip3, a customized array platform with deeper coverage of loci enriched in autoimmune diseases, provides coverage of additional genomic variants for an opportunity to explore with greater precision the contribution of these loci to the clearance of viral infection. Secondly, additional coverage of the MHC region can be gained using an imputation algorithm that takes into account the long range linkage-disequilibrium in MHC, and a large customized reference panel with improved coverage of the MHC region4. Thirdly, fine-mapping algorithms5,6, designed with the goal to resolve known genetic associations to smaller sets of variants, can be used with the high density genomic data to further improve the precision of the genetic associations.

We therefore conducted an analysis of a large pool of spontaneous resolvers and chronic patients of HCV using the ImmunoChip platform, the SNP2HLA algorithm with the T1DGC MHC imputation reference panel4, and a recently-developed fine-mapping algorithm6,7 to (1) more precisely define the susceptible variant within the known associated loci; and (2) identify additional loci associated with clearance. Similar successes have been achieved in other conditions such as inflammatory bowel diseases6,8. Additionally, we explored the hypothesis that there are shared mechanisms that define a “brisk” immunity able to confer both susceptibility to autoimmune disease and improved control of pathogens. We also examined the influence of region (North America versus European) upon associations with HLA within the European ancestry, as previous studies have shown variability of results, especially for the class I locus9,10,11,12.


The final dataset after QC has 166,537 variants for 527 cases/828 controls of European ancestry; and 171,161 variants for 75 cases/171 controls of African ancestry (Table 1). For each ancestry, we performed logistic regression under the additive model using the first two principal components as covariates. The QQ plot (Fig. 1, using common variants with >2% minor allele frequency) and the genomic control (GC) factors (0.98 for the European ancestry and 0.92 for the African ancestry using designated null variants) indicate the effective control of the population stratification.

Table 1 Variants and samples in this study.
Figure 1
figure 1

QQ plot for cohorts of European (a) and African (b) ancestries. The red line indicates the null distribution. Only variants with minor allele frequency >2% were used in this figure.

For European samples, we identified 8 genome-wide significant variants (p-value < 5E-8) in two loci (Fig. 2 and Table 2). The variant on chromosome 19, rs8099917, shows the strongest association with spontaneous clearance (p-value = 5E-16). Patients carrying the minor allele, G, are on average 2.5x (odds ratio = 0.39) less likely to spontaneously clear the virus compared to those with two copies of the C allele. This variant is roughly 7,000 base pairs upstream of the IL28B gene, and has been previously reported to be associated with HCV spontaneous clearance2 and the response to chronic HCV therapy in Asian populations13. Previous studies have also shown an association between IL28B and interferon-based clearance of HCV14, and an association between a frameshift variant upstream of IL28B and impaired clearance of hepatitis C virus15. Because the IL28B/IFLN4 region was not designed as a high-density locus in ImmunoChip, we could not test other variants in this region for their association with HCV spontaneous clearance, and was unable to provide a better resolution in this locus.

Figure 2
figure 2

Manhattan plot for cohorts of European (a) and African (b) ancestries. The red horizontal line indicates the genome-wide significance threshold.

Table 2 Genome-wide significant associations.

The other genome-wide significant locus for the European samples is the major histocompatibility complex (MHC) locus. Genome-wide significant variants in this region are reported in Table 2 (before imputation). We used SNP2HLA4 and a customized reference panel from a T1D study to impute missing variants, HLA alleles and amino acid residues for this region. We identified 12 SNPs and 5 amino acids that are genome-wide significant (Fig. 3 and Table 3, boldfaced). No secondary signal in this region exceeded the suggestive significance threshold (1 × 10−5) after conditioning on the primary signal. Therefore, all variants reported in Table 3 account for the same association signal. Using a fine-mapping algorithm described in another study6,7, we constructed the 99% credible set, which is a set of variants that has 99% probability of having the causal variant in this locus (Table 3, full). Comparing with the previous study2 which identified this association to a region of more than 1 mega base pairs, we mapped this association to a much smaller region of 50,562 base pairs.

Figure 3
figure 3

Regional association plot for the MHC class II region. Color indicates the linkage equilibrium with the top associated variant (rs6457620).

Table 3 Associations in the 99% credible set in the MHC region.

Neither the MHC nor the IL28B locus was genome-wide significant in the African ancestry. Using the heterogeneity test (fixed-effect, implemented in the R metafor package), we found that neither the MHC locus nor the IL28B locus have significantly different effect size (p-values = 0.47 and 0.29 respectively) across the two populations. Therefore, the difference in the significance is likely driven by the sample size and/or the allele frequency differences.

In addition to the genome-wide significant loci, we examined genes outside the HLA that have been previously associated with HCV spontaneous clearance16. Only genes IFNG-AS1 (p-value = 6E-4) and STAT1 (p-value = 3E-4) showed marginal evidence of association (p-value < 1E-3), and this effect was observed only in the European cohort. No genes reached the marginal p-value threshold in the samples of African ancestry. IFNG-AS1 is a long noncoding RNA that is expressed in CD4 T cells and promotes Th1 responses17. STAT1 is one of the key mediators of the type I, II and III interferon responses.

Since HCV is particularly diverse, with up to a 30% difference at the amino acid level between major viral genotypes, the strain of infecting virus may influence HLA-mediated clearance11,18. Unfortunately, information regarding the virus genotype or subtype was not available in this study so a direct comparison is therefore not possible. However, an indirect comparison is possible by taking advantage of the observation that North American patients are much more likely to be infected with the 1a virus and European patients are much more likely to be infected by the 1b virus19. We observed that the association in the class II MHC locus, after accounting for the sample size, age, sex and exposure (Methods), is stronger in North American samples than in European samples (Fig. 4) with marginal significance (p = 0.044). This suggests that viral subtype may have influenced the genetic mechanism underlying the clearance of HCV. Meta-analysis by cohorts confirms this observation (Fig. 5). We also interrogated the potentially protective effect of certain SNPs associated with HLA class I alleles previously implicated in spontaneous clearance. No SNP associated with class I was associated with genome-wide significance, including those associated with HLA B*27 subtypes (p-values > 0.05). The strength of association with the SNP most closely linked with HLA-B*57 and control of HIV-1 (rs2395029) was not genome-wide significant but showed a marked difference by continent (North America p-value = 8.6E-4, Europe p-value = 0.078, overall p-value = 1.0E-4), suggesting that any protective effect of this class I allele differs by region.

Figure 4
figure 4

Regional association plot for the MHC class II region in North American samples (a) and European samples (b). Color indicates the linkage equilibrium with their respective top associated variant.

Figure 5
figure 5

Forest plot for the top MHC class II association (rs6457620). Cohorts have been grouped by the geographical locations of where they were collected: the top panel includes cohorts collected in North America, and the bottom panel includes cohorts collected in Europe.

Autoimmune disorders have been reported to have shared genetic susceptibility loci20,21. For each of 5 major autoimmune diseases, including inflammatory bowel disease, systemic lupus erythematosus, rheumatoid arthritis, celiac disease and multiple sclerosis, we listed all variants that reached p-value < 0.001 (or the best variant) in this analysis. We found no shared variant after considering multiple testing. A full exploration of the hypothesis that susceptibility to autoimmunity also confers ability to clear HCV will require a larger sample size. This analysis was only performed in the European cohort because the African cohort has even less power due to the sample size, and GWAS results in samples of African ancestry for other autoimmune disorders is more limited.

An alternate approach, taken by the International Genetics of Ankylosing Spondylitis Consortium22, is to search for the reported associations with other diseases in loci having suggestive evidence (p-value < 1E-5), i.e., the MHC and the IL28B loci in this study. We only performed the search in IL28B because MHC has been already implicated in many autoimmune disorders. We searched within 0.5 Mb around the lead SNP (rs8099917) in IL28B for associations with other diseases that have been reported in the NHGRI GWAS catalog (, accessed on July 1, 2017). This catalog hosts published associations between genetic variants and thousands of diseases/traits, including autoimmune, inflammatory, cardiovascular, metabolic, brain and diseases. Three SNPs were found to be in partial linkage disequilibrium (R2 > 0.4) with our lead SNP in IL28B, including rs12980275 (R2 = 0.41) associated with lipid levels in hepatitis C treatment23, rs12979860 (R2 = 0.42) associated with chronic hepatitis C infection/response to hepatitis C treatment14 (discussed in the previous sections), and rs688187 (R2 = 0.40) associated with mucinous ovarian carcinoma24.


We have conducted a genome-wide association study to identify genomic variants underlying the HCV spontaneous clearance using ImmunoChip. Consistent with previous reports2, two loci were found to be significantly associated with the HCV spontaneous clearance in the European cohort. The ImmunoChip design, the imputation pipeline specifically designed for the MHC region and the novel fine-mapping algorithm facilitated the accurate characterization of classical HLA types and allowed us to achieve a higher resolution in the MHC region. Twelve SNPs and 5 amino acids in the MHC region were found to be significantly associated and no secondary signal remains after conditioning on the best SNP. Fine-mapping mapped this association to a region of about 50 kilo base pairs, down from 1 mega base pairs in the previous study. This fine-mapping analysis was conducted in the European population. We note that if the MHC association is shared across populations, this fine-mapping results will also be generalizable to other populations.

We found no associated variants in the African cohort, probably due to different genetic background (in the case of the IL28B locus) and limited sample size (in the case of the MHC locus). Previous studies18 suggest that spontaneous clearance can be more common with one virus genotype than another25. We noted that the association in the class II MHC locus might be stronger in samples from North American than those from Europe. While viral subtyping was not available with sufficient numbers in this cohort, the virus subtype 1a is more prevalent in North America than in Europe where subtype 1b predominates. Previous studies showed that key polymorphisms between viral subtype may have influenced HLA-restricted genetic associations underlying the clearance of HCV11,26. In HIV-1, viral mutational escape over first decades of the epidemic reduced the protective effect of key HLA alleles on a population level27. For HCV, additional evidence, such as virus typing, is needed to confirm this finding.

Limitations of this study include inability to dissect SNPs near the IL28B/IFLN4 region, as this loci had not been previously implicated in autoimmune GWAS studies. While the ImmunoChip did include rs8099917 as a surrogate for this region, additional information regarding associations with rs12979860 and ss469415590 is not available15. Also, this study was a fine-mapping exercise that narrowed the MHC significantly but was not fully independent due to considerable overlap with the previous GWAS.

Previous studies of GWAS data revealed that there are SNPs and loci with evidence of association across multiple immune-mediated diseases20. We found several variants that have suggestive and plausible evidence of associations with both HCV spontaneous clearance and another autoimmune disorder. Despite the observation that none of these variants are significant after the strict Bonferroni correction, they jointly confirm the concept that shared genetic mechanisms underlie autoimmune disorders and suggest the hypothesis that susceptibility to autoimmunity may also confer ability to clear HCV. Fuller exploration of this hypothesis will require further analyses with larger sample sizes.


Overview of samples

1,944 samples from 13 cohorts (ALIVE, BBAASH, HGDS, MHCS, Rosen and colleagues, REVELL, BAHSTION, SWAN, Toulouse, Cramp and colleagues, Hencore, Mangia and colleagues, UK Drug Use Cohort) were genotyped in this study, as previously described2. Self-clearance of HCV was coded as cases (718 samples) and persistence of HCV was coded as controls (1,180 samples). Samples with unidentified clearance status were not used (46 samples). All samples were genotyped using Illumina’s ImmunoChip, a custom Infinium chip with 196,524 SNPs and small in/dels. A large number of these variants are in 187 high-density regions known to be associated with twelve autoimmune disorders and inflammatory diseases. Variants in these high-density regions include 289 established associations, variants from 1000 genome project low coverage pilot 1 study28, and variants discovered in re-sequencing29. In addition, roughly 25,000 variants were included as replication of unrelated diseases as part of the WTCCC2 project, with the purpose of serving as null SNPs in analyses.

Sample ethnicities

To identify the sample ethnicities, we first constructed the principal component axes using Hapmap samples. 988 founders from Hapmap phase 3 (draft release 2)30, including samples from ethnicities ASW, CEU, CHB, CHD, GIH, JPT, LWK, MEX, MKK, TSI and YRI were used. To calculate the principal components, only common variants that are also present in the ImmunoChip were used, and AT/GC SNPs were excluded to avoid ambiguous strand alignment. We performed LD pruning of the variants, resulting in a total of 15,525 variants used to create the principal components. The study samples were then projected to the principal component axes and assigned the ethnicities based on their distance to the Hapmap samples. Out of 1,898 samples, 1,416 samples were mapped to European ancestry, 225 samples were mapped to African ancestry and 227 samples were admixtures and were not used in this study.

Quality control

QC was performed separately on samples of European and African ancestries separately. Variants that failed the Hardy–Weinberg equilibrium test in controls (p-value ≤ 1E-5) or had low call rate (≤95%) were identified, and 24,820 variants were removed in European samples and 20,196 variants were removed in African samples. The remaining variants were used to perform QC in samples. Samples were cleaned for having low call rate (≤95%) or having high heterozygosity rate (>3 standard deviations from the mean).

We then created a LD pruned dataset for calculating the identity by state (IBS) matrix and the principal components. We pruned the variants using a sliding window of 50 variants, step size of 5 variants and variance inflation factor threshold of 1.25. There were 20,782 variants in European samples, and 21,778 variants in African samples after the pruning. The IBS matrix was calculated using this LD pruned dataset and checked for sample relatedness. 28 duplicated samples in European cohorts and 9 duplicated samples in cohorts of African ancestry have been identified and removed (pi_hat > 0.9). The final dataset has 527 cases and 828 controls for European cohorts, and 75 cases and 171 controls for African cohorts.

To correct for within European and within African population stratification, we calculated the principal components for samples of European ancestry and African ancestry, respectively. The first two principal components sufficiently control the population stratification in both ancestries (results not shown) and were use in the association analysis as covariates.


Imputation of the MHC region was performed on QC cleaned data using SNP2HLA4. This software package takes advantage of the long-range linkage disequilibrium between HLA loci and SNP markers across the MHC region and can perform accurate imputation of classical HLA types starting from SNP genotype data. The reference panel was created using the Type 1 Diabetes Genetics Consortium’s high quality HLA reference panel (roughly 5,000 European samples), which includes classical HLA alleles and amino acids at class I (HLA-A, -B, -C) and class II (-DPA1, -DPB1, -DQA1, -DQB1, and -DRB1) loci.

Association test

All association tests were performed in PLINK 1.0731 using the logistic regression. We assumed additive models and used the first two principal components as covariates in the regression. HCV spontaneous clearance was coded as case so an odds ratio >1 indicates the tested allele increases the probability of spontaneous clearance.

Test of heterogeneity across North America and Europe samples

To evaluate whether the effect of the MHC association is consistent across samples from North America and Europe, we conducted the association test with age, gender and the HCV exposure (IDU v.s. non-IDU) as covariates to control for potential confounding. We only used samples that have non-missing measurements in these variables. For North America samples, we have 173 cases (spontaneous clearance) and 298 controls; and for Europe samples we have 144 cases and 266 controls. The heterogeneity test was conducted using the odds ratio and standard error from the association test in a fixed-effect model implemented in the R metafor package.

Use of experimental animals, and human participants

No experimental animals were used in this study. The study protocols were approved by the institutional review board (IRB) at each center involved with recruitment (listed at the end). Informed consent and permission to share the data were obtained from all subjects, in compliance with the guidelines specified by the recruiting center’s IRB. All experiments were performed in accordance with relevant guidelines and regulations.

  • Massachusetts General Hospital

  • Johns Hopkins School of Medicine

  • Division of Cancer Epidemiology and Genetics, National Cancer Institute

  • Casa Solievo della Sofferenza Hospital, Italy

  • Imperial College London

  • Toulouse III University, France

  • Plymouth Hospitals, UK

  • Weill Cornell Medical College

  • Blood Systems Research Institute

  • University of Cambridge

  • University of Colorado

Data availability

The data that support the findings of this study are available from the corresponding authors but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request.