Global prevalence of pre-existing HCV variants resistant to direct-acting antiviral agents (DAAs): mining the GenBank HCV genome data

Direct-acting antiviral agents (DAAs) against hepatitis C virus (HCV) proteins open a whole new era for anti-HCV therapy, but DAA resistance associated variants (RAVs) could jeopardize the effectiveness of DAAs. We reported the global prevalence of DAA RAVs using published GenBank data. 58.7% of sequences (854/1455) harbored at least one dominant resistance variant and the highest RAV frequency occurred in Asia (74.1%), followed by Africa (71.9%), America (53.5%) and Europe (51.4%). The highest RAV frequency was observed in genotype (GT) 6 sequences (99%), followed by GT2 (87.9%), GT4 (85.5%), GT1a (56%), GT3 (50.0%) and GT1b (34.3%). Furthermore, 40.0% and 29.6% of sequences were detected RAVs of non-structural (NS) 5A inhibitors and NS3 protease inhibitors, respectively. However, RAVs to NS5B nucleo(t)ide inhibitor (NI) and NI-based combinations were uncommon (<4% of sequences). As expected, combinations of multiple RAVs to the IFN-free regimens recommended by current guidelines were rarely detected (0.2%–2.0%). Our results showed that the overall global prevalence of DAA RAVs was high irrespective of geography or genotype. However, the NI-based multi-DAA regimens had a low RAV prevalence, suggesting that these regimens are the most promising strategies for cure of the long-term HCV infection.


Results
Screening of HCV genomic sequences. We identified 630,407 sequences from the NCBI Nucleotide Database in August 2014 using the key words "hepatitis C virus" or "HCV". After removing sequences with < 9000 bp, we narrowed the list of sequences to 2307 sequences of interests. After removing duplicates and non-patient orientated sequences, we obtained a list of 1459 sequences (Fig. 1). Genbank accession numbers for all sequences are provided in Supplementary Table 1. Among these sequences, 91% (1327/1459) were confirmed to be DAA-naïve by searching for their annotated information and retrieving all DAA-related clinical trials since 2003.
To investigate the prevalence of described RAVs in relation to investigational DAAs, we analyzed related amino acid substitutions separately for the 687 GT1a, 361 GT1b, 184 GT2, 48 GT3, 76 GT4 and 99 GT6 HCV sequences. The prevalence of RAVs in GT5 was not assessed because of the small number of available samples (n = 4).
However, there were several exceptions for different genotypes. In the NS3 region, the Q80K variant (associated with resistance to Simeprevir) was the most frequently observed among the GT1a sequences (37.6%, 258/687). In contrast, the variant S122T to Simeprevir was the most frequently detected (5.5%, 20/361) in GT1b sequences. The variants L31M, P58S and Y93H in the NS5A region and the variants L159F to Sofosbuvir and S556G to Dasabuvir in NS5B region were common in GT1b sequences (3.8%-9.7%). For other GTs, the variant S122R to Simeprevir in the NS3 region and the variant H58P to Daclatasvir in the NS5A region were common in GT2 sequences (45.1%, 78/173 and 50.8%, 88/173). The Q30K variant to Daclatasvir and Ledipasvir in the NS5A region was observed in 29.2% of GT3 sequences. The Q30R variant to all three NS5A inhibitors was mainly observed in the GT4 and GT6 sequences (55.3% and 24.2%, respectively). Furthermore, the I170V variant to Boceprevir in the NS3 region and the variants M28V and Y93S to at least two NS5A inhibitors in the NS5A region were common in GT6 sequences as well (22.2%-65.7%; Table 1).
Global prevalence of DAA RAVs. The overall prevalence of RAVs to all nine DAAs examined was 58.7% (854/1455). When the analysis was more conservatively restricted to clinically relevant RAVs, 37.9% of the total sequences harbored as least one RAV ( Fig. 2A). Geographically, the overall prevalence of RAVs in America, Europe, Asia and Africa was 53.5% (433/810), 51.4% (116/227), 74.1% (275/372) and 71.9% (30/42), respectively. The resistance rates observed in Asia and Africa were much higher than those observed in Europe and America (p < 0.05). The prevalence of clinically relevant RAVs was 48.4% in America, 29.3% in Europe, 18.5% in Asia and 31.3% in Africa. Oceania was excluded from this analysis because of the limited number of samples (four sequence; Fig. 3).

Prevalence of RAVs in various genotypes.
In GT 1a, the total frequency of RAVs was 56% and the highest prevalence of RAVs was observed in the NS3 region, especially in Simeprevir. In GT 1b, the total frequency of RAVs was 34.3% and the RAVs were mainly detected in the NS5A region, particularly in Daclatasvir. Notably, the prevalence of RAVs in NS5B related combinations was low, irrespective of GT 1a or 1b (Fig. 4A,B). The most commonly observed clinically relevant RAVs were RAVs to Simeprevir in GT1a and Daclatasvir in GT 1b (41.9% and 12.7%, respectively; Fig. 4C).
In other GTs, the overall prevalence of RAVs in GT2, GT3, GT4 and GT6 were 87.9%, 50%, 85.5% and 99%, respectively (Fig. 4A). The highest prevalence of RAVs in these GTs occurred in the NS5A region (41.7%-80.3%). Additionally, the RAVs in the NS3 region in GT2 (mainly observed to Simeprevir) and GT6 (mainly observed to Boceprevir and Simeprevir) were also common (59% and 92.9%, respectively). However, the RAVs to NI Sofosbuvir related combinations were uncommon in GT3 and GT4 (2.1%-3.9%), but frequent in GT2 and GT6 (4.0%-12.1%; Fig. 4A,B). Further analysis of clinically relevant RAVs indicated that 16.2%, 29.2% and 31.6% of the sequences were observed RAVs in GT2, GT3 and GT4, respectively. Clinically relevant RAVs in the NS5A region (mainly observed to Daclatasvir and Ledipasvir) were frequent in these GTs (15.6%-29.2%). Remarkably, none of the sequences observed in these GTs corresponded to multiple clinically relevant RAVs to NI related combinations (Fig. 4C,D). Prevalence of RAVs to IFN-free regimens. IFN-free regimens were recently recommended for the clinical treatment of HCV infections by the Asian Pacific Association for the Study of the Liver (APASL) 14 , the European Association for the Study of the Liver (EASL) 15 and the American Association for the Study of Liver Disease (AASLD) 16 . These recommended regimens included Sofosbuvir plus Ribavirin treatment for GT2 and GT3 patients; Sofosbuvir plus Simeprevir for GT1 and GT4 patients; Sofosbuvir plus Ledipasvir for GT1, GT4, GT5 and GT6 patients; Sofosbuvir plus Daclatasvir for all GTs and the combination of Paritaprevir, Ritonavir or Ombitasvir with Dasabuvir (3D) for GT1 naïve patients.
Multiple RAV combinations to these IFN-free regimens were observed, but the frequencies were extremely low. Only a few sequences were detected that included the combination of multiple RAVs associated with resistance to Simeprevir plus Sofosbuvir, Daclatasvir plus Sofosbuvir, Ledipasvir plus Sofosbuvir and Paritaprevir/ Ombitasvir plus Dasabuvir (0.9%, 2.0%, 1.3% and 0.1%, respectively; Fig. 5A). Similarly, in different GTs, the total prevalence of multiple RAV combinations to these regimens was also low. An exception to these observations was the combination of multiple RAVs to the regimen Sofosbuvir plus Daclatasvir in GT2 and GT6, and this was observed in 6.9% and 8.1% sequences, respectively (Fig. 5B). Remarkably, multiple clinically relevant RAV combinations to these IFN-free regimens were not detected in this study.

Discussion
Our current study demonstrated that the global prevalence of DAA RAVs was high (58.7%, 854/1455; between 53.5% and 74.1% in various geographical locations or between 48.4% and 99.0% in the HCV genotypes examined). RAVs in the NS5A and NS3 regions were most frequently observed; however, RAVs in the NS5B region were rare, especially in association with the recommended IFN-free regimens (0.1%-2.0%). As with clinically relevant RAVs, the prevalence of RAVs in these regions was lower.
RAVs were detected in up to 58.7% of the sequences analyzed in this study. This frequency is significantly higher than that observed in the previous study by Kuntzen et al. 17 which reported that HCV genome dominant DAA resistance variants occurred in 8.6% of treatment-naïve HCV genotype 1-infected patients in American and European populations. The huge discrepancy between these studies may be the result of several factors. First, the current study included RAVs in the NS3, NS5A and NS5B regions, but Kuntzen et al. only included the RAVs in the NS3 and NS5B regions. Second, more GT sequences were enrolled in the current study than the Kuntzen et al. study, further contributing to the discrepancy. Finally, the current understanding of HCV DAA RAVs is continuously improving, and more RAVs had been identified at the time of the current study (e.g. the variants at position 80 and 122 in NS3 region) than were available at the time of the Kuntzen et al. study. However, Mo et al. 18 reported a prevalence of the RAVs in 80 DAA treated patients with HCV genotype-1 that was significantly higher than that observed in the current study (94% vs. 58.7%). One explanation of this discrepancy may be that HCV adapts its genome to survive and increases its resistance to DAA treatment both during and after DAA treatment. Thus, additional variants with increased resistance will occur in DAA-treated patients when compared with DAA-naïve patients 19 .
The current study showed that RAVs to NS5A and NS3 inhibitors were common and occurred with a higher frequency than the frequency reported by previous studies 17,20 . This discrepancy might be due to the smaller sample sizes of the previous studies. Furthermore, the body of knowledge concerning DAA RAVs continues to grow, and discrepancies between the current and previous studies may be the result of an increase in the number of known RAVs. The variants L31M and Y93H, which induce resistance to Daclatasvir and asunaprevir, were recently detected by ultra-deep sequencing analysis 21 . These variants were infrequently detected in the current study (1.8% and 4.3%, respectively). Conversely, the Q80K variant associated with Simeprevir resistance in GT1a patients 22 was more common in the current study (37.6%, 258/687). This result was supported by the results reported in another recent study 18 . The frequency of NS5B inhibitor RAVs was low in this study, especially RAVs to NI. Notably, the S282T variant in the NS5B region leading to Sofosbuvir resistance 23,24 occurred in just one sequence. This observation was consistent with a previous study 20 .
Mono-therapy with NS3 inhibitors resulted in the early emergence of drug resistance variants 25 . Therefore, the use of drug combinations, especially drugs with different mechanisms of action against HCV infection, could lead to a reduction in drug resistance and RAVs. Several clinical trials implementing various DAA combinations have reported increased SVR, lower resistance rates and better drug safety profiles 26 . In this study, RAVs to the different combinations of DAAs were uncommon, especially RAVs to NI-related combinations of DAAs. Furthermore, when compared with the relatively low SVR and serious adverse effects associated with IFN therapy, the IFN-free regimens were a more effective anti-HCV treatment, especially in patients who could not bear IFN or treatment-failure with IFN. Some IFN-free regimens have recently been recommended by EASL, APASL and AASLD and have shown extremely high SVR. Combinations of multiple RAVs in the same sequence to the recommended IFN-free regimens were rare in the present study. This indicates that IFN-free regimens are more effective and should be considered the superior choice for clinical anti-HCV therapy.
The current study is novel and has a number of important strengths. First, we utilized full-length HCV genome sequences to analyze DAA resistance. This included all DAA resistance regions (NS3, NS5A and NS5B region). Second, we included all up-to-date approved DAA data in our data analysis. However, this study has some limitations as well. HCV genome sequence data were obtained from the NCBI nucleotide database. It is possible that some detailed information could be missing from these database entries, so the potential of bias cannot be ruled out. For example, the database contained few Oceanic sequences and GT5 sequences, which hindered further analyses of these sequences sub-populations. In summary, the global prevalence of DAA RAVs was high, independent of global regions or HCV genotypes. Furthermore, the high frequencies mainly occurred in the NS5A and NS3 regions. However, RAVs to NI-related multiple DAAs were rare, suggesting that NI-based combination therapy is a promising strategy for HCV infection elimination. Our current data supports the EASL, APASL and AASLD recommendations of IFN-free regimens for HCV infection control.

Methods
GenBank search strategy. HCV genomic sequences were retrieved from GenBank (http://www.ncbi. nlm.nih.gov/) in August of 2014 using the key words "hepatitis C virus" or "HCV. " After the initial search, near full-length HCV sequences (> 9000 bp) were screened and any duplicate sequences or sequences from non-human hosts were discarded (Fig. 1). Finally, the following information was extracted for each sequence: GenBank Accession Number, serum or plasma collection time and geographic data.
HCV genotypes. HCV genotypes were retrieved and identified with the NCBI viral genotyping tool (http:// www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi). Variant analyses and definition. All DAA RAVs included in this study were identified from the most current available literature, as summarized in Fig. 6   . To facilitate investigation of the prevalence of RAVs, clinically relevant RAVs selected during or after drug treatment in patients and obtained in phenotypic assays were differentiated from drug resistance variants observed in vitro. Little data has been published concerning RAVs for GT2-GT6, thus the information available concerning RAVs for these GTs was limited. Therefore, when information concerning clinically relevant RAVs for GT2-GT6 was missing, RAVs in GT1 were used as substitute in vitro RAVs for GT2-GT6. Sequences were aligned and analyzed with MEGA 5.0 software (Center for Evolutionary Medicine and Informatics, Tempe, AZ, USA). A variant type was described as the replacement of the consensus amino acid in the corresponding genotype with a novel one; for instance, Y93H and Y93N in the NS5A region were described as two variant types.
Statistical analyses. All data were presented as rates (%) and analyzed statistically using the chi-squared test with SPSS 17 software (SPSS Inc., Chicago, IL, USA). p values were calculated with two-tailed statistical analysis, and a p value ≤ 0.05 was considered statistically significant.