Introduction

Hepatitis C virus (HCV) infection is a global health problem with latest data estimates 71.1 million viraemic chronic infections (1.1% of world population)1. HCV displays high genetic heterogeneity and is classified into eight major genotypes (1–8)2 and 86 subtypes3. Worldwide there are significant differences in epidemic history among the HCV genotypes (GT) which may differ in response to treatment, with GT1 being the most represented4. Among HCV-infected patients also coinfected with human immunodeficiency virus (HIV) subtype 1a (GT1a) prevails5,6,7.

Despite the high rates of cure with the new direct-acting antivirals (DAAs), the presence of NS5A resistance-associated substitutions (RASs) may compromise the efficacy of NS5A inhibitors. Recently, Zeuzem and colleagues (2017) observed that pretreatment ledipasvir-specific RASs (identified in 8–16% of patients) may compromise treatment outcome particularly in treatment-experienced patients with GT1a, as they found a sustained virological response (SVR) rate of only 76% in patients with pretreatment RASs8.

The seroprevalence of HCV infection in Spain is 1.1%9 and hepatitis C leads the list of infectious disease related mortality10. GT1 accounts for 67% of all HCV infections and GT1a causes 40% of all GT1 infections11. Until October 2018 117,452 patients had access to DAAs and 95.5% had SVR in the frame of the national plan of universal access to treatment12 while 1.1% of those cured are reinfected7.

Our group previously evaluated the prevalence of NS5A RASs to elbasvir before its introduction into clinical use in Spain13. However, the HCV-GT1a spatiotemporal distribution, epidemic history and resistance to all NS5A inhibitors in Spain remain unknown. In this study we aimed to make the first description of the origin, epidemic history, transmission dynamics and diversity of HCV-GT1a in Spain.

Methods

Study design and patients

Overall, 588 patients harboring HCV-GT1a were included. We used the STROBE checklist14 to design and carry out a cross-sectional survey of individuals chronically-infected with HCV-G1a and naïve to NS5A inhibitors who were attended in 84 health centers distributed throughout the national territory (Supplementary Information SI1). The samples were collected prior to anti-NS5A therapy initiation, between October 2014 and October 2015. Genotyping testing was performed using the Real-Time HCV genotype II assay (Abbott Laboratories, Illinois, USA)15. The identification of NS5A RASs was performed using plasma specimens at the National Center of Microbiology (Instituto de Salud Carlos III [ISCIII]). Anonymized samples and a minimum data set were then transferred to the ISCIII National Biobank (REF: 0000984).

Amplification and direct sequencing of NS5A gene

Amplification and direct sequencing of complete NS5A gene (1,343 nt) was performed as described elsewhere13. Standard Sanger sequencing of the PCR product was carried out using an ABI PRISM 377 DNA sequencer (AppIied Biosystems, USA), with a threshold of sensitivity of 15%16.

HCV subtyping, phylogenetic tree reconstruction and transmission clusters analysis

HCV subtyping was performed using COMET HCV subtyping tool17, and HCV-GT1a lineages (clade I and clade II) were confirmed by geno2pheno[HCV] (Bonn, Germany; https://hcv.geno2pheno.org/). The sequences were aligned with the MAFFT algorithm18 and manually edited in MEGA v7.0.2619. Reference HCV Genotype 1 sequences were retrieved from the Los Alamos HCV database and added to the final alignment (Table SI3). Using the RaxMLGUI v1.5 software, a maximum likelihood tree (ML) was then reconstructed under the GTR + Gamma evolutionary model with 1,000 bootstrap replicates as implemented in RaxMLGUI v1.520. Final visualization and annotation of the ML tree was implemented in iTOL v4.0.321. Transmission clusters including two or more individuals (TC ≥ 2) were identified on the ML tree using ClusterPickerGUI v1.2.322 with bootstrap and genetic distance thresholds set at 70% and 0.045 respectively.

Bayesian evolutionary analysis

To improve the convergence of MCMC chains during the Bayesian phylogenetic analysis in the BEAST v1.10 package23 sequences with < 50% sequence length coverage (n = 14) were excluded from the Spanish HCV-GT1a sequences and the remaining sequences split into two clades, clade I (n = 100) and clade II (n = 474) and analyzed separately. Next, a maximum likelihood tree was built using IqTree24 and a temporal signal analysis was performed using Tempest25. The Bayesian analysis was carried out as described formerly26. Briefly, the SRD06 model of nucleotide substitution27, and an uncorrelated lognormal relaxed molecular clock (UCLD) with a Bayesian skyline plot (BSP) coalescent prior were specified in BEAUti in the BEAST package. Because of lack of sufficient temporal signal in our dataset which is often encountered with HCV dataset, and as this can result in unreliable and inaccurate estimates, we relied on previously published data with a strong temporal signal to calibrate the molecular clock28,29,30. Hence, we specified a Normal prior distribution on the clock rate (0.001 ± 0.0005 substitutions/site/year)31. For the selection of appropriate molecular clock model, Bayes factor support was calculated by specifying the path-sampling and stepping-stone marginal likelihood estimation (Table SI4)32. Three independent Markov Chain Monte Carlo (MCMC) chains were run for 1–2 × 108 generations for clades I and II respectively and combined after discarding 10% burn-in. The convergence of MCMC chains were monitored in Tracer v1.7 (https://tree.bio.ed.ac.uk/software/tracer/). Maximum clade credibility (MCC) trees were summarized using tree annotator and tree visualization was implemented in Figtree v1.4.3 (https://tree.bio.ed.ac.uk/software/figtree/).

Phylogeography analysis

Discrete phylogenetic analyses of the HCV-GT1a clades I and II epidemic circulating in the autonomous regions in Spain were analyzed using the Bayesian stochastic search variable selection (BSSVS) procedure with an asymmetric model of among-location transition for the determination of the HCV lineage migration rates as implemented in BEAST v1.10 (Table SI4). The MCMC chains were run as described above and the spatiotemporal visualization and estimation of the Bayes factor (BF) support (BF support ≥ 3 was assumed to be relevant33 for all the geographical location transitions were performed in the “Spatial Phylogenetic Reconstruction of Evolutionary Dynamics” software using Data-Driven Documents (D3) (spreaD3 v0.9.6)34. The georeferencing of the RASs with demonstrated clinical relevance among the Spanish Autonomous Communities (CCAA) was performed by using the GPS coordinates in decimal degrees and the prevalence maps were elaborated using the data visualization tool Tableau 9.3, (Tableau Software, Inc., USA)35.

Resistance-associated substitutions

NS5A-specific clinically relevant RASs analyzed were the following according to the latest European Guidelines: K24G/N/R, K26E, M28A/G/T/V, Q30C/D/E/G/H/I/K/L/N/R/S/T/Y, L31I/F/M/P/V, P32L/S, S38F, H58D/L/R, A92K/T, Y93C/F/H/L/N/R/S/T/W36. Fold-change was evaluated according to the European and American Guidelines36,37 and additional references listed in Table SI5. RASs were categorized as follow: RASs with a high fold-change (> 100×, > 1,000×, > 10,000× with a probable clinical impact; RASs that have an intermediate impact on efficacy (fold-change < 100×); RASs with a low fold-change (< 20×) with no clinical significant impact38. The prevalence of RASs observed regionally was classified as follows: low level (< 5%), intermediate level (5–15%) and high level (> 15%). Codon sites corresponding to these RASs were removed from the HCV NS5A sequence alignment for phylogenetic analysis.

Statistical analysis

The Pearson chi-square test or Fisher’s exact test were used to analyze categorical variables while continuous variables were compared by Mann–Whitney U test. P-values were 2-tailed and statistical significance α was 5%. The SPSS software v.25 (SPSS Inc., Chicago, IL) was used to perform analyses.

Ethics statement

The study was approved by the Institutional Review Board and the Research Ethic Committee of ISCIII (Nº CEI PI 43_2015) and was conducted in accordance with the Declaration of Helsinki.

Results

Overall, 588 patients harboring HCV-GT1a were included. The 80.8% of subjects were men and were 50 years old (IQR: 47–53). HCV-monoinfected and HIV/HCV-coinfected patients were equally represented. Clade II was much more prevalent than clade I (82.7% vs. 17.3%; P < 0.0001) and clade II was more represented in HCV-monoinfected than in HIV/HCV-coinfected patients (87.0% vs. 78.1%; P = 0.004) (Table 1; Figure SI2). Importantly, consistent GT1a lineages results were obtained when comparing the clade assignment by phylogenetic method with the geno2pheno[HCV] algorithm, with only 4 sequences having a different assignment (Table SI6).

Table 1 Epidemiological characteristics of enrolled RASs and susceptibility to DAAs.

Male-dominated transmission pairs and clade II viruses predominate the Spanish HCV-GT1a population

Twenty-six transmission pairs and three transmission clusters (TC including more than 2 individuals) were identified (70% bootstrap support; genetic distance threshold < 0.045) compared to ten transmission pairs that were identified with a more restrictive transmission cluster criteria (90% bootstrap support; genetic distance threshold < 0.015) (Fig. 1, Table 2), representing 10.5% versus 3.4% (n = 62/588 vs 20/588) of the study population. The transmission pairs and clusters predominantly comprised clade II viruses (79.3%, n = 23/29) and occurred intra-regionally (55.2%, n = 16/29). Fourteen transmission pairs and three clusters (58.6%, n = 17/29) were made up of men only. We further investigated the influence of RASs on the transmission clusters. Identical transmission clusters but short of three transmission pairs (n = 23 vs n = 26) were identified when the RASs sites were not excluded from the alignment; of these, only two clusters, cluster 7 and 8, had transmissible M28V and Q30R + Y93H RASs, respectively (Table 2).

Figure 1
figure 1

Phylogenetic tree reconstruction of GT1a strains circulating in Spain. A maximum likelihood tree (ML) produced under the GTR + Gamma substitution model with 1,000 bootstrap replicates segregates the GT1a into two clades, clade I (blue) and clade II (red). Bootstrap supports ≥ 70% are depicted as purple filled circles at the nodes and transmission clusters (TC ≥ 2) are highlighted in grey colour.

Table 2 Characteristics of the identified transmission clusters of GT1a clades I and II strains in Spain.

The largest transmission cluster, cluster 29 (TC = 4 patients) also had the most diverse population encompassing patients from three different autonomous regions, namely, Cantabria, Castille and Leon and Basque Country. The highest number of transmission pairs and clusters were found in Galicia and Cantabria autonomous regions. Overall these results are consistent with a longstanding HCV epidemic in Spain that is driven by all modes of transmission.

Origin and spatial-temporal dynamics of GT1a dispersion in Spain

Clade II was introduced in Spain in the beginning of the twentieth century [median estimate of the date of the most recent common ancestor (MRCA), 1912, 95% highest posterior density (95% HPD) interval 1822–1964] whereas clade I was introduced several decades later in 1952 (95% HPD, 1905–1980) (Fig. 2A, B). Both viruses were unnoticed during several decades before going through exponential epidemic growth, clade II starting in the 1950s and clade I twenty years later in the 1970’s (Fig. 2C, D). Clade II epidemic dissemination was more successful than clade I and leveled off in the early 1990’s with Ne ≈ 2 × 105 which was almost 20-fold higher than clade I epidemic at its peak (Ne ≈ 1.2 × 104 in mid-1990s). However, clade II epidemic is now waning whereas clade I epidemic is still ongoing.

Figure 2
figure 2

Bayesian estimation of the epidemic history of NS5A GT1a clades in Spain. Maximum clade credibility trees for each of the clades, clade I (A) and clade II (B) are presented with the mean tMRCA (95% HPD interval) estimates in calendar years annotated at the root nodes for each of the GT1a clades. Posterior probability cut-off value ≥ 0.9 are annotated at the nodes. The Bayesian skyline plots (BSP) for GT1a clade I (C) and GT1a clade II (D) showing the epidemic growth over time are presented. The solid blue line represents the changes in the median effective population size through time on a log10 scale, with the grey shaded area corresponding to the 95% highest posterior density (95% HPD) interval.

The earliest migration of HCV-GT1a strains in Spain most likely occurred from the Basque country and involved very strongly supported dispersal of clade II viruses from the Basque country to Andalusia and Madrid (BF support = 192.13 and 170.24 respectively) (Table 3). Subsequently, these clade II viruses were successfully propagated from Andalusia and Madrid (BF support = 181.91 and 42.61 respectively). Moreover, Andalusia and Madrid have continued to be the important sources for the dispersal of clade I viruses in contrast to the Basque country which had more support for being a sink (Andalusia-to-Basque country and Galicia-to-Basque country, BF support: > 20 and > 3 respectively) rather than a source of viral dispersion. Thus, the migration pattern of the HCV-GT1a strains in Spain, together with the paraphyly of clade I relative to clade II and its later origin compared to clade II strongly suggest that clade I has evolved from clade II (Fig. 1) and is a driver of the ongoing HCV-GT1a epidemic in Spain.

Table 3 Bayes factor and Posterior probabilities for all supported migration events.

Baseline polymorphisms and RASs at the NS5A domain

Overall, almost 13,000 baseline mutations were detected, of which 8,602 were considered polymorphisms, being found in > 2% of the samples (Supplementary Dataset and Table 4). Viruses bearing RASs were present in 50 individuals (8.5%), 22/300 (7.3%) HCV-monoinfected and 28/288 (9.7%) HCV/HIV-coinfected and were similarly observed in those infected with clade 1 (8.8%; n = 9/102) and clade 2 (8.4%; n = 41/486). Six of those subjects harbored viruses with double RASs and one had triple RASs (14.0%, n = 7/50). In total 58 single RASs were identified in the study population being the most common M28A/T/V (37.9%; n = 22/58), Y93C/F/H/N (24.1%; n = 14/58) and Q30E/H/R (20.7%; n = 12/58). The double mutations 30H + 93H (n = 2), 28V + 30R (n = 2), 30R + 93H (n = 2) and triple mutations 30H + 93H + 24R (n = 1) were also observed (Table 1). The prevalence of RASs observed regionally was as follow: low level (< 5%) in four regions, intermediate (5–15%) in thirteen regions, high (20%) in Cantabria (Fig. 3 and Table 5). Among patients harboring RASs, those with mutations which confer high resistance were: 62.0% (n = 31/50) to daclatasvir, 52.0% (n = 26/50) to ledipasvir, 50.0% (n = 25/50) to ombitasvir, 16.0% (n = 8/50) to elbasvir, 16.0% (n = 8/50) to velpatasvir and 4.0% (n = 2/50) to pibrentasvir; 64.0% (n = 32/50) of patients harbored RASs conferring resistance to more than 1 DAA. Eleven subjects had RASs to velpatasvir and one to pibrentasvir that were not likely to confer a clinically significant impact.

Table 4 Polymorphisms and RASs identified in GT1a-infected patients.
Figure 3
figure 3

Spatial epidemiology of natural polymorphisms at the HCV NS5A gene associated with resistance to NS5A inhibitors among patients infected with HCV genotype 1a in Spain, (A) total prevalence of the five resistance mutations under analysis in the 17 Spanish Autonomous Communities and in the city of Ceuta; (B) prevalence of K24R (C) prevalence of M28A/T/V; (D) prevalence of Q30E/H/R; (E) prevalence of L31M; (F) prevalence of H58D; (G) prevalence of Y93C/F/H/N.

Table 5 Distribution of the RASs polymorphism in HCV-GT1a infected patients throughout the national territory (autonomous communities) of Spain.

Discussion

In this study we performed the first nationwide survey assessing the origin, diversity and spatiotemporal transmission dynamics of GT1a using sequences derived from DAA-treatment naïve patients. Phylogenetic analysis confirmed the presence of two divergent GT1a clades, clades I and II39. Consistent with our previous observations, clade II showed a prevalence more than fourfold higher than clade I13,26. These findings illustrate the distinct geographic distributions of these GT1a clades, since clade I is more common in the United States and both clades are identically represented in other European countries40,41. Clade distribution differed among groups of patients, with clade II being more prevalent among HCV-monoinfected patients than among those co-infected with HIV. These differences confirm those reported in our previous nationwide cross-sectional surveys of NS5A RASs to elbasvir13 and of NS3 RASs in Spain (2014–2015)42, and may be related to differences in mode of transmission of HIV and HCV, such as risky sexual behaviors or parental use of drugs43,44.

There were major differences in the origin and spatiotemporal distribution of both clades. The origin of clade II in Spain dates back to the beginning of the twentieth century, preceding clade I by at least 40 years. Clade II transitioned from endemic in the Basque Country to epidemic in Spain. Importantly, the dispersion of clade II from the Basque Country to Andalusia and Madrid coincides with the propagation and spread of clade I from these autonomous communities to the rest of Spain. Overall, Andalusia and Madrid were the most important sources of clade I dissemination in Spain while the Basque Country was the major source of clade II dissemination. De Luca et al.40 reported an earlier origin of clade I, 1966 (95% HPD, 1952–1972) compared with clade II, 1975 (95% HPD, 1961–1989) using time-scaled phylogeny applied to NS3 sequences from Italian and German patients and additional sequences from the Los Alamos National Laboratory HCV sequence database with sampling dates from 1977 to 2013 and originated mainly from US but also from other countries such as France and Spain. Overall, these results illustrate fundamental differences in the epidemic behavior of these two GT1a clades both in Spain and in the rest of Europe.

Importantly, our data also shows that clade II epidemic is now declining whereas clade I epidemic has reached equilibrium. The impact of this new clade I epidemic in Spain may be seen on the level of NS3 Q80K prevalence that was reported to be mainly associated with clade I40,41,42,45 and that is associated with reduced treatment response to the macrocyclic protease inhibitor simeprevir46,47.

HCV and HIV share common transmission risk behaviors48. In Spain, HCV/HIV coinfection has been historically associated with parenteral drug use. Data from the Spanish National Epidemiological Bulleting show that HIV was mainly acquired through parental drug use since the first AIDS cases were reported in 198249,50, while men having sex with men have accounted for most incident cases since the year 200051. Spain is among the eight European countries with high (> 15%) estimates of HCV/HIV coinfection52. The most prevalent HCV genotypes associated with parenteral drug use in Europe are GT1a and GT3a. GT1a dominates in Spain among people who inject drugs53,54 and HIV/HCV coinfected7,55. According to a recent observational analysis, from 1988 to 2015 in North-Eastern Spain there has been an increase in the prevalence of GT1a, GT3 and GT4 associated with male gender and parental use of drugs, currently the most prevalent route of infection, and a concomitant decline of GT1b, associated with transfusion or parenteral/nosocomial transmission56. In this context, it is likely that the initial spread of GT1a clade II during the first half of the twentieth century in Spain occurred when HIV had still not found its way to this country, which may clarify the higher prevalence of clade II among the HCV-monoinfected individuals of our cohort.

Regarding the baseline polymorphisms and RAS at the NS5A domain, in a combined analysis carried out in 22 countries of baseline samples from Phase II/III HCV trials, baseline NS5A RASs were present in 13% of 3,501 GT1a samples8. Reports of the frequency of naturally occurring NS5A RASs are still scarce in the clinical settings. A cross-sectional cohort of ~ 130 GT1a subjects enrolled in São Paulo (Brazil), reported a prevalence of RASs of 14.6% in HCV-monoinfected (M28V and Q30H/R) and 3.9% in HIV/HCV-coinfected subjects (M28T and Q30H/R)57.

A low prevalence of NS5A RASs to the six DAAs evaluated was observed in the present study. RASs were similarly distributed between monoinfected and coinfected patients and between the two GT1a clades in each group. However, when considering the regional data, RASs prevalence level showed a marked geographical variability across regions. In Spain few observational regional or sub-regional studies have been carried out, as resistance testing is only recommended for subjects who are being considered for therapy with grazoprevir/elbasvir. A cohort study including 53 G1a-patients from a main hospital in Madrid (Spain) found that 18.9% (n = 10) harbored at least one RAS58. Another study including 166 G1a-patients from a tertiary hospital in A Coruña (Galicia, Spain), found a low (5.5%) prevalence of NS5A RASs using population-based sequencing59. Differently from our analyses, this study performed in Galicia did not consider clinically relevant the RASs at position K24.

Thirteen regions showed an intermediate level (5–15%) of baseline RASs to DAAs and Cantabria showed a high level (20%). This suggests that the proportion of patients who will benefit from these treatments may vary according to the Spanish regions. In this sense, it will be important to monitor carefully the response to treatment with NS5A-specific DAAs in patients from these regions and in particular in patients bearing the RASs. The results from these studies will determine whether baseline testing is appropriate for candidates to these new drugs. Nevertheless, results from the observational HCV-TARGET cohort, in USA, indicate that longer treatments may surmount the negative impact of baseline RASs on SVR12 in the clinical setting60.

Effective treatment for HCV is currently available in Spain. However, there are several barriers left to disease eradication, including deficiencies in screening and diagnosis. Indeed, almost 50,000 of those living with HCV are still unaware of their infection61. Understanding the structural barriers that undermine the implementation of screening programs and the scale-up of diagnosis coverage is fundamental to assure linkage to care and implementation of the HCV care cascade62,63.

This study presents some limitations. A more comprehensive dataset of the clinical and epidemiological characteristics of enrolled patient may allow a deeper understanding of the results. Furthermore, the lack of follow-up limited the evaluation of the possible impact of NS5A RASs on the treatment efficacy. Finally, RASs were assessed by population sequencing instead of deep sequencing which may slightly underestimate RASs prevalence8.

In conclusion, current HCV-GT1a epidemic in Spain is mainly driven by clade I viruses which seem to have different spreading routes relative to clade II viruses. With the exception of Cantabria, viruses bearing RASs to NS5A-DAAs were present at low to intermediate level in HCV infected patients at baseline. Close surveillance of response to treatment with DAAs will be important.