Epidemic history of hepatitis C virus genotypes and subtypes in Portugal

Any successful strategy to prevent and control HCV infection requires an understanding of the epidemic behaviour among the different genotypes. Here, we performed the first characterization of the epidemic history and transmission dynamics of HCV subtypes in Portugal. Direct sequencing of NS5B was performed on 230 direct-acting antiviral drugs (DAA)-treatment naïve patients in Lisbon. Phylogenetic analysis was used for subtyping and transmission cluster identification. Bayesian methods were used to reconstruct the epidemic history of HCV subtypes. Sequences were analysed for resistance-associated substitutions (RAS). The majority of strains were HCV-GT1 (62.6%), GT3 (18.3%, all subtype 3a) and GT4 (16.1%). Among GT1, the most frequent were subtypes 1a (75.5%) and 1b (24.5%). Polyphyletic patterns were found in all but 12 lineages suggesting multiple introductions of the different subtypes in this population. Five distinct epidemics were identified. The first significant HCV epidemic in Portugal occurred between 1930s and 1960s, was caused almost exclusively by GT1b and was likely associated with blood transfusions. Rapid expansion of GT3a occurred in the 1960s and GT1a in the 1980s, associated with intravenous drug use. The most recent epidemics were caused by GT4a and GT4d and seem to be associated with the resurgence of opioid use. The C316N substitution was found in 31.4% of GT1b-patients. Close surveillance of patients bearing this mutation and undergoing dasabuvir-based regimens will be important to determine its impact on treatment outcome.

The huge genetic variability of HCV brings challenges to host immune control, to the management of HCV-infected patients, and to the development of pan-genotypic treatments 18 . Important differences have been identified in the susceptibility of HCV genotypes to antibody neutralization indicating that a successful vaccination strategy has to take into consideration genotype distribution worldwide 19,20 . Regarding treatment, treatment of chronic HCV infection with peginterferon + ribavirin showed variable efficacy depending on the genotype with sustained virological response (SVR) rates of 42-58% [21][22][23] , 80%, 66% 24 or 63-69% 25 for GT1, GT2, GT3 or GT4, respectively. The current treatment landscape based on a combination of direct-acting antiviral drugs (DAA) has increased dramatically the cure rate of chronic HCV infection 26 . Two drugs targeting the viral NS5B RNA polymerase have been approved in Europe and the US for clinical use, the nucleotide analogue sofosbuvir and the nonnucleoside inhibitor dasabuvir 27,28 , while others are in the pipeline 29 . Sofosbuvir-based combination treatment resulted in SVR rates of 50-93% in six phase 3 trials [30][31][32][33] . Since 2014, the combination of sofosbuvir/simeprevir/ribavirin and the new fixed-dose combinations ledipasvir/sofosbuvir and sofosbuvir/velpatasvir resulted in SVR rates of 92-100% in trials enrolling patients infected with HCV GT1 [34][35][36][37][38][39][40] and 85-100% in other genotypes [40][41][42][43] . Dasabuvir co-packaged with ombitasvir/paritaprevir/ritonavir achieved SVR rates of 89-100% 44 . The presence of resistance-associated substitutions (RAS) as natural polymorphisms in the NS5B gene affects HCV susceptibility to sofosbuvir and dasabuvir and may limit the clinical effectiveness of DAA combinations containing these drugs 45,46 .
Despite the enormous progress made in the treatment of HCV, a recent projection performed in Germany indicates that a substantial number of patients (7%) will fail to achieve SVR and will have limited retreatment options 47 . In addition, people who are cured are at risk of HCV reinfection. Incidence of reinfection varying between 7-13/100 person-years has been reported among gay and bisexual men in Europe [48][49][50] . Hence, it is necessary to be continuously vigilant about the main factors that may affect the success of HCV treatment in any given location, which are the genotype of the circulating HCV strains and the presence of natural polymorphisms and mutations associated with drug resistance.
In Portugal, there is scarce epidemiological research on HCV and prevalence data are limited. Among the European countries, Portugal has the highest prevalence rate of HCV infection (83.5%) in people who inject drugs (PWID) 51 , while in the prison population, the prevalence of HCV in 2014 was 10.7% 52 . A recent nationwide cross-sectional survey enrolling 1,685 adults in 2012-2014, reported a low HCV prevalence (0.54%; 0.2-0.9) 53 lower than the previous estimates of 1-1.5% 54,55 . The possible underrepresentation of high-risk groups likely explains the remarkable low HCV prevalence in this recent survey. To comply with the global commitment for hepatitis elimination 56 , at the beginning of 2015, the Portuguese Ministry of Health implemented a national plan of universal access to HCV treatment. In the frame of this programme, named Portal of Hepatitis C, more than 11,700 patients have initiated combinations of sofosbuvir-based treatments and 6,639 patients have been cured, representing 96.5% of those who have already completed the treatment 57 .
Based on four previous studies performed between 1998 and 2014, the majority of HCV infections in Portugal are caused by GT1 followed by GT3, GT4 and GT2 [58][59][60][61][62][63] . The origin, epidemiological history and transmission dynamics of these genotypes in Portugal have not been investigated. Hence, the aims of the present study were to characterize the origin, epidemiological history and transmission dynamics of HCV genotypes and subtypes circulating in Portugal, and to assess the prevalence of natural polymorphisms at the NS5B gene that may impact susceptibility to sofosbuvir and dasabuvir.
The GT1a clade II was significantly more prevalent than clade I (78.7%; n = 85/108 vs. 21.3%; n = 23/108; P < 0.0001). Interestingly, when the GT1a lineages assigned by phylogenetic analysis were analysed with the geno2pheno [HCV] algorithm, 25 sequences had inconsistent clade results (Table S1). Likewise, inconsistent results were observed for two sequences that were classified as GT2 and GT4b by phylogenetic analysis and as GT2b and GT4w by geno2pheno [HCV] . Except for 12 HCV lineages which segregated into six transmission clusters (a pair each for GT1a, GT1b, GT2, GT2a, GT3a and GT4a), polyphyletic patterns were found suggesting multiple and old introductions of the different HCV subtypes in this population (Fig. 1).
Origin and epidemiologic history of HCV strains circulating in Portugal. The dates of the most recent common ancestors (MRCA) of the various HCV subtypes circulating in the Portuguese population were estimated in BEAST under an uncorrelated lognormal relaxed molecular clock with a Bayesian skyline plot (BSP) coalescent demographic model ( Table 2). According to our estimates, the ancestor of the 1a strains present in Portugal dated back to 1950 (95% HPD: 1922HPD: , 1973 whereas that of 1b dated back to 1946 (95% HPD: 1847, 1965) and 3a to 1963 (95% HPD: 1947HPD: , 1977. For genotype 4 strains, similar ancestor dates were obtained [GT4a, 1988(95% HPD: 1980, 1995; GT4d, 1982(95% HPD: 1964, 1995]. To investigate the epidemiologic history of the different HCV subtypes in the Portuguese population, BSP reconstruction was made for each subtype (Fig. 2). The data indicates that the first significant increase in HCV prevalence in Portugal occurred in the first half of the 20 th century (the 1930s) and was caused almost exclusively by GT1b until the 1960s. At this time a second epidemic emerged caused by GT3a; a third epidemic caused by GT1a emerged during the 1980s. The first two epidemics caused by GT1b and 3a reached equilibrium at the end of the 1980s, whereas the 1a epidemics continued to grow rapidly (growth rate: 0.308 Ln of number of effective infections per year) until the end of the 1990s when it reached equilibrium. The most recent epidemics were caused by GT4a and GT4d and while the GT4d epidemics seems to have reached an equilibrium GT4a epidemics is still expanding at the present although at a lower rate relative to the epidemic phase of GT1a and GT3a (Fig. 3).

Discussion
We applied phylogenetic and phylodynamic analysis to NS5b sequences obtained from DAA naïve HCV infected patients to make the first characterization of the origin, diversity and epidemiologic history of HCV in Portugal. Like in previous studies, the most prevalent HCV variants were GT1a, representing almost half of the sequences analyzed, followed by GT3a and GT1b 58-62 . These genotypes are responsible for the majority of the HCV cases globally 63,65,66 .
According to our estimates, GT1b has caused a slowly growing but extended epidemic from the 1930s to the 1990s. This epidemic preceded all others by three decades and its features are consistent with widespread unsafe needle practices at the beginning of the 20 th century and contaminated blood transfusions which stopped only after the discovery of HCV in 1989 and the subsequent screening of all blood donations for HCV 67,68 . The increase in intravenous drug use in the 1960s has likely sparked the second major HCV epidemic in Portugal which was caused by GT3a and went through the beginning of the 1980s. The fastest growing HCV epidemic in Portugal occurred during the 1980s and 1990s and was caused by GT1a. The GT1a was likely driven by PWID which increased dramatically in the late 1980s and 1990s up to 1% of the Portuguese population due to the rapid social and cultural changes that followed the revolution of 1974 and the increasing availability of heroin [69][70][71] .
We observed a relatively high (16.1%) prevalence of GT4 among the study population, that is about three-fold higher than the estimated prevalence for Western Europe 65,66 . Similar results were obtained recently in HCV-infected inmates in Portugal (13% GT4) 62 . In addition to GT4a, the most prevalent in our study group, and GT4b and GT4d that have already been reported in Portugal 61,62,72 , we also identified one isolate classified as GT4f and one as GT4k. To the extent of our knowledge, this is the first report of these variants in Portugal. Globally, GT4 has been more prevalent in some countries of Northern and Central Africa and the Middle East 11 , but there are indications that GT4 is becoming increasingly prevalent in Europe, mainly among PWID 61,73,74 .
It was interesting to find that unlike GT1 and GT3 that caused endemic localized infections during decades before becoming generalized, GT4a became epidemic almost immediately after being introduced in Portugal, likely in 1980, and this epidemic may still be growing at a relatively fast rate. The reason for this is unclear as the mode of transmission was known only for 27% of our patients. GT4a dominates the HCV epidemic in Egypt and the main risk factor for its origin and transmission has been unsafe medical practices 75 . Similar to GT4a, the epidemic history of GT4d indicates a recent introduction and a progressive epidemic in the Portuguese population. In Portugal, healthcare related HCV infections are exceedingly rare and GT4a and GT4d have been found in relatively high frequencies (11.3% and 12.1%, respectively) in PWID analysed in a similar period (2008)(2009) in Lisbon 61 . In addition, 83.3% of GT4a (n = 6) and 75% of GT4d (n = 4) infected patients for which we have information on transmission risk were PWIDs. We, therefore, believe that injecting drug use is the main mode of transmission of GT4 in Portugal.   The prevalence of GT2 in the study group was only 3.2%, about one-third lower than previously reported data from Western Europe 66 . In particular, the prevalence of GT2a, generally considered as an epidemic subtype, was less than 1%. A similar low prevalence was also observed for GT2c, in contrast to the high frequency (37.5%) previously reported among a limited number (n = 64) of liver biopsies at the Portuguese tertiary Hospital Santa Maria, the same healthcare setting attended by our study group 60 .  . Exponential mean growth rates for the most prevalent HCV subtypes (1a, 1b, 3a, 4a and 4d*) found in the current study. Linear regression equations were derived from the mean growth rates within the exponential phase of the Bayesian skyline plots (BSP) of each subtype as shown in Fig. 2 Concerning the other less prevalent HCV variants, we identified one individual harbouring GT1g. To the best of our knowledge, this is the first report of HCV-GT1g in Portugal. In Europe, the first partial sequences of GT1g were derived in 1994-1995 from HCV chronically-infected German patients who were probably immigrants from Egypt and Sudan 76 . More recently, the first complete genome of a GT1g isolate derived from a Spanish patient was published 77 .
Recent studies have highlighted distinct geographic distributions of the two clades of GT1a, with clade I being more prevalent in the United States and both clades equally distributed in Europe 78,79 . Furthermore, a recent in-depth phylogenetic analysis has found three distinct sub-clades within clade I 80 . In the present study, that represents the first assessment of clades in Portugal, we found a four-fold higher prevalence of clade II than clade I, and the majority of polymorphisms associate to the former clade. Interestingly, the prevalence of clade I among the patients included in the present study was similar (21%) to that recently observed in DAA-naïve subjects in Spain 81 but it was significantly lower than in other European countries (e.g. 48% in Italy and 67% in France) 79 . This may indicate a difference in the temporal spread of the HCV-GT1a clades in Portugal as compared to other European countries where both clades are equally distributed.
Baseline NS5B RAS to sofosbuvir has been rarely detected in clinical trial settings 82,83 and only 0.1% of sequences from all genotypes present in the Los Alamos HCV sequence database harbor RAS to this DAA 84 . Consistent with this, we have not found RAS to sofosbuvir in our patients. On the other hand, the C316N RAS was found in 31% of GT1b-infected patients. This mutation confers low-level resistance to dasabuvir and has been reported in patients failing treatment with sofosbuvir 85,86 . The relatively high prevalence of C316N in our patients is in line with previous reports showing a prevalence of C316N ranging from 11% to 40% in patients with GT1b enrolled in clinical trials, and this mutation was more frequent among European subjects than among those from the United States 87,88 . It should be noted that in the AVIATOR trial that evaluated ritonavir-boosted paritaprevir, ombitasvir and dasabuvir, the C316N RAS at baseline had no significant impact on treatment outcome 87 .
We observed the polymorphism L285F in GT2 which has not been previously reported in the Los Alamos databases as RAS 84 . This mutation was reported in 2017 in a study assessing the genetic heterogeneity of NS5B by ultra-deep pyrosequencing in patients harbouring GT3a not achieving SVR. Interestingly, the L285F polymorphism represented a minor substitution at baseline but was enriched after virological failure 89 . The significance of this mutation is still unclear and needs further investigation. Overall, these data suggest that sofosbuvir and dasabuvir can be used in first-line treatment regimens in Portugal.
Two main limitations should be mentioned. First, our study was based on partial sequences of NS5B which prevented us from examining all RAS in NS5B. Second, the mode of HCV transmission was known only for 27% of the study group preventing us from including this variable in the transmission cluster analysis and from making a more detailed investigation of the risk factors driving the different HCV epidemics in Portugal.
In summary, five distinct epidemics caused by different HCV subtypes were identified over time in Portugal. The first was caused by GT1b, occurred during the 1930s and the 1960s and was likely associated with contaminated blood transfusions. The second and third epidemics were caused by GT3a in the 1960s and GT1a in the 1980s and were likely associated with widespread use of intravenous drug use. The most recent HCV epidemics in Portugal were caused by GT4a and GT4d and seem to be associated with the resurgence of opioid use. The C316N substitution causing low-level resistance to dasabuvir was found in 31.4% of GT1b-infected patients. Close surveillance of patients bearing this mutation and undergoing dasabuvir-based regimens will be important to determine its impact on treatment outcome.  Table 3. Baseline polymorphisms and amino acid substitutions reported to reduce susceptibility to sofosbuvir and/or dasabuvir in NS5B of HCV strains (GT1 and GT2) found in the current study. Notes: Only polymorphic sites whose frequency is above 2% are shown; see Table S3 for the complete dataset of all polymorphisms and mutations. AA, amino acid position; asterisks indicate resistance associated substitutions (RAS) or substitution on scored position, as detailed below. V321I and V321I/V, amino acid substitutions on scored position for sofosbuvir (according to data of GT1a). V321S/L, amino acid substitutions on scored position for sofosbuvir (according to data of GT3). C316N, RAS: amino acid substitutions reported to reduce the susceptibility of HCV GT1b to nonnucleoside RdRp palm-1 inhibitor (dasabuvir); EC50 (fold change compared to wild-type replicon) is 2-20.  Table 4. Baseline polymorphisms and amino acid substitutions reported to reduce susceptibility to sofosbuvir and/or dasabuvir in NS5B of HCV strains (GT3 and GT4) found in the current study. Notes: Only polymorphic sites whose frequency is above 2% are shown; see Table S3 for the complete dataset of all polymorphisms and mutations. AA, amino acid position; asterisks indicate resistance associated substitutions (RAS) or substitution on scored position, as detailed below. V321I and V321I/V, amino acid substitutions on scored position for sofosbuvir (according to data of GT1a). V321S/L, amino acid substitutions on scored position for sofosbuvir (according to data of GT3  104 , HKY + 4Г nucleotide substitution model, and an uncorrelated lognormal relaxed molecular clock (UCLD) with a Bayesian skyline plot (BSP) coalescent tree prior were selected for the Bayesian analysis. To estimate the date of the most recent common ancestor (MRCA) and the population growth dynamics of the HCV epidemic in Portugal, we specified a Normal prior distribution on the UCLD mean rate parameter (0.001 ± 0.0001 substitutions/site/year) 11,105,106 . For each dataset, three independent MCMC chains were run for 100 million generations with states sampled every 10,000 generations. Log files were combined using Logcombiner to ensure sufficient convergence (ESS ≥ 200) as monitored in Tracer v1.7 (http://tree.bio.ed.ac.uk/ software/tracer/) with 10% of posterior samples discarded as burn-in. Maximum clade credibility (MCC) trees were summarized using tree annotator and tree visualization was implemented in Figtree v1.4.3 (http://tree.bio. ed.ac.uk/software/figtree/). To approximate the population growth rate of the epidemic history for each HCV subtype, we performed a linear regression analysis during the exponential growth phase of the BSP curves. The mean exponential growth rates were determined as the slope of the linear regression curves obtained by plotting the mean estimates of the effective population size in the natural logarithmic scale against time (years) in the R software.
Resistance-associated substitutions (RAS) and polymorphisms. The NS5B gene was reviewed to determine the amino acid substitutions reported to reduce the susceptibility of different HCV genotypes or subtypes to DAA according to the latest review of Pawlotsky 2016, the European and American guidelines [107][108][109][110] . Sofosbuvir-specific clinically relevant RAS were L159F, S282T/R, L320I/F/V and V321A, depending on genotype/ subtype, and dasabuvir-specific clinically relevant RAS were L314H and C316H/N/Y/W (only for GT1). Other polymorphisms in positions associated with resistance to these drugs were also assessed.
To identify RAS and polymorphisms in NS5B of our patients, the sequences were translated to amino acids and aligned against the following reference sequences: GenBank accession number AF009606 for GT-1a clade I, HQ850279 for GT-1a clade II, EU781827 for GT-1b, AM910652 for GT-1g, KC197237 for GT-2, D00944 for GT-2a, D50409 for GT 2c, D17763 for GT-3a, Y11604 for GT-4a, FJ462435 for GT-4b, DQ418786 for GT-4d, EF589161 for GT-4f, and EU392173 for GT-4k. Degenerate codons were present in some individuals reflecting the presence of quasiespecies. When these codons were present in positions associated with drug resistance all translation possibilities were considered. Statistical Analysis. Categorical variables were analysed using the chi-squared test or Fisher's exact test.
P-values were 2-tailed and statistical significance was defined as P < 0.05. Statistical analyses were performed using the SPSS software Version 22.0 (IBM Corp, Chicago, Armonk, NY, USA).