Co-infections and transmission networks of HCV, HIV-1 and HPgV among people who inject drugs

Co-infections with human immunodeficiency virus type 1 (HIV-1) and human pegivirus (HPgV) are common in hepatitis C virus (HCV)-infected individuals. However, analysis on the evolutionary dynamics and transmission network profiles of these viruses among individuals with multiple infections remains limited. A total of 228 injecting drug users (IDUs), either HCV- and/or HIV-1-infected, were recruited in Kuala Lumpur, Malaysia. HCV, HIV-1 and HPgV genes were sequenced, with epidemic growth rates assessed by the Bayesian coalescent method. Based on the sequence data, mono-, dual- and triple-infection were detected in 38.8%, 40.6% and 20.6% of the subjects, respectively. Fifteen transmission networks involving HCV (subtype 1a, 1b, 3a and 3b), HIV-1 (CRF33_01B) and HPgV (genotype 2) were identified and characterized. Genealogical estimates indicated that the predominant HCV, HIV-1 and HPgV genotypes were introduced into the IDUs population through multiple sub-epidemics that emerged as early as 1950s (HCV), 1980s (HIV-1) and 1990s (HPgV). By determining the difference in divergence times between viral lineages (ΔtMRCA), we also showed that the frequency of viral co-transmission is low among these IDUs. Despite increased access to therapy and other harm reduction interventions, the continuous emergence and coexistence of new transmission networks suggest persistent multiple viral transmissions among IDUs.

Hepatitis C virus (HCV) is a bloodborne virus from the Hepacivirus genus of the family Flaviviridae, commonly implicated to liver cirrhosis and hepatocellular carcinoma in humans 1 . The positive-sense RNA virus continues to cause severe morbidity and mortality among people who are infected through intravenous drug use, contaminated blood products and unsafe medical procedures. To date, there are six confirmed genotypes denoted as genotype 1 through 6 and a provisionally assigned genotype 7 2,3 , with each genotype diverging further into different subtypes with distinct genetic variability. Due to the shared transmission route, co-infection of human immunodeficiency virus type 1 (HIV-1) and human pegivirus (HPgV, formerly known as GB virus C or hepatitis G virus) among HCV-infected individuals are common, especially among people who inject drugs 4 .
Bayesian analyses of the time-stamped NS5B gene of HCV (n = 964) and the HIV-1 gag-pol gene (n = 226) sequences under the relaxed clock model (uncorrelated lognormal) with constant population size revealed a mean evolutionary rates of 0.98-1.08 (0.74-1.30) × 10 −3 substitutions/site/year (across subtypes 3a, 1a, 3b and 1b) and 2.3 (2.1-2.5) × 10 −3 substitutions/site/year, comparable with estimates published previously 15,16 . Based on these evolutionary a priori, the divergence time for HCV subtypes were dated around the mid 1700s to mid 1900s, in line with estimates reported elsewhere 17,18 . There were two distinct 3a lineages circulating in Malaysia since the 1950s that possibly represent the earliest/ oldest lineages in the country. Likewise, the major subtype 1a lineage in the country was dated back to the 1960s, and was closely related to the United States/Brazil (US/BR) strains. Sporadic introduction (or occasional spread) of subtype 1a lineages related to the United States/Switzerland (US/CH) strains was also observed. Subtype 1b lineage closely related to the CH strains was dated around the 1960s too, with a more recent but genetically distinct lineage related to the US/CH isolates introduced into the region in the 1980s. Around the same period, subtype 3b lineage was established in the country. In addition, the time of the most recent common ancestor (tMRCA) of each HCV transmission networks was estimated to be around the early 1990s to mid 2000s, with the oldest network involving subtype 1b emerging as early as in the 1980s ( Fig. 2A). Of note, more recent networks that were detected thereafter involved the major circulating HCV subtypes 3a, 1a and 3b (Fig. 3). Similarly, the origin of HIV-1 subtypes B, B′ and CRF01_AE dated back to between the late 1960s and mid 1980s (Fig. 2B), similar to the dates estimated in other studies [19][20][21][22][23] . The tMRCA of each HIV-1 transmission network observed within CRF33_01B was dated between early 2000s to late 2000s (Fig. 3). For HPgV, the evolutionary rate of the NS5B gene was newly estimated at 6.8 (6.2-7.3) × 10 −3 substitutions/site/year, an order of magnitude difference from the rate estimated elsewhere 24 . The divergence time of HPgV were dated more recently than that of HCV and HIV-1, with HPgV genotypes 2, 3 and 5 were first dated around the 1970s, followed by genotypes 1 and 4 (1980s) (Fig. 2C). Two distinct HPgV genotype 2 lineages were observed, one of which emerged around mid 1990s that might represent the first genotype 2 lineage circulating in the country. Following HPgV introduction, multiple genotype 2 transmission networks continued to spread among the IDUs population. All HPgV transmission networks observed within genotype 2 were dated around the mid to late 2000s (Fig. 3). Difference in divergence times (ΔtMRCA) between viral lineages within individual. We attempted to investigate the frequency of viral co-transmission (or viral co-acquisition) among the co-infected subjects by means of inferring the time of origin or divergence time of viral lineages using the Bayesian coalescent method described above. In the absence of the first seropositive dates in our cohort, we estimated the duration between two transmission events by measuring the difference of the mean divergence times between two viral lineages (Δ tMRCA) within a dually infected subject. In this proof-of-concept approach, if two viruses were co-transmitted to a susceptible individual at a particular point of time, the divergence times of the co-transmitted lineages will be similar, thus the low Δ tMRCA (Fig. 4A). In contrast, if two viruses were transmitted to an individual at two distinct time points, the estimated Δ tMRCA will be high. In our data set, the dually infected subjects (n = 45, based on the availability of NS5B gene of both HCV and HPgV, and gag-pol gene of HIV-1) were grouped into two categories: a) HCV-infected subjects co-infected with either HIV-1 (n = 26) or HPgV (n = 5) and b) HIV-1-infected subjects co-infected with either HCV (n = 8) or HPgV (n = 6) (Fig. 4B). It was estimated that only one HIV-1/HCV co-infected patient (1/45, 2.2%) showed a Δ tMRCA of approximately 73 days, whereas in other subjects the calculated Δ tMRCAs in general exceeded 100 days, suggesting a low frequency of viral co-transmission among IDUs.
To test the impact of HCV and HIV-1 infection on the duration to acquire subsequent viral infections, the quantitative Δ tMRCA estimates were further analyzed by the non-parametric Mann-Whitney U test. The order of viral transmission was deduced based on the divergence times. It was estimated that subjects first infected with HIV-1 (n = 14) experienced a shorter time (lower Δ tMRCA) to acquire subsequent HCV or HPgV infections, as compared to the HCV-infected subjects (n = 31) who later acquired HIV-1 or HPgV infections (P = 0.03) (Fig. 4B). However, among the HIV-1-infected subjects who were then infected with either HCV or HPgV, no significant difference in Δ tMRCA was observed (P = 0.144). Among subjects first infected with HCV, the duration to acquire HPgV infection was significantly longer than that to acquire HIV-1 infection (P < 0.001) (Fig. 4B).

Discussion
Transmission of bloodborne viruses such as HCV among people who inject drugs continues to cause major public-health problems worldwide. Due to the shared mode of transmission with HCV, co-infections of HIV-1 and HPgV among HCV-infected patients are common 25,26 . Although studies have indicated the clinical and epidemiological impact of co-infections involving these viruses, co-analysis on the evolutionary history and phylodynamic profiles of these viruses within a defined high-risk population remains limited. Several recent studies on HIV-1 among men who have sex with men have addressed the population dynamics of the virus, and pointed out the potential role of transmission networks in escalating HIV-1 transmission worldwide 13,14,27,28 . However, such features in HCV and HPgV infections have not been widely studied, particularly in the context of co-infections. In the present study, we analyzed the genetic history and the transmission networks of HCV, HIV-1 and HPgV within a cohort of individuals who inject drugs in Kuala Lumpur, Malaysia, and assessed the possible occurrence of viral co-transmission in the population using phylodynamic data. Viral sequence data has been proven informative in defining transmission networks within a population, particularly those with limited epidemiological data such as sexual history and injecting drug use 15,29 . In addition, studies have also highlighted the usefulness and the importance of exploiting viral genetic information in assessing the epidemic linkages through viral phylodynamics studies 15,30 . A total of 107 HCV NS5B gene, 94 HIV-1 gag-pol gene and 46 HPgV NS5B gene sequence data were phylogenetically analyzed. Bayesian coalescent analysis estimated that HCV subtype 1a was first emerged around the 1700s, followed by 1b (1800s), 3a and 3b (1900s). It is estimated that the most prevalent HCV subtype 3a first introduced into the population who inject drugs in Kuala Lumpur, Malaysia through at least two distinct introductions dating back as early as in the 1950s. This period in the 20th century probably correlated with the practice of non-sterile medical procedures that could have facilitated viral transmission 31 . Bayesian skyline plot for HCV subtype 3a lineage indicated the initial growth phase started around mid-1970s and early 1980s, followed by an exponential growth and reached its peak from 1990s onwards. Shortly after the introduction of subtype 3a, other HCV subtypes were introduced into IDUs between the early 1960s and the early 1980s, and established a steady growth ( Fig. 2A).
In addition, Bayesian coalescent analysis suggested that HIV-1 subtype B, B′ and CRF01_AE emerged between the late 1960s and early 1980s, corroborating with previous reports [19][20][21][22][23] . The Malaysian subtype B′ and CRF01_AE isolates were traced to the divergence times similar to that of the Thai reference strains, indicating their shared ancestry [32][33][34] . The data also suggest that HIV-1 infections among IDUs were primarily driven by HIV-1 CRF01_AE and subtype B′ 35 . However, nearly 80% of study subjects were infected with HIV-1 CRF33_01B, with numerous transmission networks observed between the early to mid-2000s. Bayesian skyline analysis indicated that the primary growth phase was initiated nearly two decades ago and reached its peak around the early 2000s (Fig. 2B) 36 .
The divergence time of HPgV genotypes were estimated, suggesting more recent divergence times than those of HCV and HIV-1. HPgV genotypes exhibited distinct geographical distribution patterns, with genotypes 1 and 5 commonly found in Africa, genotype 2 in Europe and the United States, and genotypes 3 and 4 in Asia 37 . In the present study, the predominant HPgV genotype is genotype 2, suggesting an HPgV infiltration into the country through a link that was probably related to the West (Europe/ United States). Bayesian skyline plot analysis suggested that the primary growth phase was initiated in the late 1990s and reached its plateau a decade later (Fig. 2C). To our knowledge, this is the first report Studies on HIV-1 have highlighted the potential role of transmission networks in fuelling the global epidemic. In the present study, multiple transmission networks were observed (Fig. 3), emphasizing their role in shaping current disease burden among IDUs in Kuala Lumpur. Despite the increased access to antiretroviral therapy and other intervention measures more than a decade ago, new transmission networks continue to emerge thereafter. The present study suggests that the trend of continuous emergence of new transmission networks and coexistence of multiple IDUs subepidemics from various common ancestors are likely to cause continuous onward viral transmission.
We explored the unique approach of using the Bayesian coalescent estimates to deduce the frequency of viral co-transmission within a dually infected subject. In the absence of seroconversion data, the divergence time from a common ancestor of a virus lineage could represent the time of infection or transmission 38 . By measuring the difference in divergence times between two viral lineages (Δ tMRCA), we found that the occurrence of viral co-transmission among the IDUs population in Kuala Lumpur is low (Fig. 4B). However, the preliminary findings should be interpreted with care because the estimated tMRCA is simply an indication of the lineage divergence time, which somehow suggests the oldest date that a virus was transmitted. Actual viral transmission (and probably viral superinfection) could have also occurred at any time point between the estimated tMRCA and the date of collection. As such, the Δ tMRCA could have been mis-estimated, leading to false representation of co-transmission event.
Primarily, our analysis also showed that HIV-1 infections may lead to a shorter time to acquire subsequent HCV or HPgV infections, compared to the HCV-infected subjects who later acquired HIV-1 or HPgV infections. It is possible that immune dysfunction such as impaired CD4 T-cell responses caused by HIV-1 infections may increase the risks to HCV infections [39][40][41] . However, such observation should be interpreted with caution due to the limited sample size studied, and potential case-reporting delay, as indicated by plateau of recent population size of the virus.
Several limitations have been identified in the present study. First, the short sampling period (2009-2010) might potentially underestimate the actual number of transmission networks. In addition, our analysis might not reveal the contemporary transmission patterns of these bloodborne viruses across international borders. This is because the existence of substantial sampling and publication biases could lead to overrepresentation of certain countries and under sampling from another 42 . In addition, the direction of transmission cannot be established from phylogenetic-based network because such transmission are not necessarily synonymous with actual transmission event 42 . Furthermore, the lack of commercially available assay for HPgV detection might have underestimated the actual prevalence of HPgV among IDUs.
To our knowledge, this is the first study that co-analyzed the genetic histories and transmission networks of HCV, HIV-1 and HPgV within a cohort of individuals with history of intravenous drug use, and assessed the characteristics of viral co-transmission in the population using phylodynamic data. Present study highlighted that the current disease burden observed among people who inject drugs is largely due to introduction of multiple HCV, HIV-1 and HPgV sub-epidemics lineages, originating few decades ago. Without effective preventive measures, continuous emergence of new transmission networks and co-existence of multiple sub-epidemics from various common ancestors are likely to fuel the HCV, HIV-1 and HPgV dissemination among people who inject drugs.

Ethics Statement. This study was approved by the University Malaya Medical Centre Medical Ethics
Committee. Standard, multilingual consent forms validated by the Medical Ethics Committee were used. Written informed consent was obtained from all subjects prior to sample collection. All experiments were performed in accordance with relevant guidelines and regulations.
Study Subjects and Specimens. In the present study, a total of 228 consenting IDUs, either HCV-and/or HIV-1-infected, were recruited in Kuala Lumpur, Malaysia between September 2009 and November 2010. Demographic information such as the age, gender and ethnicity were obtained, followed by collection of blood specimens for HCV and HIV-1 serological testing using Cobas 6000 (Roche Diagnostics, USA) 43,44 . HPgV seroprevalence was not determined due to the lack of commercially available serology assay. Viral RNA Extraction, Gene Amplification and Sequencing. Total viral RNA was extracted from 140 μ l of plasma specimens using the NucliSENS easyMAG automated nucleic acid extraction system (bioMérieux, Marcy I'Etoile, France) 45 , as described in the manufacturer's protocol. Subsequently, the viral RNA was reverse transcribed using the SuperScript III reverse transcriptase (Invitrogen, USA). Nested-PCR was performed using previously reported primers sets to amplify the gag-pol gene of HIV-1 (HXB2 position: 1756-3370, 1615 bp) 46 , 5′ -untranslated region (UTR) and NS5B gene of both HCV (H77 position: 62-696, 635 bp and 7977-8614, 638 bp) 47 and HPgV (PNF2161 position: 163-367, 204 bp and 7318-8390, 1073 bp) 48,49 . For HIV-1, the protease gene (HXB2 position: 1793-2581, 789 bp) will be amplified for subtype determination should the longer gag-pol gene was negative. The PCR products were sequenced by an ABI PRISM 3730XL DNA Analyzer (Applied Biosystems, USA). The nucleotide sequence data were then codon aligned along with the respective list of viral reference sequences retrieved from the Los Alamos National Laboratory HIV (www.hiv.lanl.gov) and HCV sequence database (www.hcv.lanl.gov) and GenBank (www.ncbi.nlm.nih.gov/genbank/).

Phylogenetic Reconstructions, Phylodynamic Inference and Divergence Times of Transmission
Networks. To determine the HCV, HIV-1 and HPgV genetic subtype and to identify the potential transmission networks, neighbour-joining trees were first reconstructed based on the 5′ -UTR and NS5B gene of both HCV and HPgV, as well as the gag-pol gene of HIV-1 from the sequence data set using MEGA version 6.0 based on Kimura-2 parameter model 50 . Since the gag-pol gene and NS5B gene have been proven useful in assessing and defining transmission networks 31,51 , these genes were used for subsequent analysis. The reliability of phylogenies were further tested by a more robust maximum likelihood approach, heuristically inferred using subtree pruning and regrafting, as well as nearest neighbour interchange algorithms implemented in PAUP version 4.0 52 . The statistical significance of the branching orders was validated by bootstrap analysis of 1000 replicates. The most appropriate nucleotide substitution model was determined using FindModel, a web implementation of Modeltest. Bayesian maximum clade credibility (MCC) trees were constructed using BEAST 1.7 53 to determine the posterior probability values for each network observed in HCV, HIV-1 and HPgV. Transmission network for HIV-1 is defined as a network consisting of at least 2 isolates from different individuals of the same geographical (i.e country) origin with a genetic distance of ≤ 0.025, which formed a monophyletic phylogenetic clade supported by bootstrap value of more than 90% and Bayesian posterior probability value of 1.0 at the internal tree node, as described previously 27,30,54 . Due to the lack of a definition for a phylogenetic transmission network for HCV and HPgV, similar classification used for HIV-1 was applied to these viruses.
The time of most recent common ancestor (tMRCA) of the respective transmission networks observed in each genomic region of HCV, HIV-1 and HPgV were estimated by the Bayesian coalescent-based relaxed molecular clock model, conducted in BEAST 1.7 53 . Previously reported HCV NS5B and HIV-1 gag-pol gene nucleotide substitution rates (1.0 × 10 −3 and 2.0 × 10 −3 substitutions/site/year, respectively) 15,16 were used as a priori. Since the HPgV nucleotide substitution rates were previously estimated from a small data set sampled over a short study period 55 , we attempted to re-estimate the evolutionary rate based on all available HPgV NS5B sequences (n = 132) sampled between 1993 and 2011. For the coalescent priors, different parametric demographic models namely constant population size, exponential and logistic growth as well as nonparametric Bayesian skyline were tested. The best fit coalescent model was chosen by means of a Bayes factor (Supplementary Table S1). Accordingly, the constant population size and uncorrelated lognormal model 29 nested in general time-reversible (GTR) nucleotide substitution model 56 with a proportion of invariant sites and four rate categories of gamma-distribution model of among site rate heterogeneity 57 was employed to estimate the HCV, HIV-1 and HPgV phylogenies, and to date the tMRCA of each transmission networks 31,51 . The Markov chain Monte Carlo (MCMC) analysis was computed four times at 50 million states sampled every 10,000 states and output was assessed for convergence by means of effective sampling size after a 10% burn-in using Tracer version 1.5 (http://tree. bio.ed.ac.uk/software/tracer).
Sequences. HCV, HIV-1 and HPgV nucleotide sequences generated in the study have been deposited in GenBank under the accession numbers KR108391-KR108696, KR131788-KR131833 and KC477938-KC478065. Similar information can be found in Supplementary Table S2.