The Transmission and Evolution of HIV-1 Quasispecies within One Couple: a Follow-up Study based on Next-Generation Sequencing

Next-generation sequencing (NGS) has been successfully used to trace HIV-1 infection. In this study, we investigated the transmission and evolution of HIV-1 quasispecies in a couple infected through heterosexual behavior. A heterosexual couple in which both partners were infected with HIV-1 was followed up for 54 months. Blood samples including whole-blood and plasma samples, were collected at various time points. After HIV-1 subtyping, NGS (Miseq platform) was used to sequence the env region of the HIV-1 quasispecies. Genetic distances were calculated, and phylogenetic trees were generated. We found both partners were infected with HIV-1 subtype circulating recombinant form (CRF), CRF65_cpx. The quasispecies distribution was relatively tightly clustered in the phylogenetic tree during early infection. Over time, the distribution of HIV-1 quasispecies gradually became more dispersed at 12th months, with a progressive increase in gene diversity. By 37th months, the sequences obtained for both partners formed different clusters in the phylogenetic tree. These results suggest that the HIV-1 contact tracing results generated by the Miseq platform may be more reliable than other conventional sequencing methods, which can provide important information about the transmission and evolution of HIV-1. Our findings may help to better target preventative interventions for promoting public health.

the immune system in individuals. Over time, the similarity of HIV-1 strains between the source and the infected individual lessens, and the paraphyletic relationship becomes blurred, eventually disappearing [8][9][10] .
The principal laboratory methods used include the direct sequencing of nested PCR products, the sequencing of cloned PCR products 11 and end-point limiting-dilution PCR 12 . These methods are used to deduce the transmission relationship, and the direction of transmission for the HIV-1 gene subtype or quasispecies. These methods are complex and have several disadvantages, including the generation of limited information. In recent years, next-generation sequencing (NGS) has been widely used for the deep sequencing of HIV-1 quasispecies for the detection of superinfection 13,14 and the identification of antiretroviral drug resistance mutations at frequencies below the detection limit 15,16 . We have successfully used NGS (Illumina company, Miseq platform), which is simpler, cheaper and more practical than conventional methods, and developed the use of this technique for the tracing of HIV-1 infection in individuals thought to have become infected with the virus through sexual activity 17 .
In this study, we used NGS (Miseq platform) to study the transmission and evolution of the HIV-1 env region in a single HIV-1-concordant heterosexual couple over a period of 54 months, and to compare the potential difference between the results obtained from immune cells in whole-blood and plasma samples. Both members of the couple were in the early stages of infection.

Results
Subtypes and NGS sequencing. HIV-1 gag region fragments from all samples of the HIV-1-infected couple and controls were successfully amplified and then sequenced. The phylogenetic trees confirmed that the viral genotypes present in male partner (M) and his wife (F) were a circulating recombinant form (CRF), CRF65_cpx. The interpersonal genetic distance between the viruses from M and F was 0.1%, which was much smaller than that between F and six other local controls.
For F, amplification was successful for eight plasma samples and unsuccessful for one plasma sample. Amplification was successful for all seven whole-blood samples from F. Viral sequences were successfully amplified from all 6 plasma and 5 whole-blood samples from M, the first member of the couple to be infected. All the amplicons obtained were successfully sequenced by the Miseq platform. The various quasispecies sequences in each sample were counted and ranked in terms of their frequency, from high to low. Quasispecies sequences with counts of more than 50 were selected for further study ( Table 1). The number of valid sequences obtained from F was 83290-180118 (mean, 131094) for plasma samples and 83329-175811 (mean, 122444) for the whole-blood samples. The plasma and whole-blood samples contained 23-322 (mean, 125),   the distribution of quasispecies was very concentrated at the 5 th and 6 th time points, with a gradual dispersal observed thereafter (Fig. 1). Whole-blood samples from F and plasma and whole-blood samples from M also followed similar patterns. From 1 to 37 months post-infection, the intraspecimen gene diversity gradually increased in both plasma and whole-blood samples from both M and F. For each subject, diversity between the follow-up sample and the first sample tended to increase over time (Fig. 1). Greater sequence dispersal was observed for plasma than for whole-blood samples. At baseline, other than for the whole-blood sample from M (MW), the most frequent quasispecies sequence accounted for more than 50% of the sequences obtained. As the duration of the infection increased, the distribution of the quasispecies populations changed. The overall proportion of the most frequent quasispecies decreased first, but this proportion increased in MW samples from the 15 th week ( Fig. 2A).  The effect of infection time on genetic distance. The genetic distance between the populations of HIV-1 quasispecies gradually increased over time in the absence of antiretroviral treatment. From the 1 st to the 8 th sample, average genetic distance in F increased from 0.7% in plasma and 0.6% in whole blood to 2.3% and 3.1%, respectively. In M, from the 1 st to the 6 th sample, average genetic distance increased from 0.6% in plasma and 0.8% in whole blood to 0.7% in plasma and 2.7% in whole blood. After 16 months of antiretroviral therapy (corresponding to a time point 54 months after the start of infection), PCR on plasma from F failed, whereas that on whole-blood samples succeeded, with an average genetic difference between HIV-1 quasispecies of 0.6%. This finding suggests that long-term antiretroviral treatment has a significant effect on the composition of the population of HIV-1 quasispecies (Fig. 2B).
Average genetic distance between the viruses in the follow-up samples and those in the first sample also tended to increase over time. This increase was slightly more marked for plasma than for whole blood in M (Fig. 2C). No amplification was achieved with the FP9 sample for 54 th months (Fig. 2C). The average genetic distance between the viruses present in the male and female partners increased from 0.7% in both plasma and whole blood in the first month to 5.4% in plasma and 4.7% in whole blood in the 37 th month after infection. Figure 2D also illustrates the average genetic distance to the first sample at different follow-up times. Overall, this genetic distance was slightly higher for plasma than for whole-blood samples.
The influence of infection time on the inference of evolutionary mechanism from the phylogenetic tree. HIV-1 transmission results in a genetic bottleneck through sexual activity, as a single genetic variant is responsible for establishing infection, resulting in the paraphyly of source viruses with respect to those of the recipient. During acute infection HIV-1 in M has been very quickly transmitted to F. Thus, phylogenetic analyses showed that the sequences from M and F clustered together, but it was difficult to identify the direction of transmission (Figs 3 and 4). The distribution of quasispecies population was fairly narrow in early infection with a close genetic distance, leading to tight clustering on the phylogenetic tree ( Fig. 3A-D). However, this distribution gradually became more dispersed over time, resulting in a scattering of samples on the phylogenetic tree. In addition, analysis of the FW9 sample showed a concentration of the distribution after 16 months of antiviral therapy (Fig. 3B). Even though, the population of quasispecies present in early infection included the quasispecies predominating later in infection, no re-infection or viral recombination has been observed over time within this couple in the study (Figs 3 and 4).
Analyses based on the analysis of the phylogenetic tree obtained at 12 th months resulted in correct inference of the evolutionary mechanism from data for plasma, whole-blood or mixed samples. The MP3 population was located at the root of the phylogenetic tree, giving rise to a large number of MP3 quasispecies, and a small number of FP3 quasispecies, MW3 and FW3 were paraphyly-relevant, and MW3 was located at the tip of the phylogenetic tree (Figs 3 and 4). On the phylogenetic tree obtained 37 months after infection, the distance between the populations obtained from F and M was greater on this tree.

Discussion
We used a recently developed approach based on NGS (Miseq platform) to analyze the HIV-1 quasispecies present in samples, to study the transmission and evolution of HIV-1 under the host immune pressure, and to provide molecular epidemiological data for the tracing of HIV-1 infection. We found that HIV-1 quasispecies richness tended to increase early in infection, but that the diversity of the quasispecies population was low at specific time points (FP5, FP6, FW9, MP8, MW2). The virus may have escaped host immunity through rapid evolution, in which case quasispecies richness might be expected to continue to increase during early infection. Viral adaptation to the host environment resulted in a stable population of quasispecies, with a decrease in quasispecies richness and gene diversity. At this time point, quasispecies richness was high for all the samples except for the whole-blood sample from F. The two participants in this study had recently become infected and chose not to take antiretroviral drugs in the first 37 months post-infection. This situation provided an ideal opportunity for studying the host-virus relationship. However, given the small number of individuals studied, additional investigations will be required to determine whether the trends observed are universal and to strengthen our findings.
There is strong evidence to support the existence of a bottleneck following transmission by heterosexual intercourse. Thus, despite the diversity of the viruses present in the person transmitting the infection, only one or a few quasispecies seem to be transmitted. As shown in the Figs 3 and 4, M was infected within one month after non-marital heterosexual exposure, and then quickly transmitted viruses from M to F during acute HIV-1 infection. Thus, the quasispecies population distribution was fairly narrow in early infection with a close genetic distance, leading to tight clustering on the phylogenetic tree (Fig. 3A-D). Moreover, HIV-1 evolves rapidly in individuals, this distribution gradually became more dispersed over time, resulting in a scattering of samples on the phylogenetic tree. HIV-1 intra-patient quasispecies clustering of both partners intersect each other, and viral strains evolved independently, thus it is difficult to infer the direction of transmission. However, neither viral recombination nor intrapersonal HIV-1 transmission over time were observed for both partners in our study (Fig. 4), suggesting that the quasispecies distribution was relatively tightly clustered during very early infection, with the sequences found located within a single cluster in the phylogenetic tree (Figs 3 and 4).
The relationship and direction of transmission have been successfully inferred by genetic distance and paraphyly relevance studies in a number of cases, in infection trace ability investigations 8,10,18,19 . The duration of HIV-1 infection has a direct impact on the trace ability results. The longer the time since infection, the greater the difference between quasispecies populations is likely to be between the person acting as the source of the infection and the person infected. However, few studies have monitored HIV-1 variants. In this study, we found that the average genetic distance between the first same and follow-up samples tended to increase over time, although follow-up was cut short by the death of M in this study. These findings are consistent with the view put forward by Kupfer 20  amplification from the plasma sample demonstrates the efficacy of the antiretroviral treatment. For whole blood, average genetic distance with respect to the first sample had increased. Overall, the average genetic distance between the viruses present in the two partners increased over time, and this difference was slightly greater for plasma than for whole blood. It remains unclear how long individuals should be followed up.
Attempts to determine the direction of transmission must take several factors into account: (1) the timing of sample collection relative to the onset of infection: the shorter the duration of infection, the easier the interpretation. The accuracy achievable at different time intervals has yet to be studied; (2) the interval between the infection events in the two individuals. Determination is easier for longer intervals. However, care is required when trying to infer the direction of transmission early in infection, particularly for longer sampling intervals.
HIV-1 superinfection may also influence evolutionary analysis. In superinfection, a previously infected individual is re-infected with another strain different from that responsible for the first infection. HIV-1 superinfection has been reported to have a worldwide incidence of 0-7.7% per year 21 . Many superinfections may go undetected due to the low sensitivity of conventional methods 21 . If HIV-1-infected couples, in addition to having sex with each other, also have high-risk extramarital behavior, then HIV-1 quasisequence analysis needs to take into account not only the two partners in the couple, but also more distant external associations. The detection of HIV-1 superinfection requires cloning or NGS techniques. Several studies on HIV-1 superinfection have been published [22][23][24][25]   In summary, the implementation of high-throughput NGS using the Miseq platform can improve the investigation of HIV transmission and subsequently intrapersonal HIV evolutionary studies. This allowed us to study the characteristics of HIV paraphyly for different HIV transmission modes under the pressure of the host immune system, in addition to the molecular epidemiologic relationship between the individuals.

Methods
Ethics statement. The participants provided written informed consent for their information, and clinical samples were stored and used for research. Ethical approval for this study was obtained from the institutional review board of the National Center for AIDS/STD Control and Prevention of the Chinese Center for Disease Control and Prevention, and written informed consent was provided according to the declaration of Helsinki. The methods were carried out in accordance with approved guidelines and regulations.
Epidemiological Materials. The 46-year-old male partner (M), was found HIV-1 seroconversion within one month after non-marital heterosexual exposure, with a viral load of 110,000 copies/ml. His wife (F), who was 32 years old, was also found HIV-1 seroconversion 4 days later, with a viral load of 17,000 copies/ml. She had not engaged in any high-risk behavior other than unprotected sex with her husband. Both M and F were identified in early infection and were followed for 37 and 54 months, respectively. M refused antiretroviral therapy and died after 37 th months of infection. F started antiretroviral therapy after 38 th months of infection and continued to be followed to the 54 th months. Seven whole-blood and nine plasma samples were collected from F at nine time points, and five whole-blood and six plasma samples were collected from M at six time points (Table 1). Thus, in total, 12 immune cells from whole-blood and 15 plasma samples were studied.

Laboratory Tests.
To investigate the HIV-1 similarity and possible transmission between the couple, another six HIV-1-infected patients were enrolled as controls. The RNA was extracted from the plasma using the QIAamp MinElute Virus Spin Kit kit (QIAgen, Germany). The RNA was reverse-transcribed to generate complementary deoxyribonucleic acid (cDNA) using SuperScript III First-Strand Synthesis System by RT-PCR (Invitrogen, USA). The fragments of the HIV-1 gag gene region (HXB2 positions 781 to 1861) was selected for HIV-1 gene subtype analyses, as previously described 17 .
Total nucleic acids and RNA were extracted from whole-blood and plasma samples of HIV-1-infected couple, respectively. A target fragment corresponding to HXB2 positions 7170-7515 was amplified by nested PCR. The primers for the first round of amplification have been described elsewhere 18,27 . The primers used for the second round of amplification were X1-ATAAGKATAGGACCAGGACAA (HXB2 positions 7148 to 7168) and X2-ATGGGAGGRGCATACATTGCT (HXB2 positions 7541 to 7521). Sequencing was performed on the Miseq platform, with a 2 × 300 basepairs (bp) read length. Paired-end reads were assembled, and the raw tags were subjected to qualitative filtering, as previously described 28 . We adopted a conservative approach to ensure high sequencing quality: only HIV-1 quasispecies sequences appearing more than 50 times in whole-blood and plasma samples were analyzed 17,29,30 . We used the HIV-1 Align program (http://www.HIV.Lanl.gov/content/sequence/VIRALIGN/viralign.html) to generate sequence alignments. Repetitive sequences were removed, resulting in a number of unique quasispecies sequences for each sample. We then calculated genetic distances and generated neighbor-joining trees with the Jukes-Cantor model, using MEGA 6.0.6 software. The trees generated were assessed by the bootstrap method, with 1000 replications.