Evolutionary Dynamics and Complicated Genetic Transmission Network Patterns of HIV-1 CRF01_AE among MSM in Shanghai, China

To explore the evolutionary dynamics and molecular transmission patterns of HIV-1 CRF01_AE in depth among men who have sex with men (MSM) in Shanghai, we constructed phylogenetic tree and genetic transmission networks based on 1, 152 pol sequences from MSM, 282 from other risk groups and 795 references. Phylogenetic analyses identified two distinct major CRF01_AE lineages and a Shanghai-based sub-lineage. The estimated tMRCAs for lineage 1 and 2 were 1996.0 (1992.9–1999.2) and 1997.8 (1994.3–2001.4), respectively. Of the 1, 152 MSM, 681 (59.1%) were identified as belonging to 241 separate networks. Of these 681 individuals in networks, 74.2% were linked to cases diagnosed in different years, 4.3% were linked to heterosexual women, and 0.7% were linked to persons who inject drugs. A total of 71 networks including 180 individuals diagnosed in Shanghai with the same domicile were found. Recent infection (P = 0.022) and sampling year after 2011 (P < 0.001) were significantly associated with potential transmission links among the networks. Besides, a significant transmission of viruses with drug resistant mutations at V179D/E were found in the networks. Given these findings, we propose that genetic transmission analysis is a useful tool in HIV intervention strategies to curb the spread of virus and promoting public health.

strains. We then re-constructed the relatively deeply sampled sub-networks of CRF01_AE to depict the temporal and spatial transmission of networks and explore the linking-associated factors among genetic transmission networks.

Multiple introductions of CRF01_AE strains into Shanghai MSM. Bayesian skyline plot (BSP)
deduced that CRF01_AE strains among Shanghai MSM started its initial growth phase around 1998, followed by an exponential expansion during 2000-2007, and reached a steady growth afterwards (Supplementary material 1). Phylogenetic analysis clearly identified two distinct major lineages, Shanghai lineage 1 (SH-L1) and Shanghai lineage 2 (SH-L2) (constituent ratio: 87.2% and 12.4%, Fig. 1). The estimated time of the most recent common ancestors (tMRCAs) for SH-L1 and SH-L2 were 1996.0 (1992.9-1999.2) and 1997. 8 (1994.3-2001.4), respectively (Supplementary material 2). The estimated mean evolutionary rate was 2.7 × 10 −3 nucleotide substitutions/site/ year. The demographic characteristics did not differ substantially between two lineages (Supplementary material 3). SH-L1 consisted of four independent sub-lineages (SH-L1A-D), while SH-L2 just presented a small monophyletic lineage (Fig. 1). A dozen other minor lineages were dispersed in the phylogenetic tree. Previously seven distinct CRF01_AE phylogenetic lineages were found nationwide by China CDC 14 . Remarkably, three of the SH-L1 sub-lineages (except 1D) clustered together with China-lineage 4, while SH L2 clustered together with China-lineage 5. However, we did not find any of the other Chinese lineages in our cohort 14 (Supplementary material 4). Of interest was that the sub-lineage 1D, accounting for 8.6% (n = 99), did not match with any reported lineages in China, indicating a unique Shanghai-based epidemic. The estimated tMRCA for this new identified sub-lineage was 2003. 5 (2001.0-2005.9), later than all other sub-lineages (Supplementary material 2). Compared to other lineages/sub-lineages, the individuals included in 1D were more likely to be single (82.8% versus 68.5%, P = 0.028), while other demographic characteristics did not differ.
Of the 681 Shanghai MSM in networks, 74.2% (505) were linked to cases diagnosed in different years. As shown in Fig  To explore the potential transmission between recent infection and long-standing infection in the networks, 517 (out of 681, 75.9%) recent infections (< 1 year) were identified using a molecular algorithm (See Methods), of which 417 (80.7%) were interlinked among recent infections, and 162 (31.3%) were linked to the long-standing infections (> 1 year). Of the 164 long-standing infection, 71 (43.3%) were interlinked among long-standing infections, and 117 (71.3%) were linked to recent infections.
To better understand the role that the migrants (living in Shanghai for more than half a year) played in the formation of networks, we investigated the influence of their domicile on transmission. We found the transmission linkage existed not only in the individuals with different domiciles and diagnosed in Shanghai (intra-province), but also in the individuals with same domiciles (townsmen) but diagnosed in the domicile places and Shanghai (inter-province). Seventy-one networks including 180 individuals diagnosed in Shanghai (dots in different colors based on domicile) and those who were diagnosed in their hometowns (squares in different colors based on domicile) are shown in Fig. 4A and Supplementary material 6. In addition, 5 networks including 11 individuals diagnosed in Shanghai were linked with 6 Thai individuals (Supplementary material 7).

Discussion
In this study, we identified two major CRF01_AE lineages circulating among Shanghai MSM individuals and a new sub-lineage that was unreported elsewhere in China. The SH-L1 included most of subjects' sequences (87.2%) and had a close match with China-lineage 4, the most common CRF01_AE lineage in MSM epidemic nationwide, while SH-L2 was closely associated with China-lineage 5, a common lineage related to heterosexuals and injection drug users epidemic 14 . Our previous study and others indicated that about 20% MSM in Shanghai had sex with women and an estimated 8.3% MSM in China was reported to have used illicit drugs in the past 6 months. In this study, the genetic transmission network analysis reconfirmed the interaction between MSM and other groups. Although the introduction of SH-L1 into Shanghai MSM was slightly earlier than SH-L2, it showed an obviously evolutionary divergence of CRF01_AE which had resulted in at least 4 sub-lineages epidemics. We speculate that the new sub-lineage would probably continue to expand given 60.6% of its members are domestic migrants.
As a coastal international metropolis, Shanghai attracted an increasing numbers of domestic migrant people in recent years. As these migrants constantly face difficulties in accessing employment, social welfare, education, and health services locally under the current household registration system, they usually flow between hometowns and different cities 15 . Our network analysis revealed the occurrence of transmission not only inside Shanghai city, but also between Shanghai and other provinces, indicating the complicated transmission patterns and the difficulty in intervention among MSM population. We realize that ongoing HIV infection and the high proportion of networking may also reflect a concentrated uncontrolled MSM epidemic in Shanghai, due to lack of diagnosis, non-treatment in time after diagnosis, failure of viral suppression after ART initiated, and lack of contact with care 11,[16][17][18] . In this study, we found that recent infection was closely related to potential transmission linking among networks, which is consistent with previous studies, suggesting that early HIV-1 infection has an important role in transmission events indeed 19 . This observation, once again, emphasizes the importance of early diagnosis and timely antiretroviral treatment. Previous studies showed that the individuals with more links in the network have a higher probability of spreading the virus to others because of a high viral load and a high rate of partner change 20,21 , and therefore, these individuals may play the role of super-spreaders 22,23 . Thus, our, genetic transmission network analysis confirmed that a minority of super-spreaders indeed drive transmission networks to a significant extend.
An interesting observation in this study was that V179D/E, an NNRTI-associated mutation was dominating Shanghai MSM. Sequences with V179D/E were distributed and networked in most sub-lineages in SH-L1, suggesting that several independent CRF01_AE strains with V179D/E were involved in ongoing transmission in this city. Besides, we found a higher proportion of V179D/E mutation in Shanghai CRF01_AE strains than in all others China CRF01_AE strains. This observation constitutes an alternative molecular evidence for determination of HIV-1 transmission among Shanghai MSM.
Although the present study was intended to include all first-time follow-up MSM after diagnosed, there was still ~20% loss to follow. Besides a misclassification as heterosexuals could be present among MSM 24 . However, analysis of ≥ 50% subjects from the available follow-up subjects in this study was suggested to satisfy the requirements of cluster analysis in the concentrated areas 25 . Therefore, we believed that the size of samples derived from MSM was reasonable and might not significantly bias our analysis.
In conclusion, we elucidated the molecular evolutionary dynamics of HIV-1 CRF01_AE circulating among Shanghai MSM. Genetic transmission network analysis further revealed a complexity of transmission pattern. We thus suggest that network analysis based on a molecular approach, combined with social science and public health approaches could be helpful in effective HIV intervention strategy to curb the spread of virus and promote public health.

Study subjects.
A total of 2, 252 MSM blood samples were collected at the first follow-up after HIV diagnosis during 01/2008-12/2013, representing > 80% of newly HIV-1 diagnosed MSM in Shanghai. A total of 1, 836 sample sequences were acquired after RNA extraction and PCR amplification, among which, 1, 152 CRF01_AE isolate sequences with > 1, 000 bp were selected for further analysis. The corresponding demographic characteristics and CD4 + T cell counts were also collected for analysis. None of the individuals was exposed to antiretroviral treatment (ART) at the time of enrolment. In addition, 130 heterosexual women sequences, 134 heterosexual men sequences and 18 PWID sequences were acquired from individuals diagnosed in Shanghai. We also isolated all closely related publicly available sequences with 1, 152 CRF01_AE strains of MSM from HIV databases of Scientific RepoRts | 6:34729 | DOI: 10.1038/srep34729 Los Alamos National Laboratory. We performed a BLAST search 26 by using the 1, 152 sequences, and selected top 10 sequences with the highest homology with the references for each sequence. A total of 795 reference sequences were selected for analysis after excluding repeated sequences. Eventually, 2, 229 CRF01_AE sequences were included for subsequent analysis. In order to exclude the influence of convergent evolution at drug resistance mutation site on the phylogenetic analysis, 45 sites in protease (PR) and reverse transcriptase (RT) were removed before phylogenetic analysis. HIV-1 Drug resistance mutations were determined using Stanford University HIV Drug Resistance Database tool: HIVdb Program: Sequence Analysis (http://sierra2.stanford.edu/sierra/servlet/ JSierra?action= sequenceInput) and the last updated (Oct-Nov 2015) guidelines from the International AIDS Society Resistance Testing-USA panel 27  Phylogenetic analyses. FastTree 2.3 28 was used to estimate an approximately-maximum likelihood phylogenetic tree for pol sequences using the GTR + G + I nucleotide substitution model. The phylogenetic tree's reliability was determined with local support values based on the Shimodaira-Hasegawa (SH) test 29  Identification and analysis of genetic transmission networks. The flowchart of genetic transmission networks was depicted as Supplementary material 8. Briefly, transmission clusters were extracted from the phylogenetic tree using the software Cluster Picker 30 . Transmission clusters were defined as node support threshold greater than 95% and intra-cluster maximum pairwise genetic distances less than 3.0% nt substitutions per site. As a significant proportion of the sequences (26.3%) were presumably from patients in the stage of long-standing infection (details in below), we chosen a threshold of genetic distance confined to 3.0% [31][32][33] , in order to identify relevant transmission clusters 30 . The pairwise genetic distances of all sequences within the available clusters were calculated. The minimum genetic distances algorithm was used to define the linkages within a cluster (Supplementary material 9). For visualizing and analyzing network, the network data were processed using a custom R script utilizing the network package in the R software 34 .
Analysis of individuals with potential transmission Links. Three groups were compared including (1) individuals with no link to others, (2) individuals who linked to another one, and (3) individuals who linked to ≥ 2 others. Chi-square test was used to determine linking-associated factors among networks. The demographic and laboratory information retrieved included sampling year, recent infection, age, domicile, education, marital status, number of sex partner in the past 6 months, and CD4 + T cell counts. We used a molecular algorithm of a frequency of ambiguous calls in bulk sequencing of pol gene under 0.5% to define a recent infection event < 1 year prior to clustering analysis 35,36 . Ethics statement. The study protocol was reviewed and approved by the Institutional Review Board at the Human Medical Research Ethics Committee of the Shanghai CDC. No additional informed consent from participants was obtained for this special investigation as the data were analyzed retrospectively and anonymously. All research methods in this study were carried out in accordance with the approved guidelines.