A Large-scale Survey of CRF55_01B from Men-Who-Have-Sex-with-Men in China: implying the Evolutionary History and Public Health Impact

The HIV-1 epidemic among men-who-have-sex-with-men (MSM) continues to expand in China, involving the co-circulation of several different lineages of HIV-1 strains, including subtype B and CRF01_AE. This expansion has created conditions that facilitate the generation of new recombinant strains. A molecular epidemiologic survey among MSM in 11 provinces/cities around China was conducted from 2008 to 2013. Based on pol nucleotide sequences, a total of 19 strains (1.95%) belonged to the CRF55_01B were identified from 975 MSM in 7 provinces, with the prevalence range from 1.5% to 12.5%. Near full length genome (NFLG) sequences from six epidemiologically-unlinked MSM were amplified for analyzing evolutionary history, an identical genome structure composed of CRF01_AE and subtype B with four unique recombination breakpoints in the pol region were identified. Bayesian molecular clock analyses for both CRF01_AE and B segments indicated that the estimated time of the most recent common ancestors of CRF55_01B was around the year 2000. Our study found CRF55_01B has spread throughout the most provinces with high HIV-1 prevalence and highlights the importance of continual surveillance of dynamic changes in HIV-1 strains, the emergence of new recombinants, and the need for implementing effective prevention measures specifically targeting the MSM population in China.

infection cases, the proportion of MSM increased from 0.3% before 2005, to 29.4% in 2011.The regions with the highest HIV prevalence among MSM were Guizhou, Sichuan, Guangdong, Jiangsu, Henan, Liaoning and Beijing 4,5 .
Our recent nationwide survey revealed that multiple HIV-1 strains have been detected among MSM in China. Major strains include the 2 lineages of CRF01_AE and 1 lineage of CRF07_BC strains 6 . These three HIV-1 lineages account for more than 75% of HIV infections among MSM in nine major Chinese cities 6 . Co-circulation of multiple lineages of HIV-1 strains had led to the inevitable emergence of various forms of inter-genotype recombinants and of novel circulating recombinant forms (CRFs). In 2006, one CRF01_AE/B recombinant among MSM population was reported in Malaysia designated CRF33_01B. This was the first CRF identified among MSM in Asia. In contrast, among Chinese MSM, the first identified circulating recombinant form CRF55_01B was identified by our group in 2013 7 . Moreover, a recent paper reported that an outbreak prevalence of the CRF55_01B strains among MSM has formed in Shenzhen, southern China 8 .
In the present study, we discuss the first CRF (CRF55_01B) detected among MSM in China, specifically regarding its evolutionary history and public health impact, based on 975 newly diagnosed HIV-1 infected cases from a prospective HIV primary infection cohort and 2 cross-sectional surveys conducted on 11 provinces/cities between 2008 and 2013.

Ethics Statement. The study was approved by the ethics committee of the AIDS Research Center of China
Medical University in Shenyang. All the methods involving human subjects were carried out in accordance with relevant approved guidelines and regulation. All study subjects provided informed consent regarding the provision of blood samples and HIV-genotype analyses.
Study Subjects. Blood  number in each site was listed in Table 1 (Table 1).10 ml EDTA-3 K anti-coagulated peripheral blood samples was collected from each case, the plasma was separated within 6 hours after collection and frozen at − 80 °C for further analysis.
RNA extraction, partial pol gene amplification and sequencing. RNA was extracted from 280 μ l of plasma using QIAamp ® Viral RNA Mini Kit (Qiagen, Germany) in a final elution volume of 60 μ l. The pol gene sequences (HXB2 2253-3318 nt) were amplified using a previously published method 9 . Briefly, partial gene sequences of the HIV-1 pol region (HXB2 2253-3318 nt) were reverse-transcribed, amplified with SuperScript TM Polymerase One-Step RT-PCR System (Invitrogen), and subjected to nested amplification using GoTaq DNA Polymerase (Promega). PCR products were purified using QIAquick Gel Extraction Kit (Qiagen) and sequenced directly with ABI PRISM Bigdye Terminator Cycle Sequencing Ready Reaction Kit and the same primers used in the previous publication 10 .
Single genome amplification and sequencing. The  Phylogenetic tree and recombination breakpoint analyses. All sequences were screened by using the HIV BLAST tool to detect laboratory contamination. Valid sequences were aligned with HIV-1 reference strains from the Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov). Alignment and manual editing Evolutionary analysis. Estimation of evolutionary rate and the time of the most recent common ancestor (tMRCA) for CRF01_AE lineages were performed as described previously 10 . Bayesian Markov chain Monte Carlo (MCMC) inference under the relaxed lognormal molecular clock was selected as a reliable mode for this analysis 11 . The MCMC chains were run 20 million times and sampled every 1000 steps. Bayesian MCMC output was analyzed using TRACER v1.5, and all parameters were estimated from an ESS > 200. The trees were summarized in a target tree using the Tree Annotator program and scanned using the Fig. Tree program1.3.1.

GENBANK Accession Numbers. The near-full-length sequences reported in this article are available in
GenBank under accession numbers JX574661 to JX574663, KF927150 to KF927151 and KC183777.  Table 1). As shown in Fig. 1A, the CRF01_AE and CRF07_BC strains circulating among the MSM population formed three distinct monophyletic clusters 6,10 : CRF01_AE MSM clusters 1 (312, 32%) and 2 (229, 23.5%), and CRF07_BC cluster 3 (238, 24.4%) (Also as Table 1). Furthermore, we found an additional phylogenetic cluster (n = 19, 1.9%) with high statistical support for the cluster's singularity (bootstrap value of 100%), belong to CRF55_01B reported by our group in 2013 (Fig. 1A). In this study, the CRF55_01B strains were detected in 7 out of 11 provinces/cities across China: the highest prevalence was found among MSM in Guangdong (12.5%, 5 of 40), Hunan (7.4%, 5 of 68) and Shandong provinces (7.1%, 3 of 42), followed by Henan (3.4%, 2 out of 58), Jiangsu   Fig. 2). Recombination breakpoint analyses of the 1.1-kb pol (pro-RT) sequences of these strains showed that they contain a small subtype B segment within a CRF01_AE backbone (data not shown). To further characterize the recombinant structure of these strains, we determined NFLG sequences by using available plasma specimens from the 19 epidemiologically-unlinked MSM (Table 1). A total of 6 NFLG sequences from the different study subjects (three from Hunan, two from Guangdong, and one from Anhui) were successfully amplified and determined. As shown in Fig. 1B, neighbor-joining tree analysis of the NFLG sequences confirmed that these six strains indeed formed a distinct monophyletic cluster with a bootstrap value of 100%.

Spread
Recombination breakpoint analyses revealed that these six strains had identical genome structure: two subtype B segments contained within a CRF01_AE backbone in the pol region (reverse transcriptase and integrase regions) (Fig. 3A,B). The recombinant structure is designated as to CRF55_01B 7 . To further confirm the subtype structure and to estimate likely parental lineages of CRF55_01B, we performed subregion tree analyses in which the HIV-1 genome was divided into five regions (denoted I, II, III, IV and V as illustrated in Fig. 3B). As shown in Fig. 3C, the CRF01_AE regions (Regions I, III and V) belonged to the Thai CRF01_AE radiation and did not belong to any other known CRF01_AE variants, including previously identified Chinese MSM clusters 1 and 2 10 (Fig. 4) using a relaxed molecular clock approach, respectively. Because the tMRCA estimations using individual or combined CRF01_AE regions [Regions I, III and V and the concatenated genome region for CRF01_AE (I + III + V)] yielded essentially similar results, we showed the maximum clade credibility (MCC) tree for only the concatenated CRF01_AE region (I + III + V) (Fig. 4A). For the subtype B region, because we were not able to obtain the tMRCA estimation with enough statistical support for Region II due to the shortness of the nucleotide sequence in this region (209nt), we provided the MCC tree for the concatenated subtype B segments (Regions II + IV) (Fig. 4B).
As shown in Fig. 4A,B, the estimated tMRCAs for the concatenated CRF01_AE regions (Regions I + III + V) and the concatenated subtype B regions (Regions II + IV) were 2000.2 [95% highest probability density (HPD): 1997.9, 2002.6] and 2000.4 95% HPD: (1996.5, 2004.1), respectively. The estimated tMRCAs for the CRF01_AE and subtype B regions were in agreement (see also Fig. 4C). This suggests that the recombination that generated CRF55_01B from parental lineages of subtype B and CRF01_AE occurred around the year 2000, consistent to the finding by Zhao et al. via the analysis of CRF55_01B pol fragments of Shenzhen MSM 8 . In contrast, the estimated tMRCAs for Chinese CRF01_AE MSM clusters 1 [1991.2 (1988.2, 1994.3)] and cluster 2 [1994.9 (1992.1, 1997.6)] are significantly older than those of CRF55_01B (Fig. 4A).

Discussion
Our large-scale molecular epidemiologic survey (Table 1) revealed that CRF55_01B, originally identified among three epidemiologically-unlinked MSM in Guangdong and Hunan province in southern China 7 , disseminated widely among MSM in major cities southern, eastern and central China. Although the CRF55_01B strain only accounted for 1.9% (19 of 975) of HIV-1 infections among MSM in this study, the Guangdong province and Hunan province are still the regions with highest CRF55_01B prevalence (12.5 % and 7.4%, respectively) among the 11 provinces/cities, consistent with the regions where the CRF55_01B strains firstly reported. In addition to the above regions, we also found Shandong province, located in eastern China, with a relatively high CRF55_01B region prevalence (7.1%). Moreover, we detected CRF55_01B strains in MSM from Henan and Anhui in central China, Jiangsu in eastern China and Yunnan, southwestern China. The prevalence ranged from 1.5 to 3.4%. However, this new CRF has not been detected in northern China (0 of 426) (Table 1, Fig. 2). The above data implied CRF55_01B, the recently identified CRF, had spread widely. On the other hand, this was not a random sampling investigation and the sampling sizes were not proportional to the local HIV prevalence, no definitive conclusion could be got. However, in some regions, such as Liaoning, Beijing and Yunnan, a HIV primary infection cohort was included as well as 2 cross-sectional studies, the specimen outnumbered other regions, we can hardly detected CRF55_01B strains there (0-1.5%), implying CRF55_01B has little impact on the above regions. In summary, the apparent distribution differences suggested that the CRF55_01B might originate among MSM in southern Chinese provinces and then co-circulated in eastern and central Chinese provinces. A recent study on MSM in Shenzhen, southern   12,23 . This difference suggests that subtype B of U.S.-European origin entered first into the MSM populations of the aforementioned countries.
The emergence of CRF55_01B is a relatively recent event. As shown in Fig. 3, Bayesian molecular clock analyses revealed that the timing of the emergence of CRF55_01B is estimated to be around the year 2000 for both the CRF01_AE and subtype B regions. This timing indicates that CRF55_01B was indeed generated earlier this century via recombination between the CRF01_AE and subtype B strains co-circulating among MSM in southern China. This timing also makes CRF55_01B significantly younger compared to the other HIV-1 lineages associated with MSM transmission in China: CRF01_AE MSM cluster 1 (~1991) and cluster 2 (~1994) (Fig. 3) 6 . The founding effect due to the emergence of CRF55_01B well before HIV surveillance detected the rapid expansion of HIV infections among MSM in the mid-late 2000 s may explain the relatively high prevalence (~10% level) of this young CRF in some Chinese cities. Although CRF01_AE subregions of CRF55_01Bs still belong to Thai CRF01_AE, not CRF01_AE cluster1 or cluster 2 that are spreading in Chinese MSM 6,10 (Fig. 3), and also these CRF55_01B are more fresh than the two CN-MSM CRF01_AE clusters, we believe that the more complex recombinants totally originated from Chinese MSM will emerge quickly, result from the frequent communication and co-circulating various HIV strains among MSM population.
The rapid upsurge of HIV infections among MSM in China is fuelled by high-risk behavior, including unprotected sex and exchanging sex for money, and inadequate knowledge about HIV among Chinese MSM 2 . This combination of ignorance and high-risk behavior makes MSM more vulnerable to super-infections. Therefore, the potential possibility of co-existing HIV-1 strains in individual MSM leads to the inevitable generation of new recombinant strains. Of the potentially many recombinant strains generated, only those that have spread widely via MSM transmission have come to be recognized as CRF(s). Indeed, several studies in China have begun to detect various recombinants and CRF candidates among MSM in different regions of China [24][25][26][27][28][29] . In our study, besides CRF55_01B, we found other potential CRF candidates among undefined recombinants (Table 1). We expect to identify additional new recombinant strains and CRFs among the Chinese MSM population.
In summary, we found that the novel recombinant CRF55_01B has disseminated widely among MSM in China. Our findings also detected the occurrence of diverse forms of potential recombinant strains affecting China's MSM population, a result of the high-risk behavior exhibited by MSM that highlights the urgent need for implementing effective measures to reduce HIV-1 transmission in this population.