CRF07_BC is associated with slow HIV disease progression in Chinese patients

HIV subtypes convey important epidemiological information and possibly influence the rate of disease progression. In this study, HIV disease progression in patients infected with CRF01_AE, CRF07_BC, and subtype B was compared in the largest HIV molecular epidemiology study ever done in China. A national data set of HIV pol sequences was assembled by pooling sequences from public databases and the Beijing HIV laboratory network. Logistic regression was used to assess factors associated with the risk of AIDS at diagnosis ([AIDSAD], defined as a CD4 count < 200 cells/µL) in patients with HIV subtype B, CRF01_AE, and CRF07_BC. Of the 20,663 sequences, 9,156 (44.3%) were CRF01_AE. CRF07_BC was responsible for 28.3% of infections, followed by B (13.9%). In multivariable analysis, the risk of AIDSAD differed significantly according to HIV subtype (OR for CRF07_BC vs. B: 0.46, 95% CI 0.39─0.53), age (OR for ≥ 65 years vs. < 18 years: 4.3 95% CI 1.81─11.8), and transmission risk groups (OR for men who have sex with men vs. heterosexuals: 0.67 95% CI 0.6─0.75). These findings suggest that HIV diversity in China is constantly evolving and gaining in complexity. CRF07_BC is less pathogenic than subtype B, while CRF01_AE is as pathogenic as B.

The prevalence of subtypes varied significantly according to sex, age, ethnicity, transmission risk group, and region. Table 2 shows the subtype diversity within the demographic subgroups. Women had a greater prevalence of infection with subtype B and a lower prevalence of infection with CRF01_AE and URF. Older individuals (≥ 65 years) tended to have lower prevalence of infection with CRF01_AE. Genotypic diversity was the greatest among heterosexuals, in which 35 HIV genotype categories were detected, the most prevalent being CRF01_AE. MSM included 21 subtypes, of which CRF01_AE was the most prevalent. IDU most likely had CRF07_BC. Individuals of Uyghur and Yi ethnicity predominantly, though not exclusively, were infected with the CRF07_BC virus. Figure 2. illustrates the regional distributions of the seven common subtypes HIV strains. CRF07_BC was more prevalent in the southwest and northwest regions. In the other four regions, the subtype with the highest prevalence was CRF01_AE. Of note, a significantly high prevalence of URF was detected in the southwest region (9.6%).The prevalence of comorbidites for individuals with HIV subtype B, CRF01_AE, and CRF07_BC were 14.0% (11.7-16.5%), 5.5% (2.6-6.4%), and 2.7% (1.9-3.5%), respectively.
HIV subtype temporal trends. Table 1 presents the temporal trends for the seven common subtypes.  [15][16][17][18][19][20][21][22][23][24] . Both methods rely on an accurate determination of the date of HIV acquisition. As the date of HIV acquisition was not available for the majority of our study participants, we could not determine the rates of HIV disease progression with precision. However, the unavailability of the date of HIV acquisition did not prevent the direct comparison of disease progression between the three major subtypes on a population level. Though the time between HIV infection and diagnosis inevitably varies substantially among individuals, we believe that the interval was well-matched between the three major subtypes. In other words, they have identical median times. We aimed to compare the natural rate of CD4 decline between the three major subtypes within this interval. The interval between infection and diagnose is also part of nature disease history. Our analysis was based on the hypothesis that if the three major subtypes progress equally, there should be no difference in their median CD4 counts at the time of diagnosis on a population level. We defined origin time as the estimated date of HIV acquisition and the end www.nature.com/scientificreports/ time as the date of HIV diagnosis. We defined disease progression as the decline in CD4 count from the time of HIV acquisition to the time of HIV diagnosis. Therefore, our analysis was limited to the individuals with available CD4 counts. CD4 counts of the seven common subtypes were compared overall and by sampling phase. The CD4 count of CRF07_BC was always significantly higher than those of subtype B and CRF01_AE (P < 0.01) (Fig. 3). Therefore, we empirically hypothesized that disease progression in these subtypes could be different. Table 3 reports the association between patient characteristics and laboratory acquired immunodeficiency syndrome ([AIDS], defined as a CD4 count < 200 cells/µL) at diagnosis (AIDSAD). In univariable logistic analyses, the risk of AIDSAD was significantly associated with sex, age, transmission risk group, and HIV subtype. After adjustment for these factors in the multivariable analysis, patients infected with CRF07_BC had only less than half of the risk of AIDSAD than those infected with subtype B (as odds ratios [OR]  www.nature.com/scientificreports/ patients (OR 0.67, 95% CI 0.6-0.75). The Yi ethnicity was associated with a lower risk of AIDSAD (OR 0.43, 95% CI 0.17-0.9); however, the sample size was very small. Three sensitivity analyses, excluding heterosexuals, MSM, and IDU, were performed, and the outcomes were consistent with those obtained from the whole population (Supplementary Table S7 -9). We also analyzed the decline in CD4 count between the time of HIV infection and diagnosis using multivariable linear regression (MLR). The results were consistent with those obtained by logistic analysis (Supplementary Tables S10-12).

Discussion
To our knowledge, this is the largest study to date reporting on the national distribution and trends of HIV subtypes in China, with a sample size of over 20,000, and spanning 1994-2020 4 . These data showed that the HIV epidemic in China exhibited some of the greatest global genetic diversity, consisting of 38 HIV subtypes. The only other country to match China is the United States, which has approximately 15 subtypes 6,7 . This high and sharp increase in HIV subtype diversity in China is consistent with evidence from most regions of the world 3-9 . www.nature.com/scientificreports/ Although there were variations in the prevalence of the three major subtypes, the combined prevalence of these subtypes was stable throughout the study period, suggesting they might be an indicator of equally stable HIV transmission in China. The data revealed that the previously described subtype compartmentalization 4 no longer existed in the transmission risk group or was of diminished impact in the geographic region, but persisted in people of the Uyghur and Yi ethnicity, in which it was as strong as it ever. Global travel and acquisition of infections abroad, population floating, and domestic transmission all likely contribute to increasing HIV viral diversity 4-9 .
The comparison of disease progression between subtype B and other subtypes has been hindered by the fact that there are few populations with multiple circulating subtypes, including subtype B 15-24 .The epidemic in China characterized by CRF01_AE, CRF07_BC, and subtype B co-circulating provides a unique opportunity for such a direct comparison. The data revealed that CRF07_BC progresses slower than subtype B, while CRF01_AE progresses as fast as subtype B. Consistent with the results of concerted action of seroconversion to AIDS and death in Europe (CASCADE) 16,17 , disease progression did not differ significantly by sex. The middle (45-64 years) and the older (≥ 65 years) age groups had the faster disease progression than the young (< 18 years). However, a lower disease progression was observed in MSM compare to that of heterosexuals. We hypothesize that this difference could be attributed to a shorter interval between seroconversion and diagnosis in MSM compared to that seen in heterosexuals, because most targeted HIV testing campaigns in China have always focused on the MSM population 2 .
The CRF07_BC strain is a relatively young HIV strain, that originated in IDU in China and is mainly confined to China 26 . During the past two decades, the number of individuals infected with CRF07_BC has undergone a significant increase in China, accounting for 38% of all infections in phase 2018-2020. Although it descends from the two most prevalent strains in the world (subtypes B and C), CRF07_BC displays many unique characteristics that differ from those of its parent strains. Li, et al. have also observed that individuals infected with CRF07_BC have a significantly higher baseline CD4 counts than those infected with CRF01_AE 14 . However, they did not realize that the higher CD4 counts could be regarded as a proxy of slower disease progression, nor did they generalize from their finding the conclusion that CRF07_BC progresses slower than CRF01_AE. We   30 have demonstrated that CRF07_BC is associated with better immune recovery in Chinese patients undergoing antiretroviral treatment (ART) compared to that of patients infected with CRF01_AE. Taken together, these results support the hypothesis that CRF07_BC is less pathogenic than subtype B. Before 2014, people in China tended to accept the viewpoint that the Chinese people infected with HIV will have approximately ten years AIDS-free time before they enter the AIDS phase, as reported by the CASCADE study 16,17 . In 2014, Li, et al. showed that infection with CRF01_AE is associated with faster disease progression in Chinese patients infected through the sexual transmission risk group compared to that of patients infected with non-CRF01_AE (most were CRF07_BC and subtype B) 10 . The time interval between seroconversion and AIDS was only 4.8 years for CRF01_AE. The non-difference in disease progression for CRF01_AE and subtype B in these findings suggested that the time from seroconversion and AIDS for subtype B was far shorter than that previously believed. Two explanations are suggested. First, since ethnicity has been proven to be a major determinant of disease progression 18  The current study has significant implications for clinical practice and policy-making. First, since approximately 60% of patients (subtype B plus CRF01_AE accounted for 58.2%) with new infections in China will progress to AIDS within 4.8 years, these findings justify early treatment. Second, results of this study necessitate subtype-specific monitoring and treatment guidelines. Patients with CRF07_BC may have a better prognostic treatment outcome. Third, in evaluating AIDS disease burden, the prevalence of CRF07_BC should be taken into account.
This study has several limitations. First, although it is the largest study of this kind, this study represents only approximately two percent of all individuals living with HIV in China. Thus, these findings might not be fully representative. Second, viral loads (VL) information was not included in the study, which did not permit the evaluation of the association between VL and subtype. However, this is a goal of a future study. Third, the biological mechanisms underlying these observations were not elucidated. On this point, Huang et al. 12 have shown that patients infected with CRF07_BC have significantly lower VL than those of patients infected with subtype B, which may be due to the deletion of seven amino acids that overlap with the apoptosis-linked gene 2-interacting protein (Alix) protein-binding domain of the p6 gag . Fourth, the infection time for most of the participants was unavailable, so the rate of CD4 count decline per year could not be assessed. Indeed, as China implemented the World Health Organization (WHO)'s 'treat-all' , 'treat-early' , and 'treatment as prevention' policy in 2016 32,33 , approximately 90% of individuals with HIV were treated with ART within the first year after diagnosis, making an evaluation of the natural disease progression was not only impractical, but unethical. This study provides a novel method to directly compare the rate of natural disease progression between subtypes, that is, the duration between the infection and the diagnosis as follow-up time, and to treat the follow-up time as a matching variable in multivariable logistic analysis. Fifth, the MSM population was most likely over-represented in the study sample. However, the original data, from which stratified and weighted results may be easily calculated, has been provided. www.nature.com/scientificreports/ www.nature.com/scientificreports/ In summary, these results highlight a China HIV epidemic characterized by a high prevalence of CRF01_AE, CRF07_BC, and subtype B infections, with an overall increasing subtype diversity over the past 26 years, providing a unique opportunity to directly compare disease progression among the three subtypes. Disease progression was slower with CRF07_BC infection than with that of subtype B infection. Moreover, for the first time, it was shown that infections with CRF01_AE progressed as fast as those with subtype B. Future studies focusing on the effect of subtype on the outcome of ART, which include more confounding variables, such as VL, will help improve clinical practice and policymaking.

Methods
Study population and design. The study population consisted of two separate populations of HIVinfected individuals. The first group comprised all patients with the HIV TDR genotype, performed between 2001 and 2020 at the BHLN. BHLN is a national collaboration engaged in surveillance of HIV TDR in China 27,28 . These methods have been previously described. Briefly, approximately 40% of the samples from all individuals newly diagnosed with HIV infection by BHLN between 2001 and 2020 were randomly selected 27,28 . The BHLN takes part in maintaining the national HIV epidemiology database, which tracks everyone who receives a diagnosis of HIV infection in China and records the baseline CD4 count of all individuals with newly diagnosed HIV infection. The baseline CD4 count was the value from their CD4 count closest to the date on which their HIV infection was confirmed by western blot within one year. Baseline demographic data on sex, age, ethnicity, Hukou province, and the transmission risk group were retrieved from this database.
The second group included publicly available sequences from the LANL 34 . All the pol sequences sampled in China with known sampling provinces, sampling years, and transmission risk groups available in the database were downloaded (data available as of December 1, 2019).
Phylogenetic analysis. Sequences were aligned using the BioEdit tool and the alignment was manually corrected according to the encoded reading frame. Duplicate sequences were discarded. If several sequences from the same patient were available in the database, only the oldest was retained. Long branch sequences were re-confirmed for their genotype, and those that were miscatalogued were eliminated from the study. A maximum likelihood phylogenetic tree was reconstructed with the merged dataset using the GTR + CAT nucleotide substitution model in FastTree 2.1 35 .The HIV subtype was inferred by automated subtyping using context-based modeling for expeditious typing (COMET) 36 , followed by phylogenetic analysis. Each sequence was assigned to one of eight subtypes, one of 102 circulating recombinant forms (CRF), or "unassigned. " An "unassigned" sequence was deemed a possible unique recombinant forms (URF) 6 .
Cohort of natural disease progression. The BHLN may also be used as a cohort to study natural disease progression of HIV in China. The starting point of the study was set as the onset of the infection and the outcome was AIDSAD. The follow-up time was the duration between the starting point and the outcome. As the seroconversion time for most of the participants was unavailable, the follow-up time was unmeasurable. To solve this problem, the follow-up time was treated as a matching variable in cohort analysis, as we hypothesized that the distribution of the follow-up time was well matched within the same transmission risk group and roughly matched the study population as a whole. Three sensitivity analysis were performed by excluding heterosexuals, MSM, and IDU for the comparison of subtypes.
Statistical analysis. For geographic location, participants were grouped into 31 provinces according to the Hukou. Hukou is a basic household registration system in China; this system officially identifies a person as a resident of an area and includes identifying information such as name, parents, spouse, and date of birth. These provinces were further divided into six regions according to their proximity and socio-economic status, in line with guidelines from the National Bureau of Statistics of China: north, northeast, east, central-south, southwest, and northwest. Six sampling phases were established: 1994-2005, 2006-2008, 2009-2011, 2015. The earliest (1994-2005 phase encompassed more years to account for the relatively fewer data available in these years. The prevalence of subtype by sex, age, ethnicity, transmission risk group, Hukou province, and region was calculated and the subtype distribution trends over the six sampling phases were examined. Categorical data were compared using the chi-squared test and continuous data were compared using one-way analysis of variance, wherever appropriate. Potential risk factors for acquiring AIDSAD were analyzed using logistic regression. Biologically plausible interactions were assessed in the multivariable model. Variables included sex, age (< 18, 18-24, 25-44, 45-64, and ≥ 65 years), ethnicity, region, subtype, transmission risk group, and sampling phase. In the model, a binary response was included, indicating the acquisition AIDSAD from each patient as an outcome. All variables were analyzed separately and the associated variables (P < 0.1) with their outcomes were entered into the multivariable model. The logistic results are expressed as OR with 95% confidence intervals (CI) and two-sided P values, where P < 0.05 was considered significant.
The decline in CD4 count between the time of HIV infection and diagnosis were analyzed using MLR. In the regression, the dependent variable was the difference in CD4 count between HIV infection and diagnosis, and the independent variables were all the variables selected in the logistic regression. The MLR results are presented as coefficients and P value. Since pre-infection CD4 counts were not measured, the reference median CD4 count in Chinese healthy adults was used 37 . All analyses were performed using R software (version 4.1.1; R Foundation, Vienna, Austria) and a listwise deletion was used to handle the missing data. www.nature.com/scientificreports/ Ethical issues. All analyses were performed on de-identified datasets to protect participants' anonymity.