Introduction

China has a slowly increasing HIV epidemic, with 64,170, 71,204, and 63,154 new cases in 2018, 2019, and 2020, respectively, and 818,360 individuals living with HIV at the end of 20201. During the first two decades of the epidemic (1985–2005), most HIV cases were concentrated in the injection drug users (IDU [44.2%]) and former blood donors (29.6%), but since 2006, there has been a clear expansion in the number of the HIV cases in heterosexuals and men who have sex with men (MSM). In 2019, heterosexuals and MSM accounted for 73.8% and 23.3% of new diagnoses, respectively, with IDU accounting for 3.4%2. Understanding the increase in HIV diversity within China is not only of epidemiological interest but also has far-reaching clinical implications3,4,5,6,7,8,9.

One of the fascinating findings concerning the HIV subtype in China is the belief that CRF01_AE progresses faster than CRF07_BC10,11,12,13,14. However, these studies were limited by small sample sizes and failed to adjust for important confounding factors. Worldwide, findings consistently indicate that the rates of disease progression among different HIV subtypes are, in descending order, subtype C>D>CRF01_AE>G>A15,16,17,18,19,20,21,22,23,24. Although subtype B is the most studied because of its predominance in North America and Europe, it is absent in this comparison chain.

When comparing subtype B with non-B strains using non-B as the comparator, it is assumed that all subtypes except for B progress equally, which is obviously not the case. To date, no previous studies have been sufficiently large to directly compare subtype B with other single subtypes15,16,17,18,19,20,21,22,23,24. The latest national HIV epidemiology study in China was conducted in 2006 and was published in 20124. Fourteen years have passed, and the China’s epidemic has changed. In this study, HIV disease progression was compared between patients infected with subtype B, CRF01_AE, and CRF07_BC in the largest HIV molecular epidemiology study ever conducted in China.

Results

Study population

HIV pol sequences generated from 13,230 patient specimens submitted by the Beijing HIV laboratory network (BHLN) for HIV transmitted drug resistance (TDR) genotyping between 2001 and 2020 were analyzed. A total of 7433 pol sequences sampled in China were retrieved, of which the province of origin, transmission risk group and sampling year were available from the Los Alamos HIV sequence database (LANL). In all, 20,663 aligned HIV pol sequences were used in this analysis, each representing a distinct HIV-positive individual (Fig. 1). These data were collected between 1994 and 2020 from 31 provinces of China. Most participants were men (94.2%) of Han ethnicity (93.1%). The median age was 32 years (interquartile range [IQR] 26–42). Where available, the overall median baseline CD4 count was 338 cells/µL (IQR 208–475). The transmission risk group was predominantly MSM (66.6%), followed by heterosexual (23.3%) and IDU (7.7%) (Table 1).

Figure 1
figure 1

Study profile.

Table 1 Baseline characteristic by sampling phase.

HIV subtype distribution

A total of 38 HIV subtypes and CRF were identified in the study. CRF01_AE, CRF07_BC, subtype B, URF, CRF55_01B, CRF08_BC, and subtype C (seven common subtypes) were the seven predominant HIV subtypes circulating in China, accounting for 44.3%, 28.3%, 13.9%, 5.9%, 2.2%, 1.7%, and 1.3% of all infections, respectively (Table 2).

Table 2 Subtype assignment by selected characteristics.

Additional clades including subtypes A1, D, F1, G, H, CRF02_AG, CRF03_AB, CRF06_cpx, CRF15_01B, CRF18_cpx, CRF24_BG, CRF33_01B, CRF55_01B, CRF57_BC, CRF58_01B, CRF59_01B, CRF61_BC, CRF62_BC, CRF63_02A1, CRF64_BC, CRF65_cpx, CRF67_01B, CRF68_01B, CRF78_cpx, CRF79_0107, CRF82_cpx, CRF83_cpx, CRF85_BC, CRF86_BC, CRF87_cpx, CRF88_BC, and CRF96_cpx (minor subtypes) were present in less than 1.0% of individuals. Of the subtypes, the combined prevalence of CRF01_AE, CRF07_BC, and subtype B (three major subtypes) remained steady at just over 85%, while the foreign subtypes (subtypes that originated and circulated mainly in foreign countries, including subtypes A1, D, F1, G, H, CRF 02_AG, CRF03_AB, CRF06_cpx, CRF15_01B, CRF18_cpx, CRF24_BG, CRF33_01B, CRF58_01B, CRF63_02A1, CRF_82cpx, and CRF_83cpx) constituted only 3% of all the subtypes.

The prevalence of subtypes varied significantly according to sex, age, ethnicity, transmission risk group, and region. Table 2 shows the subtype diversity within the demographic subgroups. Women had a greater prevalence of infection with subtype B and a lower prevalence of infection with CRF01_AE and URF. Older individuals (≥ 65 years) tended to have lower prevalence of infection with CRF01_AE. Genotypic diversity was the greatest among heterosexuals, in which 35 HIV genotype categories were detected, the most prevalent being CRF01_AE. MSM included 21 subtypes, of which CRF01_AE was the most prevalent. IDU most likely had CRF07_BC. Individuals of Uyghur and Yi ethnicity predominantly, though not exclusively, were infected with the CRF07_BC virus. Figure 2. illustrates the regional distributions of the seven common subtypes HIV strains. CRF07_BC was more prevalent in the southwest and northwest regions. In the other four regions, the subtype with the highest prevalence was CRF01_AE. Of note, a significantly high prevalence of URF was detected in the southwest region (9.6%).The prevalence of comorbidites for individuals with HIV subtype B, CRF01_AE, and CRF07_BC were 14.0% (11.7–16.5%), 5.5% (2.6–6.4%), and 2.7% (1.9–3.5%), respectively.

Figure 2
figure 2

Geographical distribution of HIV subtype. Seven common subtypes, CRF01_AE, CRF07_BC, subtype B, URF, CRF55_01B, CRF08_BC, and subtype C. Samples were from 31 provinces of China. North, Beijing, Tianjin, Hebei, Shanxi, and Inner Mongolia; Northeast, Liaoning, Jilin, and Heilongjiang; East, Shanghai, Jiangsu, Zhejiang, Anhui, Fujian, Jiangxi, and Shandong; Central-south, Henan, Hubei, Hunan, Guangdong, Guangxi, and Hainan; Southwest, Chongqing, Sichuan, Guizhou, Yunnan, and Tibet; Northwest, Shann'xi, Gansu, Qinghai, Ningxia, and Sinkiang.

HIV subtype temporal trends

Table 1 presents the temporal trends for the seven common subtypes. The prevalence of CRF01_AE increased from 19.7% to 49.2% between the phase 1994–2005 and 2012–2014 and remained high. A similar trend was observed for CRF07_BC. Interestingly, the prevalence of subtype B decreased from 47.7% in phase 1994–2005 to 6.6% in 2018–2020. Time trends were also examined by sex, age, ethnicity, transmission risk group, and region (Supplementary Table S1–5).

Phylogenetic analysis

Phylogenetic analysis revealed that the sequences from both sources were intermixed, suggesting that both sampling frames were drawn from the same overall population (Supplementary Figs. S1-3). Three, seven, and four distinct clusters were identified within subtype B, CRF01_AE, and CRF07_BC, respectively, which included 14,578 individuals (81.6% of all patients infected with the three major subtypes). The clusters have been named based on a previous numbering system25 and with the addition of new clusters in the current study. The cluster size ranged from 175–2964 individuals. Most clusters were MSM dominated (10 of 14). Supplementary Table S6 presents the detailed characteristics of these clusters.

CRF07_BC progressed slower than subtype B

Untreated HIV infections are characterized by a progressive decline in the number of CD4 cells, resulting in CD4 cell decline being recognized as one of the major markers of the rate of HIV disease progression. Most previous studies used the time from infection to the diagnosis of AIDS or the rate of CD4 loss per year to evaluate the rate of disease progression15,16,17,18,19,20,21,22,23,24. Both methods rely on an accurate determination of the date of HIV acquisition. As the date of HIV acquisition was not available for the majority of our study participants, we could not determine the rates of HIV disease progression with precision. However, the unavailability of the date of HIV acquisition did not prevent the direct comparison of disease progression between the three major subtypes on a population level. Though the time between HIV infection and diagnosis inevitably varies substantially among individuals, we believe that the interval was well-matched between the three major subtypes. In other words, they have identical median times. We aimed to compare the natural rate of CD4 decline between the three major subtypes within this interval. The interval between infection and diagnose is also part of nature disease history. Our analysis was based on the hypothesis that if the three major subtypes progress equally, there should be no difference in their median CD4 counts at the time of diagnosis on a population level. We defined origin time as the estimated date of HIV acquisition and the end time as the date of HIV diagnosis. We defined disease progression as the decline in CD4 count from the time of HIV acquisition to the time of HIV diagnosis. Therefore, our analysis was limited to the individuals with available CD4 counts. CD4 counts of the seven common subtypes were compared overall and by sampling phase. The CD4 count of CRF07_BC was always significantly higher than those of subtype B and CRF01_AE (P < 0.01) (Fig. 3). Therefore, we empirically hypothesized that disease progression in these subtypes could be different. Table 3 reports the association between patient characteristics and laboratory acquired immunodeficiency syndrome ([AIDS], defined as a CD4 count < 200 cells/µL) at diagnosis (AIDSAD).

Figure 3
figure 3

The CD4 count of the seven common subtypes.

Table 3 Risk factors for slower progression to AIDSAD in Chinese patients.

In univariable logistic analyses, the risk of AIDSAD was significantly associated with sex, age, transmission risk group, and HIV subtype. After adjustment for these factors in the multivariable analysis, patients infected with CRF07_BC had only less than half of the risk of AIDSAD than those infected with subtype B (as odds ratios [OR] 0.46, 95% CI 0.39–0.53). Patients aged 45 years or older had a higher risk of AIDSAD than did younger patients (< 18 years, [OR for individuals aged 45–64 years vs. < 18 years: 3.36, 95% CI 1.47–8.87; OR for ≥ 65 vs. < 18 years:4.32, 95% CI 1.81–11.75]). The risk of AIDSAD was lower in MSM than it was for in heterosexual patients (OR 0.67, 95% CI 0.6–0.75). The Yi ethnicity was associated with a lower risk of AIDSAD (OR 0.43, 95% CI 0.17–0.9); however, the sample size was very small. Three sensitivity analyses, excluding heterosexuals, MSM, and IDU, were performed, and the outcomes were consistent with those obtained from the whole population (Supplementary Table S7–9). We also analyzed the decline in CD4 count between the time of HIV infection and diagnosis using multivariable linear regression (MLR). The results were consistent with those obtained by logistic analysis (Supplementary Tables S10–12).

Discussion

To our knowledge, this is the largest study to date reporting on the national distribution and trends of HIV subtypes in China, with a sample size of over 20,000, and spanning 1994–20204. These data showed that the HIV epidemic in China exhibited some of the greatest global genetic diversity, consisting of 38 HIV subtypes. The only other country to match China is the United States, which has approximately 15 subtypes6,7. This high and sharp increase in HIV subtype diversity in China is consistent with evidence from most regions of the world3,4,5,6,7,8,9.Although there were variations in the prevalence of the three major subtypes, the combined prevalence of these subtypes was stable throughout the study period, suggesting they might be an indicator of equally stable HIV transmission in China. The data revealed that the previously described subtype compartmentalization4 no longer existed in the transmission risk group or was of diminished impact in the geographic region, but persisted in people of the Uyghur and Yi ethnicity, in which it was as strong as it ever. Global travel and acquisition of infections abroad, population floating, and domestic transmission all likely contribute to increasing HIV viral diversity4,5,6,7,8,9.

The comparison of disease progression between subtype B and other subtypes has been hindered by the fact that there are few populations with multiple circulating subtypes, including subtype B15,16,17,18,19,20,21,22,23,24.The epidemic in China characterized by CRF01_AE, CRF07_BC, and subtype B co-circulating provides a unique opportunity for such a direct comparison. The data revealed that CRF07_BC progresses slower than subtype B, while CRF01_AE progresses as fast as subtype B. Consistent with the results of concerted action of seroconversion to AIDS and death in Europe (CASCADE)16,17, disease progression did not differ significantly by sex. The middle (45–64 years) and the older (≥ 65 years) age groups had the faster disease progression than the young (< 18 years). However, a lower disease progression was observed in MSM compare to that of heterosexuals. We hypothesize that this difference could be attributed to a shorter interval between seroconversion and diagnosis in MSM compared to that seen in heterosexuals, because most targeted HIV testing campaigns in China have always focused on the MSM population2.

The CRF07_BC strain is a relatively young HIV strain, that originated in IDU in China and is mainly confined to China26. During the past two decades, the number of individuals infected with CRF07_BC has undergone a significant increase in China, accounting for 38% of all infections in phase 2018–2020. Although it descends from the two most prevalent strains in the world (subtypes B and C), CRF07_BC displays many unique characteristics that differ from those of its parent strains. Li, et al. have also observed that individuals infected with CRF07_BC have a significantly higher baseline CD4 counts than those infected with CRF01_AE14. However, they did not realize that the higher CD4 counts could be regarded as a proxy of slower disease progression, nor did they generalize from their finding the conclusion that CRF07_BC progresses slower than CRF01_AE. We have previously shown that CRF07_BC has a lower TDR prevalence than subtype B and CRF01_AE (1.5% vs. 4.8% vs. 5.6%) respectively27,28. Ge et al.29 and Cao et al.30 have demonstrated that CRF07_BC is associated with better immune recovery in Chinese patients undergoing antiretroviral treatment (ART) compared to that of patients infected with CRF01_AE. Taken together, these results support the hypothesis that CRF07_BC is less pathogenic than subtype B.

Before 2014, people in China tended to accept the viewpoint that the Chinese people infected with HIV will have approximately ten years AIDS-free time before they enter the AIDS phase, as reported by the CASCADE study16,17. In 2014, Li, et al. showed that infection with CRF01_AE is associated with faster disease progression in Chinese patients infected through the sexual transmission risk group compared to that of patients infected with non-CRF01_AE (most were CRF07_BC and subtype B)10. The time interval between seroconversion and AIDS was only 4.8 years for CRF01_AE. The non-difference in disease progression for CRF01_AE and subtype B in these findings suggested that the time from seroconversion and AIDS for subtype B was far shorter than that previously believed. Two explanations are suggested. First, since ethnicity has been proven to be a major determinant of disease progression18, HIV subtype B may progress faster in Han individuals in China than in Western individuals. Second, as Wertheim, et al. have suggested, HIV subtype B is experiencing natural selection to become more virulent31.

The current study has significant implications for clinical practice and policy-making. First, since approximately 60% of patients (subtype B plus CRF01_AE accounted for 58.2%) with new infections in China will progress to AIDS within 4.8 years, these findings justify early treatment. Second, results of this study necessitate subtype-specific monitoring and treatment guidelines. Patients with CRF07_BC may have a better prognostic treatment outcome. Third, in evaluating AIDS disease burden, the prevalence of CRF07_BC should be taken into account.

This study has several limitations. First, although it is the largest study of this kind, this study represents only approximately two percent of all individuals living with HIV in China. Thus, these findings might not be fully representative. Second, viral loads (VL) information was not included in the study, which did not permit the evaluation of the association between VL and subtype. However, this is a goal of a future study. Third, the biological mechanisms underlying these observations were not elucidated. On this point, Huang et al.12 have shown that patients infected with CRF07_BC have significantly lower VL than those of patients infected with subtype B, which may be due to the deletion of seven amino acids that overlap with the apoptosis-linked gene 2-interacting protein (Alix) protein-binding domain of the p6gag. Fourth, the infection time for most of the participants was unavailable, so the rate of CD4 count decline per year could not be assessed. Indeed, as China implemented the World Health Organization (WHO)’s ‘treat-all’, ‘treat-early’, and ‘treatment as prevention’ policy in 201632,33, approximately 90% of individuals with HIV were treated with ART within the first year after diagnosis, making an evaluation of the natural disease progression was not only impractical, but unethical. This study provides a novel method to directly compare the rate of natural disease progression between subtypes, that is, the duration between the infection and the diagnosis as follow-up time, and to treat the follow-up time as a matching variable in multivariable logistic analysis. Fifth, the MSM population was most likely over-represented in the study sample. However, the original data, from which stratified and weighted results may be easily calculated, has been provided.

In summary, these results highlight a China HIV epidemic characterized by a high prevalence of CRF01_AE, CRF07_BC, and subtype B infections, with an overall increasing subtype diversity over the past 26 years, providing a unique opportunity to directly compare disease progression among the three subtypes. Disease progression was slower with CRF07_BC infection than with that of subtype B infection. Moreover, for the first time, it was shown that infections with CRF01_AE progressed as fast as those with subtype B. Future studies focusing on the effect of subtype on the outcome of ART, which include more confounding variables, such as VL, will help improve clinical practice and policymaking.

Methods

Study population and design

The study population consisted of two separate populations of HIV-infected individuals. The first group comprised all patients with the HIV TDR genotype, performed between 2001 and 2020 at the BHLN. BHLN is a national collaboration engaged in surveillance of HIV TDR in China27,28. These methods have been previously described. Briefly, approximately 40% of the samples from all individuals newly diagnosed with HIV infection by BHLN between 2001 and 2020 were randomly selected27,28. The BHLN takes part in maintaining the national HIV epidemiology database, which tracks everyone who receives a diagnosis of HIV infection in China and records the baseline CD4 count of all individuals with newly diagnosed HIV infection. The baseline CD4 count was the value from their CD4 count closest to the date on which their HIV infection was confirmed by western blot within one year. Baseline demographic data on sex, age, ethnicity, Hukou province, and the transmission risk group were retrieved from this database.

The second group included publicly available sequences from the LANL34. All the pol sequences sampled in China with known sampling provinces, sampling years, and transmission risk groups available in the database were downloaded (data available as of December 1, 2019).

Phylogenetic analysis

Sequences were aligned using the BioEdit tool and the alignment was manually corrected according to the encoded reading frame. Duplicate sequences were discarded. If several sequences from the same patient were available in the database, only the oldest was retained. Long branch sequences were re-confirmed for their genotype, and those that were miscatalogued were eliminated from the study. A maximum likelihood phylogenetic tree was reconstructed with the merged dataset using the GTR + CAT nucleotide substitution model in FastTree 2.135.The HIV subtype was inferred by automated subtyping using context-based modeling for expeditious typing (COMET)36, followed by phylogenetic analysis. Each sequence was assigned to one of eight subtypes, one of 102 circulating recombinant forms (CRF), or “unassigned.” An “unassigned” sequence was deemed a possible unique recombinant forms (URF)6.

Cohort of natural disease progression

The BHLN may also be used as a cohort to study natural disease progression of HIV in China. The starting point of the study was set as the onset of the infection and the outcome was AIDSAD. The follow-up time was the duration between the starting point and the outcome. As the seroconversion time for most of the participants was unavailable, the follow-up time was unmeasurable. To solve this problem, the follow-up time was treated as a matching variable in cohort analysis, as we hypothesized that the distribution of the follow-up time was well matched within the same transmission risk group and roughly matched the study population as a whole. Three sensitivity analysis were performed by excluding heterosexuals, MSM, and IDU for the comparison of subtypes.

Statistical analysis

For geographic location, participants were grouped into 31 provinces according to the Hukou. Hukou is a basic household registration system in China; this system officially identifies a person as a resident of an area and includes identifying information such as name, parents, spouse, and date of birth. These provinces were further divided into six regions according to their proximity and socio-economic status, in line with guidelines from the National Bureau of Statistics of China: north, northeast, east, central-south, southwest, and northwest. Six sampling phases were established:1994–2005, 2006–2008, 2009–2011, 2012–2014, 2015–2017, and 2018–2020. The earliest (1994–2005) phase encompassed more years to account for the relatively fewer data available in these years. The prevalence of subtype by sex, age, ethnicity, transmission risk group, Hukou province, and region was calculated and the subtype distribution trends over the six sampling phases were examined. Categorical data were compared using the chi-squared test and continuous data were compared using one-way analysis of variance, wherever appropriate.

Potential risk factors for acquiring AIDSAD were analyzed using logistic regression. Biologically plausible interactions were assessed in the multivariable model. Variables included sex, age (< 18, 18–24, 25–44, 45–64, and ≥ 65 years), ethnicity, region, subtype, transmission risk group, and sampling phase. In the model, a binary response was included, indicating the acquisition AIDSAD from each patient as an outcome. All variables were analyzed separately and the associated variables (P < 0.1) with their outcomes were entered into the multivariable model. The logistic results are expressed as OR with 95% confidence intervals (CI) and two-sided P values, where P < 0.05 was considered significant.

The decline in CD4 count between the time of HIV infection and diagnosis were analyzed using MLR. In the regression, the dependent variable was the difference in CD4 count between HIV infection and diagnosis, and the independent variables were all the variables selected in the logistic regression. The MLR results are presented as coefficients and P value. Since pre-infection CD4 counts were not measured, the reference median CD4 count in Chinese healthy adults was used37. All analyses were performed using R software (version 4.1.1; R Foundation, Vienna, Austria) and a listwise deletion was used to handle the missing data.

Ethical issues

All analyses were performed on de-identified datasets to protect participants’ anonymity. The research ethics committee at the Beijing Center for Disease Prevention and Control approved this study, and all the methods in this study were performed in accordance with the approved guidelines. By law, consent was not required as these data were collected and analyzed in the course of routine public health surveillance.