Identification of adolescent girls and young women for targeted HIV prevention: a new risk scoring tool in KwaZulu Natal, South Africa

The ongoing spread of human immunodeficiency virus (HIV) has driven novel interventions, such as antiretrovirals, for pre-exposure prophylaxis. Interventions have overlooked a high-risk Sub-Saharan African population: adolescent girls and young women (AGYW), particularly those under 18. We apply the Balkus risk tool among rural South African AGYW (n = 971) in a hyper-endemic setting, identify limitations, and assess deficiencies with modern statistical techniques. We apply the “Ayton” tool, the first risk tool applicable to sub-Saharan African AGYW, and compare performance of Balkus and Ayton tools under varying conditions. The Ayton tool more effectively predicted HIV acquisition. In low and high-risk AGYW, the Ayton tool out-performed the Balkus tool, which did not distinguish between risk classes. The Ayton tool better captured HIV acquisition risk and risk heterogeneities due to its AGYW-focused design. Findings support use of the Ayton tool for AGYW and underscore the need for diverse prognostic tools considering epidemic severity, age, sex and transmission. Clinical Trial Number ClinicalTrials.gov (NCT01187979) and the South African National Clinical Trials Registry (SANCTR) (DOH-27-0812-3345).

Prognostic tools aid the identification of high-risk candidates for preventative interventions by calculating risk of health events for a given patient [1][2][3] . While risk calculators have traditionally targeted chronic disease, advancements in statistical modeling and ongoing outbreaks have warranted the development of risk prediction tools for infectious diseases 2 . Human immunodeficiency virus (HIV) and acquired immunodeficiency syndrome (AIDS) risk calculators prioritize populations at the highest risk of infection, namely sub-Saharan African women and men who have sex with men (MSM) [4][5][6] , as well as injecting drug users 6,7 . Identifying individuals at high-risk of HIV could enable health care providers to efficiently prioritize provision of pre-exposure prophylaxis (PrEP), a medical regimen that prevents HIV infection [8][9][10][11][12][13] .
In high-prevalence hyper-endemic settings, such as eastern and southern Africa, adolescent girls and young women (AGYW), aged 14-25 years, are particularly vulnerable to HIV acquisition due to power imbalance in sexual relationships [14][15][16][17][18][19] , particularly those in which HIV is acquired from men over 25 years old 15,[18][19][20][21] , and biological factors (i.e., increased genital inflammation) 18 . Despite their elevated risk of infection, AGYW, particularly those under 18 years, underutilize health services for fear of stigmatization 18,19 . Due to the disproportionate burden of HIV/AIDS among African AGYW, efficient identification of females with high HIV risk and an understanding of AGYW risk heterogeneities is critical for effective targeting of prevention interventions 7,15,[20][21][22][23][24][25][26] . Specific HIV risk tools have been developed for adult African women 4,5 and one tool has been developed for South African AGYW 27 . However, the widely used tools developed for women exclude AGYW under 18 years, leaving the population vulnerable and underserved.
In this manuscript, we focus on the Balkus HIV risk prediction tool 4 and the Ayton AGYW risk prediction tool 27 . While preliminary validation of the Balkus tool has indicated high predictive sensitivity among African women over 18 years of age 4 , it is unknown whether it will perform as well in AGYW in the same setting. AGYW may acquire HIV prior to adulthood, and AGYW-specific risk behaviors influence adult HIV risk 7 . The exclusion Scientific RepoRtS | (2020) 10:13017 | https://doi.org/10.1038/s41598-020-69842-x www.nature.com/scientificreports/ of AGYW from validation of risk prediction tools, such as the Balkus tool, overlooks their role in HIV transmission and ignores their particular transmission vulnerabilities. First, we applied the Balkus tool to a sample of South African AGYW and evaluated the tool's ability to predict AGYW HIV serostatus at 1 year using raw scores and imputed scores via advanced statistical simulation techniques. Second, the Ayton AGYW tool classified risk in the sample and its ability to predict HIV serostatus at 1 year was also evaluated. Risk predictive power was compared between the Balkus tool and the Ayton tool, and we assessed the Balkus tool's ability to distinguish between Ayton risk classes.

Results
The CAPRISA 007 (CAP007) study captured 1069 AGYW participants who were followed up at 1 year (Table 1). Among these participants, 20 were HIV seropositive and 111 were herpes simplex virus (HSV-2) seropositive at baseline (2011); by 2012, there were 18 additional AGYW who became HIV seropositive. Most HIV seronegative AGYW participants did not report use of contraception, drugs, or alcohol in the past year and did not report past-year pregnancy. AGYW participants in our sample demonstrated high knowledge about HIV, had been previously tested for HIV, and only missed school due to illness; these participants further self-identified as being at lower risk of infection and were under 18 years old. Raw Balkus scores were computed using age, HSV-2, alcohol use, and financial support variables. The CAP007 study did not capture three risk predictors among AGYW, which are not applicable to the population and therefore were not measurable: X 2 (marital and cohabitation status), X 5 (primary sex partners with other partners), and X 6 (curable STI). These variables were excluded in calculation of raw Balkus scores, and were simulated to enable full Balkus score calculation using generic and reality-based simulations of variable prevalence. 971 AGYW had complete observations across all items evaluated in the raw Balkus tool.
Results from the development and validation of the Balkus tool indicated higher sensitivity and NPV and lower specificity and PPV in the VOICE cohort (sensitivity: 0.98, specificity: 0.15, PPV: 0.06, NPV: 0.99) and in the HPTN 035 cohort (sensitivity: 0.84, specificity: 0.46, PPV: 0.05, NPV: 0.99), where results were presented for a cutoff value of 3, compared with the raw Balkus score evaluation for the same cutoff value of 3. Evaluation at the higher cutoff value of 5 reported for the VOICE cohort (sensitivity: 0.91, specificity: 0.38, PPV: 0.08, NPV: 0.99) and for the HPTN 035 cohort (sensitivity: 0.58, specificity: 0.71, PPV: 0.07, NPV: 0.98), indicated higher sensitivity and NPV, and lower specificity and PPV, than those observed in the raw Balkus score evaluation for the same cutoff value of 5.
Balkus tool: evaluations of generic and reality-based simulated scores. Evaluation of the Balkus tool, using 100,000 generic scenario simulations (prevalence 0.0-1.0) and 100,000 reality-based scenario simulations (prevalence 0.0-0.4) of excluded variables, resulted in a wider distribution of scores, ranging from 0 to 11 (Supplementary Table S1, Fig. 1). In generic simulations, most HIV seroconversions had a score between 4 and 7, while highest HIV prevalence was observed in those with a score of 10 (0.08, 95%CI: 0.00-0.72). In realitybased simulations, most HIV seroconversions were observed with scores between 2 and 5, while highest HIV prevalence was observed in those with a score of 10 (0.07, 95%CI: 0.00-0.99).
Compared with the reported results from the VOICE cohort with a cutoff value of 3, generic scenario simulations and reality-based simulations yielded lower sensitivity, specificity, and NPV. The same was true of the reality-based simulations compared with the HPTN 035 cohort with a cutoff value of 3, while generic simulations had higher sensitivity as well as PPV. Generic simulations resulted in a higher mean sensitivity compared with the HPTN 035 cohort and lower mean sensitivity than reported for both the VOICE cohort, with a cutoff of 5; mean specificity in generic simulations was lower than that observed in both VOICE and HPTN 035 cohorts. Reality-based simulations resulted in a higher mean specificity and lower mean sensitivity than that reported in both HPTN 035 and VOICE cohorts, with a cutoff value of 5. Both generic and reality-based scenario simulations resulted in higher mean PPVs and lower NPVs compared with both VOICE and HPTN 035 cohorts, with the score cutoff value of 5.

Comparisons of Balkus tool scores (raw and simulated) versus Ayton tool scores. Raw Balkus
scores were computed for each Ayton AGYW tool risk class and tested against risk classification, revealing that most AGYW, across all risk classes, scored values of 2 or 3 ( Fig. 2). Computed sensitivity and specificity for the Balkus tool scores did not vary dramatically across Ayton risk classes (Table 3, Fig. 2) and thus it failed to distinguish between those 3 classes regardless of the cutoff point used.    www.nature.com/scientificreports/ As the Balkus score cutoff value increased, in all three AGYW risk classes, sensitivity and PPV estimates increased, while specificity and NPV estimates that decreased (Table 3, Fig. 2). Based on sensitivity, specificity, PPV, and NPV, the value of 3 was the best cutoff for the raw Balkus scores in all three Ayton risk classes, suggesting that the Balkus tool score does not distinguish well between classes of AGYW risk identified in the Ayton tool.
The comparison of the Balkus and Ayton tools, across all Balkus evaluations and Ayton classes, was evaluated by calculating the distance to the point of optimal performance, the distance from 100% sensitivity and 100% specificity in predicting HIV acquisition (Fig. 3). This comparison revealed the best performance in predicting 1-year HIV acquisition in the Ayton low risk class (sensitivity: 0.60, 95%CI: 0.32-0.84; specificity: 0.58, 95%CI: 0.55-0.61), and slightly better performance in the Ayton high risk class (sensitivity: 0.33, 95%CI: 0.07-0.70; specificity: 0.84, 95%CI: 0.81-0.87) compared with the all Balkus tool evaluations (Supplementary Table S2 and  Table 2, Fig. 3). Overall, as can be seen in Fig. 3, the Ayton tool performance in terms of sensitivity and specificity is the closest to the optional performance point (1,1).

Discussion
AGYW are an underserved and high-risk population in the HIV epidemic in eastern and southern Africa [12][13][14][15][16][17][18] . Vulnerability to HIV acquisition is driven, in part, by power imbalance in relationships with older male partners 19,21,22 and may be prevented by the administration of PrEP [23][24][25][26] . In ongoing efforts to combat the transmission of HIV among AGYW, there is a distinct need for risk assessment tools that capture heterogeneities in AGYW HIV acquisition risk, in addition to the efficient identification of those at high risk of infection 7,15,20,22,25,26 . Risk assessment tools have successfully predicted HIV acquisition among adult African women 4,5 and the development of such prognostic tools for the AGYW population 27 may prove effective in breaking the HIV transmission cycle in South Africa, by directing the administration of PrEP to prevent HIV infection among high-risk AGYW.
This is the first study to evaluate the use of the Balkus tool in South African AGYW and to evaluate the Ayton tool, the first risk predictive tool to be developed specifically for the AGYW population. We applied the Balkus risk assessment tool in South African AGYW, identified its deficiencies therein, and adjusted for these insufficiencies with advanced statistical simulation techniques. We further assessed the Ayton tool's predictive ability in the same cohort, as well as assessed the ability of the Balkus tool to distinguish between Ayton risk classes. Finally, we compared the performance of the Balkus and Ayton tools in AGYW. The Ayton tool consistently outperformed the Balkus tool (in simulated and un-simulated evaluations) in predicting 1-year HIV seroconversions and showed that the Balkus tool was unable to distinguish between AGYW risk classes identified with the Ayton tool.
The Balkus score, when applied to AGYW, misidentified high-risk individuals more often than when the tool was applied among adults, except for generic simulations compared with the HPTN 035 cohort (cutoff of 5). The chances of correct identification of those at high-risk of seroconversion, based on score cutoff designation, was consistently lower for AGYW compared with adults in unstimulated and reality-based simulations, where risk factor prevalence was constrained based on published estimates 20 . All evaluation results indicate that the Balkus tool misidentified low-risk more often among AGYW than among older women. While generic simulations indicated relatively robust tool performance, when we control for the likely prevalence of simulated www.nature.com/scientificreports/ data (reality-based simulations), or assess the raw Balkus scores, we find lower sensitivity and higher specificity among South African AGYW. The Ayton tool showed more robust performance than the Balkus tool on all evaluation measures. In the low risk class of AGYW, the Ayton tool was superior to the Balkus tool based on sensitivity and specificity; the Ayton tool also outperformed all Balkus evaluations in the high-risk class of AGYW. The Balkus scores did not distinguish between almost no and low risk classes, and showed only a weak ability to distinguish the high risk class, suggesting AGYW-specific risk characteristics that were incorporated into the development of the Ayton tool are not captured by the Balkus tool. This indicates the Ayton tool's superior ability to identify AGYW at high risk of infection as well as capture heterogeneities of AGYW risk, which is crucial in the effective implementation of ARV and PrEP interventions.
While the Balkus tool has proven useful in assessing HIV risk in adult women, it requires variables that are typically not applicable to South African AGYW. However, without those variables, the tool is less powerful and established cutoffs limit findings. A high mean sensitivity was observed in generic simulations of the Balkus tool, indicating that in AGYW populations with very high prevalence of marriage-cohabitation, non-monogamy, and STI, the tool may have better performance. However, this effect was significantly diminished in the analysis of raw Balkus scores and reality-based simulations. Overall, these findings provide further support for the use of the Ayton tool as a risk-prediction instrument for South African AGYW, and highlight the shortcomings of the Balkus tool when applied to AGYW.
As our data originate from a randomized controlled trial conducted in schools, we expect some degree of error in our findings due to cluster effects and the small number of seroconversions observed in the data. While randomly simulated variables were used in generic evaluations of the Balkus tool, which could introduce noise, relationships between these variables and other exposures have not been sufficiently studied in AGYW to inform data generation. To address this, raw Balkus scores were evaluated as were reality-based simulations, which used published estimates to guide prevalence estimations in simulations. As discrepancies in tool performance among adult women and AGYW may result from intrinsic age disparities, future work examining HIV risk among AGYW must account for adolescent population characteristics, such as education and family life, which likely vary between populations in prevalence and impact on HIV risk. Further, it is possible that pregnant and HIV infected students are more likely to drop out of school; dynamics influencing school enrollment may have contributed to the low prevalence of HIV in the dataset, which may not fully represent the AGYW population. Notwithstanding these limitations, since most South African AGYW are enrolled in school for at least the first two years of high school, our analysis provides a unique opportunity to understand the AGYW HIV risk trajectory. Future research should focus on validating the Ayton tool to assess its performance in other AGYW populations using predictive analysis, which can assign class memberships in new data. Such advances in prognostic tools enable preventive interventions to determine efficient and optimal treatment administration.
The strength of this research lies in its integration of biological and survey measures obtained from a large prospective cohort in a randomized controlled trial setting and the use of advanced statistical simulation techniques. Prognostic tools provide valuable information for the development of future preventions and the allocation of preventative treatment resources to high-risk populations. The efficacy of such interventions relies on the ability to capture and address heterogeneities in risk. The Ayton tool serves as an improved instrument not only for risk identification, but for understanding risk heterogeneities as well. These results support the use of specialized tools, which consider epidemic severity, age, sex and mode of transmission and can better capture risk heterogeneities. Given heightened vulnerability in southern and eastern Africa and the intergenerational HIV transmission cycle, there is a distinct need for HIV risk assessment tools, such as the Ayton tool, which are designed specifically for AGYW.

Methods
Study design and sample. The sample used in our study includes South African AGYW participants, aged 14 to 25 years, who were enrolled in the CAP007 trial in 2010 and were followed up in 2011 and 2012 (n = 1069) 28 . The randomized controlled trial (open-label, matched pair design) tested the effectiveness of a conditional cash incentives (CCIs) program on HIV incidence among students in rural South African High Schools from 2010 to 2012. Informed consent was obtained from all participants prior to enrollment in the study. Students 18 years and older provided consent following a literacy and comprehension assessment, while students < 18 years, provided assent and consent was obtained from the parent or guardian. If the parent or guardian was unavailable, proxy parental consent was obtained from a member of the School Research Support Group (SRSG). Ethical approval was obtained from the University of KwaZulu-Natal Biomedical ethics committee (BF105/010 and Be523/14). All experiments were performed in accordance with relevant guidelines and regulations. From this cohort, we included AGYW CAP007 participants who were HIV-seronegative in 2011, and followed-up in 2012 (n = 1049). Baseline characteristics were obtained from the 2011 data, and 1-year serostatus was identified based on 2012 data. Further details about the trial have been published elsewhere 27-29 . The Balkus tool. Balkus et al. 4 developed an HIV risk-scoring tool from clinical predictive factors among African women aged 18-45 years. Researchers derived the following risk scoring formula: Risk Score = (2)X 1 + (2)X 2 + (1)X 3 + (1)X 4 + (2)X 5 + (1)X 6 + (2)X 7 , where X 1 is younger than 25 years, X 2 is not married or living with primary partner, X 3 is alcohol use in the past 3 months, X 4 is no partner provision of financial support, X 5 is primary sex partners have other sex partners (and unknown sex partners), X 6 is presence of curable STI, and X 7 is HSV-2 seropositivity. HIV risk scores ranged from 0 to 11 www.nature.com/scientificreports/ Raw Balkus scores were computed using age, HSV-2, alcohol use, and financial support variables, derived from data collected in 2011. Alcohol use was operationalized as use in the last year, due to reduced prevalence among AGYW compared to adults. In place of "partner provision of financial or material support", we approximated financial independence (protective) with sources of spending money among AGYW. This was dichotomized as having or not having spending money.

Simulations.
We applied the Balkus tool to AGYW, identified items that were not applicable to the population and thus not measurable, and used advanced statistical simulations to test the tool's ability to identify AGYW at high risk of developing HIV. To apply the full Balkus tool, responses to unmeasurable items were randomly and independently simulated (simulated values for different variables did not depend on each other) either via generic binary response data of prevalence varying from 0.0 to 1.0 ('generic simulations'), or using reality-based responses with prevalence varying between 0.0 and 0.4 ('reality-based simulation') 30 . Each type of simulation was repeated 100,000 times, allowing prevalence of simulated variables to independently vary in each replication. Since various risk score cutoff values (for binarization of HIV acquisition risk) were employed in the development and validation of the Balkus tool 4 , the obtained risk scores were binarized and evaluated at all possible scores between 1 and 11. The Ayton tool. The Ayton AGYW tool, the first HIV risk scoring tool developed for South African AGYW was built with this cohort of CAP007 participants 27 . Through the use of advanced statistical methods, the tool was able to capture socioeconomic and sexual behavioral dimensions of HIV risk and classify participants into three groups of varying risk.
Using the Ayton tool, developed in Ayton et al. 27 , we classified our sample of AGYW in to 3 risk classes based on data obtained in 2011: those at almost no risk, low risk, and high risk. To assess the Ayton tool's classification of AGYW based on 2011 data, we compared Ayton tool classification results with data on 1-year HIV serostatus (2012) for each risk class.
Statistical analysis. The Balkus tool risk scores based on 2011 data were compared with 1-year HIV serostatus (obtained from 2012 data). The same was done for Ayton tool. Sensitivity and specificity were computed along with their respective 95% confidence intervals for every possible score cutoff (Balkus tool) and within each risk class (Ayton tool) and compared. To compute 95% confidence intervals for the sensitivity and specificity, the exact, conservative Clopper Pearson method was used 31 . Using the estimate of HIV prevalence for South African AGYW derived from a population-based nationally representative survey conducted in 2012, positive predictive values (PPV) and negative predictive values (NPV) were also computed. The asymptotic standard logit intervals 32 were used to compute intervals for the predictive values, and, where appropriate, the adjusted logit intervals 32 were returned instead to compute intervals for the predictive values.
All analyses were performed in R version 3.4.1.
Received: 1 April 2020; Accepted: 17 July 2020 Scientific RepoRtS | (2020) 10:13017 | https://doi.org/10.1038/s41598-020-69842-x www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creat iveco mmons .org/licen ses/by/4.0/.