Use of propensity score matching to create counterfactual group to assess potential HIV prevention interventions

The design of HIV prevention trials in the context of effective HIV preventive methods is a challenge. Alternate designs, including using non-randomised ‘observational control arms’ have been proposed. We used HIV simulated vaccine efficacy trials (SiVETs) to show pitfalls that may arise from using such observational controls and suggest how to conduct the analysis in the face of the pitfalls. Two SiVETs were nested within previously established observational cohorts of fisherfolk (FF) and female sex workers (FSW) in Uganda. SiVET participants received a licensed Hepatitis B vaccine in a schedule (0, 1 and 6 months) similar to that for a possible HIV vaccine efficacy trial. All participants received HIV counselling and testing every quarter for one year to assess HIV incidence rate ratio (IRR) between SiVET and non-SiVET (observational data). Propensity scores, conditional on baseline characteristics were calculated for SiVET participation and matched between SiVET and non-SiVET in the period before and during the SiVET study. We compared IRR before and after propensity score matching (PSM). In total, 3989 participants were enrolled into observational cohorts prior to SiVET, (1575 FF prior to Jul 2012 and 2414 FSW prior to Aug 2014). SiVET enrolled 572 participants (Jul 2012 to Apr 2014 in FF and Aug 2014 to Apr 2017 in FSW), with 953 non-SiVET participants observed in the SiVET concurrent period and 2928 from the pre-SiVET period (before Jul 2012 in FF or before Apr 2014 in FSW). Imbalances in baseline characteristics were observed between SiVET and non-SiVET participants in both periods before PSM. Similarly, HIV incidence was lower in SiVET than non-SiVET; SiVET-concurrent period, IRR = 0.59, 95% CI 0.31–0.68, p = 0.033 and pre-SiVET period, IRR = 0.77, 95% CI 0.43–1.29, p = 0.161. After PSM, participants baseline characteristics were comparable and there were minimal differences in HIV incidence between SiVET and non-SiVET participants. The process of screening for eligibility for efficacy trial selects participants with baseline characteristics different from the source population, confounding any observed differences in HIV incidence. Propensity score matching can be a useful tool to adjust the imbalance in the measured participants’ baseline characteristics creating a counterfactual group to estimate the effect of interventions on HIV incidence.


Scientific Reports
| (2021) 11:7017 | https://doi.org/10.1038/s41598-021-86539-x www.nature.com/scientificreports/ participants who join trials and those that do not. This could be due to selection bias, the quality of care, or higher study completion rates [8][9][10] . Furthermore, the HIV prevention field is quickly evolving with many healthcare innovations making data from earlier trials less relevant. Comparing HIV incidence from clinical trials to that from observational data, or data from earlier trials could lead to an overestimate of efficacy. Other investigators have proposed using the Averted Infections Ratio (AIR) concept (i.e., the rate difference between hypothetical placebo and experimental arms divided by the rate difference between hypothetical placebo and active control arms) 11 . The challenge with this approach is in the estimation of the HIV incidence in the hypothetical placebo arm. With the AIR approach, investigators might propose use of incidence from, e.g., a run-in period in a registration cohort, epidemiological surveillance systems or sexually transmitted infection incidence from another trial as a surrogate for hypothetical placebo arm HIV incidence 11 . These sources of incidence data introduce significant uncertainty and are likely to provide a biased estimate due to population and study differences. Propensity score (PS), a statistical technique that attempts to estimate the effect of treatment 12,13 could be a useful strategy to reduce the uncertainty in estimating hypothetical placebo HIV incidence. Propensity score provides the probability of treatment assignment conditional on measured baseline characteristics. This allows us to design and analyze an observational study mimicking some of the attributes of a randomized controlled trial. Propensity Score Matching (PSM) will give a distribution of measured baseline covariates that is similar between treated and untreated subjects. By doing this, we could create a non-randomised, but comparable counterfactual group, which provides a less biased HIV incidence for a hypothetical placebo arm. A similar approach has been used previously to balance baseline characteristics between trials and observational data or other studies to estimate treatment effects [14][15][16][17] .
The simulated vaccine efficacy trial (SiVET) concept has been suggested to provide trial context data, through a ''simulated" trial using a commercially licensed vaccine 18,19 . This concept additionally helps to inform the design and sample size estimation for clinical trials 20 . Between July 2012 and August 2017 two SiVETs were nested within observational cohorts of female sex workers and fisherfolk sub-populations in Uganda to; (1) provide an HIV vaccine efficacy trial platform, (2) estimate HIV incidence, that would be used to plan a future HIV vaccine efficacy trial in these distinct key populations.
In this paper we use data from three observational cohorts (two in the fishing communities and one among female sex workers, all in Uganda) and their respective, nested HIV simulated vaccine efficacy trials (SiVETs) to; (a) create a counterfactual group (i.e., a non-randomised comparison arm) from observational data that is comparable to SiVET and (b) estimate and compare HIV incidence between observational cohorts and SiVETs before and after propensity score matching in two distinct key populations.

Methods
Design. SiVETs nested within longitudinal observational cohorts of fisherfolk (FF) and female sex worker (FSW) in Uganda.
Setting. The fisherfolk observational cohorts (OBC) recruited from fishing communities on the shoreline of Lake Victoria in Entebbe and Masaka, about 40 km South and 100 km West of Kampala, Uganda's capital respectively and the SiVET in this population was nested in the observational cohort in Masaka. The main economic activity is fishing but other occupations such as fish processing, small-scale businesses, entertainment etc. support the fishing activity. This population is characterised by very high HIV prevalence, 20-30% 21 and annual incidence, 3-11% 22 , with more than 50% reporting frequent high risk sexual behaviour 23 .
The FSW population's observational cohort was located within Kampala city; on Mengo hill near the Kampala city center. Women in sex work operate from HIV hotspots defined as nightclubs, entertainment facilities, restaurants/hotel, lodges and bars conducive for meeting male clients. Similarly, the prevalence and annual incidence of HIV are reported to be very high 37%(24) and 3% 8 respectively and > 90% of these women report frequent high risk sexual behaviour 23 .
Description of observational cohorts. Data from three observational cohorts; two-FF and one-FSW conducted respectively from February 2009 to April 2015 and from April 2008-April 2017 were used in this analysis, Fig. 1. In the first fisherfolk cohort (February 2009 to December 2011), study staff provided HIV counselling and testing (HCT) to potential participants and those found to be HIV negative, aged 18-49 years were enrolled into an observational cohort at a clinic established in each of five participating fishing communities. Repeat HCT was performed every 6-months for 18 months. The primary aims of this observational cohort was to determine the feasibility of enrolling and following fisherfolk in an observational cohort and to determine HIV incidence. The second fisherfolk cohort, (January 2012 to April 2015) was similar to the first fisherfolk cohort with the following exceptions: (1) participants had to travel from the landing sites to the research clinic in Masaka Town, a distance of approximately 40 km, to attend study visits; (2) repeat HCT was conducted quarterly; (3) extra aim of maintaining a pool of participants for future HIV prevention trials.
The FSW cohort initially recruited women from one administrative (Makindye) division of Kampala city until 2014 when the protocol was amended to include all the city's five divisions. Trained study fieldworkers visited HIV hotspots, provided study information to prospective participants, and invited them to the study clinic for screening and possible enrolment. At the clinic, women received HCT and those found to be HIV negative were enrolled. The aims of this cohort and participant follow up schedules were similar to those of the second fisherfolk cohort above. Details of the FF & FSW cohorts have been published previously 8 SiVET participants received a licensed Hepatitis B vaccine injection at 0, 1 and 6 months mimicking a possible schedule for an actual HIV vaccine efficacy trial. They also underwent HCT every quarter for one year. SiVETs details have also been previously published [25][26][27] .
Data stratification. We divided the observational cohort data into two periods (a) the Pre-SiVET observational cohort (non-SiVET data) made up of enrollment and follow up data before rollout of the SiVET protocol in both the FF and FSW communities and (b) observational cohort also codenamed non-SiVET data for the purpose of the comparisons in this paper, collected in the SiVET concurrent period. This is comprising of all the data collected in the 12 months of observational cohort in the SiVET period, mutually exclusive, Fig. 1.

Key evaluations.
i. We compared baseline characteristics of the participants in the SiVET to those in observational cohort (non-SiVET cohort); (a) in the pre-SiVET period, (b) in the SiVET period, all before propensity score matching. ii. Repeated evaluation (i) above after propensity score matching. iii. We compared HIV incidence in the SiVET to that in the non-SiVET; (c) in the pre-SiVET period, (d) in the SiVET period, all before propensity score matching. iv. Repeated evaluation (iii) above after propensity score matching.
Role of SiVET data. In these analyses, the SiVET data were used to mimic a placebo arm of an actual HIV vaccine efficacy trial since the hepatitis B vaccine used in the SiVETs had no effect on HIV susceptibility, and to facilitate the creation of a similar counterfactual trial arm from non-SiVET observational data.  www.nature.com/scientificreports/ Variable categorizations. We categorized source population as fisherfolk (FF) or female sex worker (FSW), religion as Christian (including Catholic, Anglicans, Pentecostals, seventh day Adventists) or Muslim, and marital status as single never married (if one had never lived with a partner in any sexual relationship) or currently/previously married (including married polygamous, monogamous, widowed or separated). In all the cohorts and SiVETs, alcohol use was defined as "Yes" if a participant reported using alcohol in the three months preceding the interview or "No" if a participant reported not using alcohol in the same period.
Calculating the propensity score. Logit models, in which SiVET assignment status was regressed on measured baseline characteristics were fitted to determine the propensity scores (probability of selection into SiVET conditional on measured baseline characteristics) stratified by period (pre-SiVET or SiVET concurrent). We matched on the following variables; source population, sex, age group, ethnicity, education level, marital status, duration of stay in the community, number of sexual partners in the last three months and alcohol use.
Propensity score matching. We performed 1:1 propensity score matching without replacement within a caliper width of 0.2 between SiVET and non-SiVET in the pre-SiVET, and in the SiVET concurrent periods to ensure a balance in baseline characteristics. Matching using a caliper width of 0.2 of the pooled standard deviation of the logit of the propensity score is considered to afford superior performance in the estimation of treatment effects 28 . We considered less than 20% difference in covariates after matching as indicative of good matching 29 . Participants in SiVET for whom there was no match in the non-SiVET were excluded from the propensity score matched analysis. We used Chi-square tests to compare the baseline characteristics of participants in SiVET to those in observational cohorts before and after propensity score matching stratified by the period. We estimated the standardized differences before and after propensity score matching comparing covariate values for participants in SiVET to those in non-SiVET in either period and illustrated these graphically. We compared HIV incidence between SiVET and non-SiVET in each period before and after propensity score matching.

Study methods confirmation.
We confirm that all methods in this manuscript were performed in accordance with the relevant guidelines and regulations.

Results
Screening and enrolment. Pre-SiVET. We screened 5902 volunteers and enrolled 3989 (67.6%) participants into the three observational cohorts before any screening was done for the SiVETs. The primary reasons for observational cohort screen failure included; HIV positive (n = 739), low risk for HIV infection (n = 681) and not engaged in sex work (FSW observational cohort only, n = 430), Fig. 2. A total of 3622 (90.8%) of those enrolled returned for at least one follow-up visit contributing data to the estimation of HIV incidence pre-SiVET.
SiVET concurrent. Of the participants that returned for at least one follow-up visit in the observational cohort pre-SiVET, 1525 (42.1%) were eligible for screening for SiVET when the SiVET protocol began. The primary reason for ineligibility included being in the observational cohort pre-SiVET for > 18 months, Fig. 2. In total 672 participants were consecutively screened until 572 (85.1%) were enrolled into SiVET, a screening enrolment ratio of 5:4. The primary reasons for screen failure included: previous exposure to Hepatitis B virus (n = 52) and unwillingness to use reliable contraception (n = 9). Therefore, 953 participants remained in follow up in the non-SiVET cohort in the SiVET concurrent period, Fig. 2.  Table 2). The standardized difference in the covariates between SiVET and non-SiVET ranged between 0.8% in the religion covariate and 51.1% in the sex covariate, Table 2.

Participants' baseline characteristics
After propensity score matching. After propensity score matching, comparing SiVET to non-SiVET cohorts, all the covariates matched on in the two cohorts were comparable (all p-values > 0.05), Table 2 and the standardized difference in the covariates between SiVET and non-SiVET became minimal i.e. ranging between 0.0% in the education level covariate and 10.4% in the source population covariate.
Standardized bias across covariates. Figure 3, shows the standardized bias across covariates resulting from the selection differences between SiVET and non-SiVET cohorts stratified by period (SiVET concurrent and pre-SiVET). From this figure, for all covariates, and periods, it can be deduced that the standardized percent   www.nature.com/scientificreports/ bias across all covariates varied across a wide range in the unmatched data (shown by a "•" symbol) while it is closer to zero in the matched data (shown by "x" symbol). , p = 0.033. Before propensity score matching, the results suggest that participation in SiVET showed a decrease in HIV incidence of approximately 23% and 40% from that observed in the non-SiVET in the pre-SiVET and SiVET concurrent periods respectively. After propensity score matching, point estimates for HIV incidence were closer together in the SiVET and non-SiVET observational cohort in either period, Table 3.

Discussion
In this analysis, we used propensity score matching to create a counterfactual group from observational data with baseline covariates comparable to those of participants in a SiVET. The observational cohort data were stratified into two periods; pre-SiVET and SiVET concurrent. We found an imbalance in baseline characteristics between www.nature.com/scientificreports/ non-SiVET and SiVET cohorts in the pre-SiVET and SiVET concurrent periods. In both periods, SiVET participants were mainly men (FF), ≥ 25 years, long-term residents, more educated and reported fewer sexual partners in the last three months (SiVET period). These characteristics have been associated with low HIV incidence in these [30][31][32][33][34] and other [35][36][37][38] key populations. Consequently, the HIV incidence was lower in the SiVET cohorts compared to non-SiVETs in both periods, more so in the SiVET concurrent period. Studies [14][15][16][17] have shown that propensity score analysis can create a balance in participants' characteristics between the treated and untreated groups, providing a unique opportunity to compare unbiased outcomes between these groups. Using propensity score matching, we created a non-SiVET counterfactual arm with participants' baseline characteristics comparable to SiVET in both periods. Although the HIV incidence was still lower in the SiVET cohorts, the difference in HIV incidence between non-SiVET and SiVET cohorts narrowed, respectively from 23 to 15% in the pre-SiVET and from 41 to 11% in the SiVET concurrent. Studies 10,39 have previously indicated that trial volunteers are more likely to positively respond to HIV risk reduction measures such as condom use, reduction in the number of sexual partners and starting new sexual relationships among others. In addition, trials provide packages including treatment for sexually transmitted infections and active tracing of participants to keep them in follow up. These interventions have been shown to be associated with diminished HIV incidence even in absence of an efficacious investigational product or absence of an imbalance in the participant baseline characteristics between the treated and untreated arms 39,40 . As previously reported 23 , www.nature.com/scientificreports/ SiVET participants received more HIV risk reduction measures than their non-SiVET counterparts and consequently achieved higher reduction in HIV risk behavior, reported 23 . These interventions or chance could be responsible for the 10% to 15% observed reductions in HIV incidence in SiVET vs non-SiVET in both periods after removing the imbalance in participants' baseline characteristics. The results of this analysis suggest that propensity score matching can help create a counterfactual trial arm from observational data especially in the concurrent period where participants in the trial have similar follow up conditions (aligned to the same duration of follow up) as those in the source population. Furthermore, in a PrEP and ART demonstration project, investigators found HIV incidence that was lower than that in the counterfactual group derived from prior prospective studies in similar key populations 41 . This further confirms that counterfactual groups if well-constructed can be used to assess efficacy and/or effectiveness in clinical trials and routine setting. Taking results of this analysis and previous studies, counterfactual group HIV incidence can be a useful tool for assessing efficacy in trials where new HIV prevention products are tested against active comparators like in HPTN 083 42 and HPTN 084 43 or trials providing interventions to all participants like in HPTN 082 44 . In the future HIV prevention trials where combination prevention is a key requirement in the conduct of clinical trials, use of propensity score matching will come in handy when creating a counterfactual arm to estimate treatment effects using data from observational cohorts gathered in the concurrent or previous period (in absence observational cohort data in the concurrent period).
The strengths of our analysis include; a large sample size of observational data in the pre-SiVET and SiVET concurrent periods to provide propensity score matches to SiVETs participants, and two distinct source population cohorts located in different geographical places. Having a SiVET concurrent period provided us a unique opportunity to compare trial-targeted outcomes aligned to adjust to the same duration of time. Additionally, the same study staff in the different source populations implemented non-SiVET and SiVET protocols, avoiding observer variation. These studies also provided us with a rare trial environment opportunity similar to a trial placebo arm in an era of widespread use of active trial control arms.
Our studies are not without limitations; although SiVETs had somewhat different protocols from the non-SiVET cohorts, the same study staff implemented them. This could have introduced some unmeasured bias arising from differentials in the completion of study procedures. However, at the time of the conduct of SiVETs and non-SiVETs, the primary objective was not to compare outcomes between the two and therefore differences in the completion of the studies procedures could have been minimal. By design, SiVET participants received  www.nature.com/scientificreports/ more HIV risk reduction counseling because of frequent visits to the clinic. This could have caused differentials in the participant response to HIV risk reduction measures. However, in an actual HIV vaccine efficacy trial, it is expected that participants will have more frequent clinic visits for safety assessments and HIV risk reduction counselling than in the routine observational data. Furthermore, we informed the participants that the Hepatitis B vaccine provided would prevent hepatitis B infection and not HIV acquisition; this could have encouraged more of those with good health seeking behavior to keep coming to the study clinic for follow up.

Conclusion
In these key populations, the process of screening for eligibility for HIV vaccine efficacy trial selects participants with baseline characteristics different from those excluded or not screened. This could result in an HIV incidence different from source population even in absence of an effective investigational product. Propensity score matching can be a useful tool to minimise the imbalance in the participants' baseline characteristics between participants joining the trial and those not, making the two groups comparable. In light of HIV prevention trials having active control, investigators could consider using propensity score matching to have a counterfactual (nonrandomised) arm but comparable trial arm in the source population to compare HIV incidence and estimate treatment effects. However, this will require concurrent measurement of HIV infection in the source population to remove the impact of development and/or time on HIV incidence. Where such data is unavailable, pretrial registration cohorts or other existing cohorts in the same or similar populations could provide some insights into the source population HIV incidence.

Data availability
The MRC/UVRI and LSHTM Uganda Research Unit encourages data sharing and has a published (https:// www. mrcug anda. org/ publi catio ns/ data-shari ng-policy) data sharing policy. This policy summarizes the conditions under which data collected by the Unit can be made available to other bona fide researchers, the way in which such researchers can apply to have access to the data and how data will be made available if an application for data sharing is approved. Should any of the other researchers need to have access to the data from which this manuscript was generated, the processes to access the data are well laid out in the policy. The corresponding and other co-author emails have been provided and could be contacted anytime for any clarifications and/or support to access the data.