Reduced efficacy of HIV-1 integrase inhibitors in patients with drug resistance mutations in reverse transcriptase

Little is known about the impact of pretreatment drug resistance (PDR) on the efficacy of second generation integrase inhibitors. We sequenced pretreatment plasma specimens from the ADVANCE trial (NCT03122262). Our primary outcome was 96-week virologic success, defined as a sustained viral load <1000 copies/mL from 12 weeks onwards, <200 copies/mL from 24 weeks onwards, and <50 copies/mL after 48 weeks. Here we report how this outcome was impacted by PDR, defined by the World Health Organization (WHO) mutation list. Of 1053 trial participants, 874 (83%) have successful sequencing, including 289 (33%) randomized to EFV-based therapy and 585 (67%) randomized to DTG-based therapy. Fourteen percent (122/874) have ≥1 WHO-defined mutation, of which 98% (120/122) are NNRTI mutations. Rates of virologic suppression are lower in the total cohort among those with PDR 65% (73/112) compared to those without PDR (85% [605/713], P < 0.001), and for those on EFV-based treatment (60% [12/20] vs 86% [214/248], P = 0.002) and for those on DTG-based treatment (61/92 [66%] vs 84% [391/465] P < 0.001, P for interaction by regimen 0.49). Results are similar in multivariable models adjusted for clinical characteristics and adherence. NNRTI resistance prior to treatment is associated with long-term failure of integrase inhibitor-containing first-line regimens, and portends high rates of first-line failure in sub Saharan Africa.

Thank you for the opportunity to review this manuscript highlighting novel findings on the impact of NNRTI drug resistance mutations on integrase-based inhibitors. Although the exact mechanism is not elucidated, these findings are critical as they indicate that DTG-based ART may not fully address the challenge of rising NNRTI resistance especially in low-and middle-income countries. In as such there is a dire need to adopt the suggestions highlighted by the authors as well as determining the exact mechanism of how NNRTI DRMs impact the efficacy of DTG-based ART.
Major comments 1. While efforts have been taken to address any potential confounders, it is still possible as the authors have indicated that factors such as prior ARV use and unmeasured adherence could still have impacted the outcome of the study. While this is also addressed in the discussion, it would still be good to note that the highlighted effect size of an unmeasured confounder (aOR 2.8) is still possible. The study by Hamers et al observed that prior ARV use was an independent predictor of virological failure even after controlling for both PDR and non-adherence with an effect size of 3. 10. In a further analysis of the same study it was noted that the effect of prior ARV use that could be addressed by eliminating the effect of PDR was only 48% suggesting a residual effect that was mediated possibly through non-adherence or an unknown mechanism. However, like in this study imperfect measures of adherence were used. Perhaps these findings may suggest the need to use more objective measures of adherence and alternative more objective tools to assess prior ARV use.
2. In addition to above, it is indeed likely that these findings may be due to a behavioral mechanism. Although the authors cite the DAWNING study to support their findings, it is worth noting that the efficacy observed in DAWNING is similar to that observed in the participants without NNRTI DR in the ADVANCE trial. Moreover, a sub-analysis in DAWNING showed that participants with multiple NRTI resistance i.e. M184V+ TAMs had even much better outcomes (noting that they were also likely to have NNRTI resistance). Although this does not rule out the benefit of hypersensitization of the viral mutants to the alternative NRTIs, it may still likely indicate that these patients are likely to be more adherent than those without multiple NRTIs as has been previously postulated. In addition, and as the authors have noted, none of the patients with viral non-suppression in NAMSAL had baseline DR although the overall PDR prevalence was low. Overall it may be good for the authors to discuss the findings in light of both DAWNING and NAMSAL but there is still a need for more studies including in-vitro studies to assess for this phenomenon 1. Exposure. Lines 131-134. Sequence read coverage depth is likely to vary markedly across the sequenced amplicons (highest in protease than in RT). Which minimum read depth was chosen and spanning across which codons in each region? This might have been described in the parental trial paper but I think that it is crucial to repeat here. Do results vary by using different coverage depths?
2. Virological outcome. The authors used 3 separate virological endpoints. None of these seems suitable for an analysis looking at virological potency and association with drug resistance. As by the primary outcome someone still with a viral load of say 900 copies/mL by 3 months from starting ART is considered as a success if viral load declines <200 copies/mL by 6 months and <50 copies/mL by one year regardless of treatment switch. Most people achieve a VL<50 by 3 months on modern ART so unclear why these early failures are ignored. The secondary outcome is unusual and unclear what is trying to achieve. People with a single viral load >200 copies/mL if followed by a treatment change should be clearly defined as failures, not censored. By snapshot analysis, treatment discontinuations are all considered as failures regardless of the reason for stopping. These include toxicity and simplifications which are not relevant for a question related to virological potency. The more standard analyses in this setting are i) change in viral load at month 3 modelled as a continuous variable censoring people who stopped/switched drugs because of toxicity or simplification before month 3 or ii) a time to virological failure analysis (either pure virological failure using two consecutive values >200 copies/mL or a single value followed by change in treatment). I'd like to see the results after these are used.
3. Confounding #1. A factor to be a confounder had to be a predictor of the outcome but also somewhat at least weakly causing pre-treated drug resistance. Most of the variables included in the logistic regression model do not satisfy this definition. In fact, the only important confounder is baseline viral load. Although including predictors of outcome should increase efficiency, it might be beneficial to have a more parsimonious model to avoid overfitting as confounding is unlikely to be introduced by the selected factors. Indeed, Supplemental Table 5 shows that some of these factors (i.e. sex age, education etc.) are quite weakly associated with the outcome. In addition, the model is adjusted for current measures of adherence such as pill count-based adherence (calculated as the number of pills taken since the prior visit and self-reported adherence (dichotomized as perfect adherence in the past 4 days). These are clearly mediators, not confounding factors as pre-existing resistant viruses become dominant via poor adherence after baseline. This is particularly true for EFV which is known to have a lower barrier to resistance than DTG so controlling for adherence would have removed an important part of the total effect. Personally I would only control for baseline viral load and NRTI resistance (both Sanger and 2-20% threshold in separate models) and use these models to test for the interaction.
4. Confounding #2. Authors are clearly aware of the possibility of unmeasured confounding and I was impressed by the calculation of the E-value. However, INSTI resistance (both low level and >20% level) is clearly an unmeasured confounder that could explain the high rate of failure in the DTG arm. For example, I have anectodal evidence that G140ACRS in INSTI region even if detected at 2-20% level can increase the risk of failing INSTI-based therapy by 60%. Furthermore, there is confounding by NRTI resistance that has not been accounted for in the analysis. Figure 2 clearly shows that both M184V and K65R were more prevalent in people starting DTG. M184I should be also counted in as it confers the same reduction in susceptibility of FTC as M184V. Again, M184VI at 2-20% can be shown to be associated with 3-fold increase in risk of virological failure of triple combination including 3TC/FTC plus an anchor drug including INSTI, a value which exceeds the calculated E-value making residual confounding a possible explanation for the findings.

5.
Interaction. The analysis was clearly not powered to test this interaction so a p-value<0.05 for the test does not mean that interaction can be ruled out. Looking at some of the analyses shown in Table 2 the risk difference by PDR in EFV are much larger (almost a 3-fold difference) than those in DTG. For example for the secondary outcome (93%-68% = 25%) for EVF vs. (94%-85%=9%) for DTG and similarly for the 48-week snapshot (81%-46%=35%) for EFV vs. (86%-74%=12) for DTG. On the basis of these differences and all the caveats described above I do not think that the data support that the association between PDR and risk of failure does not vary by treatment arm. 4. Line 248. We report a strong and pervasive association between NNRTI resistance…Although the majority of participants harboured NNRTI resistance the exposure was defined as having ≥1 mutations in the WHO list.
5. Lines 256-258. The authors use the paper by Clutter DS et al in support of their hypothesis that NNRTI resistance might impair response to INSTI-based regimens. The argument seems very speculative especially because the paper actually supports the opposite (INSTI-based regimens are an effective option with at least equal efficacy compared with bPIs for patients with isolated NNRTI TDR).
6. Lines 266-268. Current adherence is a mediator, not a confounder. Although there is suspicion that some may not be ART-naïve patients, clearly the temporality goes against believing that these measures could be confounders. 7. Line 276. Typo. DTG based ART has, not 'in has' 8. Line 307. …between 5-20% of viral quasispecies. In the Results section it says 2-20%, which one is correct? The choice would impact on the prevalence of exposure and therefore on the results. Indeed, it is possible that statistical power is lower for the 5-20% threshold vs. 2-20% threshold. 9. Lines 314-315. …because study arm was determined by computer randomisation. I do not understand this. Exchangeability created by randomisation was broken down by the fact that plasma samples were not available in all participants. 10. Table 2. I would remove the p-values within the strata as these are subset analyses so difficult to interpret.
Reviewer #3 (Remarks to the Author): The authors assess the efficacy of HIV-1 integrase inhibitors in patients with drug resistance mutations in reverse transcriptase using samples from the ADVANCE trial. Some comments regarding the statistical analyses: 1. The definition of virologic success is a little confusing. If someone achieved a viral load (VL) of <1000 but >=200 from 12 weeks onward considered a success for the entire time period? I understand that the authors would like to simplify the outcome so there is only one outcome per person, but if this participant then had a VL >=200 at 24 weeks were they still considered a success?
2. With the longitudinal nature of the data one definition of viral suppression could be used and a time to event analysis done to look at time to viral suppression. Did the authors consider this?
3. With the outcome (viral suppression) being very common, log binomial regression should probably be used instead of logistic regression to give risk ratios instead of odds ratios. 4. The authors assessed if there were differences in clinical or demographic characteristics between those who successfully underwent sequencing and those who did not. The same should be done to compare the 48 and 91 individuals who did not remain in the study for 12 or 24 weeks. Table 1 compares characteristics by regimen for those with successful sequencing. Given the primary analysis does not include the 48 individuals who did not remain in the study for 12 weeks, it seems that these individuals should be excluded from this table.

5.
6. Table 3: There are a lot of variables in the multivariable model and some of these are likely to be collinear (e.g., low self reported adherence and pill count adherence). Please assess collinearity of all variables and adjust the analyses accordingly.

REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): Thank you for the opportunity to review this manuscript highlighting novel findings on the impact of NNRTI drug resistance mutations on integrase-based inhibitors. Although the exact mechanism is not elucidated, these findings are critical as they indicate that DTG-based ART may not fully address the challenge of rising NNRTI resistance especially in low-and middle-income countries. In as such there is a dire need to adopt the suggestions highlighted by the authors as well as determining the exact mechanism of how NNRTI DRMs impact the efficacy of DTG-based ART.

Response
We thank the reviewer for their thorough review and agree with the importance of defining the contributions of drug resistance on treatment outcomes in low and middle-income countries.

Major comments
1. While efforts have been taken to address any potential confounders, it is still possible as the authors have indicated that factors such as prior ARV use and unmeasured adherence could still have impacted the outcome of the study. While this is also addressed in the discussion, it would still be good to note that the highlighted effect size of an unmeasured confounder (aOR 2.8) is still possible. The study by Hamers et al observed that prior ARV use was an independent predictor of virological failure even after controlling for both PDR and non-adherence with an effect size of 3.10. In a further analysis of the same study it was noted that the effect of prior ARV use that could be addressed by eliminating the effect of PDR was only 48% suggesting a residual effect that was mediated possibly through non-adherence or an unknown mechanism. However, like in this study imperfect measures of adherence were used. Perhaps these findings may suggest the need to use more objective measures of adherence and alternative more objective tools to assess prior ARV use.

Response
This is an excellent point. Although our models are adjusted for two different measures of adherence (self-report and pharmacy pill count), we agree that these are imperfect measures and the possibility of residual confounding persists. We thank you for directing us to the work detailing the large effect of prior treatment on outcomes and have added these citations and themes to the discussion section:

Our estimates could be susceptible to unmeasured or residual confounding, particularly due to effects of prior ART exposure and/or imperfect adherence not captured by self-report or pharmacy pill counts. Notably, our estimates of the effect of PDR on virologic outcomes remained large and significant after adjustment for confounders, including adherence, meaning an unmeasured confounder would have to have a strong association (OR of 2.8 or greater) with both PDR and virologic success to reduce the effect of pre-treatment drug resistance to null. 9 Yet, prior studies have shown such an effect size for prior ART exposure. 5,10
2. In addition to above, it is indeed likely that these findings may be due to a behavioral mechanism. Although the authors cite the DAWNING study to support their findings, it is worth noting that the efficacy observed in DAWNING is similar to that observed in the participants without NNRTI DR in the ADVANCE trial. Moreover, a sub-analysis in DAWNING showed that participants with multiple NRTI resistance i.e. M184V+ TAMs had even much better outcomes (noting that they were also likely to have NNRTI resistance). Although this does not rule out the benefit of hypersensitization of the viral mutants to the alternative NRTIs, it may still likely indicate that these patients are likely to be more adherent than those without multiple NRTIs as has been previously postulated. In addition, and as the authors have noted, none of the patients with viral non-suppression in NAMSAL had baseline DR although the overall PDR prevalence was low. Overall it may be good for the authors to discuss the findings in light of both DAWNING and NAMSAL but there is still a need for more studies including in-vitro studies to assess for this phenomenon

Response
We agree with the reviewer that, in combination, these studies suggest that presence of drug resistance mutations are distinct phenomena depending on when they occur. We have added text to the discussion section to better elucidate this important distinction:

Our data, in combination with the DAWNING study and others, highlight that the presence of drug resistance mutations might portend very different outcomes depending on the timing of when it occurs. In DAWNING, drug resistance mutations detected at the time of first-line failure were a surrogate measure of past adherence and predict success to second-line therapy. By contrast, the ADVANCE study appears to suggest that presence of drug resistance mutations at the time of presentation for first-line treatment (or re-initiation after a default) might signal the opposite -increased risk of virologic failure, perhaps mediated by poor adherence, or a virologic fitness deficit. Making this distinction both for clinical and public health purposes could be crucial.
what would be the best strategy to use, if DTG resistance is still rare among patients failing treatment, including those who had NNRTI PDR as it was the case in the ADVANCE trial.

Response
We appreciate this suggestion and, although our sample size for on-treatment sequencing is relatively small, we have added an exploratory analysis to compare resistance patterns prior to treatment versus at the time of persistent failure. A total of 38 individuals have both pre-treatment and ontreatment sequencing available in the dataset. Of these, 17 were in the EFV arm and 21 were in a DTG arm. We have added this information to the manuscript in the results section and as supplemental Table 6. Figure 6). We also agree with the importance to derive guidelines with these results in consideration, but are mindful of the limited data available on optimal management of virologic failure on DTG-based ART. We highlight the limitations of the current data and, in response to reviewer suggestion, propose a strategy in the absence of such data:

Finally, these results signal the importance of future work to determine optimal treatment recommendations for individuals failing DTG-based ART, for which minimal data is currently available. Without such data, treatment programs should be advised to maintain virologic monitoring programs, adherence monitoring and support programs for those with failure, and be mindful of the importance of regimen change guidance for individuals with intolerance or persistent virologic failure, even when documented drug resistance is absent. More novel strategies, such as real-time adherence and resistance monitoring, ,and long-acting injectable formulations of ART for those with adherence challenges should also be explored as these options become more widely available. 11-13
Minor comments 1. Typo in line 276 "in has multiple public health implications" Suggest to remove "in"

Response
We have removed the word "in" as suggested.

Reviewer #2 (Remarks to the Author):
This is an interesting analysis suggesting that the NNRTI associated resistance detected at >20% of the virus population is associated to 96-week virological failure of DTG-based first line regimen in the ADVNCE trial. If true, this could have serious implications for treatment management of HIV-infected people in the resource limited settings and elsewhere. Although the paper is fairly well written, the analysis and the arguments in favour of the set hypothesis are fairly unconvincing. I summarise my main points below.
Main Points 1. Exposure. Lines 131-134. Sequence read coverage depth is likely to vary markedly across the sequenced amplicons (highest in protease than in RT). Which minimum read depth was chosen and spanning across which codons in each region? This might have been described in the parental trial paper but I think that it is crucial to repeat here. Do results vary by using different coverage depths?

Response
Thank you for this comment. We included only sequences with ≥100X depth of coverage and spanning PR codons 1 -99 and RT codons 1 -254. We did not observe any variation in results with different coverage depths at ≥100X and there was no preferential amplification of either gene target (i.e. PR and RT). We have updated the manuscript with this information:

We limited our analyses to sequences with ≥100X depth of coverage and spanning PR codons 1 -99 and RT codons 1 -254.
We added distribution of read depth coverage among samples included in final analysis in Supplemental Figure 3.

Supplemental Figure 3. Distribution of read depth coverage among samples included in the analysis
To assess for the possibility that low coverage depth might have affected our results, we conducted an additional sensitivity analysis, excluding sequences with a coverage depth <1000x, and found no difference in the effect of PDR on any of our outcomes (Supplemental Table 8): Supplemental setting are i) change in viral load at month 3 modelled as a continuous variable censoring people who stopped/switched drugs because of toxicity or simplification before month 3 or ii) a time to virological failure analysis (either pure virological failure using two consecutive values >200 copies/mL or a single value followed by change in treatment). I'd like to see the results after these are used.

Response
We thank the reviewer for these comments and suggestions. We agree that there are multiple considerations in defining treatment failure that have import. We agree that anti-viral potency, as the reviewer suggests, is a critical outcome and one that we did not thoroughly address in our initial submission. We also feel strongly that a public health approach will also include focus on a definition of long-term maintenance of suppression. Our intention is to explore each of these perspectives to enable consideration of these varying outcomes and their implications. In response, we have taken the following steps to address the weaknesses identified by the reviewer: a. We have added a new outcome as suggested by the reviewer, defined as the change in log 10 viral load from baseline to 3 months (12 weeks), and censoring those who discontinued prior to week 12 (allowing for a maximum of a 4 week window in those who did not present exactly at 12 weeks). Given the continuous nature of this outcome and its normal distribution, we used studentized t-tests and fit linear regression models to assess this outcome, with the following results: The viral load reduction between baseline and week 12 was reduced for participants the efavirenz arm (-0.72, 95%CI 0.37, 1.08, P<0.001) but not for those in the dolutegravir arm (-0.08, 95%CI -0.27, 0.11, P=0.43) and the effect of PDR on virologic response was significantly different by arm ( b. We have modified our secondary outcome of persistent virologic failure to match the suggested definition of virologic failure as suggested by the reviewer, defined as two consecutive viral loads >200 copies/mL or a change to a second-line regimen among those with a high viral load. Only two participants changed to a second line regimen due to virologic failure (one in each arm) and as such these alterations had no substantial effect on our persistent virologic failure (secondary) outcome.

c. We have added a time to virologic suppression survival analysis.
Similar to the virologic potency analysis, this time to suppression analysis did show an effect of PDR on time to initial suppression for the EFV (P=0.04 by log-rank testing) but not for the DTG arms (P=0.54 for log-rank testing: These results suggest that the presence of PDR does not appear to impact early virologic response to DTG, but does appear to predict risk of longer-term treatment failure among those who initially suppress. We have updated our discussion to discuss the implications of these additional findings: Results: 0.58, 95%CI 0.35, 0. (AHR 1.01, 95%CI 0.80, 1.27).

In contrast to the effect seen with long-term outcomes, we did not find that PDR had an impact on time to initial suppression or change in quantified viral load from enrolment to 12 weeks, suggesting that NNRTI PDR affects longer term maintenance of suppression for DTGbased ART or via a non-virally mediated behavioral mechanism.
3. Confounding #1. A factor to be a confounder had to be a predictor of the outcome but also somewhat at least weakly causing pre-treated drug resistance. Most of the variables included in the logistic regression model do not satisfy this definition. In fact, the only important confounder is baseline viral load. Although including predictors of outcome should increase efficiency, it might be beneficial to have a more parsimonious model to avoid overfitting as confounding is unlikely to be introduced by the selected factors. Indeed, Supplemental Table 5 shows that some of these factors (i.e. sex age, education etc.) are quite weakly associated with the outcome. In addition, the model is adjusted for current measures of adherence such as pill count-based adherence (calculated as the number of pills taken since the prior visit and self-reported adherence (dichotomized as perfect adherence in the past 4 days). These are clearly mediators, not confounding factors as pre-existing resistant viruses become dominant via poor adherence after baseline. This is particularly true for EFV which is known to have a lower barrier to resistance than DTG so controlling for adherence would have removed an important part of the total effect. Personally I would only control for baseline viral load and NRTI resistance (both Sanger and 2-20% threshold in separate models) and use these models to test for the interaction.

Response
We agree with the reviewer that a simpler model might enable a clearer picture of the impact of PDR on outcomes by treatment arm, and be less susceptible to potential over-fitting and/or mediating effects of adherence. As such, we have updated our primary regression models and Table 3

In a third model intended to focus more directly on the virologic factors that determine treatment outcome, we restricted the model to pre-treatment viral load, presence or absence of PDR, and study treatment allocation.
We found similar effect sizes of PDR on treatment outcomes in all three models, suggesting minimal confounding effects:

Response
Unfortunately, our pre-treatment deep sequencing did not include the INSTI gene. We think it unlikely that major INSTI resistance is a confounder of the relationship between PDR and treatment outcomes in this study. As the reviewer knows, there was essentially 0% INSTI use in this population at the time of the study, and numerous studies have investigated for the presence of INSTI resistance in South Africa and generally noted mutations that confer resistance to dolutegravir present in <1% of the population. 14-16 That said, we agree that low level resistance and/or as of yet undetermined viral properties related to NNRTI resistance that affect INSTI efficacy remains a strong possibility:

Finally, our sequencing did not include the integrase region of the pol gene. Although resistance mutations that confer resistance to dolutegravir remain rare in South Africa, we cannot exclude the possibility of low-level resistance as a possible contributor to our findings. 17,18
Furthermore, there is confounding by NRTI resistance that has not been accounted for in the analysis. Figure 2 clearly shows that both M184V and K65R were more prevalent in people starting DTG.

Response
This was our mistake -we included both M184V and I in our characterization of M184 mutations and have updated the figure to reflect this:

5.
Interaction. The analysis was clearly not powered to test this interaction so a p-value<0.05 for the test does not mean that interaction can be ruled out. Looking at some of the analyses shown in Table 2 the risk difference by PDR in EFV are much larger (almost a 3-fold difference) than those in DTG. For example for the secondary outcome (93%-68% = 25%) for EVF vs. (94%-85%=9%) for DTG and similarly for the 48-week snapshot (81%-46%=35%) for EFV vs. (86%-74%=12) for DTG. On the basis of these differences and all the caveats described above I do not think that the data support that the association between PDR and risk of failure does not vary by treatment arm.

Response
We agree with this point and have toned down the language about differences by arm throughout the manuscript.

Response
While we agree with the reviewer that multiple studies have demonstrated relationships between resistance at minority (<20%) frequencies and treatment outcomes, there have also been multiple studies that have shown the opposite. For example, a recent systematic review of 25 studies that have investigated the impact of low-abundance frequency mutations and virologic outcomes found that 56% of studies did not demonstrate a significant relationship between the two. 19 Notably, all studies in that review from South Africa in particular failed to demonstrate relationships between minority frequency resistance variants and virologic suppression. Thus, while there is some evidence in support of this relationship, we do not agree that available data have fundamentally proven this relationship exists. To more specifically explore this relationship in our cohort in response to the reviewer's concerns, we have modified our analysis to focus on minority frequency (2-20%) specifically impacting NNRTI regimens, as that has been the focus of prior studies which have done so: Supplemental A total of 48 and 91 individuals were excluded from the primary and secondary analyses, respectively, for not remaining in the study to 12 or 24 weeks. These two sentences seem to contradict each other?

Response
Thank you for noting this error we have corrected the text and Figure 1 with the correct figures after updating definitions of outcomes and inclusion criteria based on reviewer feedback:

Of participants included in PDR analyses, 289 (33%) were randomized to an EFV-based regimen and 585 (67%) were randomized to a DTG-based regimen. At the time of data extraction, all had completed observation up to 96 weeks. A total of 48 and 82 individuals were excluded from our primary and secondary analyses, respectively, for not remaining in the study to 12 or 24 weeks.
2. Line 207. Typo. ….individual had a nucleoside, not 'an'

Response
Thank you -we have corrected this.
3. Lines 221 and 320. Was the E-value equal to 2.9 or 2.8 ? Which exact analysis was used to calculate this (primary endpoint?)

Response
Thank you for finding that inconsistency. We use our multivariable logistic regression model adjusted for known confounders to derive this (displayed in Table 3). After updating our definitions based on reviewer responses, the odd's ratio was 0.38 (95%CI 0.23, 0.62), corresponding to an updated E-value of 2.7. We have updated the manuscript throughout with this.
4. Line 248. We report a strong and pervasive association between NNRTI resistance…Although the majority of participants harboured NNRTI resistance the exposure was defined as having ≥1 mutations in the WHO list.

Response
As mentioned in the prior response, the vast majority of individuals had only NNRTI resistance (<3% with others), as such we focus on NNRTI resistance, which differentiates this cohort meaningfully from others with multi-class resistance at the time of first-line failure. To more accurately describe this cohort, we have updated that sentence as follows:

We report a strong association between drug resistance before treatment initiation, primarily to the NNRTI class, and virologic failure for people initiating first-line ART in the ADVANCE clinical trial.
5. Lines 256-258. The authors use the paper by Clutter DS et al in support of their hypothesis that NNRTI resistance might impair response to INSTI-based regimens. The argument seems very speculative especially because the paper actually supports the opposite (INSTI-based regimens are an effective option with at least equal efficacy compared with bPIs for patients with isolated NNRTI TDR).

Response:
We agree this is a hypothesis-generating finding only. We have updated the reference with a more appropriate paper in support of this (Hu et al) and have toned down that statement to be more cautionary:

The observed effect we identified may be consistent with preliminary data suggesting higher replication of NNRTI mutant viruses in the context of drug pressure from integrase inhibitors, 28
6. Lines 266-268. Current adherence is a mediator, not a confounder. Although there is suspicion that some may not be ART-naïve patients, clearly the temporality goes against believing that these measures could be confounders.

Response
While we agree with the reviewer that poor adherence could be an effect modifier, because it would differentially affect suppression in those with and without PDR, in this case we suspect that adherence behaviors might also serve as a confounder of the relationship between PDR and treatment failure. If PDR is a surrogate marker for prior treatment default and poor adherence then those with PDR might appear to have worse outcomes due to poor adherence rather than the presence of PDR itself. Importantly, the effect size of PDR on our outcomes did not change meaningfully after addition of adherence measures to our model. To better demonstrate that adherence and PDR appeared to have additive effects (but not confounding or interactive effects), we have added a figure to demonstrate how both PDR and treatment adherence impacted virologic suppression in both arms (Figure 4): Figure 4. 96-week treatment outcomes among participants in the ADVANCE Trial divided by treatment arm, presence or absence of WHO-defined pre-treatment drug resistance, and achievement of greater than versus less than 95% adherence based on pharmacy pill count. Thank you -we have corrected this.
8. Line 307. …between 5-20% of viral quasispecies. In the Results section it says 2-20%, which one is correct? The choice would impact on the prevalence of exposure and therefore on the results. Indeed, it is possible that statistical power is lower for the 5-20% threshold vs. 2-20% threshold.

Response
Thank you for noting this error. We have updated the manuscript and analysis to focus on the 2-20% range of minority variants.
9. Lines 314-315. …because study arm was determined by computer randomisation. I do not understand this. Exchangeability created by randomisation was broken down by the fact that plasma samples were not available in all participants.

Response
The rates of inclusion (or not) in this analysis was balanced evenly across study arms. Availability of plasma samples, which were collected at baseline on the day of randomization, or failure to sequence them should have been similar between randomized study arms. Thus, any imbalance in resistance by treatment arm could only have happened by chance. We feel that this is an important limitation that remains pertinent to the manuscript.
10. Table 2. I would remove the p-values within the strata as these are subset analyses so difficult to interpret.
In comment 3 above, the reviewer requests a simplified model to test for interaction, which we have done for Tables 2, Supplemental Tables 3 and 4. That said, we would be happy to remove these Pvalues from the tables if advised by the reviewer or editor. Notably the addition of the virologic change from baseline to 12 weeks as also suggested by the reviewer does demonstrate an interactive effect by treatment group.

Our primary outcome of interest was 96-week virologic success, which we defined as a sustained a viral load <1000 copies/mL from 12 weeks through 96weeks, <200 copies/mL from 24 weeks through 96 weeks, and <50 copies/mL from 48 weeks through 96 weeks.
2. With the longitudinal nature of the data one definition of viral suppression could be used and a time to event analysis done to look at time to viral suppression. Did the authors consider this?

Response
We appreciate this recommendation and have added a survival analysis to our methods and results. This is now discussed in the methods, results and discussions sections and we have added Supplemental Figure 2 to demonstrate the K-M curves generated from this analysis: 3. With the outcome (viral suppression) being very common, log binomial regression should probably be used instead of logistic regression to give risk ratios instead of odds ratios.

Response
In response to the reviewer comments we explored use of binomial logistic regression, but we had difficulty getting our multivariable models to converge (a common occurrence for this type of model).
That said, we agree that odds ratios can be misinterpreted.  Table 2, and Supplemental Tables  3 and 4. If the reviewer or editors feel strongly about a risk ratio model, we would be happy to explore Poisson regression models with robust standard errors or converting our multivariable logistic regression models to proportional rates using post-regression marginal techniques.
4. The authors assessed if there were differences in clinical or demographic characteristics between those who successfully underwent sequencing and those who did not. The same should be done to compare the 48 and 91 individuals who did not remain in the study for 12 or 24 weeks.

Response:
We have updated Supplemental Table 1 to include three categories of individuals: 1) those in the primary analysis; 2) those in the 96-snapshot analysis but not in the primary analysis; 3) those not in any analysis. Perhaps not surprisingly, we found that those who dropped out of the study before 96weeks were more likely than the other two groups to be unemployed. Otherwise the characteristics were largely similar between groups.  Table 3: There are a lot of variables in the multivariable model and some of these are likely to be collinear (e.g., low self reported adherence and pill count adherence). Please assess collinearity of all variables and adjust the analyses accordingly.

Response:
We thank the reviewer for this recommendation. We have assessed collinearity for our primary regression model and found little evidence of such added these results in Supplemental

REVIEWERS' COMMENTS
Reviewer #1 (Remarks to the Author): I do thank the authors for addressing the comments, I have no additional comments.
Reviewer #2 (Remarks to the Author): The authors did a great job at revising the manuscript and partially lowering the tone regarding the effect of NNRTI resistance for people receiving DTG-based regimens. Indeed, the new analyses, especially, when focussing on virological potency carry little evidence for such an association.
I have only a couple of additional minor points.
First, regarding the review cited as Ref #50. Although the paper is interesting it is not a formal metaanalysis and little effort has been made in matching intervention for the actual regimen used. Thus, for example, almost all studies showing no association between low level NNRTI variants and virological response are new studies involving Rilpivirine and not Efavirenz and we know that the genetic barrier of Rilpivirine is much higher. I think that this should be noted when Discussing the results.
Second, regarding my original last minor point I was only suggesting to remove the p-values of the analysis within the strata and keep only the interaction p-values.

REVIEWERS' COMMENTS SECOND REVIEW
Reviewer #1 (Remarks to the Author): I do thank the authors for addressing the comments, I have no additional comments.

Response
We thank the reviewer for their time and thorough repeated review of this work.
Reviewer #2 (Remarks to the Author): The authors did a great job at revising the manuscript and partially lowering the tone regarding the effect of NNRTI resistance for people receiving DTGbased regimens. Indeed, the new analyses, especially, when focussing on virological potency carry little evidence for such an association. I have only a couple of additional minor points.

Response
We thank the reviewer for their time and thorough repeated review of this work.
First, regarding the review cited as Ref #50. Although the paper is interesting it is not a formal meta-analysis and little effort has been made in matching intervention for the actual regimen used. Thus, for example, almost all studies showing no association between low level NNRTI variants and virological response are new studies involving Rilpivirine and not Efavirenz and we know that the genetic barrier of Rilpivirine is much higher. I think that this should be noted when Discussing the results.

Response
We agree that studies of minority resistance have shown conflicting results, and we have taken pains to ensure that we avoid conclusive statements on one side or the other. To respond to the reviewer, we have modified the phrase to include that the studies of newer regimens are less likely to have this relationship:

However, many studies, and particularly those considering newer ART regimens, have failed to demonstrate a role for low-level mutant viruses in determining clinical outcomes. 53
Second, regarding my original last minor point ( Table 2. I would remove the p-values within the strata as these are subset analyses so difficult to interpret) I was only suggesting to remove the p-values of the analysis within the strata and keep only the interaction p-values.

Response
We agree that stratified analyses can be fraught, particularly with type II error issues. However, in this case the stratified analyses by treatment arm in Table 2 are highly statistically significant -suggesting no power issue, with the notable exception of the virologic outcome for dolutegravir that the reviewer thoughtfully request we add, and shows that the virologic outcome was not affected by NNRTI resistance in that arm. Thus -we feel these do add context to our results and would prefer to maintain these unless the reviewer or editors feel otherwise.