Prediction of plasma ctDNA fraction and prognostic implications of liquid biopsy in advanced prostate cancer

No consensus strategies exist for prognosticating metastatic castration-resistant prostate cancer (mCRPC). Circulating tumor DNA fraction (ctDNA%) is increasingly reported by commercial and laboratory tests but its utility for risk stratification is unclear. Here, we intersect ctDNA%, treatment outcomes, and clinical characteristics across 738 plasma samples from 491 male mCRPC patients from two randomized multicentre phase II trials and a prospective province-wide blood biobanking program. ctDNA% correlates with serum and radiographic metrics of disease burden and is highest in patients with liver metastases. ctDNA% strongly predicts overall survival, progression-free survival, and treatment response independent of therapeutic context and outperformed established prognostic clinical factors. Recognizing that ctDNA-based biomarker genotyping is limited by low ctDNA% in some patients, we leverage the relationship between clinical prognostic factors and ctDNA% to develop a clinically-interpretable machine-learning tool that predicts whether a patient has sufficient ctDNA% for informative ctDNA genotyping (available online: https://www.ctDNA.org). Our results affirm ctDNA% as an actionable tool for patient risk stratification and provide a practical framework for optimized biomarker testing.

Editorial Note: This manuscript has been previously reviewed at another journal that is not operating a transparent peer review scheme.The manuscript was considered suitable for publication without further review at Nature Communications.

REVIEWER COMMENTS
Reviewer #4 (Remarks to the Author): In this study the authors explore the prognostic role of ctDNA in metastatic prostate cancer by analyzing a large cohort of patient samples and, more interestingly, develop a tool to predict ctDNA fraction based on several clinical parameters.
The work follows and elegant approach, with a uniform and harmonized analysis pipeline, which is well described in the methods section and supplementary Figures.The large cohort analyzed gives strength to the conclusions, however, it is important to recall that most of the samples used here had already been analyzed in previous publications from the same group (Annala et  The strength of the study is that by performing a metanalysis with a large cohort of patients (with previous reported cohorts and new patients), associating ctDNA with multiple clinical features, the authors delivered a powerful prediction tool to calculate ctDNA fraction, which might be used for biomarker studies and patient stratification in the clinics.The manuscript needs to extend its focus beyond the prognostic role of ctDNA, as this aspect lacks a significant conceptual advancement.Instead, it should center its narrative around the remarkable novelty that their AI tool potentially introduces to the clinical application of ctDNA testing.I have several comments/concerns.

Major comments:
-The prognostic association found for ctDNA is an important result due to the large cohort size presented here, however, the majority of the samples had already been used for this purpose in previous studies.I do not see the novelty of this part of the study beyond the metadata analysis and data analysis harmonization between multiple cohorts.In this sense, centering the title and the narrative of the work in the prognostic role of ctDNA in metastatic prostate cancer diminishes the importance of other novel aspects of the work, such as the predictive model developed in here.
-The authors acknowledge in the discussion that the effect of other therapies, such as taxane-based chemotherapy, could not be investigated in detail in this metacohort.However, from the data presented here it seems that patients exposed to Taxanes experienced an increase in both cfDNA concentration and ctDNA fraction (in Figure 1L mCSPC patients with taxane intensification have higher ctDNA; in Figure 1F-H patients in later lines of treatment tend to have more cfDNA and ctDNA); which makes sense since: 1) taxane-based chemotherapy induces apoptosis, which is the major source for cfDNA release and; 2) patients with more advanced disease, in later lines of treatment, are more frequently treated with chemotherapy.Therefore, it will be worth to explore and add to the manuscript some figures (as the ones included in the NMed rebuttal letter) reflecting the effect that ARPI or Taxanes have in both cfDNA and ctDNA concentration and ctDNA fraction.How could this affect the prediction model?As one would expect, cfDNA concentration and ctDNA fraction tends to be higher in 2L and 3L samples (Figure 1F-H), however, the authors show that the prediction model also works efficiently in this context.How does the model account for ctDNA variance due to different therapies or previous lines of treatment if these variables were not incorporated in the model?-The authors demonstrate in several figures the tight association between cfDNA concentration and ctDNA fraction.In fact, they claim that cfDNA concentration seems to be the highest predictor of ctDNA fraction and cfDNA concentration is inter-correlated with other blood biomarkers such as LDH or ALP.However, they do not explore the association between cfDNA concentration and other clinical parameters (metastases, time to progression, etc.) or outcome.What would be the prognostic role of cfDNA concentration?How would it compare to ctDNA?-Previous studies showed that the amount of input blood (ml of plasma) used for cfDNA isolation could affect cfDNA concentration yield (Alborelli et al., 2019, Cell Death Dis.).Could this also affect ctDNA fraction?If so, how could this affect the predictions of the model?

Minor comments:
-In Figure 1B, the number of patients with bone mets does not match the percentages shown in Figure 1D.
-In Figure 4G the y-axis legend is a bit confusing.Less ctDNA fraction is associated with a better PSA response (more patients with more than 50% PSA reduction), however, this is not completely clear with the representation shown in here.

Reviewer #5 (Remarks to the Author):
In this well-written manuscript, the authors provide a detailed analysis of the clinical determinants of circulating tumor DNA fraction (ctDNA%) and its utility for prognostication using a large metacohort of 491 mCRPC patients.Using this data, they then develop a machine learning-based tool that predicts the likelihood that a patient will have a sufficiently high ctDNA% for clinically informative ctDNA genotyping.The study highlights ctDNA% as a validated tool for patient risk stratification, and provides a potentially clinically useful web-based tool for estimating ctDNA% using clinical parameters, prior to actual ctDNA testing.
The authors have done an excellent job responding to the Reviewer comments.In particular, they are to be commended for performing validation studies of their ctDNA% prediction model with two additional external cohorts.The revised text also nicely clarifies the intended clinical application of their ctDNA% prediction machine learning-based tool.Line 279; Fig. 4A and 4D -The labeling of the X-axis is not clear and is not consistent with text in the Figure Legend.Should this be "Time from 1L therapy initiation to death" rather than "mCRPC diagnosis to death"?Lines 426-429; Table 1 -Please add race as a clinical characteristic and provide a % breakdown of the racial distribution of patients within the metacohort and validation cohorts.The authors have already described the predominance of European ancestry patients as a limitation of their study, but it would still be useful to have the actual percentages presented in the Tables.
Reviewer #6 (Remarks to the Author): The manuscript by Fonseca et al. describes the development of a novel tool for decision-making on whether cfDNA-based genotyping is viable based on the probability of ctDNA detection.In addition, they demonstrate the prognostic value of ctDNA levels in patients with advanced prostate cancer.The latter, while not entirely novel, it underscores previous work using a large meta-cohort, in which multiple confounders could be evaluated to demonstrate ctDNA% as an independent predictive biomarker.
Overall, the manuscript is very clear, and the study and analyses are well described.
A major comment relates to Figure 6, which summarises the potential clinical pipeline using the developed tool.However, from the figure, it is not apparent that the ctDNA prediction tool is the 2nd in the decision-making process.The fact that the arrow goes from patients to the tool, implies that it is the 1st step.I just have some minor suggestions and queries: 1.While the changes to 'ctDNA biomarker genotyping' provide some clarity, it is not always appropriate.Such as, in this sentence in page 3: "Excitingly, ctDNA% is increasingly reported on commerciallyavailable ctDNA biomarker genotyping tests 23, meaning that ctDNA%-prognostication is poised to rapidly influence patient management pending its clinical validation." 2. Page 7: 'credential' is not a verb 3. Methods: There are no details on the supplier companies, country, etc, which is expected as per most journal guidelines.

Methods, page 17:
The way blood processing is described, sounds like only Streck derived plasma samples were re-spun and used for buffy coat collection. 5. Methods, page 18, when referring to "ctDNA% = 2/(1/VAF + 1)" and also "ctDNA% = 2 -VAF-1", I believe you are implying an adjustment factor rather than the %ctDNA itself.Please describe clearly.Otherwise, irrespective of VAF, all ctDNA values will fall below 2.

Response to Reviewers
Reviewer #4 (Remarks to the Author): In this study the authors explore the prognostic role of ctDNA in metastatic prostate cancer by analyzing a large cohort of patient samples and, more interestingly, develop a tool to predict ctDNA fraction based on several clinical parameters.
The work follows and elegant approach, with a uniform and harmonized analysis pipeline, which is well described in the methods section and supplementary Figures.The large cohort analyzed gives strength to the conclusions, however, it is important to recall that most of the samples used here had already been analyzed in previous publications from the same group ( The strength of the study is that by performing a metanalysis with a large cohort of patients (with previous reported cohorts and new patients), associating ctDNA with multiple clinical features, the authors delivered a powerful prediction tool to calculate ctDNA fraction, which might be used for biomarker studies and patient stratification in the clinics.The manuscript needs to extend its focus beyond the prognostic role of ctDNA, as this aspect lacks a significant conceptual advancement.Instead, it should center its narrative around the remarkable novelty that their AI tool potentially introduces to the clinical application of ctDNA testing.I have several comments/concerns.

Major comments:
-The prognostic association found for ctDNA is an important result due to the large cohort size presented here, however, the majority of the samples had already been used for this purpose in previous studies.I do not see the novelty of this part of the study beyond the metadata analysis and data analysis harmonization between multiple cohorts.In this sense, centering the title and the narrative of the work in the prognostic role of ctDNA in metastatic prostate cancer diminishes the importance of other novel aspects of the work, such as the predictive model developed in here.

Thank you for your thorough appraisal of our manuscript and helpful suggestions.
We recognize that 59% of patients in our study have been analyzed in prior clinical trial publications that touched in part on the prognostic implications of ctDNA%.However, it is important to emphasize that we have provided updated clinical outcomes for consenting trial patients (e.g.median 20.3 months f/u (range: 0.4-81.6) in our metacohort versus only 12.9 (0-32.1) in Annala et al. 2018), enhancing the statistical maturity of our ctDNA% outcomes analyses relative to these original trial publications.Our large metacohort has also enabled us to perform new analyses previously not possible due to smaller cohort size, such as searching for interaction effects between sets of riskstratification variables, exploring non-linear relationships (e.g. between ctDNA% and risk of death), and looking across different lines of treatment.
To emphasize the novelty of our ctDNA%-prediction tool, we featured this result in our original Abstract and devoted a substantial portion of the Introduction to provide context for this advancement.In our revised submission, we have adjusted our manuscript title to more directly acknowledge the novelty of our ctDNA% prediction tool: • ORIGNAL: "Enhanced prognostication of advanced prostate cancer using ctDNA fraction" • NEW: "Prediction of plasma ctDNA fraction and prognostic implications of liquid biopsy in advanced prostate cancer" -The authors acknowledge in the discussion that the effect of other therapies, such as taxanebased chemotherapy, could not be investigated in detail in this metacohort.However, from the data presented here it seems that patients exposed to Taxanes experienced an increase in both cfDNA concentration and ctDNA fraction (in Figure 1L mCSPC patients with taxane intensification have higher ctDNA; in Figure 1F-H patients in later lines of treatment tend to have more cfDNA and ctDNA); which makes sense since: 1) taxane-based chemotherapy induces apoptosis, which is the major source for cfDNA release and; 2) patients with more advanced disease, in later lines of treatment, are more frequently treated with chemotherapy.Therefore, it will be worth to explore and add to the manuscript some figures (as the ones included in the NMed rebuttal letter) reflecting the effect that ARPI or Taxanes have in both cfDNA and ctDNA concentration and ctDNA fraction.How could this affect the prediction model?
While the Reviewer raises a very intriguing and relevant question about the effect of prior treatment on ctDNA%, we feel that the way taxane chemotherapy is applied in clinical populations unfortunately precludes a robust analysis of potential interaction effects.Below are two examples to illustrate these limitations: 1.In our cohort which accrued over a period of time when treatment intensification was not uniformly administered to patients, mCSPC treatment intensification via docetaxel was typically only offered to patients with highly clinically-aggressive disease-as evaluated by their treating physician-whereas patients with more indolent disease were offered ADT monotherapy.This selection bias makes it challenging to determine whether differences in prior treatment exposure affect tumor-intrinsic or -extrinsic determinants of ctDNA% at subsequent timepoints (and consequently the impact of ctDNA% for prognosticating outcomes on future lines of therapy).Overall, the apparent correlation between prior mCSPC taxane intensification and baseline mCRPC ctDNA% is likely entirely explained by clinical circumstance rather than a genuine biological interaction between treatment and ctDNA%-dynamics.
2. Most patients in our metacohort who were treated with first line taxane in the mCRPC setting were enrolled on the OZM-054 trial (NCT02254785), whose enrollment criteria enriched for poor prognosis features (in contrast with the other two cohorts that collectively comprise our study population).Therefore, differences in ctDNA% (measured at baseline or progression) among mCRPC patients exposed to 1L taxane chemotherapy are largely attributable to the confounder of patient/trial selection.
Importantly, these challenges of patient selection bias generalize to all analyses investigating interactions between ctDNA% and treatment exposure, independent of timepoint and treatment-context.Testing whether specific prior treatment exposure affects subsequent ctDNA% levels ultimately requires analysis of a prospectively enrolled and clinically-standardized patient population that is randomly allocated to different treatments.Our metacohort lacks these necessary conditions.By contrast, treatment exposure within our metacohort is entirely influenced by physician choice, clinical circumstance, and/or trial inclusion testing different SOC (i.e.non-randomized treatment allocation, exemplified in #1 above).In addition, our metacohort includes three distinct patient populations selected using disparate clinical inclusion criteria (i.e.clinically non-standardized, exemplified in #2 above).It is not possible to completely retroactively control for these significant sources of bias.
We have expanded on these limitations in our revised Discussion (new text underlined):

"Fourth, our study contained relatively few patients receiving first-or second-line taxane chemotherapy, with most chemotherapy-treated patients sourced from a single clinical trial enriched for poor prognosis features 1 . Small numbers and risk of selection bias precluded examination of potential interactions between treatment class (e.g. ARPI versus taxane) and ctDNA% as a prognostic biomarker. It is plausible that differences in prior treatment exposure may modulate tumor-intrinsic or -extrinsic determinants of ctDNA% at future timepoints, as well as the effect size of ctDNA% for prognosticating subsequent lines of therapy. Furthermore, ctDNA% may have subtly varying prognostic significance for different classes of subsequent treatment (i.e. is a predictive biomarker). Analysis of large clinically-standardized randomized cohorts will be required to uncover potential interactions between drug class (and/or mechanism of action) and ctDNA%based prognostication. Importantly, the prognostic or predictive implications of ctDNA% remain largely undefined in the context of recent additions to the mCRPC therapeutic armamentarium (e.g. PARP inhibitors and Lutetium-177-PSMA-617 radioligand therapy)."
With respect to our machine-learning tool: for the reasons outlined above, we decided not to include prior treatment exposure as an input feature in our ctDNA% prediction model.As one would expect, cfDNA concentration and ctDNA fraction tends to be higher in 2L and 3L samples (Figure 1F-H), however, the authors show that the prediction model also works efficiently in this context.How does the model account for ctDNA variance due to different therapies or previous lines of treatment if these variables were not incorporated in the model?
As described in our previous response, we omitted prior treatment as a model input feature to mitigate the possibility of XGBoost incorporating incidental and nongeneralizable training cohort characteristics into its predictions.This has the additional important benefit of futureproofing our tool to changes in SOC for metastatic prostate cancer.
In our metacohort, differences in average ctDNA% and cfDNA concentration between mCRPC treatment lines were extremely modest (effect size for ctDNA%: η 2 = 0.006 [anything <0.01 is considered very small]; p=0.03,Kruskal-Wallis one-way analysis of variance).This is compatible with our observation that per-patient ctDNA% remains relatively stable across successive lines of therapy.Nevertheless, we believe that differences in ctDNA% per treatment line are attributable to the fact that later-stage mCRPC tends to be more clinically aggressive and/or have a higher volume of metastatic disease.This can be appreciated in a new Supplementary Figure (shown below) describing the per-line distributions of additional clinical prognostic markers PSA, LDH, ALP, albumin, and hemoglobin, which roughly mirror those of ctDNA% (i.e. higher ctDNA% correlating with an enrichment for poor prognostic factors).

New Results text linked to the Figure above: "cfDNA concentration was similarly correlated with most aforementioned clinical factors, although the effect size was weaker relative to ctDNA% (Supplementary Fig 4; Fig 2c)."
-Previous studies showed that the amount of input blood (ml of plasma) used for cfDNA isolation could affect cfDNA concentration yield (Alborelli et al., 2019, Cell Death Dis.).Could this also affect ctDNA fraction?If so, how could this affect the predictions of the model?
The study the Reviewer is referring to (Alborelli et al., 2019, Cell Death Dis.; Figure 1b-c Minor comments: -In Figure 1B, the number of patients with bone mets does not match the percentages shown in Figure 1D. Thank you for pointing this out.Most metacohort patients were evaluated for presence/absence of bone lesions, including patients providing cfDNA samples in the second-and third-line context (Figure 1D).However, we only reviewed imaging data to enumerate bone lesions for patients at first-line (shown in Figure 1B), explaining the apparent discrepancy between these two Figures.

If helpful to the Reviewer, in our original submission we included a Supplementary Table detailing the extent of missing clinical annotation for all clinical fields per line of treatment, including variables of 'bone metastases (presence/absence)' and 'Number of bone metastases'.
-In Figure 4G the y-axis legend is a bit confusing.Less ctDNA fraction is associated with a better PSA response (more patients with more than 50% PSA reduction), however, this is not completely clear with the representation shown in here.
We have adjusted our Methods to clarify how best PSA response was calculated (new text underlined): "PSA response was defined as ≥50% PSA decline from the baseline pretreatment measurement, calculated using the on-treatment PSA nadir (standard PCWG2 criteria)".
The Figure 4g  Reviewer #5 (Remarks to the Author): In this well-written manuscript, the authors provide a detailed analysis of the clinical determinants of circulating tumor DNA fraction (ctDNA%) and its utility for prognostication using a large metacohort of 491 mCRPC patients.Using this data, they then develop a machine learning-based tool that predicts the likelihood that a patient will have a sufficiently high ctDNA% for clinically informative ctDNA genotyping.The study highlights ctDNA% as a validated tool for patient risk stratification, and provides a potentially clinically useful web-based tool for estimating ctDNA% using clinical parameters, prior to actual ctDNA testing.
The authors have done an excellent job responding to the Reviewer comments.In particular, they are to be commended for performing validation studies of their ctDNA% prediction model with two additional external cohorts.The revised text also nicely clarifies the intended clinical application of their ctDNA% prediction machine learning-based tool.
Thank you for the supportive comments and the helpful feedback.

I have the following additional minor comments:
Line 209; Fig. 3F -Can the authors speculate on the potential nature of the additional patientor tumor-specific determinants of ctDNA% that are not included in their models?ProBio largely focuses on the first-and second-line mCRPC setting, and cohort clinical characteristics will be described in further detail in an upcoming publication.
Line 266 -Please delete the extra period.
We have corrected this typo.
Line 279; Fig. 4A and 4D -The labeling of the X-axis is not clear and is not consistent with text in the Figure Legend.Should this be "Time from 1L therapy initiation to death" rather than "mCRPC diagnosis to death"?
We have now corrected the Figure 4A and 4D  We have now modified our limitation sentence to point out that we did not collect race as a characteristic: "Finally, although we did not collect self-reported race or other measures of patient genetic background, based on the demographics of the jurisdictions contributing to our metacohort and validation cohorts we can assume that patients were primarily of European ancestry." Reviewer #6 (Remarks to the Author): The manuscript by Fonseca et al. describes the development of a novel tool for decision-making on whether cfDNA-based genotyping is viable based on the probability of ctDNA detection.In addition, they demonstrate the prognostic value of ctDNA levels in patients with advanced prostate cancer.The latter, while not entirely novel, it underscores previous work using a large meta-cohort, in which multiple confounders could be evaluated to demonstrate ctDNA% as an independent predictive biomarker.
Overall, the manuscript is very clear, and the study and analyses are well described.
A major comment relates to Figure 6, which summarises the potential clinical pipeline using the developed tool.However, from the figure, it is not apparent that the ctDNA prediction tool is the 2nd in the decision-making process.The fact that the arrow goes from patients to the tool, implies that it is the 1st step.
Thank you for the time you have taken to perform this peer-review task and for the constructive feedback.
Our rationale for placing the ctDNA%-prediction tool first in the workflow was that in the event that routine ctDNA-testing is available, ctDNA.orgcan help users determine whether to pursue simultaneous tissue genotyping in case predicted ctDNA% is low.ctDNA-testing provides important prognostic information (in the form of ctDNA%) regardless of adequacy for genotyping, and therefore should ideally be performed for all patients if feasible.
However we recognize that this intended decision tree was not sufficiently clear in the original Figure 5.We have now adjusted Figure 5: I just have some minor suggestions and queries: 1.While the changes to 'ctDNA biomarker genotyping' provide some clarity, it is not always appropriate.Such as, in this sentence in page 3: "Excitingly, ctDNA% is increasingly reported on commercially-available ctDNA biomarker genotyping tests 23, meaning that ctDNA%prognostication is poised to rapidly influence patient management pending its clinical validation." We've updated the above sentence to "Excitingly, ctDNA% is increasingly reported on commercially-available tests that genotype ctDNA to determine treatment-predictive biomarker status, meaning that ctDNA%-prognostication is poised to rapidly influence patient management pending its clinical validation".We also checked the manuscript and made one further text clarification related to ctDNA genotyping.

Page 7: 'credential' is not a verb
We have modified this sentence to the following: "Our data, together with prior smaller studies, authenticate ctDNA%..." 3. Methods: There are no details on the supplier companies, country, etc, which is expected as per most journal guidelines.
We have added these details to the methods.

Methods, page 17:
The way blood processing is described, sounds like only Streck derived plasma samples were re-spun and used for buffy coat collection.
We have updated these sentences, thank you.
In general, ctDNA fraction-i.e. the proportion of total cfDNA that is tumor-derived-is calculated from the population prevalence of one or more somatic features detected in cfDNA.The formulae on page 18 are not adjustment factors, but are rather the mathematical relationships between ctDNA fraction and the VAF of alterations directly measured in cfDNA (that are exploited to infer ctDNA fraction).These formulae represent standard approaches for calculating ctDNA fraction from targeted sequencing data and have been utilized in established bioinformatic software 7,8 and prior papers 9-13 .
We recognize that one possible source of confusion in these formulae is our use of the abbreviation 'ctDNA%' to refer to ctDNA fraction rather than ctDNA percentage (as the abbreviation erroneously suggests).In other words, ctDNA% in these formulae refer to a quantity between 0 and 1 (rather than 0% and 100%).We adopted this nomenclature for brevity in describing ctDNA fraction throughout the manuscript, and have modified our Methods to clarify (new text underlined):

I
have the following additional minor comments: Line 209; Fig. 3F -Can the authors speculate on the potential nature of the additional patient-or tumorspecific determinants of ctDNA% that are not included in their models?Lines 239-241 -Please provide Tables showing the summary clinical characteristics of the two external prospective mCRPC datasets used for validation.Line 266 -Please delete the extra period.

Since prior treatment exposure is tied to irrelevant factors related to metacohort composition, it is likely that a gradient-boosting algorithm may memorize this information and its biases, limiting the model's generalizability to other patient populations. Fortunately, if prior choice of therapy was influenced by perceived patient prognosis and disease aggression, this information would already be included as model input features and incorporated into the prediction (since our model leverages direct measurements of prognosis e.g. LDH, ALP, ECOG, PSA). Similarly, if prior treatment exposure improves overall prognosis for subsequent lines of treatment, we would also expect this to be captured by the prognostic markers already utilized by our model. The only scenario where it would be necessary to include prior therapy as a model input feature would be if prior treatment decouples the correlation between established prognostic markers and ctDNA%, which is not currently known and can only be discovered from a randomized design. Incidental to the discussion above-for the Reviewer's interest it is worth noting that treatment induced tumor cell apoptosis likely does not influence ctDNA% at the timepoints measured in our study. While it is plausible that effective anticancer therapy may cause a transient spike in ctDNA% in the minutes to hours after treatment initiation
(N.B. this has not been demonstrated), ctDNA% fraction rapidly declines within days2-5 ,

There are numerous additional variables that could hypothetically affect patient ctDNA% that are not accounted for by our XGBoost prediction model. ctDNA% is thought to mostly reflect total tumor burden and innate tumor-cell properties (e.g. proliferative capacity and therefore clinical aggression), but can also be modulated via a variety of tumor-extrinsic physiological factors (N.B. many factors are probably impossible to assess a priori). The features incorporated into our 18-variable model were mainly metrics of tumor burden (e.g. number of bone lesions, LDH, ALP) and prognosis, since these are relatively easily measured. It is possible that inclusion of features that more directly measure tumor cell proliferation may improve model ctDNA%-prediction accuracy. For example, leveraging variables such as tumor metabolic activity (e.g. total lesion glycolysis via [ 18 F]FDG-PET/CT)) or percent of tumor nuclei positive for Ki-67 as surrogates for tumor cell proliferation. Additionally, it may be relevant to explore whether the local tumor microenvironment constrains ctDNA release. For example, ctDNA release from metastases with a high degree of macrophage infiltration may be comparatively limited (i.e. due to immune cell-mediated phagocytosis preventing release of post-apoptotic tumor cell detritus into circulation). Finally, it is likely that certain genomic alterations (as indicators of tumor aggression) may also impact ctDNA%. Although our models already incorporated metrics of total tumor burden and anatomic involvement, we would also posit that more granular evaluation of these variables would also improve ctDNA% model prediction accuracy. New next-generation imaging tools (e.g. [ 68 Ga]PSMA-PET/CT and [ 18 F]FDG-PET/CT in prostate cancer) can provide highly quantitative estimates of total tumor volume, outperforming the conventional imaging analysis utilized in this study. We have added a new line to our Discussion to comment on other possible modulators of ctDNA%: "New studies investigating additional determinants of ctDNA% should utilize next- generation targeted imaging (e.g. [ 68 Ga]PSMA-PET/CT in prostate cancer) for more precise quantification of disease burden and location-as well as investigate the potential relevance of tumor cell proliferation indicators (e.g., Ki-67-positive tumor nuclei or total lesion glycolysis) and microenvironmental factors (e.g., tumor vascularization, macrophage infiltration) on ctDNA%"
Lines 239-241 -Please provide Tables showing the summary clinical characteristics of the two external prospective mCRPC datasets used for validation.

characteristics for the OPT/ILU validation cohort (representing a pooled analysis of patients from the OPTIMUM (NCT02426333) and ILUMINATE (NCT02471469) prospective trials) has been published previously-see Table 1 from Tolmeijer et al., Clinical Cancer Research 2023; PMID: 36996325 2 . To avoid duplication of previously published data, we have amended our Results text to explicitly refer readers to this publication: • Results (new texted underlined): "We validated the performance of our parsimonious 8 feature model in two external prospective mCRPC datasets collectively including 391 patients with first-line mCRPC, achieving similar AUCs for predicting ctDNA≥2% of 0.76-0.78 (Methods; Fig 3g-h, Supplementary Fig 4, Supplementary Table 6). Patient clinical characteristics for one of the two validation cohorts (n=81 patients) has been published previously (Tolmeijer et al., Clinical Cancer Research 2023)." For the second validation cohort (ProBio trial: NCT03903835), select patient demographic details can be gleaned from the trial clinical inclusion/exclusion criteria: available at https://clinicaltrials.gov/study/NCT03903835 and two recent publications dissecting the trial
design (Crippa et al., 2020, PMID: 32586393; De Laere et al., 2022, PMID: 35317973).

Table 1 -
Please add race as a clinical characteristic and provide a % breakdown of the racial distribution of patients within the metacohort and validation cohorts.The authors have already described the predominance of European ancestry patients as a limitation of their study, but it would still be useful to have the actual percentages presented in the Tables.