Abstract
Cancer-associated venous thromboembolism (VTE) is a major source of oncologic cost, morbidity and mortality. Identifying high-risk patients for prophylactic anticoagulation is challenging and adds to clinician burden. Circulating tumor DNA (ctDNA) sequencing assays (‘liquid biopsies’) are widely implemented, but their utility for VTE prognostication is unknown. Here we analyzed three plasma sequencing cohorts: a pan-cancer discovery cohort of 4,141 patients with non-small cell lung cancer (NSCLC) or breast, pancreatic and other cancers; a prospective validation cohort consisting of 1,426 patients with the same cancer types; and an international generalizability cohort of 463 patients with advanced NSCLC. ctDNA detection was associated with VTE independent of clinical and radiographic features. A machine learning model trained on liquid biopsy data outperformed previous risk scores (discovery, validation and generalizability c-indices 0.74, 0.73 and 0.67, respectively, versus 0.57, 0.61 and 0.54 for the Khorana score). In real-world data, anticoagulation was associated with lower VTE rates if ctDNA was detected (n = 2,522, adjusted hazard ratio (HR) = 0.50, 95% confidence interval (CI): 0.30–0.81); ctDNA− patients (n = 1,619) did not benefit from anticoagulation (adjusted HR = 0.89, 95% CI: 0.40–2.0). These results provide preliminary evidence that liquid biopsies may improve VTE risk stratification in addition to clinical parameters. Interventional, randomized prospective studies are needed to confirm the clinical utility of liquid biopsies for guiding anticoagulation in patients with cancer.
Similar content being viewed by others
Main
Venous thromboembolism (VTE) is a major source of healthcare cost1, morbidity2 and mortality in patients with cancer3. Prophylactic anticoagulation lowers the risk of VTE4,5,6,7,8, although defining the patients most likely to benefit is challenging. Expert guidelines9,10 recommend offering prophylaxis based on the Khorana score (KS)11, a validated cancer VTE risk stratification measure based on hematologic and clinical parameters. However, of patients with a high-risk KS, 10% or fewer develop VTE6,7; many patients receive unnecessary anticoagulation with KS-guided prophylaxis. Furthermore, many patients who develop VTE do not have a high-risk KS12,13,14. Models adding clinical features15,16 based on large observational datasets, such as a recently published risk assessment model (RAM)15, may increase sensitivity and specificity, although these gains are modest17. Furthermore, such risk scores require provider assessment or electronic health record (EHR) integration; thus, most providers do not assess patients for VTE risk18,19. Genetic20,21 microparticle22 and proteomic23 approaches show promise in risk-stratifying patients for VTE but, to date, are not readily deployed in practice. An accurate, easily integrated VTE risk stratification system would be helpful for identifying which patients would benefit from prophylactic anticoagulation or its de-escalation.
Circulating tumor DNA (ctDNA) sequencing assays (‘liquid biopsies’ (LBs)) are increasingly deployed in clinic, with multiple US Food and Drug Administration (FDA) approvals for matching to molecularly targeted therapy24. In patients already receiving ctDNA sequencing, an LB-based VTE risk score, if prognostically valid, could, thus, be provided without additional overhead to the patient or clinician. Preliminary data suggest that cell-free DNA (cfDNA), which may consist of tumor or wild-type DNA, is thrombogenic, at least in part due to its association with neutrophil extracellular traps (NETs)25,26. ctDNA detection is associated with worse survival likely due to more aggressive tumor physiology, although whether it is associated with VTE is unknown27,28,29,30,31. We performed an observational study in non-overlapping cohorts of patients with cancer undergoing ctDNA sequencing with two main goals: (1) to determine whether ctDNA is associated with VTE and (2) to develop and test whether DNA LB-based machine learning models can predict VTE.
Results
ctDNA and VTE
We studied three cohorts: a discovery cohort (n = 4,141) and a prospective validation cohort (n = 1,426) of patients with ctDNA sequencing at Memorial Sloan Kettering (MSK) with any cancer type and sequencing using a New York State-approved assay (MSK-ACCESS32) and a generalizability cohort (n = 463) of patients with advanced non-small cell lung cancer (NSCLC) at MSK and GenesisCare, a community oncology setting in Sydney, Australia, sequenced using an FDA-approved commercial assay (ctDx Lung; Methods: ‘Cohort selection’ and Extended Data Fig. 1). Both assays are indicated for matching patients to molecularly targeted therapy, either in the treatment-naive setting or after progression of disease on previous treatment27.
Patient characteristics are presented in Table 1. A total of 464 (11%) patients in the discovery cohort, 118 (8%) patients in the validation cohort and 98 (21%) patients in the generalizability cohort developed VTE after plasma draw. Patients with 53 different cancer types were included (Supplementary Table 1). As expected33, NSCLC and pancreatic and hepatobiliary cancers were associated with higher VTE rates, whereas melanoma, breast and colorectal cancers were associated with lower VTE rates (Extended Data Fig. 2).
We performed time-to-VTE analyses with death as a competing risk, excluding patients with VTE before ctDNA sequencing. In the discovery cohort, ctDNA detection was associated with higher rates of VTE (hazard ratio (HR) = 2.49, 95% confidence interval (CI): 1.99–3.11, P < 0.001; Fig. 1a). Patients with higher variant allele fraction (VAF)—that is, proportion plasma DNA attributable to tumor—had higher rates of VTE, suggesting a dose-dependent relationship between ctDNA and VTE risk (Fig. 1b). The association between ctDNA and VTE rates held in subgroup analyses for NSCLC and for melanoma, pancreatic and less represented cancers but not in bladder, hepatobiliary and colorectal cancers (Fig. 1c). In contrast, there was some evidence that the plasma cfDNA yield, which can consist of either tumor or wild-type DNA, was associated with higher VTE rates in all cancer subtypes (Extended Data Fig. 3).
Certain tumor genomic alterations may predispose to VTE34. To assess whether the association with ctDNA alterations and VTE risk was gene specific, we performed subgroup analyses in which we compared VTE risk between patients with specific pathogenic gene-level alterations observed in ctDNA and all other patients without the specific gene-level alteration in question detected in plasma. To account for the fact that cancer type is correlated with both tumor genotypes and VTE risk, cancer type was included as a variable in these analyses. Alterations in nearly all genes had some evidence for association with VTE, although known thrombogenic alterations, such as KRAS, STK11 and KEAP1, had more evidence for association with VTE rates (Fig. 1d). To further test whether the prognostic value of ctDNA may be attributable to tumor genomic content, we performed a sensitivity analysis among patients with matched tumor sequencing with an FDA-authorized targeted sequencing panel (MSK-IMPACT; n = 2,873). In this cohort, pathogenic gene-level alterations as confirmed on tissue sequencing, cancer type, disease stage and ctDNA detection in matched plasma sequencing were included as variables in a multivariate model to predict VTE risk. In this analysis, ctDNA detection was associated with higher VTE risk, whereas most gene-level alterations, including those in KRAS, STK11 and KEAP1, were not associated with VTE risk (Supplementary Table 2). Thus, the association between ctDNA and VTE risk appears largely independent of tumor genomics.
The validation and generalizability cohorts confirmed the association of ctDNA with VTE (Fig. 1e). To control for variability in time of plasma draw since diagnosis, we performed a sensitivity analysis measuring time to VTE from time of diagnosis, left truncating at time of plasma draw. Here, ctDNA detection was also associated with higher rates of VTE (Extended Data Fig. 4a). To assess the chronicity by which ctDNA associates with VTE, we performed a sensitivity analysis limited to patients still at risk 6 months after plasma draw. ctDNA was associated with higher rates of VTE 6 months after plasma draw (Extended Data Fig. 4b). The relationship between ctDNA and VTE was also observed when death was treated as a censoring event rather than a competing risk (Extended Data Fig. 4c).
Some patients (n = 537) had multiple plasma draws. Most patients who were ctDNA+ remained ctDNA+ on subsequent draws (odds ratio (OR) = 7.9, 95% CI: 5.3–11.7); however, a minority of patients did switch from ctDNA+ to ctDNA− and vice versa. Stratification by ctDNA detection in the first two plasma draws (median, 308 d apart) suggests that, in serial samples, later ctDNA measurements had greater association with VTE than earlier measurements (Extended Data Fig. 4d). Thus, ctDNA is a prognostic marker for VTE in multiple sensitivity analyses and dynamically reflects VTE risk.
ctDNA levels may associate with both future and prior risk of VTE. To test the latter association, we compared ctDNA levels among patients with versus without prior VTE in the discovery cohort; in those with prior VTE, ctDNA VAF was higher (Extended Data Fig. 5).
ctDNA VAF was correlated with known VTE-associated factors—that is, KS, cfDNA concentration and cytotoxic chemotherapy receipt (Extended Data Fig. 6). It is also known that ctDNA levels are associated with overall tumor burden35; as expected, ctDNA VAF was also correlated with number of disease organ sites (Extended Data Fig. 6). We assessed the independent contribution of these variables to VTE prediction using a multivariate model. ctDNA, cfDNA concentration, KS, chemotherapy receipt and number of disease sites were all independently associated with higher VTE rates (Fig. 2a). The trends in ctDNA and cfDNA as independent predictors of VTE held when the cohort was stratified by cytotoxic chemotherapy receipt, a known risk factor for VTE36 (Extended Data Fig. 7). To further analyze whether ctDNA’s association with VTE was independent of disease stage, we repeated our multivariate analysis in the subgroup of patients with only stage IV or stage I–III disease at diagnosis. In these analyses, ctDNA, cfDNA and chemotherapy receipt also remained independently associated with higher VTE rates stratified by stage at diagnosis (Extended Data Fig. 7).
It is possible that radiomic features, such as metabolic tumor volume (MTV)37,38,39,40, may better capture disease burden than the number of organs involved with cancer. We tested whether ctDNA remained an independent predictor of VTE in the presence of MTV and the aforementioned variables in a cohort of patients with stage IV, treatment-naive lung adenocarcinoma27. In this analysis, ctDNA remained an independent predictor of VTE (Extended Data Fig. 7). Thus, ctDNA appears to be associated with VTE independent of tumor burden and other features.
To further benchmark the utility of ctDNA VAF as a quantitative biomarker for VTE, we computed the time-dependent area under the receiver operating characteristic (AUROC) curve for VTE within 6 months as well as performance metrics at an optimal threshold (based on Youden’s index41). We compared these results to similar metrics for cfDNA concentration as well as KS and RAM. In this analysis, ctDNA VAF as a single variable had an AUROC of 0.66, greater than that of cfDNA concentration (0.52), KS (0.58) and RAM (0.61). At optimal thresholds, ctDNA VAF had a sensitivity of 0.83, a specificity of 0.44, a positive predictive value (PPV) of 0.60 and a negative predictive value (NPV) of 0.72 (metrics on all tested variables are available in Supplementary Table 3). In summary, ctDNA is a quantitative biomarker for VTE risk with unimodal predictive power greater even than other multifactorial scores, such as KS and RAM.
The independent relationship between cfDNA concentration and ctDNA VAF and VTE and death was also observed in canonical correlation analysis (Supplementary Discussion: ‘Canonical correlation analysis’ and Supplementary Table 4).
LB-based models for VTE
We hypothesized that machine learning models incorporating LB variables (that is, ctDNA VAF, genomic content and cfDNA concentration) would more accurately predict VTE risk than those without such parameters. A random survival forest (RSF; Methods: ‘Machine learning models’) trained on the discovery cohort including LB variables achieved a five-fold cross-validation c-index of 0.73 (95% CI: 0.71–0.75). Addition of cancer type and cytotoxic chemotherapy receipt (‘LB+’ model) achieved a five-fold cross-validation c-index of 0.74 (95% CI: 0.71–0.77); addition of demographic information, sites of disease, time since diagnosis and KS-related variables (‘All’ model; see Supplementary Information for details) achieved a c-index of 0.75 (95% CI: 0.72–0.78; Fig. 2b). By contrast, KS and RAM achieved a c-index of 0.57 (95% CI: 0.55–0.59) and 0.62 (95% CI: 0.58–0.66), respectively. Nonlinear machine learning models trained on KS and RAM components achieved greater performance than those baseline models but inferior performance to LB-based models (Fig. 2b). These trends in model performance held in patients not treated with chemotherapy or those starting a new chemotherapy regimen and across multiple cancer types (Fig. 2b and Supplementary Table 5). Thus, LB-based models outperformed KS and RAM in multiple settings.
Testing the LB+ model on the validation and generalizability cohorts resulted in a c-index of 0.73 and 0.67, respectively. The difference in performance in the generalizability dataset is likely attributable to the cancer type homogeneity; when only patients with NSCLC were included in discovery cohort cross-validation, the c-index for the ‘LB+’ model was 0.70 (95% CI: 0.64–0.74) (Fig. 2b and Supplementary Table 5). The c-index of KS was 0.54 in the generalizability cohort and 0.48 (95% CI: 0.45–0.51) in the NSCLC discovery cohort. Together, these results suggest the superiority of an LB-based model over KS and also highlight the sensitivity of the c-index and AUROCs more generally to cohort homogeneity42.
The time-dependent AUROC, precision and recall for 6-month VTE prediction between the ‘All’ and ‘LB+’ models did not differ (Fig. 2c). In contrast, KS had a lower AUC than the ‘LB+’ model (Fig. 2c). cfDNA concentration was the most important feature in predicting VTE in a model with access to all variables (Extended Data Fig. 8a). Risk scores exhibited a wide range within cancer types (Extended Data Fig. 8b) but effectively stratified VTE risk in all cohorts, with the highest-risk patients having a cumulative VTE incidence of over 25% (Extended Data Fig. 8c).
Differences in DNA extraction methods across assays may result in variable cfDNA yields. ctDx Lung samples had correlated cfDNA concentrations with matched MSK-ACCESS samples in our cohort (Extended Data Fig. 9). Adjusting RSF inputs based on a best-fit linear approximation between MSK-ACCESS and ctDx Lung cfDNA concentrations did not yield better model performance (Supplementary Table 5), suggesting that differences in cfDNA extraction methods did not significantly impact model results.
In summary, LB-based models outperformed KS and other clinical models for predicting VTE. A model including LB parameters and minimal clinical data performed similarly to a model including LB parameters and more extensive clinical variables.
Impact of cardiovascular medications by ctDNA strata
Patients may be prescribed anticoagulants for non-VTE-related reasons. In exploratory analysis using our discovery cohort, we sought to test whether ctDNA presence might help stratify patients most likely to benefit from anticoagulation using non-randomized, real-world evidence. Adjusting for age, cancer type and time since diagnosis, all of which may be associated with use of specific cardiovascular medications43, ctDNA+ patients prescribed anticoagulants had lower rates of VTE than those not prescribed anticoagulants (adjusted HR = 0.50, 95% CI: 0.30–0.81; Fig. 3a). In contrast, ctDNA− patients prescribed anticoagulants had no difference in VTE rates from those not prescribed anticoagulants (adjusted HR = 0.89, 95% CI: 0.40–2.0; Fig. 3b).
Statin use is associated with lower VTE rates in patients with cancer44,45. In the discovery cohort, statin prescription was associated with reduction in VTE rates in patients who were ctDNA+ but not in those who were ctDNA− (Extended Data Fig. 10a,b). In contrast, aspirin prescription, which has equivocal evidence as a VTE-reducing medication46, was not associated with lower VTE rates in either ctDNA+ or ctDNA− groups (Extended Data Fig. 10c,d).
Discussion
The association between cancer and thrombosis was recognized over 150 years ago47. The association between elevated plasma nucleic acid levels and cancer is 75 years old48. More recent studies have linked cfDNA to thrombotic risk25,26,49,50, but no large-scale study has confirmed the clinical validity of ctDNA-based VTE risk stratification.
In the present study, we leveraged the rapid advances of LBs in clinical practice35,51,52 to investigate the relationship among cfDNA, ctDNA and cancer-associated VTE. ctDNA was independently associated with VTE. The association between ctDNA and VTE is largely uncharacterized, although our findings align with those of a small study in patients with prostate cancer with plasma sequenced with a custom panel53. Predictive models leveraging machine learning54,55 based on LBs outperformed KS and other models, including RAM, trained only on clinical, radiographic and laboratory values. High-risk cohorts identified by our cohort had a cumulative VTE risk approaching 30%, three times higher than in patients with a high-risk KS, for whom current guidelines9,10 recommend anticoagulation. In ctDNA+ patients but not in ctDNA− patients, anticoagulation appeared to lower the risk of VTE, supporting prospective, randomized evaluation of ctDNA-guided prophylaxis or de-escalation.
cfDNA is a thrombogenic component of NETs25,26,49,50, and most cfDNA in LBs is attributable to neutrophils56. In our study, cfDNA was associated with higher VTE rates across cancer types, whereas ctDNA was associated with VTE in many cancer types but not in bladder, hepatobiliary or colorectal cancer. Our finding that ctDNA was not associated with VTE in colorectal cancer is, interestingly, corroborated by a preliminary study of 111 patients with locally advanced rectal cancer, in which no association between ctDNA detection and VTE was observed57.
Together, our findings suggest that ctDNA and non-tumor cfDNA may modulate the hypercoagulable state of malignancy by different means. Although cfDNA may contribute to NET-based coagulation, ctDNA shedding may reflect more aggressive tumor genetic and epigenetic states27,28,31,58 as well as the presence of micrometastatic disease59,60, which may also play a role in clotting61. Extracellular nucleosomes accompany ctDNA62 and, along with other intracellular tumor proteins, may contribute to coagulation63. Further studies including a variety of cancer types are required to determine which of these elements might play a role in the genesis of cancer-associated VTE. Why ctDNA may portend VTE in some cancer types and not in others is complex and deserves further investigation into both tumor biology and host immune response to said biology in shedding versus non-shedding tumors28.
Fewer than 5% of oncologists implement VTE risk assessment tools18. Although uptake can be improved with EHR integration18, such methods do not scale across hospital systems. Similarly, although machine learning models may increase prediction accuracy, particularly when nonlinear relationships exist between variables, such models can be more difficult to implement and interpret. An LB-based model, such as that presented here, may shift the onus of risk stratification from individual providers and hospital systems to assay distributors, for whom genomic sequencing reports and treatment recommendations are already standard deliverables. Because expert guidelines recommend LBs as therapy selection tools64, many patients could, thus, receive effective VTE risk stratification with an LB report with no additional overhead required of the patient or clinician.
VTE may also be a presenting symptom of cancer33,65. Trials to assess whether LBs may aid in the diagnosis of malignancy in patients with idiopathic VTE are ongoing66. Our finding that ctDNA is more frequently detected in patients with prior VTE than in those without further motivates a ctDNA-based approach to cancer screening in this population, although the optimal assay for this purpose is yet to be determined67.
Our study has limitations. Although LBs are increasingly deployed in clinic and are used as biomarkers for a growing number of clinical trials68, they are mostly used at present for targeted therapy matching in certain cancer types, limiting the immediate universality of our VTE prediction approach. DNA LBs are also not universally implemented, and, despite growing adoption across academic and community centers, whether they can and will be used on a global scale remains to be seen. The sequencing assay used in our discovery and validation cohorts used matched white blood cell sequencing to filter germline and clonal hematopoietic mutations. Empirically, we previously found that 11% of ctDx Lung mutations may be attributable to clonal hematopoiesis27; in broader panels without matched white blood cell sequencing, the rate of false-positive mutations may be higher, and this may falsely elevate VTE risk prediction. Our real-world analysis of previous medication administration and VTE rates has many potential confounders, including comorbidities leading to anticoagulation, statin or aspirin use. Randomized studies taking into account the evolving anticoagulation landscape6,7,69,70 are necessary before prophylactic anticoagulation based on LBs can be applied in clinic. Our population had sparse minority representation and a plurality of patients with advanced disease27; generalizability of results should be further studied. Other clinical, genetic21 and blood-based biomarkers22,23 may add orthogonal information and, thus, result in a superior risk model. Future studies integrating these modalities are necessary to determine how to best risk-stratify patients with cancer for VTE.
Overall, our findings suggest that, in patients with cancer, ctDNA is independently associated with higher rates of VTE in a quantitative manner. DNA LB-based models have the potential to predict VTE risk on a spectrum and to be deployed with minimal clinician burden. The use of LBs to guide anticoagulation in patients with cancer merits further validation.
Methods
Cohort selection
We studied three cohorts. (1) A discovery cohort and (2) a prospective validation cohort, both including patients with any cancer type and plasma MSK-ACCESS sequencing who were evaluated and treated at MSK, an academic cancer center. Patients in the discovery cohort had sequencing between 10 June 2019 and 15 September 2022 with follow-up until 31 October 2022, whereas those in the validation cohort had sequencing between 16 September 2022 and 30 September 2023 and follow-up until 31 October 2023. (3) A generalizability cohort was enrolled including patients with plasma sequenced by a different assay, ctDx Lung, and with diagnoses of stage IV or recurrent NSCLC treated at MSK or GenesisCare (Sydney Australia), a community-based oncology practice, between 21 October 2016 and 1 November 2020. This cohort was followed until 31 August 2022 (ref. 27). ctDNA testing in all cohorts was administered at the provider’s discretion. Sample size was determined by the number of patients available at date of data cutoff, and no power calculation was used to determine sample size.
This study was independently approved by the institutional review boards (IRBs) of MSK and GenesisCare. MSK patients in all three cohorts were enrolled as part of a prospective observational biospecimen collection and sequencing protocol (NCT01775072). Patients from Sydney in the generalizability cohort were enrolled as part of a prospective observational biospecimen collection and sequencing protocol (‘Genomic profiling in cancer patients’, approved by the GenesisCare/Northern Cancer Institute IRB in 2017). Patients provided written informed consent and were enrolled in a continuous, non-random fashion. Patients were not compensated financially for participation in the study. Patients with prior cancer-associated VTE were excluded from VTE risk analyses (Extended Data Fig. 1). Events were defined as any new pulmonary embolism or lower extremity deep vein thrombosis event, whether incidental or symptomatic.
ctDNA sequencing
Details of the MSK-ACCESS32 and ctDx Lung (Resolution Bioscience, Exact Sciences)71 protocols were published previously. In short, both Clinical Laboratory Improvement Amendments (CLIA)-certified and New York State-approved methods use error-corrected, hybrid capture-based next-generation sequencing to detect mutant DNA in plasma with a validated VAF detection limit of 0.1–0.5% and the ability to call lower VAF mutations in cases of sufficient coverage. MSK-ACCESS and ctDx Lung probes cover 129 and 21 genes, respectively. MSK-ACCESS includes matched white blood cell sequencing to filter germline and clonal hematopoietic variants. ctDx Lung filtering was performed using germline databases as previously described27.
Statistical analysis
We studied time-to-VTE from time of first ctDNA plasma draw, right-censored at time of last follow-up using the Aalen–Johansen estimator with all-cause mortality as a competing risk. Subgroups were compared using cause-specific Fine–Gray regression72. Patients with detectable ctDNA (ctDNA+)—that is, any reported ctDNA mutation or copy number alteration—were compared to those without detectable ctDNA (ctDNA−) for rates of VTE. In exploratory analysis to assess the dose dependence of ctDNA on VTE risk, we created Aalen–Johansen survival curves with ctDNA+ patients grouped into one of four quartiles based on the ctDNA VAF.
We repeated Fine–Gray regression to test for the association of ctDNA with VTE in subgroups by cancer type and, separately, by pathogenically altered gene (as annotated by an FDA-recognized molecular knowledge database73) detected in ctDNA adjusted for cancer type given known associations with specific gene alterations and cancer type74. Using multivariate regression, we assessed the independent association with VTE of ctDNA detection, log10(cfDNA concentration), KS, receipt of cytotoxic chemotherapy within 30 d and number of sites of disease as annotated from radiology reports75,76.
To assess the association of anticoagulation with rates of future VTE in an exploratory manner, we performed Fine–Gray regression77 analyses adjusted for age, time since diagnosis and cancer type (variables thought a priori to be associated with anticoagulation use43), comparing those prescribed anticoagulants (apixaban, rivaroxaban, edoxaban, enoxaparin, warfarin, dalteparin, fondaparinux or dabigatran) at cohort entry to those not prescribed anticoagulants stratified by ctDNA detection.
Machine learning models
We created machine learning models (RSFs78,79) trained on the discovery cohort to predict risk of VTE from time of blood draw using the aforementioned variables as well as demographics, detected alterations at the gene level and specific organ sites of disease as inputs (details below). Models using subsets of the available variables were trained to assess the relative performance of models with certain data. Performance was assessed using Harrell’s c-index and discrimination of VTE events at 6 months using time-dependent80 AUROC, precision and recall. Models were assessed using five-fold cross-validation81. Final models trained on the entire discovery cohort were also tested in the entire held-out validation and generalizability cohorts.
RSFs were trained using pre-assigned hyperparameters (n trees = 1,000; minimum n splits = 10; minimum n samples per leaf = 15). In exploratory secondary analyses, a random hyperparameter grid search to find ‘optimal’ hyperparameters using a 20% holdout for evaluation was conducted (n tree range, 200–2,000; minimum n splits range, 5–20; minimum n samples per leaf range, 5–30; n search iterations = 100; three-fold internal cross-validation for hyperparameter selection); a model trained on optimal hyperparameters did not yield better results (c-index ‘improvement’ of −0.01 using optimal versus pre-assigned hyperparameters). The following variables were considered (grouped by variable type):
KS components
Closest white blood cell count, hemoglobin, platelet count and body mass index (BMI) before plasma draw (continuous), receipt of chemotherapy within 30 d of plasma draw (binary) as well as the cancer types in Fig. 1c as one-hot encoded variables. For patients in whom no laboratory or BMI data were available, the cohort median was imputed (for Sydney patients in whom KS data were not available, this model was not tested).
Tumor sites
The following were encoded as one-hot variables based on tumor presence in a preceding radiology report: Brain, Bone, Liver, Lung, Pleura, Abdomen, Lymph, Adrenal and Other (for Sydney patients in whom data were not available, this model was not tested).
Demographics
White, Black or Asian race, sex (male or female), time since cancer diagnosis (continuous) in days and cancer type as above.
LB
log10 cfDNA concentration (continuous), log10 ctDNA, VAF (continuous), pathogenic alterations as annotated by OncoKB in one of the genes in the MSK-ACCESS panel (altered in >5% of the entire cohort) were encoded as one-hot encoded variables.
LB+
All variables in LB as well as cancer type and chemotherapy receipt.
All
All variables above.
Models were evaluated in the MSK-ACCESS cohort by five-fold cross-validation. Models trained on the entire MSK-ACCESS cohort were evaluated in the ctDx Lung MSK cohort. For LB and LB+ models (in which all data were available), the Sydney cohort was used as an additional validation cohort. In the ‘adjusted cfDNA’ model, ctDx Lung cfDNA concentrations were transformed according to the linear equations described in Extended Data Fig. 9 before being input to the model.
In validating the model’s utility to predict VTE within 6 months of plasma draw, precision and recall were reported at an ‘optimal’ point minimizing false-positive rate and maximizing true-positive rate.
Analyses were performed in Python 3.10.11 using the lifelines 0.26.0 and sksurv 0.20.0 packages or in R 3.6.1 using the cmprsk 2.2-11 and survivalROC 1.0.3.1 packages.
Data capture
VTE events in the discovery and validation cohorts were annotated using the CEDARS (https://cedars.io) and PINES (https://pines.ai) natural language processing (NLP) packages to identify patient records with candidate thromboembolic events and were manually confirmed by chart review of clinician notes and diagnostic scans82,83. An in-depth discussion of the methodological approach and a link to the code base repositories for the most recent versions of the two packages can be found on their respective web pages.
Clinical notes and radiology reports from 1 year before cohort entry up to the date of censoring were included for all patients. Those documents were first processed through the CEDARS pipeline. Individual sentences were retained along with their corresponding documents if they matched the following query:
‘dvt OR pe OR vte OR thrombos* OR thrombus OR thrombi OR thrombotic OR clot OR *embol* OR *phlebitis’
The documents were then labeled with an estimated probability of coming at or after a VTE event using a longformer model (https://huggingface.co/allenai/longformer-base-4096) previously fine-tuned using the PINES methodology. The document-specific probability threshold used for this work corresponded to a sensitivity of 95% for detecting cancer-associated thrombosis events in model validation. Only sentences associated with those selected documents were retained and presented to a physician reviewer for assessment with the CEDARS graphical user interface.
The interface presents one sentence at a time with its associated clinical note or radiology report in the same screen view. The human reviewer enters an event date as indicated, after which the application automatically moves on to the next patient. The EHR can be consulted separately if additional information is needed. A board-certified hematologist (S.M.) oversaw the discovery and generalizability cohort annotation process; validation cohort events were primarily annotated by a board-certified oncologist (J.J.). Once the entire cohort has been reviewed, CEDARS generates a table including patient identifiers, event dates, included document dates and text for selected sentences. This dataset can be audited and used as is for time-to-event analyses.
The current CEDARS methodology was used uniformly for the discovery and validation cohorts. An earlier version of CEDARS was used for the generalizability cohort for the years 2017–2019. Events for the generalizability cohort were annotated as previously described for patients accrued in 2016 (ref. 34). Events for the Sydney ctDx cohort were manually physician annotated by A.L. and L.G. Curators were blinded to ctDNA status during curation.
NLP data audits
Two hundred patients were randomly selected from each of three MSK cohorts:
-
1.
MSK IMPACT cohort (2014–2019, n = 35,391, ref. 34), including 361 patients from the generalizability cohort
-
2.
Discovery cohort
-
3.
Validation cohort
Clinical notes and radiology reports were assessed manually for each known VTE case from the original datasets to confirm events detected with the CEDARS + PINES NLP platform. Hematology and anticoagulation clinic notes were reviewed for all patients retained in the audit to look for VTE events potentially missed by NLP. All patients in the audit were assessed for International Classification of Diseases 9 (ICD-9) and International Classification of Diseases 10 (ICD-10) codes potentially revealing a qualifying VTE event during the observation period. Codes used are shown in Supplementary Table 6.
All patients without a VTE event found with NLP but with a VTE ICD code detected were reviewed manually, looking at individual notes and radiology reports. Recall (also known as sensitivity) and precision (also known as positive predictive value) were calculated. Results of the audit for those three cohorts are shown in Supplementary Table 7. ICD code review did not reveal any qualifying VTE event missed with NLP. The manual review of hematology/anticoagulation clinic notes revealed one missed case due to data entry oversight (human error). Two false-positive events were uncovered. Combining the three audit cohorts, the overall recall for NLP was 99%, and the overall precision was 98%.
Organ sites with tumor involvement were automatically extracted from the EHR using previously validated NLP methods75. A Bidirectional Encoder Representations from Transformers (BERT) model was trained and validated on a manual 31,455-report corpus in an 80:20 train:test split. Annotation algorithms used for the analysis presented here had an average AUC of 0.981 and micro-average precision/recall of 87.5/89.6. This approach was shown to have higher recall than structured data approaches, such as those based on billing codes alone.
Structured data were obtained from our EHR as follows: demographics were self-reported; medications, including antineoplastics, were derived from electronic prescription records; cancer stage and time of diagnosis were derived from cancer registry data; and laboratory data were obtained from an institutional laboratory medicine database.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Genomic data for the discovery and validation cohorts are available at https://www.cbioportal.org/study/summary?id=msk_ctdna_vte_2024. Genomic data for the generalizability cohort are available at https://www.cbioportal.org/ (study ID: nsclc_ctdx_msk_2022). Clinical data for all cohorts are available at https://github.com/clinical-data-mining/ctDNA_CAT. Raw genomic data and protected health information are not available owing to privacy laws. Researchers at MSK may submit a retrospective research protocol to the MSK institutional review board for permission and, upon approval, may request the data from the Center for Molecular Oncology (skicmopm@mskcc.org), with an estimated 1–2-week timeframe for response.
Code availability
Code necessary to reproduce the analyses in this study is available at https://github.com/clinical-data-mining/ctDNA_CAT. The CEDARS and PINES packages are available at https://github.com/CEDARS-NLP.
References
Lyman, G. H., Eckert, L., Wang, Y., Wang, H. & Cohen, A. Venous thromboembolism risk in patients with cancer receiving chemotherapy: a real-world analysis. Oncologist 18, 1321–1329 (2013).
Lyman, G. H., Culakova, E., Poniewierski, M. S. & Kuderer, N. M. Morbidity, mortality and costs associated with venous thromboembolism in hospitalized patients with cancer. Thromb. Res. 164, S112–S118 (2018).
Khorana, A. A., Francis, C. W., Culakova, E., Kuderer, N. M. & Lyman, G. H. Thromboembolism is a leading cause of death in cancer patients receiving outpatient chemotherapy. J. Thromb. Haemost. 5, 632–634 (2007).
Agnelli, G. et al. Semuloparin for thromboprophylaxis in patients receiving chemotherapy for cancer. N. Engl. J. Med. 366, 601–609 (2012).
Agnelli, G. et al. Nadroparin for the prevention of thromboembolic events in ambulatory patients with metastatic or locally advanced solid cancer receiving chemotherapy: a randomised, placebo-controlled, double-blind study. Lancet Oncol. 10, 943–949 (2009).
Carrier, M. et al. Apixaban to prevent venous thromboembolism in patients with cancer. N. Engl. J. Med. 380, 711–719 (2018).
Khorana, A. A. et al. Rivaroxaban for thromboprophylaxis in high-risk ambulatory patients with cancer. N. Engl. J. Med. 380, 720–728 (2019).
Li, A. et al. Cost-effectiveness analysis of low-dose direct oral anticoagulant (DOAC) for the prevention of cancer-associated thrombosis in the United States. Cancer 126, 1736–1748 (2020).
Lyman, G. H. et al. American Society of Hematology 2021 guidelines for management of venous thromboembolism: prevention and treatment in patients with cancer. Blood Adv. 5, 927–974 (2021).
Key, N. S. et al. Venous thromboembolism prophylaxis and treatment in patients with cancer: ASCO clinical practice guideline update. J. Clin. Oncol. 38, 496–520 (2019).
Khorana, A. A., Kuderer, N. M., Culakova, E., Lyman, G. H. & Francis, C. W. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood 111, 4902–4907 (2008).
Kuderer, N. M. et al. Predictors of venous thromboembolism and early mortality in lung cancer: results from a global prospective study (CANTARISK). Oncologist 23, 247–255 (2018).
Mansfield, A. S. et al. Predictors of active cancer thromboembolic outcomes: validation of the Khorana score among patients with lung cancer. J. Thromb. Haemost. 14, 1773–1778 (2016).
Mulder, F. I. et al. The Khorana score for prediction of venous thromboembolism in cancer patients: a systematic review and meta-analysis. Haematologica 104, 1277 (2019).
Li, A. et al. Derivation and validation of a clinical risk assessment model for cancer-associated thrombosis in two unique US health care systems. J. Clin. Oncol. 41, 2926–2938 (2023).
Pabinger, I. et al. A clinical prediction model for cancer-associated venous thromboembolism: a development and validation study in two independent prospective cohorts. Lancet Haematol. 5, e289–e298 (2018).
Connors, J. M. Fine tuning venous thromboembolism risk prediction in patients with cancer. J. Clin. Oncol. 41, 2881–2883 (2023).
Holmes, C. E. et al. Successful model for guideline implementation to prevent cancer-associated thrombosis: venous thromboembolism prevention in the ambulatory cancer clinic. JCO Oncol. Pract. 16, e868–e874 (2020).
Khorana, A. A. Simplicity versus complexity: an existential dilemma as risk tools evolve. Lancet Haematol. 5, e273–e274 (2018).
Blom, J. W., Doggen, C. J. M., Osanto, S. & Rosendaal, F. R. Malignancies, prothrombotic mutations, and the risk of venous thrombosis. JAMA 293, 715–722 (2005).
Muñoz, A. et al. A clinical-genetic risk score for predicting cancer-associated venous thromboembolism: a development and validation study involving two independent prospective cohorts. J. Clin. Oncol. 41, 2911–2925 (2023).
Zwicker, J. I. et al. Tumor-derived tissue factor-bearing microparticles are associated with venous thromboembolic events in malignancy. Clin. Cancer Res. 15, 6830–6840 (2009).
Khorana, A. A. et al. A proteomics-based approach to identifying mechanisms of cancer-associated thrombosis: potential role for immunoglobulins. Blood 122, 1127 (2013).
FDA approves liquid biopsy NGS companion diagnostic test for multiple cancers and biomarkers. US FDA www.fda.gov/drugs/resources-information-approved-drugs/fda-approves-liquid-biopsy-ngs-companion-diagnostic-test-multiple-cancers-and-biomarkers (2020).
Fuchs, T. A. et al. Extracellular DNA traps promote thrombosis. Proc. Natl Acad. Sci. USA 107, 15880–15885 (2010).
Mauracher, L.-M. et al. Citrullinated histone H3, a biomarker of neutrophil extracellular trap formation, predicts the risk of venous thromboembolism in cancer patients. J. Thromb. Haemost. 16, 508–518 (2018).
Jee, J. et al. Overall survival with circulating tumor DNA-guided therapy in advanced non-small-cell lung cancer. Nat. Med. 28, 2353–2363 (2022).
Vivancos, A. & Tabernero, J. Circulating tumor DNA as a novel prognostic indicator. Nat. Med. 28, 2255–2256 (2022).
Abbosh, C. et al. Tracking early lung cancer metastatic dissemination in TRACERx using ctDNA. Nature 616, 553–562 (2023).
Syeda, M. M. et al. Circulating tumour DNA in patients with advanced melanoma treated with dabrafenib or dabrafenib plus trametinib: a clinical validation study. Lancet Oncol. 22, 370–380 (2021).
Reichert, Z. R. et al. Prognostic value of plasma circulating tumor DNA fraction across four common cancer types: a real-world outcomes study. Ann. Oncol. 34, 111–120 (2023).
Rose Brannon, A. et al. Enhanced specificity of clinical high-sensitivity tumor mutation profiling in cell-free DNA via paired normal sequencing using MSK-ACCESS. Nat. Commun. 12, 3770 (2021).
Timp, J. F., Braekkan, S. K., Versteeg, H. H. & Cannegieter, S. C. Epidemiology of cancer-associated venous thrombosis. Blood 122, 1712–1723 (2013).
Dunbar, A. et al. Genomic profiling identifies somatic mutations predicting thromboembolic risk in patients with solid tumors. Blood 137, 2103–2113 (2021).
Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).
Khorana, A. A., Francis, C. W., Culakova, E. & Lyman, G. H. Risk factors for chemotherapy-associated venous thromboembolism in a prospective observational study. Cancer 104, 2822–2829 (2005).
Kanoun, S. et al. Influence of software tool and methodological aspects of total metabolic tumor volume calculation on baseline [18F]FDG PET to predict survival in Hodgkin lymphoma. PLoS ONE 10, e0140830 (2015).
Morbelli, S. et al. Circulating tumor DNA reflects tumor metabolism rather than tumor burden in chemotherapy-naive patients with advanced non-small cell lung cancer (NSCLC): 18F-FDG PET/CT study. J. Nucl. Med. 58, 1764–1769 (2017).
Chabon, J. J. et al. Integrating genomic features for non-invasive early lung cancer detection. Nature 580, 245–251 (2020).
Ottestad, A. L. et al. Associations between detectable circulating tumor DNA and tumor glucose uptake measured by 18F-FDG PET/CT in early-stage non-small cell lung cancer. BMC Cancer 23, 646 (2023).
YOUDEN, W. J. Index for rating diagnostic tests. Cancer 3, 32–35 (1950).
Lobo, J. M., Jiménez-Valverde, A. & Real, R. AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 17, 145–151 (2008).
Cohen, A. et al. Effectiveness and safety of apixaban, low-molecular-weight heparin, and warfarin among venous thromboembolism patients with active cancer: a U.S. claims data analysis. Thromb. Haemost. 121, 383–395 (2020).
Khemasuwan, D., Divietro, M. L., Tangdhanakanond, K., Pomerantz, S. C. & Eiger, G. Statins decrease the occurrence of venous thromboembolism in patients with cancer. Am. J. Med. 123, 60–65 (2010).
El-Refai, S. M., Black, E. P., Adams, V. R., Talbert, J. C. & Brown, J. D. Statin use and venous thromboembolism in cancer: a large, active comparator, propensity score matched cohort study. Thromb. Res. 158, 49–58 (2017).
Li, P. et al. Aspirin is associated with reduced rates of venous thromboembolism in older patients with cancer. J. Cardiovasc. Pharmacol. Ther. 25, 456–465 (2020).
Trousseau, A. Phlegmasia alba dolens. In Clinique medicale l’Hôtel-Dieu Paris 2nd edn, Vol. 3. 654–712 (J.-B. Baillière et fils, 1865).
Mandel, P. & Metais, P. [Nuclear acids in human blood plasma]. C. R. Seances Soc. Biol. Fil. 142, 241–243 (1948).
Demers, M. et al. Cancers predispose neutrophils to release extracellular DNA traps that contribute to cancer-associated thrombosis. Proc. Natl Acad. Sci. USA 109, 13076–13081 (2012).
Swystun, L. L., Mukherjee, S. & Liaw, P. C. Breast cancer chemotherapy induces the release of cell-free DNA, a novel procoagulant stimulus. J. Thromb. Haemost. 9, 2313–2321 (2011).
Alix-Panabières, C. & Pantel, K. Liquid biopsy: from discovery to clinical application. Cancer Discov. 11, 858–873 (2021).
Ignatiadis, M., Sledge, G. W. & Jeffrey, S. S. Liquid biopsy enters the clinic—implementation issues and future challenges. Nat. Rev. Clin. Oncol. 18, 297–312 (2021).
Conteduca, V. et al. Plasma tumor DNA is associated with increased risk of venous thromboembolism in metastatic castration-resistant cancer patients. Int. J. Cancer 150, 1166–1173 (2022).
Mantha, S. et al. Application of machine learning to the prediction of cancer-associated venous thromboembolism. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-2870367/v1 (2023).
Ferroni, P. et al. Validation of a machine learning approach for venous thromboembolism risk prediction in oncology. Dis. Markers 2017, 8781379 (2017).
Mattox, A. K. et al. The origin of highly elevated cell-free DNA in healthy individuals and patients with pancreatic, colorectal, lung, or ovarian cancer. Cancer Discov. 13, 2166–2179 (2023).
Gervaso, L. et al. Circulating tumor DNA and risk of venous thromboembolism in locally advanced rectal cancer. Blood 142, 4013 (2023).
Zhang, Q. et al. Prognostic and predictive impact of circulating tumor DNA in patients with advanced cancers treated with immune checkpoint blockade. Cancer Discov. 10, 1842–1853 (2020).
Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).
Tie, J. et al. Circulating tumor DNA analysis detects minimal residual disease and predicts recurrence in patients with stage II colon cancer. Sci. Transl. Med. 8, 346ra92 (2016).
Gi, T. et al. Histopathological features of cancer-associated venous thromboembolism: presence of intrathrombus cancer cells and prothrombotic factors. Arterioscler. Thromb. Vasc. Biol. 43, 146–159 (2023).
Snyder, M. W., Kircher, M., Hill, A. J., Daza, R. M. & Shendure, J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016).
Pfeiler, S., Stark, K., Massberg, S. & Engelmann, B. Propagation of thrombosis by neutrophils and extracellular nucleosome networks. Haematologica 102, 206–213 (2017).
Rolfo, C. et al. Liquid biopsy for advanced NSCLC: a consensus statement from the International Association for the Study of Lung Cancer. J. Thorac. Oncol. 16, 1647–1662 (2021).
Prandoni, P. et al. Deep-vein thrombosis and the incidence of subsequent symptomatic cancer. N. Engl. J. Med. 327, 1128–1133 (1992).
Kraaijpoel, N. et al. Novel biomarkers to detect occult cancer in patients with unprovoked venous thromboembolism: rationale and design of the PLATO-VTE study. Thromb. Update 2, 100030 (2021).
Kaiser, J. ‘The complexities are staggering.’ U.S. plans huge trial of blood tests for multiple cancers. Science https://www.science.org/content/article/complexities-are-staggering-u-s-plans-huge-trial-blood-tests-multiple-cancers (2022).
Xie, W., Suryaprakash, S., Wu, C., Rodriguez, A. & Fraterman, S. Trends in the uses of liquid biopsy in oncology. Nat. Rev. Drug Discov. 22, 612–613 (2023).
Verhamme, P. et al. Abelacimab for prevention of venous thromboembolism. N. Engl. J. Med. 385, 609–617 (2021).
Schrag, D. et al. Direct oral anticoagulants vs low-molecular-weight heparin and recurrent VTE in patients with cancer: a randomized clinical trial. JAMA 329, 1924–1933 (2023).
Paweletz, C. P. et al. Bias-corrected targeted next-generation sequencing for rapid, multiplexed detection of actionable alterations in cell-free DNA from advanced lung cancer patients. Clin. Cancer Res. 22, 915–922 (2016).
Gray, R. J. A class of K-sample tests for comparing the cumulative incidence of a competing risk. Ann. Stat. 16, 1141–1154 (1988).
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 2017, PO.17.00011 (2017).
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
Luthra, A. et al. A.I.-assisted clinical data curation to determine genomic biomarkers of cancer metastasis. Cancer Res. 82, 1158 (2022).
Do, R. K. G. et al. Patterns of metastatic disease in patients with cancer derived from natural language processing of structured CT radiology reports over a 10-year period. Radiology 301, 115–122 (2021).
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008).
Jee, J. Machine learning-based markers for CAD. Lancet 402, 183 (2023).
Heagerty, P. J., Lumley, T. & Pepe, M. S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344 (2000).
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proc. of the 14th International Joint Conference on Artificial Intelligence 1137–1143 (Association for Computing Machinery, 1995).
Fine, J. P. & Gray, R. J. A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 94, 496–509 (1999).
Mantha, S. CEDARS. Github https://github.com/CEDARS-NLP/CEDARS (2024).
Mantha, S. & Singh, R. PINES. Github https://github.com/CEDARS-NLP/PINES (2024).
Acknowledgements
We acknowledge S. Ng and C. Hutcheon for their help with data collection and annotation. This work was supported by Memorial Sloan Kettering Cancer Center Support Grant/Core Grant P30 CA008748 (all MSK authors); the Molecular Diagnostics Service in the Department of Pathology (all MSK authors); the Marie-Josée and Henry R. Kravis Center for Molecular Oncology (all MSK authors); the Gregory John Poche and Kay Van Norton Poche Initiative for Clinical Trials Access in Sydney (J.J., A. Lee and B.T.L.); the Investigational Cancer Therapeutics Training Program (T32-CA009207, J.J.); and the Paul Calabresi Career Development Award for Clinical Oncology (K12 CA184746, J.J.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.
Author information
Authors and Affiliations
Contributions
Conception: J.J. and S.M. Genomic data collection and analysis: J.J., A.R.B., A. Lee, L.G., M.D., M.H., J.G., J.H., K.G., M.E.A., N.P., S.C., P.R., J.S.R.-F., M.L., N.S., M.F.B. and B.T.L. Clinical data collection and analysis: R.S., C.F., A. Luthra, L.G., K.P., A. Lee, N.P., S.C., P.R., N.S., M.F.B., B.T.L. and S.M. Administration: M.E.A., N.P., S.C., S.P.S., J.S.R.-F., M.L., N.S., J.Z., M.F.B., B.T.L. and S.M. Statistical plan: J.J., A.D. and S.M. Writing: J.J., C.W., J.Z. and S.M. All authors reviewed and approved the final version of the paper.
Corresponding author
Ethics declarations
Competing interests
J.J. has a patent licensed by MDSeq, Inc. J.G. is a former employee of Agilent Technologies and a current employee of NeoGenomics. J.H. and K.G. are current employees of Agilent Technologies. M.E.A. has consulted for Janssen Global Services, Bristol Myers Squibb, AstraZeneca, Roche and Biocartis and has participated in speaker’s bureau activities for Biocartis, Invivoscribe, Physiciansʼ Education Resource, PeerView Institute for Medical Education, Clinical Care Options and RMEI Medical Education. N.P. has received honoraria from Boehringer Ingelheim, Merck Sharp & Dohme, Merck, Bristol Myers Squibb, AstraZeneca, Takeda, Pfizer, Roche, Novartis, Ipsen and Bayer and received research funding from Bayer, Pfizer and Roche. S.P.S. holds equity in Canesia Health, Inc. P.R. has received research funding from GRAIL, Illumina, Novartis, Epic Sciences and ArcherDx and served as a consultant for Novartis, Foundation Medicine, AstraZeneca, Epic Sciences, Inivata, Natera and Tempus. J.S.R.-F. is a current employee of AstraZeneca; has served as a consultant for Goldman Sachs, Paige.AI and REPARE Therapeutics; and has served as an advisor for Roche, Genentech, Roche Tissue Diagnostics, Ventana, Novartis, InVicro, GRAIL, Goldman Sachs, Paige.AI and Volition RX. M.L. has received honoraria from Merck, AstraZeneca, Bristol Myers Squibb, Blueprint Medicines, Janssen Pharmaceuticals, Takeda Pharmaceuticals, Lilly Oncology, LOXO Oncology, Bayer, ADC Therapeutics, Riken Genesis and Paige.AI and research funding from LOXO Oncology, Merus and Helsinn Therapeutics. J.Z. has served as a consultant for Calyx, Sanofi, CSL Behring, Janssen, Sanofi, CSL and Parexel and received research funding from Incyte Corporation and QUercegen and honoraria from Pfizer/Bristol Myers Squibb, Portola and Daiichi. M.F.B. has consulted for PetDx and Eli Lilly and received research funding from GRAIL. B.T.L. has received research funding from Amgen, Genentech, AstraZeneca, Daiichi Sankyo, Eli Lilly, Illumina, GRAIL, Guardant Health, Hengrui Therapeutics, MORE Health and Bolt Biotherapeutics. S.M. has served as a consultant for Janssen Pharmaceuticals; is developing a licensing agreement with Superbio.ai, Inc. for NLP software featured in this paper; is the principal owner of Daboia Consulting, LLC; and has a US patent application for PINES. J.J., B.T.L. and S.M. have applied for a US patent related to the research in this paper. The other authors declare no competing interests.
Peer review
Peer review information
Nature Medicine thanks Hanny Al-Samkari and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Anna Maria Ranzoni, in collaboration with the Nature Medicine team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Consort diagram.
Patient inclusion for the Discovery, Validation, and Generalizability cohorts.
Extended Data Fig. 2 Venous thromboembolism (VTE) rates by cancer type in the Discovery cohort.
Univariate hazard ratios for VTE events for patients with vs. all others without a given cancer type. The number in each subgroup is given in Supplementary Table 1.
Extended Data Fig. 3 Venous thromboembolism (VTE) rates and cell-free DNA (cfDNA) concentration.
Univariate hazard ratios (+−95%CI) for VTE with log10(cfDNA concentration in ng/mL plasma) as a variable stratified by cancer type in the discovery cohort.
Extended Data Fig. 4 Sensitivity Analyses.
Aalen-Johansen time-to-event analysis to time of venous thromboembolism (VTE) with death as a competing risk, (A) from time of diagnosis left truncated at time of plasma draw (* at risk numbers adjusted for left truncation; that is patients entering the risk set after the start date), (B) from time of plasma draw, showing only patients without VTE, death, or right censorship at 6 months for both cohorts stratified by the presence (ctDNA + ) or absence (ctDNA-) of detectable circulating tumor DNA (ctDNA), or (D) from time of second plasma draw in patients with two draws (Discovery cohort only), stratified by ctDNA status in each draw. Confidence interval (CI); hazard ratio (HR). C. Kaplan-Meier analysis with Cox Proportional Hazards reported for VTE in the Discovery cohort.
Extended Data Fig. 5 Circulating tumor DNA (ctDNA) levels in patients with vs. without prior venous thromboembolism (VTE).
ctDNA levels are quantified by the variant allele frequency (VAF) detected from individual patients. Graphs represent boxplots showing median +/− 25%ile and 75%ile with whiskers corresponding to 5%ile and 95%ile of the log10(max VAF of all mutations) among patients with (true) vs. without (false) prior VTE. Patients without detectable ctDNA had max VAF set to −5 on the log axis. Groups are different (two-sided p = 3.6x10−9 by Mann-Whitney U test).
Extended Data Fig. 6 Correlates of circulating tumor DNA (ctDNA) levels.
Boxplots show median +/− 25%ile and 75%ile with whiskers corresponding to 5%ile and 95%ile for log10(ctDNA variant allele frequency [VAF]) vs (A) number of organ sites (two-sided p = 5.5x10−190), (B) Khorana score (two-sided p = 2.1×10−16), and (D) chemotherapy receipt within 30 days of plasma draw from individual patients (two-sided p = 1.6×10−40). C. Scatterplot of max VAF vs cell-free DNA (cfDNA) concentration in ng/μL (two-sided p = 7.3x10−215). In A, B and D samples without ctDNA are represented by a log10(max VAF) of −5.
Extended Data Fig. 7 Multivariate Fine Gray models.
Cell-free DNA (cfDNA); circulating tumor DNA (ctDNA); metabolic tumor volume (MTV); hazard ratio (HR).
Extended Data Fig. 8 More on random survival forest performance.
a. Mean permutation variable importances (for all variables with >0.001 importance) in the ‘All’ RSF in Fig. 2b. Points represent individual experiments from 10x cross-validation. b. Risk scores from LB+ in 5-fold cross-validation by cancer type. Boxes represent median + /-IQR, with whiskers representing 5–95%iles and dots representing outliers. c. Aalen-Johansen survival curves for VTE from time of plasma draw with death as a competing risk stratified by the risk quantile from the’All’ RSF in Fig. 2b for the Discovery (left) and Generalizability (right) cohorts (Cox PH 2-sided p = 3.8x10−80). Area under the curve (AUC); cell-free DNA (cfDNA); hepatocellular carcinoma (HCC); non-small cell lung cancer (NSCLC); variant allele frequency (VAF); white blood cell count (WBC).
Extended Data Fig. 9 DNA extraction methods comparison.
Scatterplot of cell-free DNA (cfDNA) concentrations in ng/μL from same patient-matched MSK-ACCESS (extracted by MagMAX protocol) and ctDx Lung (extracted by Qiagen and Kingfisher kits as indicated) samples. Least squares linear regression with slope and intercept are reported for the Qiagen and Kingfisher methods to approximate the MagMAX concentrations and shown as dotted lines.
Extended Data Fig. 10 Associations between previous statin and aspirin receipt and venous thromboembolism.
Aalen-Johansen curves from time of plasma draw to time of VTE with death as a competing risk. (Top) Patients with vs. without prior statin administration. (Bottom) Patients with vs. without prior aspirin administration. The number of patients at risk at each time point are shown below the graphs.
Supplementary information
Supplementary Information
Supplementary Discussion and Tables 1–7.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jee, J., Brannon, A.R., Singh, R. et al. DNA liquid biopsy-based prediction of cancer-associated venous thromboembolism. Nat Med 30, 2499–2507 (2024). https://doi.org/10.1038/s41591-024-03195-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-024-03195-0