Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Combining radiomic phenotypes of non-small cell lung cancer with liquid biopsy data may improve prediction of response to EGFR inhibitors

## Abstract

Among non-small cell lung cancer (NSCLC) patients with therapeutically targetable tumor mutations in epidermal growth factor receptor (EGFR), not all patients respond to targeted therapy. Combining circulating-tumor DNA (ctDNA), clinical variables, and radiomic phenotypes may improve prediction of EGFR-targeted therapy outcomes for NSCLC. This single-center retrospective study included 40 EGFR-mutant advanced NSCLC patients treated with EGFR-targeted therapy. ctDNA data included number of mutations and detection of EGFR T790M. Clinical data included age, smoking status, and ECOG performance status. Baseline chest CT scans were analyzed to extract 429 radiomic features from each primary tumor. Unsupervised hierarchical clustering was used to group tumors into phenotypes. Kaplan–Meier (K–M) curves and Cox proportional hazards regression were modeled for progression-free survival (PFS) and overall survival (OS). Likelihood ratio test (LRT) was used to compare fit between models. Among 40 patients (73% women, median age 62 years), consensus clustering identified two radiomic phenotypes. For PFS, the model combining radiomic phenotypes with ctDNA and clinical variables had c-statistic of 0.77 and a better fit (LRT p = 0.01) than the model with clinical and ctDNA variables alone with a c-statistic of 0.73. For OS, adding radiomic phenotypes resulted in c-statistic of 0.83 versus 0.80 when using clinical and ctDNA variables (LRT p = 0.08). Both models showed separation of K–M curves dichotomized by median prognostic score (p < 0.005). Combining radiomic phenotypes, ctDNA, and clinical variables may enhance precision oncology approaches to managing advanced non-small cell lung cancer with EGFR mutations.

## Introduction

The discovery of activating mutations and the development of targeted therapies has improved survival in patients with non-small cell lung cancer (NSCLC)1. Mutation detection by tissue and circulating tumor DNA (ctDNA) next-generation sequencing (NGS) guides therapy selection both at initial diagnosis and disease progression. Epidermal growth factor receptor (EGFR) mutations are the most common therapeutically targetable variants in NSCLC, and treatment with an EGFR tyrosine kinase inhibitor (TKI) has shown superior efficacy compared to standard chemotherapy in mutation-positive patients2. However, primary resistance occurs in 20–30% of patients3. Ultimately, all patients develop acquired resistance to EGFR-directed therapies and an active area of research is the use of novel combination therapies, including antibodies against c-met, poly-adenosine diphosphate ribose polymerase inhibitors and antiangiogenic therapies along with EGFR-TKIs to improve long-term efficacy4,5.

Tumor heterogeneity is thought to play a role in TKI response and is associated with poor outcome6,7,8,9, as EGFR mutations may be suboptimal targets when they co-occur with genetic alternations or are subclonally expressed8,9. Small tissue biopsies may not fully reflect tumor heterogeneity and can often be difficult to obtain10,11, with tissue NGS only able to be completed for as few as 50% of patients12. Thus, developing non-invasive tests to assess the likelihood of response to an EGFR-TKI is critical for therapy selection. Studies have shown that ctDNA analysis represents a non-invasive biomarker that can improve targetable mutation detection, and that ctDNA molecular heterogeneity predicts clinical outcome13,14,15. Although useful clinically, however, ctDNA sensitivity remains less than ideal13.

An emerging non-invasive approach to characterize tumor heterogeneity is to analyze tumor imaging phenotypes16,17. Radiomics analysis enables the detection of tumor imaging features and patterns of intra-tumor heterogeneity not appreciable by the human eye, increasing the wealth of information from radiological imaging. Studies specifically suggest that radiomic analysis may provide novel prognostic markers related to gene-expression patterns and responder signatures for NSCLC patients receiving targeted therapy18,19,20,21,22,23,24,25,26,27,28,29,30,31. Most studies to date have focused on using radiomic analysis on computed tomography (CT) and/or positron emission tomography (PET)/CT data to predict EGFR mutation status, using statistical modeling or machine learning approaches for reducing the high dimensionality of radiomic features19,21,22,23,24,25,26,27,28,29,32. More recently deep learning approaches have also been used to predict outcomes after TKI therapy for NSCLC31,33. While this field is rapidly developing, a question still remains as to which extent radiomic analysis can complement established prognostic markers for TKIs, as most studies have either evaluated radiomic features in the absence of established prognostic biomarkers or have only examined surrogate endpoints, such as EGFR mutation status, rather than actual patient outcomes. In addition, and to the best of our knowledge, no studies have evaluated radiomic analysis in the context of complementing liquid biopsy-based assessment, which is another promising non-invasive tool for characterizing tumor heterogeneity when predicting EGFR-TKIs response.

The purpose of our study was to determine the feasibility of integrating radiomics features with ctDNA next-generation sequencing data to predict TKI outcomes in EGFR mutant NSCLC. Our approach combines unsupervised hierarchical clustering and principal component analysis (PCA) of radiomic features extracted from clinically acquired CT scans, to arrive at two distinct radiomic phenotypes. Our hypothesis is that integrating these radiomic phenotypes with ctDNA and clinical variables can improve assessment of tumor heterogeneity and outcome prediction to EGFR-targeted therapy for metastatic NSCLC.

## Materials and methods

### Study sample and data

This single-center, retrospective, observational study was conducted at the University of Pennsylvania from October 2016 to February 2019 and was approved by the Institutional Review Board with Health Insurance Portability and Accountability Act waiver of informed consent. All methods in this study were in accordance with the Declaration of Helsinki and informed consent was obtained from all the participants. Patients with metastatic NSCLC that had an actionable EGFR mutation detected by ctDNA next-generation sequencing and also had CT imaging data available for radiomic analysis were included. Based on these criteria, a total of 40 EGFR-mutant advanced NSCLC patients were included in the study. All patients were treated with the EGFR-TKI indicated by the clinical ctDNA next-generation sequencing result either at the time of diagnosis (n = 23) or suspected progression on a front-line EGFR-TKI (n = 17). The patients starting an EGFR-TKI at the time of diagnosis received afatinib (n = 8), erlotinib (n = 5), gefitinib (n = 1), or osimertinib (n = 9). All patients who had experienced progression on a front-line EGFR-TKI received osimertinib (n = 17). Baseline demographics, clinical data, including ctDNA targeted next-generation sequencing results (Guardant360 73 gene panel), and baseline CT scans were collected from the electronic medical record. ctDNA features measured included: allele fraction of the therapeutically targetable driver mutation, total number of co-existing mutations detected, and whether the EGFR T790M mutation was detected. Chest CT data included a total of 7 contrast-enhanced and 33 non-contrast enhanced scans, of which 24 were acquired with Siemens and 16 with a General Electric scanner (Supplementary Table S1). A board-certified, fellowship-trained thoracic radiologist (S.I.K.) with 18 years of clinical experience manually segmented the tumor area using the semi-automated ITK-SNAP software (version 3.6.0) (Fig. 1a)34.

A total of 429 radiomic features were extracted from each tumor’s entire volume using the PyRadiomics library35, representing nine type of descriptors: (1) First-order statistics, capturing the voxel grey-level intensities within a neighborhood. (2) Shape-based descriptors of the three-dimensional size and shape of the tumor measured on the whole tumor volume. (3) Gray level co-occurrence matrix features, calculated based on second-order joint probability functions of voxel intensities in a particular spatial relation, for all intensities and many spatial relations. (4) Gray level size zone matrix features, similar to gray level co-occurrence matrix features but rotation-independent. (5) Gray level run length matrix features, based on quantifying gray level runs as the lengths of consecutive pixels. (6) Gray level dependence matrix features, calculated as the number of connected voxels within a specified distance. (7) Neighboring gray tone difference matrix features, rotation-independent features based on gray-level relationships between neighboring voxels (for a certain distance between voxels). (8) Laplacian of Gaussian features, capturing information about edge detection in a smoothed image. (9) Wavelet features, giving information on the location, direction, and frequency of gray-level changes. All features were z-scored prior to further analysis.

We used the extracted features as input to a two-level hierarchical clustering algorithm: first, features were clustered and principal component analysis was used to reduce dimensionality and construct a feature-vector signature reflecting each tumor’s imaging phenotype (i.e., feature-level clustering); then the derived feature vector signatures were clustered (i.e., tumor-level clustering) to identify intrinsic tumor phenotypes (Fig. 1b). Specifically, for Pearson’s correlation $$r$$ between any two features, we defined $$1-{r}^{2}$$ as a metric for the distance between the z-scored radiomic features, with strongly covarying features being closer. Using this metric, we performed unsupervised hierarchical clustering, applying the maximum distance linkage on the extracted features36. To determine the optimal number of feature clusters we used consensus clustering37 with a 10% cutoff for minimum change in the cumulative density function. We then performed PCA on each identified feature cluster and retained the first principal component (PC) from each cluster for all subsequent statistical modeling. As the features in each cluster covary strongly, the first PC should capture the dominant information in each feature cluster. Where $$k$$ is the number of feature clusters, dimensionality is thus reduced from 429 total radiomic features measured to $$k$$, with $$k$$ substantially lower than 429. Using the same unsupervised hierarchical approach as described above36,37 we used these derived PC feature signatures to cluster our sample into distinct radiomic tumor phenotypes, where the optimal number of phenotype clusters was deemed by consensus clustering37.

### Statistical analysis

We used Kaplan–Meier (K–M) curves and log-rank test to assess the univariable association between radiomic phenotype and each of progression-free survival (PFS) and overall survival (OS). We also used K–M curves to assess the association between these outcomes and each of the established prognostic clinical covariates of age, smoking status, and Eastern Cooperative Oncology Group (ECOG) performance score; patient line of therapy (first versus second or third); and the ctDNA-derived number of mutations. Further, Cox proportional-hazards regression models provided hazard ratios (HRs) and p values for the effect of each of these covariates. Retaining number of mutations and all other covariates that gave p ≤ 0.2 for association in a univariable model, we examined multivariable models both with and without radiomic phenotype. We evaluated Cox models using the likelihood-ratio test (LRT) both versus the null model, and, for the multivariable model, versus the nested model without radiomic phenotype. Finally, model discrimination capacity was assessed via the concordance statistic (c-statistic), as modified by Uno et al.38, with a time horizon for each event type of τ = the longest time-to-event for that event type. As a subsidiary analysis, we also examined the K-M curves for PFS and OS versus what line of therapy a patient received—first versus second or third—and for radiomic phenotype within strata of line of therapy.

To evaluate possible confounding, variations in CT acquisition including contrast-enhanced versus non-contrast-enhanced imaging, helical pitch, X-ray voltage, and tube current (Supplementary Table S1) were also tested for association both with radiomic phenotype via Fisher’s exact and Mann–Whitney–Wilcoxon tests and with outcome via K–M curves.

Statistical significance was tested throughout all analyses versus α = 0.05. We performed all data manipulation, statistical analysis, and plotting using Python (Ver. 3.7, Anaconda) and the R programming language (Ver. 3.5.1)39,40,41.

## Results

### Study sample

The median age in our study sample was 62 years, with 29 (72.5%) women, 21 former smokers (52.5%) and 19 never smokers (47.5%). All patients had a therapeutically targetable EGFR mutation detected by clinical ctDNA testing, including: EGFR exon 19 deletion, EGFR L858R, EGFR G719C/S768I, EGFR Exon 20 insertion, and EGFR T790M. Patients were followed for a median time of 328 days, range 29–835. All patients received the EGFR inhibitor indicated by their ctDNA testing, with 23 (57.5%) receiving the drug in the front-line setting and 17 (42.5%) in the later line setting (Table 1). Of the 40 patients, 11 died and 29 were censored (maximum time to death 676 days, median 339); 20 showed disease progression and 20 were censored (maximum time to progression 511 days, median 231). There was no statistically significant difference for any of the clinical covariates between phenotypes except for first versus later (second or third) line of TKI therapy (p = 0.01) (Table 1). The majority (15 of 23) of patients receiving front-line therapy were classified into phenotype 2, and the majority (13 of 17) of patients receiving a later line therapy into phenotype 1.

From the 429 initially extracted radiomic features assessed, feature-level clustering with PCA gave k = 27 derived features (Fig. 2), when the relative change in area under the cumulative distribution function (CDF) fell below 10%. Subsequent tumor-level clustering identified two distinct radiomic phenotypes, with 21 tumors in phenotype 1 and 19 in phenotype 2 (Fig. 3) (p < 0.001 for SigClust test of two clusters versus one). No significant associations were found between CT acquisition parameters (including contrast-enhanced versus non-contrast-enhanced imaging) and phenotype or outcome (Supplementary Tables S2, S3, Supplementary Fig. S1).

### Radiomic phenotype association with outcomes

Median PFS was 17 months for patients with radiomic phenotype 1 versus 10.4 months for those with phenotype 2 (median OS was not reached for either phenotype). The split between K–M curves for PFS resulted into log-rank p = 0.03; in a univariable Cox model, the HR 2.7 (95% confidence interval (CI) 1.1, 6.6) (p = 0.04) for tumors with radiomic phenotype 2 versus 1 (Fig. 4, Table 2). In OS, K–M curves dichotomized by phenotype resulted in a log-rank p = 0.11; in the corresponding univariable Cox model, the HR 2.7 (95% CI 0.8, 9.2) (p = 0.12) for tumors with phenotype 2 versus 1 (Fig. 4, Table 3). When PFS and OS were analyzed by line of therapy, radiomic phenotype showed statistically significant separation of the K–M curves for both outcomes in patients who received second or third line of therapy (p < 0.005), whereas there was no appreciable separation for patients who received front-line EGFR-targeted therapy (p = 0.36 and p = 0.66 for PFS and OS, respectively) (Fig. 5). The ECOG performance score also showed association with PFS and OS (PFS: HR 3.56 [95% CI (1.64, 7.73)], p < 0.005; OS: HR 2.91 [95% CI (1.17, 7.24)], p = 0.02). Smoking status showed p < 0.2 in univariable modeling and so, along with ECOG performance, was retained in the multivariable model (Tables 2, 3).

### Radiomic phenotype association with outcomes when combined with clinical and liquid biopsy data

Age, smoking status, and ECOG performance status are established prognostic factors for metastatic NSCLC42,43 that are considered clinically in selecting a patient’s therapy. While ctDNA NGS is often used to detect therapeutically targetable mutations, the association of other ctDNA measures, such as the number of mutations detected which may be a surrogate of tumor heterogeneity, have not been previously assessed. To determine the added value of radiomic phenotypes, to ctDNA data and established clinical prognostic covariates retained from univariable modeling, we next calculated multivariable Cox regression models that incorporated number of ctDNA-detected mutations, smoking status, and ECOG performance score, both with and without radiomic phenotype.

The PFS model without phenotype yielded a c-statistic of 0.73 (95% CI 0.59–0.86); a model using radiomic phenotype alone gave a c-statistic of 0.63 (95% CI 0.49–0.77); and including radiomic phenotype in the multivariable model increased the c-statistic to 0.77 (95% CI 0.64–0.89) with an LRT p < 0.005, suggesting that this model had a better fit than the model without phenotype (Table 4). The pattern was similar for OS. The OS multivariable model without radiomic phenotype yielded a c-statistic of 0.8 (95% CI 0.61–0.98); the model using phenotype alone had a c-statistic of 0.62 (95% CI 0.39–0.85); and adding radiomic phenotype to the multivariable model increased the c-statistic to 0.83 (95% CI 0.67–1) with an LRT p = 0.08 (Table 4).

The full multivariable model of PFS, incorporating number of mutations, smoking status, ECOG performance score, and radiomic phenotype, yielded p < 0.005 for separation of K–M curves for patients above versus below the median prognostic score (Fig. 6). Of the covariates in this model, only ECOG performance status (HR 5.1 (95% CI 2.0–13.3) for each increment in grade, p < 0.005) and phenotype (HR 3.8 (95% CI 1.3–10.7) for tumors in radiomic phenotype 2 versus 1, p = 0.01) had statistically significant association for HR ≠ 1 (Table 2). The full multivariable model of OS also had p < 0.005 for separation of the K–M curves for patients above versus below the median prognostic score (Fig. 6). Of the covariates included, only ECOG performance status (HR 4.4 (95% CI 1.2, 16.6) for each increment in grade, p = 0.03), had statistically significant association for HR ≠ 1. (Table 3).

## Discussion

We have used computerized tomography (CT) images to identify patient subpopulations with radiomic phenotypes that show differing responses to epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs). In particular, we combined several non-invasively gathered prognostic factors: clinical data from electronic medical records, circulating-tumor DNA (ctDNA) next-generation sequencing (NGS) ordered as standard of care, and radiomic features extracted from clinically acquired chest CT scans. A model including radiomic phenotype, number of mutations, smoking status, and ECOG performance score had better performance in predicting PFS than a model without radiomic phenotype, increasing the c-statistic from 0.73 to 0.77 (LRT p = 0.01). Similarly, for predicting OS, adding radiomic phenotype raised the c-statistic from 0.80 to 0.83 (LRT p = 0.08). Both augmented models showed statistically significant separation of K-M curves when split at their median prognostic score (p < 0.005 for both).

Although TKIs have dramatically changed the management of metastatic non-small cell lung cancer (NSCLC)5,8,44, the detection of a driver EGFR mutation in tumor tissue or ctDNA is necessary but insufficient for predicting response6,12. More than half of patients will experience initial response, but a substantial proportion will exhibit de novo or acquired resistance4. In addition, tumor tissue sampling can be difficult or impossible to access, especially for metastatic disease12. Therefore, there is an urgent need for non-invasive measures to more effectively stratify patients on to targeted therapy. Although studies suggest promising roles for both ctDNA and radiomics in complementing tissue biopsy, both have limitations when used in isolation: ctDNA sensitivity is less than ideal13, and radiomics are difficult to interpret in the absence of biologic correlates20. Finding useful radiomic signatures is also a substantial challenge, as the number of radiomic features continues to grow. In this study, we used correlation-based hierarchical clustering and principal component analysis to first mitigate feature dimensionality and then define distinct radiomic phenotypes of tumors based on the derived feature signatures.

While most previously published studies have focused on determining associations between radiomic features and EGFR mutation status19,21,22,23,24,25,26,27,28,29,32, which is a surrogate marker of TKI response, to the best of our knowledge our study is one of the first to evaluate the feasibility of combining radiomic features and mutation status data acquired from liquid biopsy to directly predict patient outcomes after EGFR-TKI therapy. In addition, while most prior studies have examined associations between individual radiomic features and EGFR mutations, our study sought to identify phenotypic signatures that represent intrinsic patterns in radiomic data. Our analysis showed a trend for association for radiomic phenotype with EGFR T790M mutation (p = 0.07), which is in line with prior studies20,32, although not specific to EGFR T790M. If further validated, radiomic analysis could provide an inexpensive, fast, and clinically feasible tool to identify patients at high risk of developing resistance mutations.

Our study also found a statistically significant association between phenotypes and first versus later lines of TKI therapy. Interestingly, phenotype 1 which had better PFS and OS outcomes had a higher number of second and third line therapy patients (62%), whereas phenotype 2 which had worse outcomes had a higher number of first-line patients (79%). One explanation may be that radiomic phenotypes may be a surrogate of tumor heterogeneity. Such heterogeneity has been associated with inferior response and outcomes in patients receiving EGFR TKIs3. When visually examining the detected phenotypes, we observed that most cancers in phenotype 1 appear to be relatively smaller, with elongated shape, convex borders and adjacent linear opacities, while cancers in phenotype 2 appear to be generally larger, and have more ground-glass, irregular, and indistinct border characteristics (Fig. 7, Supplementary Fig. S2), suggestive of potential inflammatory changes that may be related to their worse outcomes. At the same time, the characteristics of the cancers clustered in phenotype 1 may potentially also reflect the effect of prior therapy for the 13 of 17 patients receiving later line therapy.

Our study suggests that radiomic features may augment liquid biopsy and clinical prognostic factors to enhance precision oncology approaches for the management of advanced non-small cell lung cancer (NSCLC) patients. If validated, these radiomic phenotypes could be used to identify the subgroup of patients with less favorable outcomes to tyrosine kinase inhibitor (TKI) therapy who might benefit from combination therapy. Recently, for epidermal growth factor receptor (EGFR)-mutated NSCLC, the EGFR-TKI, osimertinib, has transitioned to the front-line treatment of choice based on the FLAURA trial48,49 and studies evaluating our radiomic phenotypes in this setting are ongoing. Future work, will include an extension of this approach to other recently approved targeted therapies, such as the use of osimertinib as a front-line EGFR inhibitor, and TKIs targeting other mutations such as EML4-ALK and ROS1 translocations. Ultimately, our work could pave the way for application in broader settings for patients suffering from advanced NSCLC as well as other solid tumors for which targeted therapies are approved.

## Abbreviations

NSCLC:

Non-small lung cancer

EGFR:

Epidermal growth factor receptor

ctDNA:

Circulating-tumor DNA

K-M:

Kaplan–Meier (K–M)

PFS:

Progression-free survival

OS:

Overall survival

LRT:

Likelihood ratio test

TKI:

Tyrosine kinase inhibitor

HRs:

Hazard ratios

ECOG:

Eastern Cooperative Oncology Group

## References

1. Yang, J. C. et al. Afatinib versus cisplatin-based chemotherapy for EGFR mutation-positive lung adenocarcinoma (LUX-Lung 3 and LUX-Lung 6): Analysis of overall survival data from two randomised, phase 3 trials. Lancet Oncol. 16(2), 141–151. https://doi.org/10.1016/S1470-2045(14)71173-8 (2015).

2. Ettinger, D. S. et al. Non-small cell lung cancer, Version 5.2017, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Cancer Netw. JNCCN 15(4), 504–535. https://doi.org/10.6004/jnccn.2017.0050 (2017).

3. Hong, S. et al. Concomitant genetic alterations with response to treatment and epidermal growth factor receptor tyrosine kinase inhibitors in patients with EGFR-mutant advanced non-small cell lung cancer. JAMA Oncol. 4(5), 739–742. https://doi.org/10.1001/jamaoncol.2018.0049 (2018).

4. Marcar, L. et al. Acquired resistance of EGFR-mutated lung cancer to tyrosine kinase inhibitor treatment promotes PARP inhibitor sensitivity. Cell Rep. 27(12), 3422–3432. https://doi.org/10.1016/j.celrep.2019.05.058 (2019).

5. Saito, H. et al. Erlotinib plus bevacizumab versus erlotinib alone in patients with EGFR-positive advanced non-squamous non-small-cell lung cancer (NEJ026): Interim analysis of an open-label, randomised, multicentre, phase 3 trial. Lancet Oncol. 20(5), 625–635. https://doi.org/10.1016/S1470-2045(19)30035-X (2019).

6. Aggarwal, C. et al. Influence of TP53 mutation on survival in patients with advanced EGFR-Mutant non-small-cell lung cancer. JCO Precis. Oncol. https://doi.org/10.1200/PO.18.00107 (2018).

7. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376(22), 2109–2121. https://doi.org/10.1056/NEJMoa1616288 (2017).

8. Blakely, C. M. et al. Evolution and clinical impact of co-occurring genetic alterations in advanced-stage EGFR-mutant lung cancers. Nat. Genet. 49(12), 1693–1704. https://doi.org/10.1038/ng.3990 (2017).

9. McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7(283), 283–254. https://doi.org/10.1126/scitranslmed.aaa1408 (2015).

10. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366(10), 883–892. https://doi.org/10.1056/NEJMoa1113205 (2012).

11. Sholl, L. M. et al. Multi-institutional oncogenic driver mutation analysis in lung adenocarcinoma: The lung cancer mutation consortium experience. J. Thorac. Oncol. 10(5), 768–777. https://doi.org/10.1097/JTO.0000000000000516 (2015).

12. Thompson, J. C. et al. Detection of therapeutically targetable driver and resistance mutations in lung cancer patients by next-generation sequencing of cell-free circulating tumor DNA. Clin. Cancer Res. 22(23), 5772–5782. https://doi.org/10.1158/1078-0432.CCR-16-1231 (2016).

13. Aggarwal, C. et al. Clinical implications of plasma-based genotyping with the delivery of personalized therapy in metastatic non-small cell lung cancer. JAMA Oncol. 5(2), 173–180. https://doi.org/10.1001/jamaoncol.2018.4305 (2019).

14. Khagi, Y. et al. Hypermutated circulating tumor DNA: Correlation with response to checkpoint inhibitor-based immunotherapy. Clin. Cancer Res. 23(19), 5729–5736. https://doi.org/10.1158/1078-0432.CCR-17-1439 (2017).

15. Oxnard, G. R. et al. Noninvasive detection of response and resistance in EGFR-mutant lung cancer using quantitative next-generation genotyping of cell-free plasma DNA. Clin. Cancer Res. 20(6), 1698–1705. https://doi.org/10.1158/1078-0432.CCR-13-2482 (2014).

16. Aerts, H. J. The potential of radiomic-based phenotyping in precision medicine: A review. JAMA Oncol. 2(12), 1636–1642. https://doi.org/10.1001/jamaoncol.2016.2631 (2016).

17. Napel, S., Mu, W., Jardim-Perassi, B. V., Aerts, H. & Gillies, R. J. Quantitative imaging of cancer in the postgenomic era: Radio(geno)mics, deep learning, and habitats. Cancer 124(24), 4633–4649. https://doi.org/10.1002/cncr.31630 (2018).

18. Aerts, H. J. et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5, 4006. https://doi.org/10.1038/ncomms5006 (2014).

19. Li, X. et al. Predictive power of a radiomic signature based on (18)F-FDG PET/CT images for EGFR mutational status in NSCLC. Front. Oncol. 9, 1062. https://doi.org/10.3389/fonc.2019.01062 (2019).

20. Rios Velazquez, E. et al. Somatic mutations drive distinct imaging phenotypes in lung cancer. Can. Res. 77(14), 3922–3930. https://doi.org/10.1158/0008-5472.CAN-17-0122 (2017).

21. Zhang, M. et al. Performance of (18)F-FDG PET/CT radiomics for predicting EGFR mutation status in patients with non-small cell lung cancer. Front. Oncol. 10, 568857. https://doi.org/10.3389/fonc.2020.568857 (2020).

22. Zhang, J. et al. Value of pre-therapy (18)F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur. J. Nucl. Med. Mol. Imaging 47(5), 1137–1146. https://doi.org/10.1007/s00259-019-04592-1 (2020).

23. Wu, S., Shen, G., Mao, J. & Gao, B. CT radiomics in predicting EGFR mutation in non-small cell lung cancer: A single institutional study. Front. Oncol. 10, 542957. https://doi.org/10.3389/fonc.2020.542957 (2020).

24. Lu, L. et al. Radiomics prediction of EGFR status in lung cancer-our experience in using multiple feature extractors and the cancer imaging archive data. Tomography 6(2), 223–230. https://doi.org/10.18383/j.tom.2020.00017 (2020).

25. Liu, G. et al. 3D radiomics predicts EGFR mutation, exon-19 deletion and exon-21 L858R mutation in lung adenocarcinoma. Transl. Lung Cancer Res. 9(4), 1212–1224. https://doi.org/10.21037/tlcr-20-122 (2020).

26. Hong, D., Xu, K., Zhang, L., Wan, X. & Guo, Y. Radiomics Signature as a predictive factor for EGFR mutations in advanced lung adenocarcinoma. Front. Oncol. 10, 28. https://doi.org/10.3389/fonc.2020.00028 (2020).

27. Tu, W. et al. Radiomics signature: A potential and incremental predictor for EGFR mutation status in NSCLC patients, comparison with CT morphology. Lung Cancer 132, 28–35. https://doi.org/10.1016/j.lungcan.2019.03.025 (2019).

28. Jia, T. Y. et al. Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling. Eur. Radiol. 29(9), 4742–4750. https://doi.org/10.1007/s00330-019-06024-y (2019).

29. Li, X. Y. et al. Detection of epithelial growth factor receptor (EGFR) mutations on CT images of patients with lung adenocarcinoma using radiomics and/or multi-level residual convolutionary neural networks. J. Thorac. Dis. 10(12), 6624–6635. https://doi.org/10.21037/jtd.2018.11.03 (2018).

30. Li, H. et al. CT-based radiomic signature as a prognostic factor in stage IV ALK-positive non-small-cell lung cancer treated with TKI crizotinib: A proof-of-concept study. Front. Oncol. 10, 57. https://doi.org/10.3389/fonc.2020.00057 (2020).

31. Mu, W. et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat. Commun. 11(1), 5228. https://doi.org/10.1038/s41467-020-19116-x (2020).

32. Li, S., Ding, C., Zhang, H., Song, J. & Wu, L. Radiomics for the prediction of EGFR mutation subtypes in non-small cell lung cancer. Med. Phys. 46(10), 4545–4552. https://doi.org/10.1002/mp.13747 (2019).

33. Song, Z. et al. The deep learning model combining CT image and clinicopathological information for predicting ALK fusion status and response to ALK-TKI therapy in non-small cell lung cancer patients. Eur. J. Nucl. Med. Mol. Imaging https://doi.org/10.1007/s00259-020-04986-6 (2020).

34. Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage 31(3), 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015 (2006).

35. Griethuysen, J. J. M. et al. Computational radiomics system to decode the radiographic phenotype. Can. Res. 77(21), e104–e107. https://doi.org/10.1158/0008-5472.CAN-17-0339 (2017).

36. Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–240. https://doi.org/10.2307/2282967 (1963).

37. Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118. https://doi.org/10.1023/A:1023949509487 (2003).

38. Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. & Wei, L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30(10), 1105–1117. https://doi.org/10.1002/sim.4154 (2011).

39. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Published July 2018. Updated August 29, 2018. Accessed 12 Nov 2020.

40. Wickham H. Tidyverse: Easily install and load the ‘tidyverse’. R package version. https://CRAN.R-project.org/package=tidyverse. Published December 21, 2018. Accessed 12 Nov 2020.

41. Therneau TM. A Package for Survival Analysis in R. https://CRAN.Rproject.org/package=survival. Published June 13, 2020. Accessed 12 Nov 2020.

42. Ferketich, A. K. et al. Smoking status and survival in the national comprehensive cancer network non-small cell lung cancer cohort. Cancer 119(4), 847–853. https://doi.org/10.1002/cncr.27824 (2013).

43. West, H. J. & Jin, J. O. JAMA Oncology Patient Page. Performance status in patients with cancer. JAMA Oncol. 1(7), 998. https://doi.org/10.1001/jamaoncol.2015.3113 (2015).

44. Maemondo, M. et al. Gefitinib or chemotherapy for non-small-cell lung cancer with mutated EGFR. N. Engl. J. Med. 362(25), 2380–2388. https://doi.org/10.1056/NEJMoa0909530 (2010).

45. Pavic, M. et al. Influence of inter-observer delineation variability on radiomics stability in different tumor sites. Acta Oncol. 57(8), 1070–1074. https://doi.org/10.1080/0284186X.2018.1445283 (2018).

46. Haarburger, C. et al. Radiomics feature reproducibility under inter-rater variability in segmentations of CT images. Sci. Rep. 10(1), 12688. https://doi.org/10.1038/s41598-020-69534-6 (2020).

47. Hershman, M. L., et al. Impact of Interobserver Variabilityin Manual Segmentation of Non-Small Cell Lung Cancer (NSCLC) on Computed Tomography, Radiology Society of North America (RSNA, 2019)

48. Ramalingam, S. S. et al. Overall survival with osimertinib in untreated, EGFR-mutated advanced NSCLC. N. Engl. J. Med. 382(1), 41–50. https://doi.org/10.1056/NEJMoa1913662 (2020).

49. Soria, J. C. et al. Osimertinib in untreated EGFR-mutated advanced non-small-cell lung cancer. N. Engl. J. Med. 378(2), 113–125. https://doi.org/10.1056/NEJMoa1713137 (2018).

## Funding

Funding was provided by the National Cancer Institute at the National Institutes of Health (CA234225-02), the LUNGevity Foundation, the University of Pennsylvania Center of Precision Medicine (PCPM) and Emerging Cancer Informatics Center of Excellence (eCICE).

## Author information

Authors

### Contributions

Guarantors of integrity of entire study, J.C.T., E.L.C., D.K.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, M.L., B.Y.; clinical studies, M.L., S.I.K., J.C.T., E.L.C.; statistical analysis, E.A.C., W.T.H.; and manuscript editing, all authors.

### Corresponding author

Correspondence to Despina Kontos.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

### Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Yousefi, B., LaRiviere, M.J., Cohen, E.A. et al. Combining radiomic phenotypes of non-small cell lung cancer with liquid biopsy data may improve prediction of response to EGFR inhibitors. Sci Rep 11, 9984 (2021). https://doi.org/10.1038/s41598-021-88239-y

• Accepted:

• Published:

• DOI: https://doi.org/10.1038/s41598-021-88239-y