Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Combining radiomic phenotypes of non-small cell lung cancer with liquid biopsy data may improve prediction of response to EGFR inhibitors


Among non-small cell lung cancer (NSCLC) patients with therapeutically targetable tumor mutations in epidermal growth factor receptor (EGFR), not all patients respond to targeted therapy. Combining circulating-tumor DNA (ctDNA), clinical variables, and radiomic phenotypes may improve prediction of EGFR-targeted therapy outcomes for NSCLC. This single-center retrospective study included 40 EGFR-mutant advanced NSCLC patients treated with EGFR-targeted therapy. ctDNA data included number of mutations and detection of EGFR T790M. Clinical data included age, smoking status, and ECOG performance status. Baseline chest CT scans were analyzed to extract 429 radiomic features from each primary tumor. Unsupervised hierarchical clustering was used to group tumors into phenotypes. Kaplan–Meier (K–M) curves and Cox proportional hazards regression were modeled for progression-free survival (PFS) and overall survival (OS). Likelihood ratio test (LRT) was used to compare fit between models. Among 40 patients (73% women, median age 62 years), consensus clustering identified two radiomic phenotypes. For PFS, the model combining radiomic phenotypes with ctDNA and clinical variables had c-statistic of 0.77 and a better fit (LRT p = 0.01) than the model with clinical and ctDNA variables alone with a c-statistic of 0.73. For OS, adding radiomic phenotypes resulted in c-statistic of 0.83 versus 0.80 when using clinical and ctDNA variables (LRT p = 0.08). Both models showed separation of K–M curves dichotomized by median prognostic score (p < 0.005). Combining radiomic phenotypes, ctDNA, and clinical variables may enhance precision oncology approaches to managing advanced non-small cell lung cancer with EGFR mutations.


The discovery of activating mutations and the development of targeted therapies has improved survival in patients with non-small cell lung cancer (NSCLC)1. Mutation detection by tissue and circulating tumor DNA (ctDNA) next-generation sequencing (NGS) guides therapy selection both at initial diagnosis and disease progression. Epidermal growth factor receptor (EGFR) mutations are the most common therapeutically targetable variants in NSCLC, and treatment with an EGFR tyrosine kinase inhibitor (TKI) has shown superior efficacy compared to standard chemotherapy in mutation-positive patients2. However, primary resistance occurs in 20–30% of patients3. Ultimately, all patients develop acquired resistance to EGFR-directed therapies and an active area of research is the use of novel combination therapies, including antibodies against c-met, poly-adenosine diphosphate ribose polymerase inhibitors and antiangiogenic therapies along with EGFR-TKIs to improve long-term efficacy4,5.

Tumor heterogeneity is thought to play a role in TKI response and is associated with poor outcome6,7,8,9, as EGFR mutations may be suboptimal targets when they co-occur with genetic alternations or are subclonally expressed8,9. Small tissue biopsies may not fully reflect tumor heterogeneity and can often be difficult to obtain10,11, with tissue NGS only able to be completed for as few as 50% of patients12. Thus, developing non-invasive tests to assess the likelihood of response to an EGFR-TKI is critical for therapy selection. Studies have shown that ctDNA analysis represents a non-invasive biomarker that can improve targetable mutation detection, and that ctDNA molecular heterogeneity predicts clinical outcome13,14,15. Although useful clinically, however, ctDNA sensitivity remains less than ideal13.

An emerging non-invasive approach to characterize tumor heterogeneity is to analyze tumor imaging phenotypes16,17. Radiomics analysis enables the detection of tumor imaging features and patterns of intra-tumor heterogeneity not appreciable by the human eye, increasing the wealth of information from radiological imaging. Studies specifically suggest that radiomic analysis may provide novel prognostic markers related to gene-expression patterns and responder signatures for NSCLC patients receiving targeted therapy18,19,20,21,22,23,24,25,26,27,28,29,30,31. Most studies to date have focused on using radiomic analysis on computed tomography (CT) and/or positron emission tomography (PET)/CT data to predict EGFR mutation status, using statistical modeling or machine learning approaches for reducing the high dimensionality of radiomic features19,21,22,23,24,25,26,27,28,29,32. More recently deep learning approaches have also been used to predict outcomes after TKI therapy for NSCLC31,33. While this field is rapidly developing, a question still remains as to which extent radiomic analysis can complement established prognostic markers for TKIs, as most studies have either evaluated radiomic features in the absence of established prognostic biomarkers or have only examined surrogate endpoints, such as EGFR mutation status, rather than actual patient outcomes. In addition, and to the best of our knowledge, no studies have evaluated radiomic analysis in the context of complementing liquid biopsy-based assessment, which is another promising non-invasive tool for characterizing tumor heterogeneity when predicting EGFR-TKIs response.

The purpose of our study was to determine the feasibility of integrating radiomics features with ctDNA next-generation sequencing data to predict TKI outcomes in EGFR mutant NSCLC. Our approach combines unsupervised hierarchical clustering and principal component analysis (PCA) of radiomic features extracted from clinically acquired CT scans, to arrive at two distinct radiomic phenotypes. Our hypothesis is that integrating these radiomic phenotypes with ctDNA and clinical variables can improve assessment of tumor heterogeneity and outcome prediction to EGFR-targeted therapy for metastatic NSCLC.

Materials and methods

Study sample and data

This single-center, retrospective, observational study was conducted at the University of Pennsylvania from October 2016 to February 2019 and was approved by the Institutional Review Board with Health Insurance Portability and Accountability Act waiver of informed consent. All methods in this study were in accordance with the Declaration of Helsinki and informed consent was obtained from all the participants. Patients with metastatic NSCLC that had an actionable EGFR mutation detected by ctDNA next-generation sequencing and also had CT imaging data available for radiomic analysis were included. Based on these criteria, a total of 40 EGFR-mutant advanced NSCLC patients were included in the study. All patients were treated with the EGFR-TKI indicated by the clinical ctDNA next-generation sequencing result either at the time of diagnosis (n = 23) or suspected progression on a front-line EGFR-TKI (n = 17). The patients starting an EGFR-TKI at the time of diagnosis received afatinib (n = 8), erlotinib (n = 5), gefitinib (n = 1), or osimertinib (n = 9). All patients who had experienced progression on a front-line EGFR-TKI received osimertinib (n = 17). Baseline demographics, clinical data, including ctDNA targeted next-generation sequencing results (Guardant360 73 gene panel), and baseline CT scans were collected from the electronic medical record. ctDNA features measured included: allele fraction of the therapeutically targetable driver mutation, total number of co-existing mutations detected, and whether the EGFR T790M mutation was detected. Chest CT data included a total of 7 contrast-enhanced and 33 non-contrast enhanced scans, of which 24 were acquired with Siemens and 16 with a General Electric scanner (Supplementary Table S1). A board-certified, fellowship-trained thoracic radiologist (S.I.K.) with 18 years of clinical experience manually segmented the tumor area using the semi-automated ITK-SNAP software (version 3.6.0) (Fig. 1a)34.

Figure 1
figure 1

Tumor segmentation and radiomic analysis. (a) Example of segmentation of a tumor expressing the epidermal growth factor receptor (EGFR) T790M mutation. (b) Workflow of radiomics analysis where the tumor is segmented in 3D, followed by radiomic feature extraction, and two-level hierarchical clustering to first reduce feature dimensionality and then cluster the derived radiomic signatures into distinct tumor phenotypes.

Radiomic feature extraction

A total of 429 radiomic features were extracted from each tumor’s entire volume using the PyRadiomics library35, representing nine type of descriptors: (1) First-order statistics, capturing the voxel grey-level intensities within a neighborhood. (2) Shape-based descriptors of the three-dimensional size and shape of the tumor measured on the whole tumor volume. (3) Gray level co-occurrence matrix features, calculated based on second-order joint probability functions of voxel intensities in a particular spatial relation, for all intensities and many spatial relations. (4) Gray level size zone matrix features, similar to gray level co-occurrence matrix features but rotation-independent. (5) Gray level run length matrix features, based on quantifying gray level runs as the lengths of consecutive pixels. (6) Gray level dependence matrix features, calculated as the number of connected voxels within a specified distance. (7) Neighboring gray tone difference matrix features, rotation-independent features based on gray-level relationships between neighboring voxels (for a certain distance between voxels). (8) Laplacian of Gaussian features, capturing information about edge detection in a smoothed image. (9) Wavelet features, giving information on the location, direction, and frequency of gray-level changes. All features were z-scored prior to further analysis.

Radiomic phenotype identification

We used the extracted features as input to a two-level hierarchical clustering algorithm: first, features were clustered and principal component analysis was used to reduce dimensionality and construct a feature-vector signature reflecting each tumor’s imaging phenotype (i.e., feature-level clustering); then the derived feature vector signatures were clustered (i.e., tumor-level clustering) to identify intrinsic tumor phenotypes (Fig. 1b). Specifically, for Pearson’s correlation \(r\) between any two features, we defined \(1-{r}^{2}\) as a metric for the distance between the z-scored radiomic features, with strongly covarying features being closer. Using this metric, we performed unsupervised hierarchical clustering, applying the maximum distance linkage on the extracted features36. To determine the optimal number of feature clusters we used consensus clustering37 with a 10% cutoff for minimum change in the cumulative density function. We then performed PCA on each identified feature cluster and retained the first principal component (PC) from each cluster for all subsequent statistical modeling. As the features in each cluster covary strongly, the first PC should capture the dominant information in each feature cluster. Where \(k\) is the number of feature clusters, dimensionality is thus reduced from 429 total radiomic features measured to \(k\), with \(k\) substantially lower than 429. Using the same unsupervised hierarchical approach as described above36,37 we used these derived PC feature signatures to cluster our sample into distinct radiomic tumor phenotypes, where the optimal number of phenotype clusters was deemed by consensus clustering37.

Statistical analysis

We used Kaplan–Meier (K–M) curves and log-rank test to assess the univariable association between radiomic phenotype and each of progression-free survival (PFS) and overall survival (OS). We also used K–M curves to assess the association between these outcomes and each of the established prognostic clinical covariates of age, smoking status, and Eastern Cooperative Oncology Group (ECOG) performance score; patient line of therapy (first versus second or third); and the ctDNA-derived number of mutations. Further, Cox proportional-hazards regression models provided hazard ratios (HRs) and p values for the effect of each of these covariates. Retaining number of mutations and all other covariates that gave p ≤ 0.2 for association in a univariable model, we examined multivariable models both with and without radiomic phenotype. We evaluated Cox models using the likelihood-ratio test (LRT) both versus the null model, and, for the multivariable model, versus the nested model without radiomic phenotype. Finally, model discrimination capacity was assessed via the concordance statistic (c-statistic), as modified by Uno et al.38, with a time horizon for each event type of τ = the longest time-to-event for that event type. As a subsidiary analysis, we also examined the K-M curves for PFS and OS versus what line of therapy a patient received—first versus second or third—and for radiomic phenotype within strata of line of therapy.

To evaluate possible confounding, variations in CT acquisition including contrast-enhanced versus non-contrast-enhanced imaging, helical pitch, X-ray voltage, and tube current (Supplementary Table S1) were also tested for association both with radiomic phenotype via Fisher’s exact and Mann–Whitney–Wilcoxon tests and with outcome via K–M curves.

Statistical significance was tested throughout all analyses versus α = 0.05. We performed all data manipulation, statistical analysis, and plotting using Python (Ver. 3.7, Anaconda) and the R programming language (Ver. 3.5.1)39,40,41.


Study sample

The median age in our study sample was 62 years, with 29 (72.5%) women, 21 former smokers (52.5%) and 19 never smokers (47.5%). All patients had a therapeutically targetable EGFR mutation detected by clinical ctDNA testing, including: EGFR exon 19 deletion, EGFR L858R, EGFR G719C/S768I, EGFR Exon 20 insertion, and EGFR T790M. Patients were followed for a median time of 328 days, range 29–835. All patients received the EGFR inhibitor indicated by their ctDNA testing, with 23 (57.5%) receiving the drug in the front-line setting and 17 (42.5%) in the later line setting (Table 1). Of the 40 patients, 11 died and 29 were censored (maximum time to death 676 days, median 339); 20 showed disease progression and 20 were censored (maximum time to progression 511 days, median 231). There was no statistically significant difference for any of the clinical covariates between phenotypes except for first versus later (second or third) line of TKI therapy (p = 0.01) (Table 1). The majority (15 of 23) of patients receiving front-line therapy were classified into phenotype 2, and the majority (13 of 17) of patients receiving a later line therapy into phenotype 1.

Table 1 Patient characteristics.

Radiomic phenotype identification

From the 429 initially extracted radiomic features assessed, feature-level clustering with PCA gave k = 27 derived features (Fig. 2), when the relative change in area under the cumulative distribution function (CDF) fell below 10%. Subsequent tumor-level clustering identified two distinct radiomic phenotypes, with 21 tumors in phenotype 1 and 19 in phenotype 2 (Fig. 3) (p < 0.001 for SigClust test of two clusters versus one). No significant associations were found between CT acquisition parameters (including contrast-enhanced versus non-contrast-enhanced imaging) and phenotype or outcome (Supplementary Tables S2, S3, Supplementary Fig. S1).

Figure 2
figure 2

Selection of derived radiomic features. Cumulative distribution function (CDF) and consensus clustering are used to determine the optimum number of clusters of radiomic features. The red arrow in (a) represents the point (k = 27) where the relative change in CDF drops below 10%; (b) shows the clustered dendrogram corresponding to the 27 derived features.

Figure 3
figure 3

Heatmap of radiomic derived features. Unsupervised hierarchical clustering identifies two distinct, and statistically significant (p < 0.05), tumor radiomic phenotypes. Association of these phenotypes with study covariates is shown by the top colorbars. Driver AF is the percent allele fraction for the detected epidermal growth factor receptor (EGFR) driver mutation. EGFR T790M refers to those patients for whom the mutation was detected in circulating-tumor DNA.

Radiomic phenotype association with outcomes

Median PFS was 17 months for patients with radiomic phenotype 1 versus 10.4 months for those with phenotype 2 (median OS was not reached for either phenotype). The split between K–M curves for PFS resulted into log-rank p = 0.03; in a univariable Cox model, the HR 2.7 (95% confidence interval (CI) 1.1, 6.6) (p = 0.04) for tumors with radiomic phenotype 2 versus 1 (Fig. 4, Table 2). In OS, K–M curves dichotomized by phenotype resulted in a log-rank p = 0.11; in the corresponding univariable Cox model, the HR 2.7 (95% CI 0.8, 9.2) (p = 0.12) for tumors with phenotype 2 versus 1 (Fig. 4, Table 3). When PFS and OS were analyzed by line of therapy, radiomic phenotype showed statistically significant separation of the K–M curves for both outcomes in patients who received second or third line of therapy (p < 0.005), whereas there was no appreciable separation for patients who received front-line EGFR-targeted therapy (p = 0.36 and p = 0.66 for PFS and OS, respectively) (Fig. 5). The ECOG performance score also showed association with PFS and OS (PFS: HR 3.56 [95% CI (1.64, 7.73)], p < 0.005; OS: HR 2.91 [95% CI (1.17, 7.24)], p = 0.02). Smoking status showed p < 0.2 in univariable modeling and so, along with ECOG performance, was retained in the multivariable model (Tables 2, 3).

Figure 4
figure 4

Survival analysis by radiomic phenotype. Progression-free survival (top) and overall survival (bottom) analysis for radiomic phenotypes.

Table 2 Progression-free survival Cox regression hazard ratios.
Table 3 Overall survival Cox regression hazard ratios.
Figure 5
figure 5

Survival analysis by line of therapy. Kaplan–Meier curves for (top row) progression-free survival and (bottom row) overall survival in first-line patients (left) and second- and third-line patients (right), showing that the radiomic tumor phenotypes can further sub-stratify patients in the second or third line of treatment.

Radiomic phenotype association with outcomes when combined with clinical and liquid biopsy data

Age, smoking status, and ECOG performance status are established prognostic factors for metastatic NSCLC42,43 that are considered clinically in selecting a patient’s therapy. While ctDNA NGS is often used to detect therapeutically targetable mutations, the association of other ctDNA measures, such as the number of mutations detected which may be a surrogate of tumor heterogeneity, have not been previously assessed. To determine the added value of radiomic phenotypes, to ctDNA data and established clinical prognostic covariates retained from univariable modeling, we next calculated multivariable Cox regression models that incorporated number of ctDNA-detected mutations, smoking status, and ECOG performance score, both with and without radiomic phenotype.

The PFS model without phenotype yielded a c-statistic of 0.73 (95% CI 0.59–0.86); a model using radiomic phenotype alone gave a c-statistic of 0.63 (95% CI 0.49–0.77); and including radiomic phenotype in the multivariable model increased the c-statistic to 0.77 (95% CI 0.64–0.89) with an LRT p < 0.005, suggesting that this model had a better fit than the model without phenotype (Table 4). The pattern was similar for OS. The OS multivariable model without radiomic phenotype yielded a c-statistic of 0.8 (95% CI 0.61–0.98); the model using phenotype alone had a c-statistic of 0.62 (95% CI 0.39–0.85); and adding radiomic phenotype to the multivariable model increased the c-statistic to 0.83 (95% CI 0.67–1) with an LRT p = 0.08 (Table 4).

Table 4 Predictive ability of Cox regression models for progression-free and overall survival.

The full multivariable model of PFS, incorporating number of mutations, smoking status, ECOG performance score, and radiomic phenotype, yielded p < 0.005 for separation of K–M curves for patients above versus below the median prognostic score (Fig. 6). Of the covariates in this model, only ECOG performance status (HR 5.1 (95% CI 2.0–13.3) for each increment in grade, p < 0.005) and phenotype (HR 3.8 (95% CI 1.3–10.7) for tumors in radiomic phenotype 2 versus 1, p = 0.01) had statistically significant association for HR ≠ 1 (Table 2). The full multivariable model of OS also had p < 0.005 for separation of the K–M curves for patients above versus below the median prognostic score (Fig. 6). Of the covariates included, only ECOG performance status (HR 4.4 (95% CI 1.2, 16.6) for each increment in grade, p = 0.03), had statistically significant association for HR ≠ 1. (Table 3).

Figure 6
figure 6

Survival analysis using multivariable model. Progression-free survival (top) and overall survival (bottom) analysis for the full multivariable model, including number of mutations, smoking status, Eastern Cooperative Oncology Group (ECOG) performance score, and radiomic phenotype.


We have used computerized tomography (CT) images to identify patient subpopulations with radiomic phenotypes that show differing responses to epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs). In particular, we combined several non-invasively gathered prognostic factors: clinical data from electronic medical records, circulating-tumor DNA (ctDNA) next-generation sequencing (NGS) ordered as standard of care, and radiomic features extracted from clinically acquired chest CT scans. A model including radiomic phenotype, number of mutations, smoking status, and ECOG performance score had better performance in predicting PFS than a model without radiomic phenotype, increasing the c-statistic from 0.73 to 0.77 (LRT p = 0.01). Similarly, for predicting OS, adding radiomic phenotype raised the c-statistic from 0.80 to 0.83 (LRT p = 0.08). Both augmented models showed statistically significant separation of K-M curves when split at their median prognostic score (p < 0.005 for both).

Although TKIs have dramatically changed the management of metastatic non-small cell lung cancer (NSCLC)5,8,44, the detection of a driver EGFR mutation in tumor tissue or ctDNA is necessary but insufficient for predicting response6,12. More than half of patients will experience initial response, but a substantial proportion will exhibit de novo or acquired resistance4. In addition, tumor tissue sampling can be difficult or impossible to access, especially for metastatic disease12. Therefore, there is an urgent need for non-invasive measures to more effectively stratify patients on to targeted therapy. Although studies suggest promising roles for both ctDNA and radiomics in complementing tissue biopsy, both have limitations when used in isolation: ctDNA sensitivity is less than ideal13, and radiomics are difficult to interpret in the absence of biologic correlates20. Finding useful radiomic signatures is also a substantial challenge, as the number of radiomic features continues to grow. In this study, we used correlation-based hierarchical clustering and principal component analysis to first mitigate feature dimensionality and then define distinct radiomic phenotypes of tumors based on the derived feature signatures.

While most previously published studies have focused on determining associations between radiomic features and EGFR mutation status19,21,22,23,24,25,26,27,28,29,32, which is a surrogate marker of TKI response, to the best of our knowledge our study is one of the first to evaluate the feasibility of combining radiomic features and mutation status data acquired from liquid biopsy to directly predict patient outcomes after EGFR-TKI therapy. In addition, while most prior studies have examined associations between individual radiomic features and EGFR mutations, our study sought to identify phenotypic signatures that represent intrinsic patterns in radiomic data. Our analysis showed a trend for association for radiomic phenotype with EGFR T790M mutation (p = 0.07), which is in line with prior studies20,32, although not specific to EGFR T790M. If further validated, radiomic analysis could provide an inexpensive, fast, and clinically feasible tool to identify patients at high risk of developing resistance mutations.

Our study also found a statistically significant association between phenotypes and first versus later lines of TKI therapy. Interestingly, phenotype 1 which had better PFS and OS outcomes had a higher number of second and third line therapy patients (62%), whereas phenotype 2 which had worse outcomes had a higher number of first-line patients (79%). One explanation may be that radiomic phenotypes may be a surrogate of tumor heterogeneity. Such heterogeneity has been associated with inferior response and outcomes in patients receiving EGFR TKIs3. When visually examining the detected phenotypes, we observed that most cancers in phenotype 1 appear to be relatively smaller, with elongated shape, convex borders and adjacent linear opacities, while cancers in phenotype 2 appear to be generally larger, and have more ground-glass, irregular, and indistinct border characteristics (Fig. 7, Supplementary Fig. S2), suggestive of potential inflammatory changes that may be related to their worse outcomes. At the same time, the characteristics of the cancers clustered in phenotype 1 may potentially also reflect the effect of prior therapy for the 13 of 17 patients receiving later line therapy.

Figure 7
figure 7

Representative tumors from the two phenotypes. Examples demonstrate the relatively smaller, elongated shape, convex borders and adjacent linear opacities for phenotype 1 versus the larger size, ground-glass, irregular, and indistinct border characteristics for phenotype 2 suggestive of potential inflammatory changes that may be related to their observed worse PFS and OS outcomes.

Limitations of our study must also be noted. Our study sample is relatively small. As a proof of concept, it is important that our findings must be validated in larger future studies with independent cohorts. In addition, we used manual segmentation of tumors by only one human expert. While studies have shown that in general tumor segmentation and radiomic feature extraction could be affected by inter-rater variability45, recent studies suggest that such variability may not necessarily affect the robustness of all radiomic features46. In a preliminary evaluation, we also recently showed that despite inter-reader variation, radiomic features extracted from segmentations obtained by different human raters tend to be highly correlated and have similar predictive value47. Our future larger studies should seek to further evaluate the effect of reader segmentation on radiomic features, and ideally utilize fully-automated algorithms. Our study also combines radiomic features from both contrast-enhanced and non-contrast enhanced CT scans as well as from different scanners and acquisition protocols. While acknowledging that such acquisition factors may have an effect on the extracted radiomic features, our analysis showed that the use of contrast agent, spiral pitch, X-ray tube voltage and current did not appear to confound the detected phenotypes. Nevertheless, our relatively small sample size did not confer statistical power to rigorously perform stratified analysis across all possible acquisition factors to fully evaluate image acquisition effects. We are encouraged that despite the potential noise introduced by such effects we were able to detect radiomic phenotypes with statistically significant associations with outcomes and plan to further explore the effect of CT acquisition on radiomic phenotypes in our future larger studies. Finally, our study sample included a mix of patients who had received either first or later line TKI, with our models being more strongly predictive of survival for the latter group. Nevertheless, despite this heterogeneity of patients, our fully-combined multivariable model can more accurately predict survival than any one set of covariates alone.

Our study suggests that radiomic features may augment liquid biopsy and clinical prognostic factors to enhance precision oncology approaches for the management of advanced non-small cell lung cancer (NSCLC) patients. If validated, these radiomic phenotypes could be used to identify the subgroup of patients with less favorable outcomes to tyrosine kinase inhibitor (TKI) therapy who might benefit from combination therapy. Recently, for epidermal growth factor receptor (EGFR)-mutated NSCLC, the EGFR-TKI, osimertinib, has transitioned to the front-line treatment of choice based on the FLAURA trial48,49 and studies evaluating our radiomic phenotypes in this setting are ongoing. Future work, will include an extension of this approach to other recently approved targeted therapies, such as the use of osimertinib as a front-line EGFR inhibitor, and TKIs targeting other mutations such as EML4-ALK and ROS1 translocations. Ultimately, our work could pave the way for application in broader settings for patients suffering from advanced NSCLC as well as other solid tumors for which targeted therapies are approved.



Non-small lung cancer


Epidermal growth factor receptor


Circulating-tumor DNA


Kaplan–Meier (K–M)


Progression-free survival


Overall survival


Likelihood ratio test


Tyrosine kinase inhibitor


Hazard ratios


Eastern Cooperative Oncology Group


  1. Yang, J. C. et al. Afatinib versus cisplatin-based chemotherapy for EGFR mutation-positive lung adenocarcinoma (LUX-Lung 3 and LUX-Lung 6): Analysis of overall survival data from two randomised, phase 3 trials. Lancet Oncol. 16(2), 141–151. (2015).

    CAS  Article  PubMed  Google Scholar 

  2. Ettinger, D. S. et al. Non-small cell lung cancer, Version 5.2017, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Cancer Netw. JNCCN 15(4), 504–535. (2017).

    Article  Google Scholar 

  3. Hong, S. et al. Concomitant genetic alterations with response to treatment and epidermal growth factor receptor tyrosine kinase inhibitors in patients with EGFR-mutant advanced non-small cell lung cancer. JAMA Oncol. 4(5), 739–742. (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Marcar, L. et al. Acquired resistance of EGFR-mutated lung cancer to tyrosine kinase inhibitor treatment promotes PARP inhibitor sensitivity. Cell Rep. 27(12), 3422–3432. (2019).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. Saito, H. et al. Erlotinib plus bevacizumab versus erlotinib alone in patients with EGFR-positive advanced non-squamous non-small-cell lung cancer (NEJ026): Interim analysis of an open-label, randomised, multicentre, phase 3 trial. Lancet Oncol. 20(5), 625–635. (2019).

    CAS  Article  PubMed  Google Scholar 

  6. Aggarwal, C. et al. Influence of TP53 mutation on survival in patients with advanced EGFR-Mutant non-small-cell lung cancer. JCO Precis. Oncol. (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376(22), 2109–2121. (2017).

    CAS  Article  Google Scholar 

  8. Blakely, C. M. et al. Evolution and clinical impact of co-occurring genetic alterations in advanced-stage EGFR-mutant lung cancers. Nat. Genet. 49(12), 1693–1704. (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci. Transl. Med. 7(283), 283–254. (2015).

    Article  Google Scholar 

  10. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366(10), 883–892. (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. Sholl, L. M. et al. Multi-institutional oncogenic driver mutation analysis in lung adenocarcinoma: The lung cancer mutation consortium experience. J. Thorac. Oncol. 10(5), 768–777. (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. Thompson, J. C. et al. Detection of therapeutically targetable driver and resistance mutations in lung cancer patients by next-generation sequencing of cell-free circulating tumor DNA. Clin. Cancer Res. 22(23), 5772–5782. (2016).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. Aggarwal, C. et al. Clinical implications of plasma-based genotyping with the delivery of personalized therapy in metastatic non-small cell lung cancer. JAMA Oncol. 5(2), 173–180. (2019).

    Article  PubMed  Google Scholar 

  14. Khagi, Y. et al. Hypermutated circulating tumor DNA: Correlation with response to checkpoint inhibitor-based immunotherapy. Clin. Cancer Res. 23(19), 5729–5736. (2017).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Oxnard, G. R. et al. Noninvasive detection of response and resistance in EGFR-mutant lung cancer using quantitative next-generation genotyping of cell-free plasma DNA. Clin. Cancer Res. 20(6), 1698–1705. (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. Aerts, H. J. The potential of radiomic-based phenotyping in precision medicine: A review. JAMA Oncol. 2(12), 1636–1642. (2016).

    Article  PubMed  Google Scholar 

  17. Napel, S., Mu, W., Jardim-Perassi, B. V., Aerts, H. & Gillies, R. J. Quantitative imaging of cancer in the postgenomic era: Radio(geno)mics, deep learning, and habitats. Cancer 124(24), 4633–4649. (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Aerts, H. J. et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 5, 4006. (2014).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. Li, X. et al. Predictive power of a radiomic signature based on (18)F-FDG PET/CT images for EGFR mutational status in NSCLC. Front. Oncol. 9, 1062. (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Rios Velazquez, E. et al. Somatic mutations drive distinct imaging phenotypes in lung cancer. Can. Res. 77(14), 3922–3930. (2017).

    CAS  Article  Google Scholar 

  21. Zhang, M. et al. Performance of (18)F-FDG PET/CT radiomics for predicting EGFR mutation status in patients with non-small cell lung cancer. Front. Oncol. 10, 568857. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Zhang, J. et al. Value of pre-therapy (18)F-FDG PET/CT radiomics in predicting EGFR mutation status in patients with non-small cell lung cancer. Eur. J. Nucl. Med. Mol. Imaging 47(5), 1137–1146. (2020).

    CAS  Article  PubMed  Google Scholar 

  23. Wu, S., Shen, G., Mao, J. & Gao, B. CT radiomics in predicting EGFR mutation in non-small cell lung cancer: A single institutional study. Front. Oncol. 10, 542957. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Lu, L. et al. Radiomics prediction of EGFR status in lung cancer-our experience in using multiple feature extractors and the cancer imaging archive data. Tomography 6(2), 223–230. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Liu, G. et al. 3D radiomics predicts EGFR mutation, exon-19 deletion and exon-21 L858R mutation in lung adenocarcinoma. Transl. Lung Cancer Res. 9(4), 1212–1224. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Hong, D., Xu, K., Zhang, L., Wan, X. & Guo, Y. Radiomics Signature as a predictive factor for EGFR mutations in advanced lung adenocarcinoma. Front. Oncol. 10, 28. (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Tu, W. et al. Radiomics signature: A potential and incremental predictor for EGFR mutation status in NSCLC patients, comparison with CT morphology. Lung Cancer 132, 28–35. (2019).

    Article  PubMed  Google Scholar 

  28. Jia, T. Y. et al. Identifying EGFR mutations in lung adenocarcinoma by noninvasive imaging using radiomics features and random forest modeling. Eur. Radiol. 29(9), 4742–4750. (2019).

    Article  PubMed  Google Scholar 

  29. Li, X. Y. et al. Detection of epithelial growth factor receptor (EGFR) mutations on CT images of patients with lung adenocarcinoma using radiomics and/or multi-level residual convolutionary neural networks. J. Thorac. Dis. 10(12), 6624–6635. (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Li, H. et al. CT-based radiomic signature as a prognostic factor in stage IV ALK-positive non-small-cell lung cancer treated with TKI crizotinib: A proof-of-concept study. Front. Oncol. 10, 57. (2020).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  31. Mu, W. et al. Non-invasive decision support for NSCLC treatment using PET/CT radiomics. Nat. Commun. 11(1), 5228. (2020).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. Li, S., Ding, C., Zhang, H., Song, J. & Wu, L. Radiomics for the prediction of EGFR mutation subtypes in non-small cell lung cancer. Med. Phys. 46(10), 4545–4552. (2019).

    CAS  Article  PubMed  Google Scholar 

  33. Song, Z. et al. The deep learning model combining CT image and clinicopathological information for predicting ALK fusion status and response to ALK-TKI therapy in non-small cell lung cancer patients. Eur. J. Nucl. Med. Mol. Imaging (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Yushkevich, P. A. et al. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage 31(3), 1116–1128. (2006).

    Article  PubMed  Google Scholar 

  35. Griethuysen, J. J. M. et al. Computational radiomics system to decode the radiographic phenotype. Can. Res. 77(21), e104–e107. (2017).

    CAS  Article  Google Scholar 

  36. Ward, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–240. (1963).

    MathSciNet  Article  Google Scholar 

  37. Monti, S., Tamayo, P., Mesirov, J. & Golub, T. Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52(1–2), 91–118. (2003).

    Article  MATH  Google Scholar 

  38. Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B. & Wei, L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30(10), 1105–1117. (2011).

    MathSciNet  Article  PubMed  PubMed Central  Google Scholar 

  39. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Published July 2018. Updated August 29, 2018. Accessed 12 Nov 2020.

  40. Wickham H. Tidyverse: Easily install and load the ‘tidyverse’. R package version. Published December 21, 2018. Accessed 12 Nov 2020.

  41. Therneau TM. A Package for Survival Analysis in R. Published June 13, 2020. Accessed 12 Nov 2020.

  42. Ferketich, A. K. et al. Smoking status and survival in the national comprehensive cancer network non-small cell lung cancer cohort. Cancer 119(4), 847–853. (2013).

    Article  PubMed  Google Scholar 

  43. West, H. J. & Jin, J. O. JAMA Oncology Patient Page. Performance status in patients with cancer. JAMA Oncol. 1(7), 998. (2015).

    Article  PubMed  Google Scholar 

  44. Maemondo, M. et al. Gefitinib or chemotherapy for non-small-cell lung cancer with mutated EGFR. N. Engl. J. Med. 362(25), 2380–2388. (2010).

    CAS  Article  PubMed  Google Scholar 

  45. Pavic, M. et al. Influence of inter-observer delineation variability on radiomics stability in different tumor sites. Acta Oncol. 57(8), 1070–1074. (2018).

    Article  PubMed  Google Scholar 

  46. Haarburger, C. et al. Radiomics feature reproducibility under inter-rater variability in segmentations of CT images. Sci. Rep. 10(1), 12688. (2020).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. Hershman, M. L., et al. Impact of Interobserver Variabilityin Manual Segmentation of Non-Small Cell Lung Cancer (NSCLC) on Computed Tomography, Radiology Society of North America (RSNA, 2019)

  48. Ramalingam, S. S. et al. Overall survival with osimertinib in untreated, EGFR-mutated advanced NSCLC. N. Engl. J. Med. 382(1), 41–50. (2020).

    CAS  Article  PubMed  Google Scholar 

  49. Soria, J. C. et al. Osimertinib in untreated EGFR-mutated advanced non-small-cell lung cancer. N. Engl. J. Med. 378(2), 113–125. (2018).

    CAS  Article  PubMed  Google Scholar 

Download references


Funding was provided by the National Cancer Institute at the National Institutes of Health (CA234225-02), the LUNGevity Foundation, the University of Pennsylvania Center of Precision Medicine (PCPM) and Emerging Cancer Informatics Center of Excellence (eCICE).

Author information

Authors and Affiliations



Guarantors of integrity of entire study, J.C.T., E.L.C., D.K.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, M.L., B.Y.; clinical studies, M.L., S.I.K., J.C.T., E.L.C.; statistical analysis, E.A.C., W.T.H.; and manuscript editing, all authors.

Corresponding author

Correspondence to Despina Kontos.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yousefi, B., LaRiviere, M.J., Cohen, E.A. et al. Combining radiomic phenotypes of non-small cell lung cancer with liquid biopsy data may improve prediction of response to EGFR inhibitors. Sci Rep 11, 9984 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing