## Introduction

The discovery of activating mutations and the development of targeted therapies has improved survival in patients with non-small cell lung cancer (NSCLC)1. Mutation detection by tissue and circulating tumor DNA (ctDNA) next-generation sequencing (NGS) guides therapy selection both at initial diagnosis and disease progression. Epidermal growth factor receptor (EGFR) mutations are the most common therapeutically targetable variants in NSCLC, and treatment with an EGFR tyrosine kinase inhibitor (TKI) has shown superior efficacy compared to standard chemotherapy in mutation-positive patients2. However, primary resistance occurs in 20–30% of patients3. Ultimately, all patients develop acquired resistance to EGFR-directed therapies and an active area of research is the use of novel combination therapies, including antibodies against c-met, poly-adenosine diphosphate ribose polymerase inhibitors and antiangiogenic therapies along with EGFR-TKIs to improve long-term efficacy4,5.

Tumor heterogeneity is thought to play a role in TKI response and is associated with poor outcome6,7,8,9, as EGFR mutations may be suboptimal targets when they co-occur with genetic alternations or are subclonally expressed8,9. Small tissue biopsies may not fully reflect tumor heterogeneity and can often be difficult to obtain10,11, with tissue NGS only able to be completed for as few as 50% of patients12. Thus, developing non-invasive tests to assess the likelihood of response to an EGFR-TKI is critical for therapy selection. Studies have shown that ctDNA analysis represents a non-invasive biomarker that can improve targetable mutation detection, and that ctDNA molecular heterogeneity predicts clinical outcome13,14,15. Although useful clinically, however, ctDNA sensitivity remains less than ideal13.

An emerging non-invasive approach to characterize tumor heterogeneity is to analyze tumor imaging phenotypes16,17. Radiomics analysis enables the detection of tumor imaging features and patterns of intra-tumor heterogeneity not appreciable by the human eye, increasing the wealth of information from radiological imaging. Studies specifically suggest that radiomic analysis may provide novel prognostic markers related to gene-expression patterns and responder signatures for NSCLC patients receiving targeted therapy18,19,20,21,22,23,24,25,26,27,28,29,30,31. Most studies to date have focused on using radiomic analysis on computed tomography (CT) and/or positron emission tomography (PET)/CT data to predict EGFR mutation status, using statistical modeling or machine learning approaches for reducing the high dimensionality of radiomic features19,21,22,23,24,25,26,27,28,29,32. More recently deep learning approaches have also been used to predict outcomes after TKI therapy for NSCLC31,33. While this field is rapidly developing, a question still remains as to which extent radiomic analysis can complement established prognostic markers for TKIs, as most studies have either evaluated radiomic features in the absence of established prognostic biomarkers or have only examined surrogate endpoints, such as EGFR mutation status, rather than actual patient outcomes. In addition, and to the best of our knowledge, no studies have evaluated radiomic analysis in the context of complementing liquid biopsy-based assessment, which is another promising non-invasive tool for characterizing tumor heterogeneity when predicting EGFR-TKIs response.

The purpose of our study was to determine the feasibility of integrating radiomics features with ctDNA next-generation sequencing data to predict TKI outcomes in EGFR mutant NSCLC. Our approach combines unsupervised hierarchical clustering and principal component analysis (PCA) of radiomic features extracted from clinically acquired CT scans, to arrive at two distinct radiomic phenotypes. Our hypothesis is that integrating these radiomic phenotypes with ctDNA and clinical variables can improve assessment of tumor heterogeneity and outcome prediction to EGFR-targeted therapy for metastatic NSCLC.

## Materials and methods

### Study sample and data

This single-center, retrospective, observational study was conducted at the University of Pennsylvania from October 2016 to February 2019 and was approved by the Institutional Review Board with Health Insurance Portability and Accountability Act waiver of informed consent. All methods in this study were in accordance with the Declaration of Helsinki and informed consent was obtained from all the participants. Patients with metastatic NSCLC that had an actionable EGFR mutation detected by ctDNA next-generation sequencing and also had CT imaging data available for radiomic analysis were included. Based on these criteria, a total of 40 EGFR-mutant advanced NSCLC patients were included in the study. All patients were treated with the EGFR-TKI indicated by the clinical ctDNA next-generation sequencing result either at the time of diagnosis (n = 23) or suspected progression on a front-line EGFR-TKI (n = 17). The patients starting an EGFR-TKI at the time of diagnosis received afatinib (n = 8), erlotinib (n = 5), gefitinib (n = 1), or osimertinib (n = 9). All patients who had experienced progression on a front-line EGFR-TKI received osimertinib (n = 17). Baseline demographics, clinical data, including ctDNA targeted next-generation sequencing results (Guardant360 73 gene panel), and baseline CT scans were collected from the electronic medical record. ctDNA features measured included: allele fraction of the therapeutically targetable driver mutation, total number of co-existing mutations detected, and whether the EGFR T790M mutation was detected. Chest CT data included a total of 7 contrast-enhanced and 33 non-contrast enhanced scans, of which 24 were acquired with Siemens and 16 with a General Electric scanner (Supplementary Table S1). A board-certified, fellowship-trained thoracic radiologist (S.I.K.) with 18 years of clinical experience manually segmented the tumor area using the semi-automated ITK-SNAP software (version 3.6.0) (Fig. 1a)34.

### Radiomic feature extraction

A total of 429 radiomic features were extracted from each tumor’s entire volume using the PyRadiomics library35, representing nine type of descriptors: (1) First-order statistics, capturing the voxel grey-level intensities within a neighborhood. (2) Shape-based descriptors of the three-dimensional size and shape of the tumor measured on the whole tumor volume. (3) Gray level co-occurrence matrix features, calculated based on second-order joint probability functions of voxel intensities in a particular spatial relation, for all intensities and many spatial relations. (4) Gray level size zone matrix features, similar to gray level co-occurrence matrix features but rotation-independent. (5) Gray level run length matrix features, based on quantifying gray level runs as the lengths of consecutive pixels. (6) Gray level dependence matrix features, calculated as the number of connected voxels within a specified distance. (7) Neighboring gray tone difference matrix features, rotation-independent features based on gray-level relationships between neighboring voxels (for a certain distance between voxels). (8) Laplacian of Gaussian features, capturing information about edge detection in a smoothed image. (9) Wavelet features, giving information on the location, direction, and frequency of gray-level changes. All features were z-scored prior to further analysis.

### Radiomic phenotype identification

We used the extracted features as input to a two-level hierarchical clustering algorithm: first, features were clustered and principal component analysis was used to reduce dimensionality and construct a feature-vector signature reflecting each tumor’s imaging phenotype (i.e., feature-level clustering); then the derived feature vector signatures were clustered (i.e., tumor-level clustering) to identify intrinsic tumor phenotypes (Fig. 1b). Specifically, for Pearson’s correlation $$r$$ between any two features, we defined $$1-{r}^{2}$$ as a metric for the distance between the z-scored radiomic features, with strongly covarying features being closer. Using this metric, we performed unsupervised hierarchical clustering, applying the maximum distance linkage on the extracted features36. To determine the optimal number of feature clusters we used consensus clustering37 with a 10% cutoff for minimum change in the cumulative density function. We then performed PCA on each identified feature cluster and retained the first principal component (PC) from each cluster for all subsequent statistical modeling. As the features in each cluster covary strongly, the first PC should capture the dominant information in each feature cluster. Where $$k$$ is the number of feature clusters, dimensionality is thus reduced from 429 total radiomic features measured to $$k$$, with $$k$$ substantially lower than 429. Using the same unsupervised hierarchical approach as described above36,37 we used these derived PC feature signatures to cluster our sample into distinct radiomic tumor phenotypes, where the optimal number of phenotype clusters was deemed by consensus clustering37.

### Statistical analysis

We used Kaplan–Meier (K–M) curves and log-rank test to assess the univariable association between radiomic phenotype and each of progression-free survival (PFS) and overall survival (OS). We also used K–M curves to assess the association between these outcomes and each of the established prognostic clinical covariates of age, smoking status, and Eastern Cooperative Oncology Group (ECOG) performance score; patient line of therapy (first versus second or third); and the ctDNA-derived number of mutations. Further, Cox proportional-hazards regression models provided hazard ratios (HRs) and p values for the effect of each of these covariates. Retaining number of mutations and all other covariates that gave p ≤ 0.2 for association in a univariable model, we examined multivariable models both with and without radiomic phenotype. We evaluated Cox models using the likelihood-ratio test (LRT) both versus the null model, and, for the multivariable model, versus the nested model without radiomic phenotype. Finally, model discrimination capacity was assessed via the concordance statistic (c-statistic), as modified by Uno et al.38, with a time horizon for each event type of τ = the longest time-to-event for that event type. As a subsidiary analysis, we also examined the K-M curves for PFS and OS versus what line of therapy a patient received—first versus second or third—and for radiomic phenotype within strata of line of therapy.

To evaluate possible confounding, variations in CT acquisition including contrast-enhanced versus non-contrast-enhanced imaging, helical pitch, X-ray voltage, and tube current (Supplementary Table S1) were also tested for association both with radiomic phenotype via Fisher’s exact and Mann–Whitney–Wilcoxon tests and with outcome via K–M curves.

Statistical significance was tested throughout all analyses versus α = 0.05. We performed all data manipulation, statistical analysis, and plotting using Python (Ver. 3.7, Anaconda) and the R programming language (Ver. 3.5.1)39,40,41.

## Results

### Study sample

The median age in our study sample was 62 years, with 29 (72.5%) women, 21 former smokers (52.5%) and 19 never smokers (47.5%). All patients had a therapeutically targetable EGFR mutation detected by clinical ctDNA testing, including: EGFR exon 19 deletion, EGFR L858R, EGFR G719C/S768I, EGFR Exon 20 insertion, and EGFR T790M. Patients were followed for a median time of 328 days, range 29–835. All patients received the EGFR inhibitor indicated by their ctDNA testing, with 23 (57.5%) receiving the drug in the front-line setting and 17 (42.5%) in the later line setting (Table 1). Of the 40 patients, 11 died and 29 were censored (maximum time to death 676 days, median 339); 20 showed disease progression and 20 were censored (maximum time to progression 511 days, median 231). There was no statistically significant difference for any of the clinical covariates between phenotypes except for first versus later (second or third) line of TKI therapy (p = 0.01) (Table 1). The majority (15 of 23) of patients receiving front-line therapy were classified into phenotype 2, and the majority (13 of 17) of patients receiving a later line therapy into phenotype 1.

### Radiomic phenotype identification

From the 429 initially extracted radiomic features assessed, feature-level clustering with PCA gave k = 27 derived features (Fig. 2), when the relative change in area under the cumulative distribution function (CDF) fell below 10%. Subsequent tumor-level clustering identified two distinct radiomic phenotypes, with 21 tumors in phenotype 1 and 19 in phenotype 2 (Fig. 3) (p < 0.001 for SigClust test of two clusters versus one). No significant associations were found between CT acquisition parameters (including contrast-enhanced versus non-contrast-enhanced imaging) and phenotype or outcome (Supplementary Tables S2, S3, Supplementary Fig. S1).

### Radiomic phenotype association with outcomes

Median PFS was 17 months for patients with radiomic phenotype 1 versus 10.4 months for those with phenotype 2 (median OS was not reached for either phenotype). The split between K–M curves for PFS resulted into log-rank p = 0.03; in a univariable Cox model, the HR 2.7 (95% confidence interval (CI) 1.1, 6.6) (p = 0.04) for tumors with radiomic phenotype 2 versus 1 (Fig. 4, Table 2). In OS, K–M curves dichotomized by phenotype resulted in a log-rank p = 0.11; in the corresponding univariable Cox model, the HR 2.7 (95% CI 0.8, 9.2) (p = 0.12) for tumors with phenotype 2 versus 1 (Fig. 4, Table 3). When PFS and OS were analyzed by line of therapy, radiomic phenotype showed statistically significant separation of the K–M curves for both outcomes in patients who received second or third line of therapy (p < 0.005), whereas there was no appreciable separation for patients who received front-line EGFR-targeted therapy (p = 0.36 and p = 0.66 for PFS and OS, respectively) (Fig. 5). The ECOG performance score also showed association with PFS and OS (PFS: HR 3.56 [95% CI (1.64, 7.73)], p < 0.005; OS: HR 2.91 [95% CI (1.17, 7.24)], p = 0.02). Smoking status showed p < 0.2 in univariable modeling and so, along with ECOG performance, was retained in the multivariable model (Tables 2, 3).

### Radiomic phenotype association with outcomes when combined with clinical and liquid biopsy data

Age, smoking status, and ECOG performance status are established prognostic factors for metastatic NSCLC42,43 that are considered clinically in selecting a patient’s therapy. While ctDNA NGS is often used to detect therapeutically targetable mutations, the association of other ctDNA measures, such as the number of mutations detected which may be a surrogate of tumor heterogeneity, have not been previously assessed. To determine the added value of radiomic phenotypes, to ctDNA data and established clinical prognostic covariates retained from univariable modeling, we next calculated multivariable Cox regression models that incorporated number of ctDNA-detected mutations, smoking status, and ECOG performance score, both with and without radiomic phenotype.

The PFS model without phenotype yielded a c-statistic of 0.73 (95% CI 0.59–0.86); a model using radiomic phenotype alone gave a c-statistic of 0.63 (95% CI 0.49–0.77); and including radiomic phenotype in the multivariable model increased the c-statistic to 0.77 (95% CI 0.64–0.89) with an LRT p < 0.005, suggesting that this model had a better fit than the model without phenotype (Table 4). The pattern was similar for OS. The OS multivariable model without radiomic phenotype yielded a c-statistic of 0.8 (95% CI 0.61–0.98); the model using phenotype alone had a c-statistic of 0.62 (95% CI 0.39–0.85); and adding radiomic phenotype to the multivariable model increased the c-statistic to 0.83 (95% CI 0.67–1) with an LRT p = 0.08 (Table 4).

The full multivariable model of PFS, incorporating number of mutations, smoking status, ECOG performance score, and radiomic phenotype, yielded p < 0.005 for separation of K–M curves for patients above versus below the median prognostic score (Fig. 6). Of the covariates in this model, only ECOG performance status (HR 5.1 (95% CI 2.0–13.3) for each increment in grade, p < 0.005) and phenotype (HR 3.8 (95% CI 1.3–10.7) for tumors in radiomic phenotype 2 versus 1, p = 0.01) had statistically significant association for HR ≠ 1 (Table 2). The full multivariable model of OS also had p < 0.005 for separation of the K–M curves for patients above versus below the median prognostic score (Fig. 6). Of the covariates included, only ECOG performance status (HR 4.4 (95% CI 1.2, 16.6) for each increment in grade, p = 0.03), had statistically significant association for HR ≠ 1. (Table 3).

## Discussion

We have used computerized tomography (CT) images to identify patient subpopulations with radiomic phenotypes that show differing responses to epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs). In particular, we combined several non-invasively gathered prognostic factors: clinical data from electronic medical records, circulating-tumor DNA (ctDNA) next-generation sequencing (NGS) ordered as standard of care, and radiomic features extracted from clinically acquired chest CT scans. A model including radiomic phenotype, number of mutations, smoking status, and ECOG performance score had better performance in predicting PFS than a model without radiomic phenotype, increasing the c-statistic from 0.73 to 0.77 (LRT p = 0.01). Similarly, for predicting OS, adding radiomic phenotype raised the c-statistic from 0.80 to 0.83 (LRT p = 0.08). Both augmented models showed statistically significant separation of K-M curves when split at their median prognostic score (p < 0.005 for both).

Although TKIs have dramatically changed the management of metastatic non-small cell lung cancer (NSCLC)5,8,44, the detection of a driver EGFR mutation in tumor tissue or ctDNA is necessary but insufficient for predicting response6,12. More than half of patients will experience initial response, but a substantial proportion will exhibit de novo or acquired resistance4. In addition, tumor tissue sampling can be difficult or impossible to access, especially for metastatic disease12. Therefore, there is an urgent need for non-invasive measures to more effectively stratify patients on to targeted therapy. Although studies suggest promising roles for both ctDNA and radiomics in complementing tissue biopsy, both have limitations when used in isolation: ctDNA sensitivity is less than ideal13, and radiomics are difficult to interpret in the absence of biologic correlates20. Finding useful radiomic signatures is also a substantial challenge, as the number of radiomic features continues to grow. In this study, we used correlation-based hierarchical clustering and principal component analysis to first mitigate feature dimensionality and then define distinct radiomic phenotypes of tumors based on the derived feature signatures.

While most previously published studies have focused on determining associations between radiomic features and EGFR mutation status19,21,22,23,24,25,26,27,28,29,32, which is a surrogate marker of TKI response, to the best of our knowledge our study is one of the first to evaluate the feasibility of combining radiomic features and mutation status data acquired from liquid biopsy to directly predict patient outcomes after EGFR-TKI therapy. In addition, while most prior studies have examined associations between individual radiomic features and EGFR mutations, our study sought to identify phenotypic signatures that represent intrinsic patterns in radiomic data. Our analysis showed a trend for association for radiomic phenotype with EGFR T790M mutation (p = 0.07), which is in line with prior studies20,32, although not specific to EGFR T790M. If further validated, radiomic analysis could provide an inexpensive, fast, and clinically feasible tool to identify patients at high risk of developing resistance mutations.

Our study also found a statistically significant association between phenotypes and first versus later lines of TKI therapy. Interestingly, phenotype 1 which had better PFS and OS outcomes had a higher number of second and third line therapy patients (62%), whereas phenotype 2 which had worse outcomes had a higher number of first-line patients (79%). One explanation may be that radiomic phenotypes may be a surrogate of tumor heterogeneity. Such heterogeneity has been associated with inferior response and outcomes in patients receiving EGFR TKIs3. When visually examining the detected phenotypes, we observed that most cancers in phenotype 1 appear to be relatively smaller, with elongated shape, convex borders and adjacent linear opacities, while cancers in phenotype 2 appear to be generally larger, and have more ground-glass, irregular, and indistinct border characteristics (Fig. 7, Supplementary Fig. S2), suggestive of potential inflammatory changes that may be related to their worse outcomes. At the same time, the characteristics of the cancers clustered in phenotype 1 may potentially also reflect the effect of prior therapy for the 13 of 17 patients receiving later line therapy.

Limitations of our study must also be noted. Our study sample is relatively small. As a proof of concept, it is important that our findings must be validated in larger future studies with independent cohorts. In addition, we used manual segmentation of tumors by only one human expert. While studies have shown that in general tumor segmentation and radiomic feature extraction could be affected by inter-rater variability45, recent studies suggest that such variability may not necessarily affect the robustness of all radiomic features46. In a preliminary evaluation, we also recently showed that despite inter-reader variation, radiomic features extracted from segmentations obtained by different human raters tend to be highly correlated and have similar predictive value47. Our future larger studies should seek to further evaluate the effect of reader segmentation on radiomic features, and ideally utilize fully-automated algorithms. Our study also combines radiomic features from both contrast-enhanced and non-contrast enhanced CT scans as well as from different scanners and acquisition protocols. While acknowledging that such acquisition factors may have an effect on the extracted radiomic features, our analysis showed that the use of contrast agent, spiral pitch, X-ray tube voltage and current did not appear to confound the detected phenotypes. Nevertheless, our relatively small sample size did not confer statistical power to rigorously perform stratified analysis across all possible acquisition factors to fully evaluate image acquisition effects. We are encouraged that despite the potential noise introduced by such effects we were able to detect radiomic phenotypes with statistically significant associations with outcomes and plan to further explore the effect of CT acquisition on radiomic phenotypes in our future larger studies. Finally, our study sample included a mix of patients who had received either first or later line TKI, with our models being more strongly predictive of survival for the latter group. Nevertheless, despite this heterogeneity of patients, our fully-combined multivariable model can more accurately predict survival than any one set of covariates alone.

Our study suggests that radiomic features may augment liquid biopsy and clinical prognostic factors to enhance precision oncology approaches for the management of advanced non-small cell lung cancer (NSCLC) patients. If validated, these radiomic phenotypes could be used to identify the subgroup of patients with less favorable outcomes to tyrosine kinase inhibitor (TKI) therapy who might benefit from combination therapy. Recently, for epidermal growth factor receptor (EGFR)-mutated NSCLC, the EGFR-TKI, osimertinib, has transitioned to the front-line treatment of choice based on the FLAURA trial48,49 and studies evaluating our radiomic phenotypes in this setting are ongoing. Future work, will include an extension of this approach to other recently approved targeted therapies, such as the use of osimertinib as a front-line EGFR inhibitor, and TKIs targeting other mutations such as EML4-ALK and ROS1 translocations. Ultimately, our work could pave the way for application in broader settings for patients suffering from advanced NSCLC as well as other solid tumors for which targeted therapies are approved.