Value of CT features for predicting EGFR mutations and ALK positivity in patients with lung adenocarcinoma

The aim of this study was to identify the relationships of epidermal growth factor receptor (EGFR) mutations and anaplastic large-cell lymphoma kinase (ALK) status with CT characteristics in adenocarcinoma using the largest patient cohort to date. In this study, preoperative chest CT findings prior to treatment were retrospectively evaluated in 827 surgically resected lung adenocarcinomas. All patients were tested for EGFR mutations and ALK status. EGFR mutations were found in 489 (59.1%) patients, and ALK positivity was found in 57 (7.0%). By logistic regression, the most significant independent prognostic factors of EGFR effective mutations were female sex, nonsmoker status, GGO air bronchograms and pleural retraction. For EGFR mutation prediction, receiver operating characteristic (ROC) curves yielded areas under the curve (AUCs) of 0.682 and 0.758 for clinical only or combined CT features, respectively, with a significant difference (p < 0.001). Furthermore, the exon 21 mutation rate in GGO was significantly higher than the exon 19 mutation rate(p = 0.029). The most significant independent prognostic factors of ALK positivity were age, solid-predominant-subtype tumours, mucinous lung adenocarcinoma, solid tumours and no air bronchograms on CT. ROC curve analysis showed that for predicting ALK positivity, the use of clinical variables combined with CT features (AUC = 0.739) was superior to the use of clinical variables alone (AUC = 0.657), with a significant difference (p = 0.0082). The use of CT features for patients may allow analyses of tumours and more accurately predict patient populations who will benefit from therapies targeting treatment.


Results
Clinical characteristics. A total of 827 eligible patients (average age, 59 ± 9 years; 418 males) were included in the study, and their clinical and pathological characteristics are summarized in Table 2. EGFR mutations were found in 489 (59.1%) patients, and exon 21,19,20, and 18 mutation rates were 49.5%, 44.0%, 3.1% and 3.4%, respectively. ALK positivity was found in 57 (7.0%) patients. Six patients had concomitant EGFR mutations and ALK positivity (0.7%). EGFR mutations were more common in females than in males (p < 0.001) and in those who had never smoked (p < 0.001). No significant association was found between age or TNM stage and EGFR mutation (p = 0.320 and p = 0.831, respectively). Pathologically, EGFR mutation was associated with a high frequency of the lepidic predominant subtypes (p < 0.001) and a low frequency of lymph node metastasis proven by surgery (p = 0.006). Pleural invasion did not differ between patients with wild-type and mutant disease (p = 0.268). ALK positivity was found more frequently in younger patients (p < 0.001) (Fig. 1a), and the optimal cut-off value for age was 56 years old (Fig. 1c). A high frequency of solid growth or mucus patterns in ALK-positive tumours was observed in the present study. There was no significant difference in sex, smoking history, TNM stage, lymph node metastasis or pleural invasion between the ALK-positive and ALK-negative groups ( Table 2).
Interobserver agreement of CT interpretations. The intraclass correlation coefficient for maximum tumour diameter was good at 0.963 (95% CI: 0.898, 0.978). The concordance between the two observers was also good, with k coefficients ranging between 0.613 and 0.984 (Table 3).

Differences in CT features between EGFR exon 21 and -19 mutations.
The exon 21 mutation rate in GGOs was significantly higher than the exon 19 mutation rate (34% vs 24%, p = 0.029). However, no differences in sex, smoking history, predominant subtype, type, tumour maximum diameter, air bronchogram, margin, or lymphadenopathy were found between patients with EGFR exon 19 and exon 21 mutations (Table 5).  (Table 7). ROC curve analysis showed that the use of clinical variables combined with CT features (AUC = 0.739) was superior to the use of clinical variables alone (AUC = 0.657) for the prediction of ALK positivity, and a significant difference was found between them (p = 0.0082) (Fig. 4b).
Correlation analysis of histopathology subtype with lesion texture on CT. Tumours displaying GGOs on CT correlated positively with the lepidic-predominant subtype (β = 0.325, p < 0.001). A positive correlation was also found between lesions with a solid appearance on CT and the solid-predominant subtype or mucinous adenocarcinoma (β = 0.363, p < 0.001). www.nature.com/scientificreports/

Discussion
The EGFR mutation rate has been reported to be 27-56% in Asian patients 13,21,22 . Patients had a similar EGFR mutation rate in our study (59.1%), mainly composed of exons 19 and 21 (93.5%). Previous studies have reported that females and nonsmokers have an increased risk of EGFR mutations 17,23 , which was confirmed in our series. A recent study found that EGFR mutations more commonly occurred in early-stage NSCLC patients than in advanced-stage patients 24 . The present study showed that the mutation rate of EGFR (55.6%) in lung adenocarcinoma of TNM stage I-II was similar to that in stage III-IV (56.5%). Such a discrepancy in results may be due to individual differences in the samples included between studies. Moreover, in our study, EGFR mutations were associated with a small tumour size and lymph node metastasis proven by surgery, suggesting a low TNM stage in those cases. In addition, our study demonstrated that EGFR mutations were significantly more common in lepidic predominant adenocarcinomas 13,14 . This result is supported by gene expression profiling microarray studies in which high EGFR mutation frequencies were observed in terminal respiratory unit adenocarcinoma 13,25 . Several studies have explored the correlation between EGFR gene mutations and GGOs on CT [11][12][13][14][15] . For example, Glynn et al. and Sugano et al. 11,12 found no significant association between GGOs and EGFR mutations, whereas Liu et al. 13 reported that a GGO appearance and 15 other CT features were significantly associated with EGFR mutations. In the present study, three main results regarding correlations between GGO and EGFR mutations were identified. First, EGFR mutations were more frequently associated with GGOs by CT, consistent with most previous studies 10,13,17 . This finding may be due to the inverse relationship between the replication of EGFR and the percentage of GGOs on CT 26,27 . Moreover, this result was supported by the pathological-imaging correlation in this study: tumours displaying GGOs on CT correlated positively with the lepidic-predominant subtype. Second, our series showed that mixed GGO (mGGO) lesions are more susceptible to EGFR mutations than pure GGO (pGGO) or solid lesions. Hsu et al. 15 stated that EGFR mutations are more common in invasive solid patterns and significantly less common in pGGO patterns in stage I lung adenocarcinoma. Possible explanations for Table 2. Association between clinical characteristics and EGFR and ALK status in adenocarcinoma. * P values < 0.05 were based on comparisons between the two groups;&According to the IASLC 8th TNM Lung Cancer Staging System; $ Lepidic predominant includes: adenocarcinoma in situ, minimally invasive adenocarcinoma, and lepidic predominant invasive adenocarcinoma;@ Other subtypes include: acinar, papillary, micropapillary, and solid predominant adenocarcinoma, as well as variants of invasive mucinous adenocarcinoma;% other subtypes include: lepidic predominant acinar, papillary, micropapillary, as well as variants of invasive adenocarcinoma; EGFR, epidermal growth factor receptor; ALK, anaplastic large-cell lymphoma kinase EGFR + , EGFR mutation; EGFR-, EGFR wild type mutation; ALK + , ALK positive; ALK-, ALK negative.   (7) 83 (25) 119 (14) 17 (30) 102 (13) 119 (14) Mucinous predominant 7 (1) 23 (7) 30 (4) 10 (18) 20 (3) 30 (4) Micropapillary 8 (2)  7 (2) 15 (2) 3 (5) 12 (2) 15 (2) Sieve predominant Lymph node metastasis 134 (27)  www.nature.com/scientificreports/ the above findings may be that EGFR mutations promote the conversion of pGGO to mGGO and indicate more aggressive behaviour. Third, GGOs were significantly more highly related to the exon 21 mutation rate than to the exon 19 mutation rate, in line with Lee et al. 14 . The results indicated that GGO is not only a factor for EGFR mutation but also an incidence factor for EGFR exon 21 mutation. Because TKIs show different targeted effects in cases of EGFR exon 21 and -19 mutations 20 , analysis of clinical or imaging variables between these cases can provide a comprehensive baseline for selecting targeted treatment. In the present study, patients with EGFR mutations frequently presented with smaller tumours (tumour maximum diameter ≤ 33 mm) and peripheral lesions, although Liu et al. 13 found that EGFR mutations are more common in peripheral lung adenocarcinomas with diameters < 3 cm. This difference may be attributed to the different methods used between the studies, as the cut-off was self-defined in the previous study 13 . In our study, ROC curve analysis was used to determine the cut-off. In addition, our study found associations between EGFR mutations, air bronchogram and pleural retraction, which agrees with Zhou et al. 17 and Stefania et al. 16 but contradicts Mizue et al. 28 . These controversial results may be the result of different ethnicities, grouping methods and sample sizes. Air bronchograms reflect tumour invasion or expansion without destroying the intratumoural bronchus, suggesting reduced aggressiveness. Pleural retraction is a common sign of visceral pleural invasion, which is one of the most important prognostic factors after the surgical resection of NSCLC 29 . Nevertheless, in the present study, pleural invasion (proven by surgery) did not differ between wild-type and mutant EGFR. The reason for this difference may be that pleural retraction does not always mean pleural invasion pathologically. Besides, our study showed that EGFR mutation was associated with less lymphadenopathy and less lymph node metastasis pathologically, supporting a previous study 13 , suggesting a lower invasiveness of tumours with EGFR mutations.
ALK rearrangement has been identified in 0.4 to 13.5% of unselected NSCLC patients 30,31 , consistent with the present study (7.0%). Previous studies have reported that patients with ALK positivity tend to be younger and are more often never smokers than patients with ALK-negative tumours 32 . We found that a younger age (≤ 56 years old) was associated with ALK positivity. However, no significant difference in smoking status was www.nature.com/scientificreports/  www.nature.com/scientificreports/ found. A recent report by Li et al. 33 conducted on a relatively large sample demonstrated that ALK rearrangements are more commonly observed in the solid predominant subtype of adenocarcinoma. A high frequency of solid growth or mucus patterns in ALK-positive tumours was observed in the present study, consistent with the above report. Previous studies have demonstrated that ALK positivity is more common in stage IV disease (ranging from 9.7-28.0%). In contrast, ALK positivity showed no significant difference between the stage I-II and III-IV groups in our study. We indeed found that ALK positivity was associated with metastasis detected by CT, suggesting a higher TNM staging. www.nature.com/scientificreports/  www.nature.com/scientificreports/   www.nature.com/scientificreports/ A limited number of reports have focused on the association between ALK positivity and CT features [16][17][18][19] because of the low rate of ALK positivity in lung adenocarcinoma. According to Zhou et al. and Chang et al. 17,19 , a solid pattern is the main characteristic of ALK-positive tumours. Zhou et al. 17 also found that a pure GGO appearance was significantly less common in ALK-positive cases than in EGFR mutation cases. Similarly, in our series, ALK positivity was associated with solid nodules or masses without GGOs on CT. This imaging finding was consistent with the correlation between solid lesions on CT and solid-predominant subtypes or mucinous adenocarcinoma in the present study, whereas ALK positivity commonly occurred in tumours of solid-predominant subtypes or mucus patterns. Therefore, the correlation between CT signs and gene mutations may be due to different histological growth characteristics.
Previous studies have reported that ALK-positive lung adenocarcinoma frequently occurs with extensive lymph node metastasis 18,19 , pleural retraction 17 , pleural effusion 16 , and distant metastasis 34 . Among them, distant metastasis in ALK-positive patients was supported by our study, although we did not find significant correlations between ALK positivity and other CT features, which may be caused by the small sample size in the current studies. In addition, the present study found that ALK-positive tumours lack air bronchograms. As mentioned above, the sign of air bronchogram represents reduced tumour invasiveness. When taken together, we may reasonably consider that ALK-positive lung adenocarcinoma is associated with high invasion. The lack of GGO manifestations in our study also supports this biological behaviour.
Our study has some limitations. First, this study was conducted at a single institution, and the patients in our study were all Chinese and thus had a genetic alteration pattern distinct from that of other races, which may impede the application of our results to other ethnicities. To improve the generalization ability and optimization of the model, multidisciplinary and prospective research is needed. Second, due to the limitation of the retrospective analysis method, our study is just a preliminary research study on CT features for predicting gene mutations. However, this report serves as a basis for comprehensive and prospective investigations analysing these patient populations. Third, although we strictly used double-blind methods to record CT signs, EGFR and ALK status, selection bias was inevitable. Finally, the present study analysed only adenocarcinoma and did not include other histologic subtypes, which could explain the results. However, this is understandable, as the majority of EGFR mutations and ALK positivity are found in adenocarcinomas, with an extremely low mutation rate in squamous cell carcinoma (< 5%) 35 . Finally, CT findings of distance metastases were not pathologically confirmed. Thus, we did not include distance metastases in the multivariate logistics analysis to predict ALK positivity.
In conclusion, combining clinical variables and CT features was more effective in predicting EGFR and ALK than using clinical variables alone. In addition, GGO is not only a factor for EGFR mutation but also an incidence factor for EGFR exon 21 mutation. Therefore, the use of CT features for patients can allow analyses of tumours and more accurately predict patient populations who will benefit from EGFR-TKIs or ALK crizotinib treatment.

Materials and methods
Patients and inclusion criteria. A total of 1,459 patients evaluated by the multidisciplinary thoracic oncology group between January 2010 and February 2017 at the Union Hospital of Tongji Medical College were retrospectively screened. Initially, 1,186 patients were included according to the inclusion criteria: (1) lung adenocarcinoma confirmed by surgical resection; (2) available pathology reports (including predominant pathological subtypes, lymph node metastasis and pleural invasion, etc.); (3) available results for both EGFR mutations and ALK status; and (4) available clinical data. In total, 359 patients were excluded because of the following three exclusion criteria: (1) thin-section CT was not available (n = 251); (2) heavy CT image artefacts (n = 65); and (3) received preoperative treatment with chemotherapy or radiation therapy before surgery (n = 43). Ultimately, 827 patients were included. The patient's clinical characteristics, including age, sex, smoking history, histopathology, nodal involvement, and tumour stage, among others, were recorded. In accordance with Lv et al. 34 , nonsmoking was defined as lifetime exposure to fewer than 100 cigarettes, and the remaining patients were categorized as ever-smokers. TNM staging was based on the IASLC 8th TNM Lung Cancer Staging System 36 . This retrospective study was approved by the Institutional Review Board of Union Hospital of Tongji Medical College. All subjects enrolled signed a written consent form after being informed of the details of the research. This study was conducted in compliance with the Declaration of Helsinki. EGFR mutation analysis. EGFR mutations were analysed according to the principle of the amplified drug resistance mutation system (ARMS). Primary tumours or lymph nodes were simply excised, aspirated, or biopsied, followed by 10% neutral buffered formalin fixation and paraffin embedding. DNA was extracted from formalin-fixed paraffin-embedded (FFPE) tissue sections, and the Qiagen FFPE Tissue Kit (Netherlands Roots NV) was used according to the manufacturer's instructions. PCR was carried out using www.nature.com/scientificreports/ CT image acquisition. CT was performed at our institution using a multislice spiral CT system (SOMATOM Definition AS + , Siemens Healthineers, Germany) 38 . The scan ranged from the level of the chest inlet to the inferior level of the costophrenic angle. The CT parameters were as follows: detector collimation width, 64 × 0.6 mm and 128 × 0.6 mm; tube voltage, 120 kV. The tube current was regulated by an automatic exposure control system (CARE Dose 4D). Images were reconstructed with a slice thickness of 1.5 mm and an interval of 1.5 mm. The reconstructed image is transmitted to the workstation and picture archiving and communication systems (PACS) for multiplanar reconstruction (MPR) postprocessing. Nonionic iodine contrast agents (60-80 ml iohexol 350 mg/mL, Beilu Pharmaceutical Co., Ltd.; Beijing, China) at a dose of 3 Ml/s were intravenously injected into 360 patients. Two radiologists with different degrees of experience in interpreting chest CT images independently performed all qualitative image analyses 38 . One of them was a senior radiologist with 10 years of chest imaging (H.S.); the other is a fellow with 4 years of experience in CT image interpretation (J.G.). Both analysed the Digital Imaging and Communications in Medicine (DICOM) images from the CT studies without access to clinical and histologic findings but were aware of the presence and sites of tumours. They assessed CT features using both axial CT images and MPR images. After separate evaluations were performed, differences were resolved by consensus. For each CT scan, the data shown in Table 1 were recorded. Statistical analysis. The analyses were performed using SPSS Statistics (SPSS, version 21, IBM, Chicago, IL, USA) and MedCalc 16.2.0 (MedCalc Software, Mariakerke, Belgium) 38 . Distribution normality was assessed using the Kolmogorov-Smirnov test. Normally and nonnormally distributed data and categorical variables are expressed as the mean ± standard deviation, median (interquartile range) and frequency (percentage), respectively. The independent-sample Student's t test was applied to compare two groups of normally distributed variables, and one-way ANOVA and the chi-square test were used to compare categorical variables. Multivariate linear regression analyses (binary logistic regression) were performed to identify independent factors predictive of EGFR or ALK mutation status. The final model was selected by using the enter elimination method, with a cut-off P value of 0.05. A P value < 0.05 (two-tailed) was considered to be statistically significant. Receiver operating characteristic (ROC) curves were constructed for the ability of combined independent factors to predict EGFR mutations or ALK positivity. Comparison of the ROC curves for clinical characteristics alone and clinical characteristics combined with CT signs was performed by the nonparametric approach of DeLong et al. Patient age and the tumour maximum diameter were applied to examine the diagnostic performance of ALK positivity and EGFR mutation by ROC curve analysis. The sensitivity, specificity and optimal cut-off value were calculated. The repeatability test of the maximum tumour diameter was evaluated by intraclass correlation coefficient (ICC) analysis and the 95% CI. For other CT signs, interobserver agreement was assessed by the k coefficient. Pearson's correlation was used to analyse the relationship between histopathology subtype and lesion texture. A P value < 0.05 (two-tailed) was considered to be statistically significant.

Ethics declarations
This study was approved by the ethics committee of Tongji Medical College of Huazhong University of Science and Technology. All subjects provided written informed consent.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.