Abstract
Histological stratification in metastatic non-small cell lung cancer (NSCLC) is essential to properly guide therapy. Morphological evaluation remains the basis for subtyping and is completed by additional immunohistochemistry labelling to confirm the diagnosis, which delays molecular analysis and utilises precious sample. Therefore, we tested the capacity of convolutional neural networks (CNNs) to classify NSCLC based on pathologic HES diagnostic biopsies. The model was estimated with a learning cohort of 132 NSCLC patients and validated on an external validation cohort of 65 NSCLC patients. Based on image patches, a CNN using InceptionV3 architecture was trained and optimized to classify NSCLC between squamous and non-squamous subtypes. Accuracies of 0.99, 0.87, 0.85, 0.85 was reached in the training, validation and test sets and in the external validation cohort. At the patient level, the CNN model showed a capacity to predict the tumour histology with accuracy of 0.73 and 0.78 in the learning and external validation cohorts respectively. Selecting tumour area using virtual tissue micro-array improved prediction, with accuracy of 0.82 in the external validation cohort. This study underlines the capacity of CNN to predict NSCLC subtype with good accuracy and to be applied to small pathologic samples without annotation.
Similar content being viewed by others
Introduction
The standard of care in first line treatment of Non-Small Cell Lung Cancer (NSCLC) patients is based on chemoimmunotherapy or tyrosine kinase1,2. Treatment is assigned on the basis of specific histologic and genomic characteristics of the patient’s tumour3. In a first step, NSCLC must be classified into a particular histological type: non-squamous NSCLSC versus squamous cell carcinoma. This classification is essential for further molecular examination of the tissue sample to orient patients towards the optimal therapeutic treatment4. In case of non-squamous NSCLC, it is mandatory to obtain a list of molecular biomarkers, such as EGFR or BRAF V600E mutations, or ALK and ROS1 rearrangements5. In addition, many emerging biomarkers require histological material for adenocarcinoma NSCLC (Met, NRG1, NTRK) but also for non-squamous NSCLC (PI3KCA, HRAS)6. However, while the molecular and histological material needed for treatment determination is increasing, the amount of histologic tumour tissue available is often small. Therefore, strategies that can help to reduce the material required for histological assessment will be helpful.
The development of Artificial Intelligence algorithms, which can be used to automatically classify histological slides, opens new perspectives in virtual and digital pathology. For example, in the setting of lung cancer, automatic analysis of whole-slide images of lung tumour resection has recently been studied to predict survival outcomes7, and can be used to predict histological type or mutational status8. However, such data are not relevant for clinical use because in most cases, pathologists only have a small tumour biopsy, or cytology fine needle aspiration.
To limit the volume of material required for histological diagnosis, we propose a deep learning convolutional network aimed at predicting the histological classification of non-squamous versus squamous cell carcinoma. Our analysis was based on tumour biopsy using whole tissue from biopsy or virtual TMA, based on annotation of the tumour zone.
Results
Population description
For the learning set, we included 132 HES slides from Dijon. These samples comprised 66 non-squamous and 66 squamous samples. Samples were obtained from primary lung tumour for all cases. The median tissue area was 11.734 [0.158–111.227] mm2 and the median tumour tissue area was 0.177 mm2 [0.002–1.088]. For the validation set, we included HES slides from Caen (n = 65; 45 non squamous and 20 squamous samples).
In the training and validation cohorts, no cytologic specimen was included, the sets were built with brushing and transbonchial small biopsies. In the test cohort, only one cytological was included (from pericarium liquid), the other specimens were either transbronchial or brushing small biopsies comparable to those included in the training and validation sets.
We also randomly selected 60 H&E slides from the LUAD and LUSC cohorts from the TCGA database, 30 for non-squamous patients and 30 for squamous patients.
A deep learning model for NSCLC subtype prediction using WSI classification
Our objective was to estimate a deep learning model to classify lung carcinoma subtypes using whole HES slides from tumour biopsy, regardless of the percentage of tumour cells contained in the biopsy. As described above, the learning cohort was decomposed into internal training, validation and test sets (Table 1).
Using Inception V3 deep learning architecture, our CNN model was optimized using different approaches. First, we added a threshold for predictions to retain only tiles with high prediction level; in fact, it is expected that WSI include a large number of tiles without tumour cells, which would alter predictions with noisy information. A second strategy used a kernel filter to take into account the spatial environment of the tiles. At the tile level, accuracies from the resulting models underline that the threshold methodology is the best strategy, with values of 0.99, 0.87 and 0.85 respectively in the training, validation and test datasets. Similarly, our model had an accuracy of 0.85 (Table 2) and AUC of 0.81 (Fig. 1a) in the external validation cohort which underlines the robustness of the model and the absence of overfitting. Supplemental Fig. S1 shows the accuracy and loss across epochs for model estimation. An accuracy of 0.75 and an AUC of 0.78 were reached in the TCGA dataset.
To classify the tumour slide, we pooled tile information using either max pooling or majority voting strategies. The best strategy was majority voting; using this strategy, our model had an accuracy of 0.73, 0.78 and 0.64 respectively in the learning, external validation and TCGA cohorts (Table 3). In the external validation cohort, the model had an AUC of 0.79 using majority voting (Fig. 1b) and 0.89 using max pooling (Fig. 1c). In the TCGA cohort, the model had an AUC of 0.67 using majority voting and 0.76 using max pooling.
Prediction using virtual TMA analysis of WSI
In order to improve classification and computational time, we created a virtual TMA, using a circle with a radius of 500 µm from the centroid of the annotation drawn by the pathologist. The computational time for predictions on TMA was 18 times less than on the entire slide. This strategy also has a benefit that can translate in clinical routine with quick annotation by a pathologist, who just clicks on the tumour core instead of contouring the whole tumour. When we applied the unique model trained in WSI on TMA, the accuracy findings confirmed that using this gating strategy, the threshold methodology was also the best strategy, with model accuracy of 0.99, 0.83 and 0.88 in the training, validation and test datasets at the tile level. Similarly, the model had accuracy of 0.92 (Table 2) and an AUC of 0.94 (Fig. 1a) in the external validation cohort. An accuracy of 0.83 and an AUC of 0.77 were reached in the TCGA cohort.
We then used the TMA strategy to predict tumour slide classification. Using majority voting, the model had an accuracy of 0.68, 0.82 and 0.73 respectively in the learning, external validation and TCGA cohorts (Table 3). In the external validation cohort, the AUC was equal to 0.88 for the both strategies max pooling and majority voting (Fig. 1b,c).
In the TCGA dataset, the AUC was equal to 0.63 using majority voting and 0.79 using max pooling.
Figures 2 and 3 show two cases of tumour biopsy sections containing respectively squamous and non-squamous tumour, with the correct diagnosis and the predicted WSI diagnosis based on TMA or WSI analysis for each prediction step.
Discussion
The diagnosis of NSCLC is based on morphological evaluation of tissue specimens. This analysis is the first step before addressing samples for molecular testing and therapy stratification4. One issue in the management of metastatic lung cancer is that in most cases, samples are cytological exams or small biopsies. Preservation of the sample in this clinical context for further molecular testing is important. Consequently, even if applying artificial intelligence on such a routine exam may seem irrelevant an experienced pathologist who is well trained in analysis of IHC staining like TTF1 and p40, it clearly assure a sparingly use of biopsy specimen. Our study, like previous reports, showed that the combination of digital pathology and machine learning has the potential to support this decision process in an objective manner9. In previous works, the application of deep learning to classify lung histological specimens yielded promising results in lung cancer10,11,12. However most of these reports only fostered on surgical samples.
In this study, we analysed whether a CNN-model (InceptionV3 CNN) could be used to differentiate squamous from non-squamous NSCLC, based on the initial tumour biopsy. This study was performed without taking into account the tissue type of the biopsy, or whether the sample was a cytological or histological sample. In this work, we addressed some technical points and show that the whole slide can be used to predict the histological subtype with good accuracy, without prior tumour tissue selection by the pathologist. Surprisingly, adding spatial information using kernel filter did not improve the classification. In contrast, adding quality check with a threshold to select only predictions with a good level of confidence improved the accuracy of the classification. These findings are not unexpected, since WSI include many non-tumour zones.
To improve the prediction, we also used a virtual TMA strategy. Based on the pathologist’s hand-drawn tumour annotations, TMA were created by tracing a circle with a radius of 500 µm from the centroid of this annotation. This strategy could easily be reproduced by a pathologist, who could click on the virtual slide to localize the tumour and obtain the prediction for the whole slide using only TMA restricted information.
We chose to estimate our model on whole-slide images because we believe stroma or connective tissues can also be within the tumor, i.e. tumor is made of tumor cells for sure, but also of stroma, immune and connective tissue. But it is also sure that predicting squamous or non-squamous subtype on restrictive connective tissue made no sense. We think that can explain why virtual TMA improve results. Khosravi et al.13 also observed an improvement when using TMA strategy.
The limitations of our study include the small sample size, and the small number of extracted image patches in some cases, which may limit the accuracy of the model. Moreover, epithelial lung tumours may be morphologically very different. In particular, the current World Health Organization classification is more complex and separates adenocarcinoma into several different subtypes, such as lepidic, solid, acinar, and papillary. Because of the small learning set, we did not include this information in the model, but using a larger learning set with further non-squamous subtype labelling would undoubtedly improve the capacity of the CNN model to predict histological types with greater accuracy. Further studies are warranted on this point. While the learning set was performed on lung biopsy, the model is validated on either cytological or pathological samples, and also on either lung biopsy or metastatic samples. This heterogeneity in the samples may induce some bias, and may limit the accuracy of the model. However, we chose this heterogeneity to better reflect the clinical reality of lung cancer diagnosis.
We compared our results to those obtained in lung cancer in other recent works8,13,14,15,16. In these works, models were trained on H&E TCGA slides, thus on hundreds of image slides. There were then evaluated on other public H&E slides. Coudray et al.8 get the best AUC, with a value of 0.97. Using machine learning models, Yu et al.14 get an AUC of 0.75. Other authors obtained AUC between these 2 values. Predictive abilities of our model are in the same range, although estimated on HES slides. We would like to remind that our objective was to propose a model that can be applied by pathologists belonging to French network using HE & Safran on HES diagnostic slides.
In summary, we trained and optimized an Inception V3 CNN model to classify the two common NSCLC subtypes using routine biopsy or cytological samples. Moreover, we established a virtual TMA strategy to improve predictions. Our results highlight the potential and limitations of CNN image classification models for morphology-based tumour classification.
Methods
Study population
The learning cohort comprised 132 NSCLC tumour biopsies (66 non squamous and 66 squamous samples) collected between 2015 and 2018 in the Department of Pathology of the Georges François Leclerc Cancer Center in Dijon, France.
The external validation cohort comprised 65 biopsy samples (45 non squamous and 20 squamous samples) from the University Hospital of Caen, France, using tumours collected between 2017 and 2019.
Whole slide histopathology images from 30 non-squamous and 30 squamous patients were taken from the LUAD and LUSC cohorts of the Cancer Genome Atlas (TCGA). Data were obtained from the National Cancer Institute Genomic Data Commons17.
Only patients from whom informed consent was obtained were included in this retrospective study. The present study was approved by the CNIL (French national commission for data privacy) and the Georges François Leclerc Cancer Center (Dijon, France) local ethics committee, and was performed in accordance with the Helsinki Declaration and European legislation.
Pathological diagnosis
The pathological diagnosis (adenocarcinoma versus squamous cell carcinoma) was validated for all samples by a pathologist (ALLP). Pathological classification was performed using analysis of morphology on HES stained slides and TTF1 and p40 immunohistological analysis.
Image processing
Formalin-fixed paraffin-embedded HES stained slides were digitised with a Nanozoomer HT2.0 (Hamamatsu) at 20× magnification to generate a whole slide imaging (WSI) file in ndpi format. We partitioned the WSI into non-overlapping 220 × 220 pixel tiles at 0.5 mm/pixel resolution (equivalent to 20× magnification) using QuPath v.0.2.318.
In addition, tumour regions of each slide were manually annotated by a pathologist (ALLP). Then, the centroids of each annotation were calculated. A TMA was created based on a circle with a radius of 500 µm from the centre of the centroid of the annotation. The same tiling as described above was kept.
Tile Pre-processing
Tiles were removed if they contained more than 2/3 of white background. The color channel values were normalized by Reinhard normalization to neutralize color differences between slides19. This normalization uses a linear transformation to match the mean and standard deviation between slides. The color channel values were scaled to a floating value range of [0, 1].
Training, validation and test sets were generated using respectively 60%, 20% and 20% of tiles. Tiles associated with a given slide were not separated, but associated with one of these sets to prevent overlap of slides between the three sets.
Deep learning model
We estimated a model based on InceptionV320. The idea behind the Inception architecture is to use a series of convolutional blocks to both decrease the number of parameters in the network and improve its performance. The main components of a convolutional block are convolutional and pooling layers. To make the algorithm more robust against image variations, and to add a regularisation effect, we applied data augmentation techniques. This included techniques such as randomly flipping the images left–right and up-down with additional random rotations.
The model was fully trained for one hundred epochs on the augmented training set, starting with an initial learning rate of 0.001, decaying by a factor of 0.9 every five epochs and using the Adam optimisation algorithm21 with a momentum of 0.9 and epsilon of 1e−7. We used a batch size of 100 tiles.
Due to an unequal number of extracted tiles for each class (unbalanced dataset), we used a weighted loss function allowing direct penalization of false predictions during the training process. Negative and false positives were equally penalized with a 1.5 factor.
Patient inference
We then classified each tile and filtered out low-confidence predictions by using thresholding. Thresholds were determined by a grid search over each class, optimizing the correct classification rate22.
The CNN can be used directly as a classifier, but it predicts each tile independently and ignores spatial correlations. To take advantage of the neighbourhood of each tile, filter kernel algorithms aimed at extracting spatial information were used; the filter kernel takes advantage of the label distribution of neighbouring patches to re-estimate the output of CNNs. A logistic regression algorithm was used as the strategy for parameter estimation of the filter kernel23.
If the label of a tile is the same as the label of the neighbouring tiles, its probability will be increased. Conversely, it will have a lower probability when its label differs from that of its neighbours.
To classify the whole slide, we used two methods24. The first, called “majority vote”, assigned the most frequent class to the slide. The second, called “max pooling”, assigned the class with the highest probability to the slide.
These different strategies were applied on tiles from the whole slide as well as on tiles from TMA only in order to focus the results on tumour regions. More precisely, the training was the same in both strategies. A unique model was estimated based on tiles taken from whole slides of training set. This model was then evaluated on testing and external validation sets performing prediction either using tiles from whole slides, or using tiles restricted to TMA regions.
Receiver operating characteristic (ROC) curve and area under the curve (AUC) analysis were performed to evaluate the abilities of the different strategies to predict the class of tiles from whole slide and from TMA at slide and patient levels in the external validation cohort.
Software
The deep learning model was implemented and trained using TensorFlow 2.1.0 and python 3.5. Calculations were performed using HPC resources from DNUM CCUB (Centre de Calcul de l’Université de Bourgogne).
Data analysis was performed using R statistical software (http://www.R-project.org/).
Data availability
Images from training and validation cohorts, as well as the code used for statistical analysis are available from the corresponding author on reasonable request.
References
Hanna, N. H. et al. Therapy for stage IV non-small-cell lung cancer without driver alterations: ASCO and OH (CCO) Joint Guideline Update. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 38, 1608–1632 (2020).
Hanna, N. H. et al. Therapy for stage IV non-small-cell lung cancer with driver alterations: ASCO and OH (CCO) Joint Guideline Update. J. Clin. Oncol. Off. J. Am. Soc. Clin. Oncol. 39, 1040–1091 (2021).
Bernicker, E. H., Miller, R. A. & Cagle, P. T. Biomarkers for selection of therapy for adenocarcinoma of the lung. J. Oncol. Pract. 13, 221–227 (2017).
Travis, W. D. et al. The 2015 World Health Organization classification of lung tumors: Impact of genetic, clinical and radiologic advances since the 2004 classification. J. Thorac. Oncol. Off. Publ. Int. Assoc. Study Lung Cancer 10, 1243–1260 (2015).
Vanderlaan, P. A. et al. Success and failure rates of tumor genotyping techniques in routine pathological samples with non-small-cell lung cancer. Lung Cancer Amst. Neth. 84, 39–44 (2014).
Halliday, P. R., Blakely, C. M. & Bivona, T. G. Emerging targeted therapies for the treatment of non-small cell lung cancer. Curr. Oncol. Rep. 21, 21 (2019).
Luo, X. et al. Comprehensive computational pathological image analysis predicts lung cancer prognosis. J. Thorac. Oncol. Off. Publ. Int. Assoc. Study Lung Cancer 12, 501–509 (2017).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Cui, M. & Zhang, D. Y. Artificial intelligence and computational pathology. Lab. Investig. J. Tech. Methods Pathol. 101, 412–422 (2021).
Kriegsmann, M. et al. Deep learning for the classification of small-cell and non-small-cell lung cancer. Cancers 12, 1604 (2020).
Gertych, A. et al. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Sci. Rep. 9, 1483 (2019).
Chen, M. et al. Classification and mutation prediction based on histopathology H&E images in liver cancer using deep learning. Npj Precis. Oncol. 4, 1–7 (2020).
Khosravi, P., Kazemi, E., Imielinski, M., Elemento, O. & Hajirasouliha, I. Deep convolutional neural networks enable discrimination of heterogeneous digital pathology images. EBioMedicine 27, 317–328 (2018).
Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
Yu, K.-H. et al. Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks. J. Am. Med. Inform. Assoc. JAMIA 27, 757–769 (2020).
Noorbakhsh, J. et al. Pan-cancer classifications of tumor histological images using deep learning. https://doi.org/10.1101/715656v1 (2019).
Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).
QuPath: Open source software for digital pathology image analysis. Scientific Reports. https://www.nature.com/articles/s41598-017-17204-5.
Reinhard, E., Ashikhmin, M., Gooch, B. & Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 21, 34–41 (2001).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. (2016). https://doi.org/10.1109/CVPR.2016.308.
Kingma, D. & Ba, J. Adam: A method for stochastic optimization. Int. Conf. Learn. Represent. (2014).
Wei, J. W. et al. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci. Rep. 9, 3358 (2019).
Ye, J., Luo, Y., Zhu, C., Liu, F. & Zhang, Y. Breast cancer image classification on WSI with spatial correlations. in ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1219–1223 (2019). https://doi.org/10.1109/ICASSP.2019.8682560.
Hou, L. et al. Patch-based convolutional neural network for whole slide tissue image classification. in Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016 (2016).
Acknowledgements
We wish to thank Fiona Ecarnot, PhD (EA3920, University of Franche-Comté, Besançon, France) for English correction and helpful comments. Calculations were performed using HPC resources from DNUM CCUB (Centre de Calcul de l’Université de Bourgogne).
Author information
Authors and Affiliations
Contributions
A.-L.L.P., V.D., F.B., F.G., E.B. and C.T. contributed to the design. A.I. and D.R. generated the data. E.B. and C.T. analysed the data. All authors contributed to the writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Le Page, A.L., Ballot, E., Truntzer, C. et al. Using a convolutional neural network for classification of squamous and non-squamous non-small cell lung cancer based on diagnostic histopathology HES images. Sci Rep 11, 23912 (2021). https://doi.org/10.1038/s41598-021-03206-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-03206-x
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.