Artificial intelligence-assisted fast screening cervical high grade squamous intraepithelial lesion and squamous cell carcinoma diagnosis and treatment planning

Every year cervical cancer affects more than 300,000 people, and on average one woman is diagnosed with cervical cancer every minute. Early diagnosis and classification of cervical lesions greatly boosts up the chance of successful treatments of patients, and automated diagnosis and classification of cervical lesions from Papanicolaou (Pap) smear images have become highly demanded. To the authors’ best knowledge, this is the first study of fully automated cervical lesions analysis on whole slide images (WSIs) of conventional Pap smear samples. The presented deep learning-based cervical lesions diagnosis system is demonstrated to be able to detect high grade squamous intraepithelial lesions (HSILs) or higher (squamous cell carcinoma; SQCC), which usually immediately indicate patients must be referred to colposcopy, but also to rapidly process WSIs in seconds for practical clinical usage. We evaluate this framework at scale on a dataset of 143 whole slide images, and the proposed method achieves a high precision 0.93, recall 0.90, F-measure 0.88, and Jaccard index 0.84, showing that the proposed system is capable of segmenting HSILs or higher (SQCC) with high precision and reaches sensitivity comparable to the referenced standard produced by pathologists. Based on Fisher’s Least Significant Difference (LSD) test (P < 0.0001), the proposed method performs significantly better than the two state-of-the-art benchmark methods (U-Net and SegNet) in precision, F-Measure, Jaccard index. For the run time analysis, the proposed method takes only 210 seconds to process a WSI and is 20 times faster than U-Net and 19 times faster than SegNet, respectively. In summary, the proposed method is demonstrated to be able to both detect HSILs or higher (SQCC), which indicate patients for further treatments, including colposcopy and surgery to remove the lesion, and rapidly processing WSIs in seconds for practical clinical usages.


Data and results
Material. De-identified, digitized whole-slide images of conventional Pap smear samples were obtained from the tissue bank of the Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan (n = 143 patients). A research ethics approval has been gained from the research ethics committee of the Tri-Service General Hospital (TSGHIRB No.1-107-05-171 and No.B202005070), and informed consent is formally waived by the approving committee. The data were de-identified and used for a retrospective study without impacting patient care. All methods were carried out in accordance with relevant guidelines and regulations. Cervical scrapings were collected for cytological diagnosis by gynecologists. The slides were prepared and stained by the Pap method according to the usual laboratory protocol. The screening of cytology slides was first performed by the pool of cytotechnologists, and a pathologist always confirmed abnormal results. Cytology was performed using TBS 2014. A series of negative (n = 8), ASC-US (n = 8), LSIL (n = 8), ASC-H (n = 29), HSIL (n = 74), or higher (SQCC, n = 16), and the number of per category in the dataset as shown in Fig. 2a. All patients were treated and followed by the standard clinical protocol. The patients with ASC-US underwent a repeat Pap smear within 1 year, while the patients with ASC-H, HSIL or SQCC underwent colposcopy-directed cervical biopsy and subsequent therapy when indicated. WSIs are digitized glass slides from scanning devices. All stained slides were scanned using Leica AT Turbo (Leica, Germany), at 20× objective magnification. The network was Scientific Reports | (2021) 11:16244 | https://doi.org/10.1038/s41598-021-95545-y www.nature.com/scientificreports/ instead trained and tested using non-overlapping tiles (512-512 pixels) obtained from the WSIs. Distribution of the tile numbers per WSIs as shown in Fig. 2b. In computational pathology, the massive size of WSIs is one of the challenges. For automatic analysis the average size of WSIs is 91,257 × 41,546 pixels, 45.93 × 20.91 mm in our dataset, and the size distribution of the WSIs as shown in Fig. 2c. The proposed network structure with the associated outputs of the selected layers are shown in Fig. 2d. A WSI generally contains billions of pixels, while the regions of interest could be as small as a few thousands of pixels (see Fig. 2e). We collected slide-level reviews and region-level annotations from pathologists. Slide-level reviews categorize each slide into a group of higher than LSIL including ASC-H, HSIL and SQCC. Region-level annotations represent specific HSILs or higher (SQCC) within a slide. The information for the dataset is shown in Table 1.
Experimental set-up and implementation details. In evaluation, the whole slide images were randomly split into two sets: 68% for training and 32% for testing. As shown in Table 1, the training set consists of 25 ASC-H, 57 HSIL, and 15 SQCC cases, and the tiles annotated and sampled for building AI models account for 0.006% of the training WSIs and for 0.004% of the whole data set, respectively. Moreover, the proposed framework is initialized using VGG16 model, and stochastic gradient descent (SGD) optimization and the cross entropy loss function are utilized. In addition, the network training parameters of the proposed method, includ-   19 and SegNet 20 ) are implemented using the keras impelementation of images segmentation models by Gupta et al. 21 . For training, the benchmark methods (U-net 19 and SegNet 20 ) are initialized using a pre-trained VGG16 model with the networks optimized using Adadelta optimization, and the cross entropy function is used as a loss function. In addition, the network training parameters of U-Net and SegNet, including the learning rate, dropout ratio, and weight decay, are set to 0.0001, 0.2, and 0.0002, respectively. The   Fig. 3 presents qualitative segmentation results by the proposed method and two benchmark approaches for HSILs or higher (SQCC) detection. The results demonstrate the high precision, efficiency and reliability of the proposed model. Table 3 presents detailed quantitative evaluation results in HSILs or higher (SQCC) segmentation for all samples and for separate evaluation on samples with high-grade lesions and samples with low-grade lesions or negative. The box plots of the quantitative evaluation results for all samples are provided in Fig. 4, showing that the  19 and SegNet 20 ) perform poor in detecting HSILs or higher (SQCC), obtaining precision, recall, F-Measure, Jaccard index were < 26%, < 88%, < 27%, < 23% on average, respectively. Furthermore, for statistical analysis, using SPSS software 27 , the quantitative scores were analyzed with the Fisher's Least Significant Difference (LSD) to compare multiple methods (see Table 4). In precision, the presented method achieves 92.94% averaged and significantly outperforms both benchmark methods based on LSD tests ( P < 0.0001 ). In recall, the presented method achieves 89.85% averaged and significantly outperforms the U-net method based on LSD tests ( P < 0.0001 ). In F-Measure, the presented method achieves 88.21% averaged and significantly outperforms both benchmark methods based on LSD tests ( P < 0.0001 ). In Jaccard index, the presented method achieves 83.57% averaged and significantly outperforms both benchmark methods based on LSD tests ( P < 0.0001 ). Figure 1 presents more segmentation results of randomly selected examples of by the proposed method. Table 3. Quantitative evaluation of the proposed method and two benchmark methods (U-Net and SegNet) in segmenting of HSILs or higher (SQCC). The proposed method is significantly better than the benchmarkapproaches (p<0.0001). a As there are no positive in these samples, the value is computed as 0.  Table 5a). For the run time analysis, the proposed method takes only 210 seconds to process a WSI and is 20 times faster than U-Net and 19 times faster than SegNet, respectively (see Table 5b). In comparison with Araújo et al. 's method 17 , the proposed method processes 923,520 more pixels per second, and Araújo et al. 's method requires twice of hardware memory (251 GB) than the proposed method (128 GB). Overall, the proposed method is demonstrated to be able to both detect HSILs, which indicate patients for further treatments, including colposcopy and surgery to remove the lesion, and rapidly processing WSIs in seconds for practical clinical usages.

Discussion
To the authors' best knowledge, this is the first work on automated cervical HSILs or higher (SQCC) analysis of WSIs on conventional Pap smear samples for practical usages. Our study demonstrates that the proposed new cervical Pap smear diagnosis system could be used to assist in automatic detection and quantification of cervical HSILs or higher (SQCC) from WSIs. The proposed method achieved a high precision 0.93, recall 0.90, F-measure 0.88, and Jaccard index 0.84, and capable of unambiguously segmenting HSILs or higher (SQCC), showing that the proposed system is robust and capable of segmenting HSILs or higher (SQCC) with high precision and reaches sensitivity comparable to the referenced standard produced by pathologists. Moreover, the proposed method significantly outperforms two state of the art deep learning approaches ( P < 0.001 ). Cervical cancer develops through persistent infection with high-risk human papilloma virus (HR-HPV) and is a leading cause of death among women worldwide 28 . Regular screening strategies using HR-HPV, Pap smear and colposcopy alone or in combination can prevent the onset and development of cervical cancer 28 . Cervical cancer incidence can be reduced by as much as 90% where screening quality and coverage are high 29 . In 2018, the United States Preventive Services Task Force (USPSTF) updated its screening guidelines. In addition to continuing to recommend triennial cytology (Pap test) for women between 21 and 29 years old, then continue with triennial cytology or increase HR-HPV testing every 5 years between 30 and 65 years old 30 . The major contribution of our proposed method in a cervical Pap smear screening workflow compared to manual cytology reading is that it reduces on the time required by the cytotechnician to screen many pap-smears by eliminating the obvious normal ones, www.nature.com/scientificreports/ hence more time can be put on the suspicious slides. In recent decades, although the conventional Pap smear method has been the mainstay of the screening procedures. However, this technique is not without limitations, because the sensitivity and specificity are relatively low. Liquid-based cytology (LBC) was introduced in the 1990s and was initially considered a better tool for processing cervical lesions. But now it has been found that LBC is more superior to conventional smears only with respect to a lesser number of unsatisfactory smears. There is no significant difference in the detection of epithelial cell abnormalities between the two methods 31 . LBC is being widely used in the United States, European countries, and many other developed nations. Although these approaches appear better clarity, uniform spread of smears, less time for screening and better handling of hemorrhagic and inflammatory samples 32 , but they are expensive and rely heavily on technology 33 . To consider the cost effectiveness and health insurance policy , the conventional Pap method is more feasible in our country. Although HPV testing is more sensitive to detect cervical precancerous lesions and cancers earlier than cytology, there are currently costs, infrastructure considerations and specificity issues that limit its use in low-and middleincome countries 34 . The high frequency of transient HPV infection among women younger than 30 years can lead to unnecessary follow-up diagnostic and treatment interventions with potential for harm 35 . For the HR-HPV screening, the Food and Drug Administration (FDA) approved cobas HPV testing. This test detects HPV types 16, 18, and 26 and additional HR-HPV types 36 . Despite the reported high sensitivity (86%) and negative predictive value (82%) of HR-HPV testing 37 , some HSIL can still be missed [38][39][40] . The low specificity (31%) and positive predictive value (37%) even make the situation worse because they lead to more patients undergoing unnecessary referrals 37 . Recent studies have shown the correlation between epigenetics and development and progression of cervical cancer 41 . Increased methylation of host genes has been observed in women with cervical precancer and cancer. Several of these genes have been evaluated as candidates for triage of HPV-positive women. However, more longitudinal studies are needed to prove the longitudinal safety of negative methylation result 42 . Vaccination against HPV is a possible long-term solution for eradicating cervical cancer in developing countries, where a prophylactic HPV vaccine has already been approved. However, knowledge and awareness about cervical cancer, HPV, and the efficacy of the HPV vaccine in the prevention of cervical cancer are very low in the world. The low level of knowledge about HPV is considered to be the major hurdle for the implementation HPV vaccination programs 30 . Automatic screening of Papanicolaou system has been available for more than 25 years, such as AutoPap 300 43 46 . A fast automated deep learning system enables high throughput analysis across a wide cohort of patients, and also helps to obtain a large amount of data to analyze the enormous dimensions of large gigapixel data of WSIs. There is limited research on automated analysis of cervical lesions on conventional Pap smear WSIs. Araújo et al. 17 applied CNNs to segment LSIL or ASCUS using small size cervical cell images (1392 × 1040 pixels) acquired by manually identified regions of interests from microscopy, and Lin et al. 18 applied CNNs to classify abnormal cells using single cervical cell image with average size (110 × 110 pixels), which is carefully prepared by manual localization and extraction of microscopic images. Both methods require manual intervention to locate and acquire single-cell images or images of regions of interest. In comparison, we developed a fast and fully automatic deep learning fast screening system, which is capable of detection and quantification of HSILs or higher (SQCC) on WSIs in seconds for cervical lesion diagnosis and treatment suggestion. Our data demonstrated that AI-assisted cytology could distinguish most of CIN2+ (higher than CIN2) cytology based on a high precision 0.93, recall 0.90, F-measure 0.88, and Jaccard index 0.84. Compare to the manual cytology reading, it is close to an effective use in clinical practice due to complete CIN2+ cells labeling in a short time, which aid cytologists or cytotechnologists in screening and labeling cervical high grade dysplastic cells more easily and quickly. There are still some weaknesses in AI based Pap smear screening. When atypical cervical cells are gathered in different planes, traditional microscopes can overcome the focus problem by turning the adjustment wheel, but AI is not easy to correctly classify, such as HSIL present in three-dimensional groups closely mimic shed endometrial cells or HSIL pattern resembling reparative change. Specimen with rare, small, high nuclear to cytoplasmic ratio HSIL cells may be problematic with AI regard to identifying the single HSIL cells. The Pap smear image sometimes contain overlapping hyperchromatic crowded groups which can interfere the AI cytological diagnosis. Although our proposed method can correctly find out CIN2+ cells, but it still needs cytologists to confirm this diagnosis and divided CIN2+ cells into moderate dysplasia, severe dysplasia, squamous cell carcinoma in situ, nonkeratinizing SQCC or keratinizing SQCC. Furthermore, the proposed system is demonstrated to be superior than two state-of-the art deep learning methods, i.e. U-Net 19 and SegNet 20 , in precision, recall, F-measure, Jaccard index and computing efficiency based on the experimental results using LSD test ( P < 0.001 ). The precision, recall, F-measure and Jaccard index was calculated from a hospital-based, retrospective study using a research platform that may not be directly applicable to the clinical setting or to wider populations. The application of artificial intelligence may provide a new screening method of cervical Pap smear and warrants further validation in a larger population-based study in future work.
Our results show that the proposed AI assisted method with high sensitivity (0.9) and specificity (1.0) outperforms conventional Pap smear examination and HPV testing, overcoming the limitation of low sensitivity in conventional Pap smear slides using light microscopic examination and low specificity in HPV testing. Artificial intelligence assisted rapid screening has great potential to provide much faster and cheaper service in the future. The processing time, material and labor cost could be greatly reduced using artificial intelligence assisted rapid screening. The proposed fast screening deep learning based system could not only avoid misdiagnosis by human negligence but also resolve lengthy screening process. The proposed system is applicable for practical clinical usage worldwide for comprehensive screening and ultimately has an impact on the areas with high incidences of cervical cancer.

Methods
In this paper, we developed an efficient system to identify HSILs on WSIs with a cascaded multi-layer deep learning framework to improve accuracy and reduce the computing time. The proposed cascaded multi-layer utilizes a coarse-to-fine strategy to rapidly locate the tissues of interest and perform semantic segmentation to identify HSILs; at the coarse-level, fast localization of the tissues of interest is conducted, and at the fine-level, HSILs are identified based on the fast screening results of the tissues of interest. The framework of the proposed method is shown in Fig. 5.
Cascaded multi-layer deep learning framework. For dealing with gigapixel data efficiently, each WSI is formatted into a tile-based pyramid data structure, which is denoted by T = t l,i,j l=1,...,N where l is the cur-  The output of M l is the probabilities of HSILs P l,i,j at level l, which is used to produce an attention map a l,i,j . If any pixel of t l,i,j has a probability greater than or equal to α , set the attention map a l,i,j of that tile to 1, as shown in Eq. (6). In the practical case, α is set to 0.5.
To render tiles of interest t ′ l,i,j in every next level, the attention map a l−1 in the previous level l − 1 is used by a tile selector model l−1 with a mapping function as shown in Eq. (7). a l−1,i,j , i.e. an attention map unit at l − 1 level, is associated with 2 2z units at l level, and on the other hand, a l,i,j is associated with the attention map unit at (5) P 1,i,j x, y = M 1 t 1,i,j x, y (6) a l,i,j = 1 , max(P l,i,j (x, y)) ≥ α 0 ,otherwise Figure 5. System framework. Each WSI is formatted into a tile-based pyramid data structure and the probability of HSILs at every level is generated by multi-layer deep learning framework. A multi-layer attention map is computed to select tiles of interest at every next level by a tile selector model , and the segmentation result of HSILs is generated by the last layer deep learning model M N (t ′ N,i,j (x, y)).
Thus, a tile t l,i,j is selected for further inspection as t ′ l,i,j when the corresponding attention map unit at the previous level a l−1, ⌊i×2 −z ⌋, ⌊j×2 −z ⌋ equals to 1.
In our implementation, each tile is associated with an attention tile, which contains 16 attention-units ( 2 2 × 2 2 squares), and if the segmentation model at that level confirms any pixel of that tile associated to the attention unit(s) as the target, the attention units will be activated to select tiles for the next level.
For the subsequent levels (l = [2, N]) , t ′ l,i,j is processed by M l to generate the probabilities of HSILs P l,i,j , as shown in Eq. (8). P l,i,j is then used to produce an attention map a l,i,j for identification of tiles of interest in the next level by a tile selector model l to generate t ′ l+1,i,j as formulated in Eqs. (6)-(7).
In the level N, the selected tiles t ′ N,i,j produces probabilities P N,i,j by using (8). The segmentation result of HSILs S N,i,j (x, y) is generated by M N using t ′ N,i,j (x, y) as shown in eq.(9).

Modified FCN for segmentation of HSILs or higher (SQCC). Fully Convolutional Network (FCN)
has been demonstrated to be effective in pathology, such as segmentation of nuclei in the images 47 , cell counting in different kinds of microscopy images 48 and neuropathology 49 . The Fully Convolutional Network (FCN) 50 is mainly composed of 18 convolution layers (each convolution layer is followed by a RELU layer), five pooling layers for downsampling, a SoftMax layer and three upsampling layers, namely FCN-8s, FCN-16s, and FCN-32s models forming a three-stream net where the outputs from each stream are aggregated to form the final output. The architecture of the proposed FCN as shown in Table 6. In our preliminary test using a lung dataset provided by Automatic Cancer Detection and Classification in Whole Slide Lung Histopathology challenge, which is held with the IEEE International Symposium on Biomedical Imaging (ISBI) in 2019 51 , we discover that single-stream FCN-32s could avoid overly fragmented segmentation results in comparison to the original three-stream net, as shown in Fig. 6. In addition, the cost of training and inference time is saved dramatically. In this study, we developed an improved FCN as the base deep learning model using the single-stream FCN-32s as the upsam-(7) t ′ l,i,j = l−1 a l−1, ⌊i×2 −z ⌋, ⌊j×2 −z ⌋ , t l,i,j = t l,i,j , a l−1, ⌊i×2 −z ⌋, ⌊j×2 −z ⌋ = 1 φ, otherwise (8) P l,i,j x, y = M l (t ′ l,i,j (x, y)) (9) S N,i,j x, y = t ′ N,i,j (x, y) , M N (t ′ N,i,j (x, y)) ≥ α φ , otherwise