The study of automatic machine learning base on radiomics of non-focus area in the first chest CT of different clinical types of COVID-19 pneumonia

To explore the possibility of predicting the clinical types of Corona-Virus-Disease-2019 (COVID-19) pneumonia by analyzing the non-focus area of the lung in the first chest CT image of patients with COVID-19 by using automatic machine learning (Auto-ML). 136 moderate and 83 severe patients were selected from the patients with COVID-19 pneumonia. The clinical and laboratory data were collected for statistical analysis. The texture features of the Non-focus area of the first chest CT of patients with COVID-19 pneumonia were extracted, and then the classification model of the first chest CT of COVID-19 pneumonia was constructed by using these texture features based on the Auto-ML method of radiomics, The area under curve(AUC), true positive rate(TPR), true negative rate (TNR), positive predictive value(PPV) and negative predictive value (NPV) of the operating characteristic curve (ROC) were used to evaluate the accuracy of the first chest CT image classification model in patients with COVID-19 pneumonia. The TPR, TNR, PPV, NPV and AUC of the training cohort and test cohort of the moderate group and the control group, the severe group and the control group, the moderate group and the severe group were all greater than 95% and 0.95 respectively. The non-focus area of the first CT image of COVID-19 pneumonia has obvious difference in different clinical types. The AUTO-ML classification model of Radiomics based on this difference can be used to predict the clinical types of COVID-19 pneumonia.

often not typical, some of which cannot clearly be used to diagnose and classify the pneumonia of COVID-19, thus limited the value for clinical design of treatment plan.
The inflammatory reaction of interstitial and alveolar edema in Non-focus lung tissue during the early lung injury of COVID-19 pneumonia, which is difficult to be distinguished by eyes on CT images 4,9,10 . As an extension of computer-aided diagnosis, Lambin proposed the Radiomics method in 2012 11 . It will extract and analyze image texture features and combine them with other available patient data to enhance the ability of decision model. The method of Radiomics analysis can make the inflammatory reaction of alveolar interstitium and alveolar edema in the Non-focus area which is difficult to be distinguished by eyes in the early chest CT image of COVID-19 pneumonia become the image information that can be excavate and utilized.
Therefore, our aim is to establish and validate a prediction model of Non-focus area in the early stage of COVID-19 pneumonia by excavate the texture features of the first chest CT image with the method of Auto-ML, and to evaluate the value of the model in the degree of Non-focus area damage and clinical classification in the early stage of COVID-19 pneumonia.

Methods
The study is based on the principles of the Helsinki declaration. The Ethics Committee of the PLA Central Theater General Hospital approved this study because it is a retrospective study, giving up the need for written informed consent (Decision/Protocol number: [2020]030-1).

Patients selection.
Collected 2680 patients with COVID-19 pneumonia diagnosed according to the COVID-19 diagnostic and therapeutic regimen (trial 7th edition) in China (www.nhc.Gov.cn/yzygj /s7652 m/20200 3/a3119 1442e 29474 b98bf ed557 9d5af 95.shtml ), From January 2020 to February 2020. They were included in the study according to the following conditions: 1. Hospital patients. 2. The clinical information and laboratory examination were complete, and at least two lung CT examinations (including the first CT examination) were performed within one week after hospitalization. 3. Positive results of severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) in nasopharynx swab by RT-PCR. 4. Cases with a history of lung surgery, lung tumors, or any other cause of pneumonia were excluded. Finally, 219 patients were included in the study (Fig. 1). In order to prevent asymptomatic cases infected with COVID-19 virus from being added to the control group, we randomly selected 100 cases from the physical examination population who had chest CT examination and no lung lesions between January and February 2019 as the control group (Fig. 2).
Clinical characteristics, including age, gender, temperature, cough, sputum, nausea and vomiting and other clinical symptoms; white blood cells (WBC), lymphocytes, alanine aminotransferase (ALT), aspartate aminotransferase 12 , C-reactive protein (CRP), fibrinogen, urea (URE), creatinine (CRE) were obtained from the medical records, see Table 1. The clinical symptoms were the symptoms at the time of admission, and blood samples were taken for examination within 3 days after admission.
According to the scheme of "COVID-19 diagnostic and therapeutic regimen (trial 7th edition) in China", the moderate degree cases are defined as the patients with fever, respiratory symptoms and other clinical symptoms, and the chest image can show pneumonia. The severe cases were defined as adults who met any of the following criteria: respiratory rate ≥ 30 times/min; oxygen saturation ≤ 93% at rest; arterial oxygen partial pressure (PaO 2 )/ oxygen concentration (FiO2) < 300 mmHg. In the lung CT examination, the patients whose focus increased more than 50% within 24-48 h should be considered as severe. All the 219 patients were of moderate degree at the time of admission, 83 of them developed to serious degree in 7-13 days after admission, and the other 136 cases were stable in the moderate degree (Figs. 2 and 3).   13 . Firstly, take regional growth to draw the volume of interest (VOI) of the non-focus part of the lung, then two radiologists with more than 10 years of experience manually modified and shrunk the VOI edge to 3 mm from the focus edge. Data Supplement presents the VOI drawing methods and modification criteria (Fig. 4).

Radiomics features extraction.
In this study, we used Python language (version 3.7.4) program to call the Pyrometric package (version 2.2.0) 14,15 . In the process of program running, seven filters are used to process the original VOI. Seven categories (each category including: 18 features of first-order statistics (FOS), 24 features of gray level co-occurrence matrix (GLCM), 16 features of gray level run matrix (GLRLM), 16 features of gray level size region matrix (GLSZM) and 5 of neighbouring gray tone difference matrix (NGTDM) are extracted from each filtered image There are 1674 texture features. Together with 14 shape features of the original image, 1688 features were extracted for this study. For more information on the methods and parameters of feature extraction in radiomics 16 , see Table 2.

Auto-ML.
In the texture feature data, since the shape related parameters of the control group and the study group are significantly different, they are removed from the data matrix during the analysis. Tree-based pipeline optimization tool (TPOT) (epistasislab. github. io/tpot) is a python Auto-ML tool based on genetic algorithm to optimize Auto-ML pipeline [17][18][19] . In the process of Auto-ML, each group's original data is imported into TPOT, and TPOT randomly divides the original data into training set and test set according to the proportion of 8:2. In the Auto-ML process of training set, TPOT repeatedly carries out data cleaning, feature selection, feature preprocessing, feature construction, model selection and parameter optimization through intelligent exploration of thousands of possible pipeline, automatically realizes feature analysis of shadow parts, and carries out in training set verification. After the exploration and verification, the available Python code containing classifier information and corresponding parameter settings is generated (Fig. 5).

Classification model testing.
According to the results of TOPT analysis, classifier was selected and classifier parameters were set (generations = 5, population size = 20, verbosity = 2). Three models of Moderate and Severe group, Moderate and control group, Severe and control group, were established respectively. The test set data of each group was used to test with the corresponding classifier and optimization parameters (Fig. 5).  Radiomics's auto-ML model performance and its classifier verification. Figure 5 summarizes the manifestations of the radiomics Auto-ML model in the first CT images of the Non-focus area of COVID-19 pneumonia. The samples of the moderate group, severe group and control group in this study were randomly  Table 3. ROC curves are shown in (Fig. 6). changes of the first chest CT and the second chest CT image after 7 days of treatment in patients with "severe" and "moderate", respectively. A 43-year-old male patient classified as "Severe", showed the typical lesion level image of the first CT examination, and the subpleural areas of the two lungs are scattered with mass like ground glass shadow (a), showed the CT image of reexamination after 7 days of hospitalization treatment. The original lesion range is expanded, and the strip like consolidation area appears in the lesion (b); a 65-year-old man classified as "Moderate" was admitted to the hospital due to fatigue, headache and muscle ache. showed the typical focus level image of the first CT examination, and the focus similar to patient 1 can be seen, but the central area of the focus is partially solid (c), showed the CT image of reexamination 7 days after hospitalization, the original focus range of the two lungs is narrowed, the exudation part of the focus is absorbed, and the edge of the focus is clear (d).

Discussion
At present, the CT studies of COVID-19 pneumonia are all focused on the focus of pneumonia, there is no study on the non-focus area. As we all know, viral pneumonia is a widespread interstitial inflammation in the lung 20 . In the early stage of pulmonary interstitial inflammation, CT images can hardly to reflect the pathological changes of the lung. Therefore, this study uses the Auto-ML method of radiomics based on CT to study the Non-focus area of COVID-19 pneumonia, in order to find the changes of Non-focus area that CT images cannot find.
The study of non-focus tissue in the lung will help clinicians to broader perspective on recognizing COVID-19 pneumonia. Meanwhile it also beneficial to optimize the treatment plan, block the progression of disease, reduce symptoms and improve the cure rate of severe patients. According to the existing data, this is the first time to use CT image-based radiomics to study the non-focus area of COVID-19 pneumonia 2,4,5,9 .
Studies have shown, that the early pathological manifestations of lung injury caused by COVID-19 virus included edema of alveolar epithelial cells and alveolar septum in different degrees, uneven surface of alveoli, and more cytoplasmic vesicles in type I alveolar epithelial cells 21 . These vesicles gradually burst and release fluid, causing morphological changes of alveolar cells, such as cell swelling, deformation, DNA breakage, etc. With the necrosis of the alveolar cells, the pulmonary capillaries further ruptured, resulting in alveolar hemorrhage, pulmonary infection and pulmonary fibrosis. This may be the root cause of severe pneumonia in COVID-19 22 . Radiomics medicine can extract a lot of texture feature information from the image to reflect the heterogeneity of damage. For example, GLCM mainly reflects the characteristics of the internal structure of the image through the change of density 14,20,23 . Therefore, even if no lesions are found on the CT images, we can also analyze different types of texture features extracted to determine whether the lung tissue is damaged. In this study, through In the pipeline, The training set was repeated for data cleaning, feature selection, feature construction, feature processing, model selection and parameter optimization, until a pipeline (classifier) optimized for the parameters of a specific model was selected. In the last stage of the pipeline, the data of the training set and the test set were mixed, then the data was put into the optimal pipeline to verify whether the pipeline and parameters were optimal. The specific operators selected in the best pipeline include the built-in TPOT operator (OneHotenCoder, FeatureSetSelector) and the functions in the scikitlearn library (ExtraTreesClassifier and RandomForstClassifier). www.nature.com/scientificreports/ the analysis of AUTO-ML classification model, there are significant differences in the texture characteristics of non-focus area in the first CT image between the moderate and severe groups, and there are also significant differences between the moderate and severe groups and the control group, which is similar to the results of Yanling's study of different types of pneumonia with radiomics 15 . Different from other radiomics studies, the classification technology of Auto-ML used in this study avoids the limitations of manual selection of machine learning classifiers. Feature selection, feature preprocessing, feature construction, model selection and super parameter optimization 17,18 are the advantages of TOPT module. Its main code modules are Sklearn and XGBboost, which are commonly used by Auto-ML researchers. According to the results of auto-ML classification of Radiomics, the classifiers used in establishing the classification model of moderate group & severe group are different from those of moderate group& control group and severe group/ control group. Although the classification models of moderate group& control group and severe group& control group used the same classifier, the classifiers aim at different models in the calculation process the parameters were optimized. This indicated that TPOT has customized the best model for each data matrix.
In this study, we collected demographic factors, clinical symptoms on admission, and laboratory tests that may be relevant to identification. However, there was no difference between the moderate and severe focus in the early stage of the disease. When the experimental data showed differences, the patient's condition had been Figure 6. (a-c) ROC diagrams. The AUC of "moderate" & "severe" training set and test set were 0.98/0.95, respectively (a); The AUC of the training set and the test set in the "moderate" & "control" group were 0.97/0.98, respectively (b); The AUC of the training set and the test set in the "severe" & "control" group were 0.99/0.95, respectively (c).

Scientific Reports
| (2020) 10:18926 | https://doi.org/10.1038/s41598-020-76141-y www.nature.com/scientificreports/ aggravated. Therefore, it is an effective way to reduce the rate of severe conversion by effectively predicting the Non-focus area before the patient's condition turns to severe. In this study, a simple, stable and efficient semi-automatic region growing method, human-computer interaction segmentation method, is selected. Combined with manual modification, the accuracy and repeatability of VOI description are improved. This is of great significance to the accurate segmentation of Non-focus area for feature extraction and model construction. In addition, we chose Non-focus area as VOI. Avoiding the damage of COVID-19 pneumonia, including GGO, consolidation, thickening of broncho-vascular bundle, cystic change and pulmonary vessels and trachea in Non-focus area, which not only avoids the influence of subjective factors, but also can fully measure the severity and degree of lung injury.
However, limitations still existed. Firstly, 219 cases included in the study, thus samples number was relatively insufficient, while there was a risk of over fitting in machine learning and deep learning. Secondly, the data of this study came from the same institution. Although it is a good radiology model for this institution, it is necessary for more research institutions to carry out data sharing, verification, cooperation thus to establish a more general COVID-19 pulmonary inflammation model. Thirdly, there is no completed biological explanation of radiomics features in this study which showing further exploration is needed in the future.
In conclusion, the authors believe that the Radiomics Auto-ML classification model based on the analysis of Non-focus area in the first chest CT image of COVID-19 pneumonia can effectively classify the clinical types of COVID-19 pneumonia.