A novel CT scoring method predicts the prognosis of interstitial lung disease associated with anti-MDA5 positive dermatomyositis

Anti-melanoma differentiation-associated gene 5-positive dermatomyositis-associated interstitial lung disease (MDA5+ DM-ILD) is a life-threatening disease. This study aimed to develop a novel pulmonary CT visual scoring method for assessing the prognosis of the disease, and an artificial intelligence (AI) algorithm-based analysis and an idiopathic pulmonary fibrosis (IPF)-based scoring were conducted as comparators. A retrospective cohort of hospitalized patients with MDA5+ DM-ILD was analyzed. Since most fatalities occur within the first half year of the disease course, the primary outcome was the six-month all-cause mortality since the time of admission. A ground glass opacity (GGO) and consolidation-weighted CT visual scoring model for MDA5+ DM-ILD, namely ‘MDA5 score’, was then developed with C-index values of 0.80 (95%CI 0.75–0.86) in the derivation dataset (n = 116) and 0.84 (95%CI 0.71–0.97) in the validation dataset (n = 57), respectively. While, the AI algorithm-based analysis, namely ‘AI score’, yielded C-index 0.78 (95%CI 0.72–0.84) for the derivation dataset and 0.77 (95%CI 0.64–0.90) for the validation dataset. These findings suggest that the newly derived ‘MDA5 score’ may serve as an applicable prognostic predictor for MDA5+ DM-ILD and facilitate further clinical trial design. The AI based CT quantitative analysis provided a promising solution for ILD evaluation.


Results
Comparable baseline clinical features, treatment and outcomes of the derivation dataset and validation dataset were listed in Supplementary table S1. Of which, 47 (40.5%) and 21 (36.8%) patients died within six-month follow up since the time of admission, respectively (p = 0.764).
'MDA5 score': a novel CT visual semi-quantitative analysis. The pulmonary HRCT findings from visual semi-quantitative analysis of patients between survivors and non-survivors in both datasets were presented in Table 1. As expected, the ILD pattern distributed bilaterally. It was noteworthy that only GGO and consolidation patterns were significantly associated with outcome according to univariable analysis; as opposed to neither fibrosis nor the presence of pneumomediastinum or pneumothorax (PNM) at baseline. Then, the GGO and consolidation score were included in further multivariable COX regression analysis. Both total GGO score (β coefficient = 0.13, p < 0.001) and total consolidation score (β coefficient = 0.22, p < 0.001) were determined to be significantly associated with all-cause mortality ( Table 2). To simplify, a linear equation, namely 'MDA5 www.nature.com/scientificreports/ score' , by combining defined prognostic factors weighted by their β coefficients was finally generated: total GGO score + 2*total consolidation score. ROC curve analysis indicated that the optimal cutoff value for 'MDA5 score' was 18, which could efficiently predict the six-month all-cause mortality in the derivation dataset (sensitivity 70.2%; specificity 82.6%) and the validation dataset (sensitivity 85.7%; specificity 63.9%). The prediction accuracy of 'MDA5 score' calculated by AUC was 0.85 (95%CI 0.78-0.91) for the derivation dataset and 0.87 (95%CI 0.78-0.96) for the validation dataset, far ahead of the 'IPF score' , which was 0.81 (95%CI 0.73-0.89) for the derivation dataset and 0.79 (95%CI 0.68-0.91) for the validation dataset. Additionally, The Kaplan-Meier survival plots of patients in both datasets presented significant difference between the high-risk ('MDA5 score' > 18) and low-risk ('MDA5 score' ≤ 18) groups (Figs. 1, 2). The mortality of high-risk patients was 73.3% in the derivation dataset and 58.1% in the validation dataset; while the mortality of low-risk patients was 19.7% in the derivation dataset and 11.5% in the validation dataset.
'AI score': an AI algorithm-based quantitative analysis. The redundancy of the baseline fibrosis component in terms of outcome prediction made pneumonia-trained AI algorithm plausible for our MDA5 + DM-ILD patients' CT quantitative analysis (Fig. 3A). Percentage of consolidation was determined as the only significant predictor for the overall survival in the final multivariable COX model (p < 0.001) ( Table 2). Thus, the percentage of consolidation was defined to represent ' AI score' . Interestingly, the radar charts in Fig. 3B showed www.nature.com/scientificreports/ that the GGO and consolidation patterns were symmetrically distributed, with an evident 'gravity gradient' propensity to the lower area of the lungs, especially for the consolidation distribution.
Comparisons of clinical performance between 'IPF score', 'MDA5 score' and 'AI score' model. The inter-observer consistency of the two visual scoring models was assessed with an ICC of 0.69 (95% CI 0.57-0.78) for 'IPF score' and an ICC of 0.93 (95% CI 0.89-0.96) for 'MDA5 score' . Therefore, 'MDA5 score' attained a better inter-observer reproducibility. As a comparator, the detailed data of six domains for calculating 'IPF score' was presented in Supplementary table S2. Hereafter, the comparisons of model discrimination between 'IPF score' , 'MDA5 score' and ' AI score' were shown in Table 3 www.nature.com/scientificreports/ the derivation dataset and 0.77 (95%CI 0.64-0.90) for the validation dataset. Finally, the DCA further demonstrated that the 'MDA5 score' also presented with a higher overall net benefit than the other two models in terms of clinical applicability (Fig. 4).

Discussion
As a highly progressive disease, MDA5 + DM-ILD remains to be a big challenge despite of recent treatment advances 4,14 . Several prognostic indicators of the disease had been reported involving respiratory physiology parameters, laboratory biomarkers, and radiology features 1,6,15,16 . The current study focused on patients' baseline pulmonary HRCT and attempted to quantitatively assess the disease in the regard of predicting six-month mortality.
Our study takes a step-forward from the previous visual scoring methods, and extensively evaluates the distribution and extent of three basic imaging components of MDA5 + DM-ILD, i.e., GGO, consolidation and www.nature.com/scientificreports/ fibrosis. In line with prior reports, our data confirmed that the presence of fibrosis or TBE in the context of GGO or consolidation, is not of predictive value on prognosis in MDA5 + DM-ILD 5,10 . The probable explanation is that those fibrotic features are less common in this rapid progressive disease and likely to be presented, if it happens, in later stage instead of baseline. The same notion apparently holds true for the presence of PNM, which is a known severity indicator rather than a baseline predictor 17 . The combination of the extent of GGO and consolidation was found to have the best yield in terms of outcome prediction, with the area of consolidation contributing more than GGO. The image 'snapshot' might reflect the dynamic transformation from GGO to consolidation as disease progresses, just like the imaging changes observed in severe COVID-19 patients 11,12,18 . A possible shared underlying mechanism of acute lung injury in the two diseases is a very intriguing question deserves further investigation. After all, the highly activated type I interferon pathway in MDA5 + DM-ILD which suggested a possible virus-triggered response has been postulated 19,20 .
To apply AI algorithm-based quantitative imaging analysis in MDA5 + DM-ILD is a preliminarily yet novel attempt. The initial primary applicable population of this algorithm was pneumonia, or more specifically, COVID-19 disease. Of interest, our data suggested that this AI algorithm performs fairly well among MDA5 + DM-ILD patients. The performance might be further enhanced given more MDA5 + DM-ILD imaging data could be fed into its machine-learning processes. Table 3. Comparison of the prediction performance of each model. * 'MDA5 score' model performed significantly better than 'IPF score' model (p = 0.02). C-index, concordance index; 95%CI, 95% confidence interval; IPF, idiopathic pulmonary fibrosis; MDA5, anti-melanoma differentiation-associated gene 5; GGO, ground-glass opacity; AI, artificial intelligence.  Figure 4. Decision curve analysis for 'IPF score' , 'MDA5 score' and ' AI score' model. The concept of population net benefit (NB) is fundamental to decision curves (measured in the y-axis) and referred to classification accuracy of a model. Suppose high risk is defined as risk above some risk threshold R (x-axis); such high-risk patients are recommended an intervention. The NB of using the risk model was calculated by the true-positive rate, the proportion of cases with risk above risk threshold R; and the false-positive rate, the proportion of controls with risk above risk threshold R. The horizontal dotted line at NB = 0 mean a simple policy of no intervention to all patients (treat none); the gray curve in the plot depicted the NB of another simple policy: recommend the intervention to everyone regardless of risk. In our result, the 'MDA5 score' model (red line) had the highest net benefit compared to the others, almost across the full range of threshold probabilities. www.nature.com/scientificreports/ The major limitation of our study was the single-center design. Although we presented a relatively large cohort for this rare disease and performed internal validation, large-scale multi-center external validation is mandatory before the CT scoring models being utilized in a clinical setting. Based upon this, the biases of different machine conditions, patient selection and treatment protocols could be taken into consideration and subjected to better control and adjustment. In addition, longitudinal analysis on the changes of ILD patterns over time remains untouched in the current study, which deserves further exploration.
In conclusion, we have shown that a GGO and consolidation-weighted CT scoring model, along with an AI algorithm, might serve as prognostic predictors for six-month mortality in MDA5 + DM-ILD. This might facilitate future clinical trial design and precision management for this tricky disease.

Patients.
A retrospective cohort of hospitalized patients with MDA5 + DM-ILD was setup since April 2014 in our center. All patients initially fulfilled Bohan and Peter's criteria for DM or Sontheimer's criteria for clinically amyopathic dermatomyositis on admission 21,22 , were re-evaluated and considered eligible as long as they also met the recent 239 th ENMC classification criteria for DM 23 . All patients were with imaging-confirmed ILD and positive anti-MDA5 antibody. ILD course was defined as time from the first abnormal pulmonary CT which revealed ILD changes to admission. Patients with ILD course > 3 months or with coexisting malignancy (within 3 years) or with pre-existing chronic obstructive pulmonary disease were excluded. The primary outcome was the six-month all-cause mortality since the time of admission.
A total of 173 eligible patients were enrolled and were further divided into two datasets. Patients admitted between April 2014 and December 2018 (n = 116) versus those admitted between January 2019 and January 2020 (n = 57), were defined as the derivation dataset and the validation dataset, respectively (Fig. 1).
Clinical data including age, gender, physical findings, respiratory function, treatment history and outcomes were obtained from medical records. The study was approved by the Shanghai Jiao tong University School of Medicine, Renji Hospital Ethnics Committee. The need to obtain informed consent was waived by the same committee. All methods performed in the study involving human participants were in accordance with the ethical standards of the Helsinki Declaration and its later amendments or comparable ethical standards.
Measurement of autoimmune antibodies. The semi-quantitative detection of anti-MDA5 and other myositis specific antibodies (MSAs) was performed with EUROLINE Autoimmune Inflammatory Myopathies 16 Ag (IgG) (Euroimmun, Germany).
Quantification of anti-MDA5 antibody as confirmatory was conducted by the enzyme linked immunosorbent assay (ELISA). Firstly, purified recombinant MDA5 antigen (rMDA5) (Freezone Biotechnology co., LTD, Shanghai, China) diluted to 5 μg/mL in phosphate-buffered saline (PBS), was coated onto 96-well Microtiter plates (Maxisorp; Nunc, Rochester, NY, USA) overnight at 4 °C. The plates were washed twice with PBS and blocked with PBS containing 1% bovine serum albumin (BSA) and 5% sucrose overnight at 4 °C. Secondly, the serum samples were diluted at 1:101 in PBS containing 0.5% sodium chloride, 0.15% Tween 20, 0.2% BSA. Incubated for 30 min at room temperature. The plates were then washed four times with PBS containing 0.05% Tween 20 and incubated with Goat-conjugated anti-human IgG (PROMEGA, USA) diluted 1:60,000 in Conjugate Stabilizer (Thermo, USA). Finally, after incubation for 30 min at room temperature, the plates were washed 4 times and the bound antibodies were detected with the peroxidase substrate, 3, 3' , 5, 5'-tetramethylbenzidine. After incubation for 10 min at room temperature, the reaction was stopped by the addition of 0.5 N sulfuric acid. Absorbance at 450 nm (A) was measured, and unit values (IU/mL) were calculated from the following formula: 100 × (sample OD-blank OD) / (anti-MDA5-positive reference OD-blank OD). The cut-off level was set at 35 IU/ml.

HRCT images acquisition and visual scoring.
Patients underwent non-contrast pulmonary HRCT at the day around admission (median, 2 days; range, 1-6 days), using multidetector CT scanner (United Imaging, Shanghai, China; Siemens Healthineers, Forchheim, Germany). CT slice thickness was 1.0-1.5 mm at 10 mm intervals in the whole lungs.
All CT images were reviewed by two observers (YZ with 10-years' experience and CZ with 5-years' experience in chest HRCT imaging evaluation) who were blinded to patients' outcome. Inter-observer variability was evaluated by Intraclass correlation coefficient (ICC). The results were agreed upon by consensus between the two observers.
For the previously reported IPF-based visual scoring method ('IPF score'), HRCT findings were graded on a scale of 1-6 based on the classification system: 1, normal attenuation; 2, GGO without TBE; 3, consolidation without TBE; 4, GGO associated with TBE; 5, consolidation associated with TBE; and 6, honeycombing ( Fig. 1) 8 . The overall 'IPF score' was calculated by summing the average score of six zones (upper, middle, and lower on both sides) as described; and was used as a comparator for the following analysis.
Three components, i.e. GGO, consolidation and fibrosis, were separately rated and recorded according to pulmonary involvement area of the five lobes (right upper, right middle, right lower, left upper and left lower lobes of the lung). The 0-5 scoring for GGO or consolidation at each lobe was adopted (0, no involvement; 1, ≤ 5% involvement; 2, 5 to < 25% involvement; 3, 25-49% involvement; 4, 50-75% involvement; 5, > 75% involvement). Similarly, the fibrotic change in each lobe was classified into 5 grades (0, no fibrosis; 1, interlobular septal thickening without honeycombing; 2, honeycombing < 25%; 3, 25-49%; 4, 50-75%; 5, > 75% of the lobe) as fibrosis score 9,10 . The respective total score of each component (GGO, consolidation and fibrosis) was the sum of each lobe's score and ranged from 0 (no involvement) to 25 (maximum involvement). www.nature.com/scientificreports/ AI algorithm-based CT quantitative analysis. The Digital Imaging and Communications in Medicine files of CT images were inputted and run on a software package named "CT Pneumonia Analysis" (syngo. via Frontier 1.0, Siemens Healthineers, Forchheim, Germany). The algorithm had been first trained on a large cohort of patients with various diseases, then fine-tuned with a cohort with abnormal patterns including GGO, consolidation, effusions, and masses, to improve the robustness of the lung segmentation over the involved areas. Based on 3D segmentations of lesions, lungs, and lobes, the AI algorithm automatically detected and quantified abnormal tomographic patterns commonly present in pneumonia, such as GGO and consolidation both globally and lobe-wise. The percentage of total opacity (total lesions) as well as the percentage of consolidation (with a cutoff of CT value ≥ -200 Hounsfield unit) was directly calculated for the whole lung. Then by subtracting consolidation from total lesion, the percentage of GGO was obtained for further analysis.
Statistical analysis. Clinical data were described and compared between the derivation and validation datasets by univariable analysis. The Mann-Whitney U test, Chi-square test and Fisher's exact test were conducted, as appropriate. Clinical features with > 5% missing data were excluded for analysis. Due to the retrospective observational design, no sample size calculation was performed in the current analysis.
Among the three visual scoring components, i.e. GGO, consolidation and fibrosis, variables significantly associated with outcome in the univariable analysis were subsequently included in the multivariable COX proportional hazards model. The derived β regression coefficients were used to construct a linear weighted scoring model, defined as 'MDA5 score' . Likewise, the percentage of GGO and consolidation from AI algorithm based quantitative analysis were used to construct another weighted scoring model, defined as ' AI score' .
The optimal cutoff value of CT score was identified by receiver operating characteristic curve analysis. The association between CT score and six-month survival were assessed by Kaplan-Meier survival plot and log-rank test.
Model discrimination of the 'IPF score' , 'MDA5 score' and ' AI score' models were quantified and compared by the Harrell concordance index (C-index) with 95% confidence interval (CI). A decision curve analysis (DCA) was built to determine and compare the clinical usefulness of each model 24 . Significance was defined as p < 0.05.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.