Introduction

Late cardiac toxicity after breast irradiation is a major adverse event in left-sided breast radiotherapy (RT)1,2,3,4,5,6. Darby et al. showed the relationship between the mean heart dose (MHD) and the frequency of major coronary events5. Deep inspiration breath-hold (DIBH) effectively reduces MHD compared to free-breathing (FB) RT7,8,9,10,11,12. Rochet et al. reported in their study that the reduction of MHD was > 0.9 Gy in 75% of patients and < 0.9 Gy in 25%13. Past studies have attempted to predict MHD using some parameters acquired in the simulation CT14,15,16,17,18,19,20,21,22,23,24,25,26,27. Most studies used such CT-based parameters, but some used non-CT parameters (e.g., BMI, pulmonary function test)14,28,29,30,31,32,33,34. Although non-CT parameters may have advantages over CT parameters in terms of earlier availability and reduced patient radiation exposure, no reports have high prediction accuracy using non-CT parameters. We previously investigated non-radiological parameters for preoperative prediction of MHD. Vital capacity was a significant predictor of MHD in DIBH (MHDDIBH), but it still did not work as an accurate prediction34.

The machine learning (ML) technique has been widely used in the medical field35,36. Many studies have used the ML approach with radiological images, and recently chest X-rays have been actively studied as a diagnostic ML tool in Covid-1937,38. Chest X-rays are the most frequently taken and easily available radiological images. Therefore, we wondered if the ML chest X-ray model could predict the cardiac dose of the breast RT, it might be easier and earlier to select which patients have significant benefit from DIBH.

The purpose of this study is to predict MHD in FB (MHDFB) and MHD reduction between DIBH and FB (∆MHD) using a machine learning method with preoperative chest X-rays.

Methods

Patient selection

This study is a prediction model development study approved by our institutional review board. All participants provided written informed consent and all methods were performed in accordance with the relevant guidelines and regulations. The eligibility criteria are as follows: histologically proven diagnosis of invasive ductal carcinoma or carcinoma in situ of the left breast, patients who underwent DIBH-RT after breast-conserving surgery from June 2018 to October 2021. Patients who did not receive preoperative chest X-rays were excluded. All data were retrospectively collected randomly split into two cohorts (training cohort: n = 78, test cohort: n = 25).

Planning CT simulation

The DIBH-RT method of this study has implemented a technique of Bartlett et al.10. Described as our previous study, we trained patients to inhale, exhale, and hold deep breaths. The breath-hold training time was initially 5–10 s and increased to 20 s26,34. The simulation and training took about 20–30 min per patient. After confirming the respiratory motion, all patients underwent two planning CT simulations (FB and DIBH) in the supine position on a wing board with the arms stretched overhead. We used the Aquilion LB CT system (Canon Medical Systems, Tochigi, Japan) with a slice thickness of 3 mm.

Treatment planning

We perform the contouring and planning on FB- and DIBH-CT using RayStation version 9 (RaySearch Laboratories AB, Stockholm, Sweden). The calculation algorithm is Collapsed Cone version 5.1. The planning target volume (PTV), including CTV with a 5-mm margin, was prescribed 42.56 Gy in 16 fractions with the Varian TrueBeam system (Varian Medical Systems, Palo Alto, USA)26,34. The clinical target volume (CTV) and the heart were delineated following the consensus guideline and atlas validation study39,40. The CTV was cropped withing 5 mm of the skin contour. Treatment plans consist of three-dimensional conformal radiotherapy using two opposing tangential beams and a field-in-field technique.

Development of the chest X-ray model

Figure 1 shows a pipeline outlining the modeling procedure and evaluation.

Figure 1
figure 1

A pipeline of modeling procedure and model evaluation. T training, V validation, CNN convolutional neural network, MHD mean heart dose.

As the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guideline described, the data is split into the following groups; Model development group (Training: N = 59, Validation: N = 19), and Test group (Test: N = 25)41. Although the optimal ratio for the number of patients in each group has not been established, 60/20/20 and 70/15/15 are frequently used empirically; The ratio of each group in this study was determined based on several previous studies27,42. A regression model was trained with the training group, and the predicted MHD was validated against the validation group. Input values and size were searched from the parameters in previous studies and finally determined to achieve the best prediction results in the validation group26,27,30,34,42. Table 1 shows the convolutional neural network (CNN) architecture with the determined parameters.

Table 1 The detailed structure of CNN used in this study.

The architecture has three inputs: an anteroposterior chest X-ray image (1, 64, 64) as input 1, a lateral chest X-ray image (1, 64, 64) as input 2, and a patient's age (y), height (cm), and weight (kg) as input 3. First, we multiply the input1 and two tensors at the element level (i.e., multiplying each pixel of images). Then, convolution is performed twice for the multiplied data (1, 64, 64), followed by Rectified Linear Units (ReLU) and batch normalization. The resulting tensors were then fully connected and concatenated with input 3. Then performed another full-connection process, The predicted MHD was produced as an absolute value of the final output. Finally, predicted MHD is trained using the mean squared error as the loss function with 100 epochs.

Model evaluation and statistical analyses

The primary prediction outcome is ∆MHD. The model is trained to achieve a high prediction accuracy of ∆MHD in the training cohort. The prediction performance of the developed model is evaluated in an independent test cohort. As our previous study26, we use the model as a binary classifier to determine if a patient would potentially receive ∆MHD > 1 Gy or not. The model performance is also evaluated as a regression model by calculating the median and interquartile range of absolute residuals, the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE). The secondary outcome is defined as the prediction accuracy of MHDFB. The prediction performance is evaluated in the same way as the primary outcome, but the cutoff value of classification is set as MHDFB > 2 Gy following some previous reports6,43,44.

Statistical analysis was performed using R version 3.6.1 (The R Foundation for Statistical Computing, Vienna, Austria). The required sample size of test data is based on ∆MHD: we set the cutoff value of < 1 Gy as the classification point. According to our training data, 50% of patients had > 1 Gy. We estimated at least ten events (i.e., 20 patients) are required. P < 0.05 (two-sided) was considered statistically significant.

Ethics approval and consent to participate

The Institutional Review Board (IRB) of Aichi Cancer Center Hospital approved our study (approve number: 2019-1-211).

Results

Dataset

One hundred and three patients were included in this study. Table 2 shows the patient characteristics of the training and test cohort. Each characteristic difference was not statistically significant between the cohorts. In the test cohort, median ∆MHD and MHDFB were 1.24 (range 0.080–2.71) Gy and 1.97 (range 0.52–3.80) Gy, respectively. Fourteen patients (56%) had ∆MHD ≥ 1 Gy.

Table 2 Patient characteristics.

Model performance: MHD prediction results

As a binary classifier of ∆MHD > 1 Gy, the model showed a high classification performance: a sensitivity of 85.7%, a specificity of 90.9%, a positive predictive value of 92.3%, a negative predictive value of 83.3%, and diagnostic accuracy of 88.0%. Figure 2 shows the ROC curve, and the AUC value is 0.864 (95% CI 0.701–1.00). The point at 1.02 Gy was the best classification point in which the sum values of the sensitivity and specificity were maximized.

Figure 2
figure 2

The Receiver Operating Characteristic (ROC) curve of the developed model: the area under curve (AUC) value was 0.864. The sensitivity and specificity of the best classification point (= 1.02 Gy) were 0.857 and 0.909, respectively.

The developed model shows that the median predicted ∆MHD was 1.02 (range 0.06–2.43, IQR 0.63–2.11) Gy. Compared to the observed ∆MHD, the absolute prediction difference was 0.39 (range 0.004–1.55, IQR: 0.22–0.72) Gy. The Pearson correlation coefficient between observed and predicted ∆MHD was 0.55 (P = 0.028). R2, RMSE, and MAE were 0.30, 0.73, 0.56, respectively.

Although the accuracy was not as ΔMHD, MHDFB could also be predicted from the model: the median absolute error was 0.72 Gy (range 0.058–2.73 Gy, IQR 0.43–1.42 Gy), the correlation coefficient was 0.46 (P = 0.02), and the sensitivity and specificity were 0.58 and 0.77, respectively.

Discussion

Recent studies have attempted to predict MHD to select patients with potential cardiac toxicity risks and reduce MHD by performing DIBH14,15,16,17,18,19,20,21,22,23,24,25,26. In most cases, prediction models used the maximum heart distance or cardiac contact distance in the CT simulations as predictors14,15,16,17,18,19,20,24. The coronary artery calcium scores (CAC) in CT improved the Framingham risk score prediction for coronary artery disease (CAD)45,46. According to Mast et al., DIBH increases LAD CAC less than FB, potentially preventing radiation-induced coronary artery disease47. Our previous study demonstrated that a synthetic DIBH-CT model with a deep learning approach achieved more accurate ΔMHD prediction than other models26. However, such models in past studies have a significant limitation: the prediction is only performed after simulation CT.

We next investigated non-radiological parameters for preoperative prediction of MHD34. The result showed that Vital capacity was the only significant predictor of MHDDIBH, but it could not work as a predictor of ΔMHD nor MHDFB as other parameters. To the best of our knowledge, no other studies have found non-CT parameters promising as predictors of ΔMHD nor MHDFB. Therefore, this study attempted to predict ΔMHD nor MHDFB using a deep learning technique based on preoperative chest X-rays. The prediction results showed a high performance as a binary classifier in the cutoff of ΔMHD > 1 Gy. Our model has also worked for MHDFB prediction in the same method. The strong points of this model are the early timing of the prediction and the required radiological images required only chest X-rays, which can be acquired easier and earlier than simulation CT in many patients. Ninety-two percent of our patients underwent preoperative chest X-rays, with a median of 90 days before radiotherapy.

In the present study, MHDFB and ΔMHD were used as predictive outcomes, following previous studies14,26,28,29,30,31,32,33,34. The primary outcome was defined as ΔMHD, used in multiple studies14,26,30,31,32,33. We set the cutoff for classification as ΔMHD > 1 Gy based on the report of increased cardiotoxicity per 1 Gy by Darby et al.: a linear relationship between MHD and the frequency of major coronary events that increases at a rate of 7.4% per Gy, but no significant difference was found for MHD < 2 Gy5. Otherwise, the Early Breast Cancer Trialists’ Collaborative Group report and the UK consensus statements for postoperative breast radiotherapy recommend the MHD < 2 Gy, so it may be possible to set the classification criteria with MHDFB as the primary predictive outcome6,43,44.

There are several limitations of this study. First, our study used a single institutional dataset, consisting mainly of those who underwent BCS followed by DIBH-RT. Therefore, whether the study results can be extrapolated to patients undergoing chest wall or lymph node irradiation is uncertain. Second, our approach focused on the chest X-ray parameters and may omit the clinical aspects of DIBH training during simulation: even if the prediction recommends the cardiac sparing RT, our model does not predict whether the patient can tolerate DIBH. Finally, the CNN architecture used in this study requires both anteroposterior and lateral chest X-ray images. Future studies are needed to build a model using only anteroposterior images and perform external validation at multicenter for model versatility.

Conclusion

In conclusion, our deep learning chest X-ray model can predict MHD and play an essential role in classifying patients’ potentially desirable DIBH. However, further study is needed to validate our prediction model externally.