Assessment of deep learning segmentation for real-time free-breathing cardiac magnetic resonance imaging at rest and under exercise stress

In recent years, a variety of deep learning networks for cardiac MRI (CMR) segmentation have been developed and analyzed. However, nearly all of them are focused on cine CMR under breathold. In this work, accuracy of deep learning methods is assessed for volumetric analysis (via segmentation) of the left ventricle in real-time free-breathing CMR at rest and under exercise stress. Data from healthy volunteers (n = 15) for cine and real-time free-breathing CMR at rest and under exercise stress were analyzed retrospectively. Exercise stress was performed using an ergometer in the supine position. Segmentations of two deep learning methods, a commercially available technique (comDL) and an openly available network (nnU-Net), were compared to a reference model created via the manual correction of segmentations obtained with comDL. Segmentations of left ventricular endocardium (LV), left ventricular myocardium (MYO), and right ventricle (RV) are compared for both end-systolic and end-diastolic phases and analyzed with Dice’s coefficient. The volumetric analysis includes the cardiac function parameters LV end-diastolic volume (EDV), LV end-systolic volume (ESV), and LV ejection fraction (EF), evaluated with respect to both absolute and relative differences. For cine CMR, nnU-Net and comDL achieve Dice’s coefficients above 0.95 for LV and 0.9 for MYO, and RV. For real-time CMR, the accuracy of nnU-Net exceeds that of comDL overall. For real-time CMR at rest, nnU-Net achieves Dice’s coefficients of 0.94 for LV, 0.89 for MYO, and 0.90 for RV and the mean absolute differences between nnU-Net and the reference are 2.9 mL for EDV, 3.5 mL for ESV, and 2.6% for EF. For real-time CMR under exercise stress, nnU-Net achieves Dice’s coefficients of 0.92 for LV, 0.85 for MYO, and 0.83 for RV and the mean absolute differences between nnU-Net and reference are 11.4 mL for EDV, 2.9 mL for ESV, and 3.6% for EF. Deep learning methods designed or trained for cine CMR segmentation can perform well on real-time CMR. For real-time free-breathing CMR at rest, the performance of deep learning methods is comparable to inter-observer variability in cine CMR and is usable for fully automatic segmentation. For real-time CMR under exercise stress, the performance of nnU-Net could promise a higher degree of automation in the future.


Introduction
The fast and reliable evaluation of cardiac function is an essential part of cardiac MRI (CMR), significant for patient diagnostics, disease analysis, therapy evaluation, follow-up, and risk estimation [1,2].The main quantitative parameters of cardiac function are the left ventricular blood volume (LV), the volume of left ventricular myocardium (MYO), and the right ventricular blood volume (RV).These parameters are usually calculated by acquiring and segmenting a stack of cross-sectional images in short-axis view.
Advances in image reconstruction have enabled the continuous acquisition of high-quality images in real time, i.e. during free-breathing and independent of ECG-synchronization [3,4,5].In CMR, real-time MRI has emerged as a viable alternative for patients with arrhythmia [6], for measurements using exercise stress [7,8,9], as well as for real-time guidance in cardiac catheter interventions [10,11,12,13].
The delineation of the LV boundary is an important step for the determination of enddiastolic volume (EDV), end-systolic volume (ESV), and ejection fraction (EF).However, manual segmentation of images is tedious and affected by inter-and intra-observer variability [14,15,16].In recent years, deep learning methods have been introduced into clinical practice for the generation of base contours, which are manually corrected as required.In research, a variety of deep learning methods have been developed for segmentation in CMR [17].However, nearly all of these methods are focused on conventional breathold cine imaging [18].In real-time MRI, automatic segmentation becomes more important, because a series of heart beats is acquired instead of a single cine loop.Real-time exercise stress studies pose an additional challenge for automatic segmentation due to the tendency for inferior image quality [8,19] depending on an increased heart rate and breathing motion.Recent works covering real-time MRI [20,21] have used custom neural networks trained specifically for the application on real-time CMR, though have faced the problem of limited training data availability.
This study aims to investigate the feasibility of using deep learning methods trained on cine CMR data for the automatic segmentation of real-time free-breathing CMR images.It will evaluate the performance of the methods under the conditions of both rest and during exercise stress.

Overview
We analyzed cine and real-time measurements of healthy volunteers (n=15) acquired at rest and under exercise stress using a highly undersampled radial bSSFP sequence with NLINV reconstruction [22,23] with a temporal resolution of 33 ms resolution.
We evaluated segmentations obtained with two deep learning methods, (1) the automatic contour detection designed for cine CMR (comDL) that is included in the commercially available software Medis (version 4.0.56.4,QMass® 8.1, Medical Imaging Systems, Leiden, Netherlands) and (2) nnU-Net [24], which was pre-trained on the cine CMR dataset of the Automated Cardiac Diagnosis Challenge (ACDC).Segmentations were compared to a reference model and standard clinical routine procedure of manually correcting the results obtained with comDL.
The accuracy of segmentation was evaluated for images in end-diastolic and end-systolic phases.The cardiac function parameters EDV, ESV, and EF were derived from neural network segmentation and compared to reference values derived from the manually corrected contours.
A comparison of cine and real-time CMR measurements and representative manually corrected contours are presented in Figure 1.

Dataset
The dataset consists of cine and real-time images from 15 healthy volunteers (7 male; 8 female; aged 55±8 (s.d.)) from a consecutive series of exams performed at the Institute for Diagnostic and Interventional Radiology of the University Medical Center Göttingen.It is part of a larger dataset acquired in a previous research study on real-time MRI with the approval of the local ethics committee.Consent for publication was obtained from all participants in the study.
All volunteers were measured with the same protocol.The healthy volunteers underwent CMR in supine position using a 32-channel cardiac surface receiver coil at 3 T (Skyra, Siemens Healthineers, Germany).Conventional imaging at rest included a balanced steady-state free precession (bSSFP) ECG-gated cine sequence to create a short-axis stack covering the entire heart, including both ventricles and atria.Real-time CMR data acquisition was performed during free-breathing and without ECG-synchronization at rest and under two different levels of exercise stress.A detailed summary of acquisition parameters of cine and real-time CMR can be found in Supplementary Table S1.
Real-time acquisiton was based on a bSSFP sequence using a highly undersampled radial encoding scheme and iterative image reconstruction [3,23].Exercise stress was introduced using a CMR-compatible ergometer in the supine position (Lode, Leiden, Netherlands), as previously described in [7].
After performing real-time free-breathing measurements at rest (RT rest), exercise stress was increased until a target heart rate of 110 beats per minute (bpm) was reached.Measurements at this target heart rate are referred to as real-time stress (RT stress).Finally, exercise stress was increased to the subjective, maximal exercise stress of each person and a measurement was performed (RT max stress).All persons gave written informed consent before each CMR examination.

Reference
To create a reference segmentation, manually corrected segmentation was performed using Medis (version 4.0.56.4,QMass® 8.1, Medical Imaging Systems, Leiden, Netherlands).Contours for the left ventricular endocardium, the left ventricular epicardium, and the right ventricle were created on short axis slices in end-diastole and end-systole.The segmentation of the left ventricular myocardium is formed by the area between the left ventricular epicardium and endocardium.Contours automatically created by the software were then manually corrected to create the reference in accordance with the standard procedure used in clinical routine.Contour creation followed the contour protocol used for the Automated Cardiac Diagnosis Challenge (ACDC) [14]: Papillary muscles are included in the cavity and LV endocardial contours follow the limit defined by the aortic valve at the base of the LV.
For real-time measurements at rest and under physical stress, images in ED and ES phase in the whole time series (120-150 images) were segmented.At rest, the time series spans 3-4 heartbeats.Under physical stress, the time series spans 6-9 heartbeats.For real-time measurements at maximal physical stress, only images in ED and ES phase among the first 50 images of the time series (2-4 heartbeats) were segmented.For the measurements of three volunteers at maximal physical stress, image quality was too poor to create reasonable reference contours.

Neural networks for automatic segmentation
This work evaluates two different neural networks for automatic segmentation.The first is the automatic contour creation from the commercial software Medis DL ACD (Medis deep learning automatic contour detection) in QMass 8.1.We refer to this method as comDL.
It should be stressed that comDL is intended to be used for cine CMR as a basis for the contours, to be manually corrected and checked as described above.Since manual correction is generally not feasible for the high number of images acquired using real-time MRI, the scope of this work is to evaluate the performance of deep-learning methods for automatic segmentation.
The second neural network evaluated in this work is nnU-Net.It is freely available and offers trained weights for various image segmentation challenges in the medical field.The model has already shown great versatility and was successfully used for a variety of medical segmentation tasks, e.g.achieving first place for all segmentation classes in the cardiac segmentation challenge "Automated Cardiac Segmentation Challenge" (ACDC) [24].
Preliminary analysis (results not shown) showed that nnU-Net performs better if applied on individual 2D images rather than on a stack of 2D images, e.g. a time series of real-time MRI in a single slice or a stack of cine images within the same cardiac phase.The benefit of independent normalization of each image increased for more challenging segmentation tasks (see Supplementary Table S2).The application of 2D nnU-Net was identified as the best configuration for this task when compared with 3D nnU-Net and the ensemble of both models (see Supplementary Table S3).
Consequently, the 2D version of nnU-Net with weights pre-trained on the ACDC dataset was applied on single images of the dataset for all cine and real-time measurements.
In March 2023, a new version of nnU-Net was published (nnU-Net V2).As of 2024-01-19, no pre-trained weights have been published for the second version and segmentation performance reportedly remains the same [25].As such, this work evaluates the first version of nnU-Net.

Evaluation criterion
It evaluates the overlap of a predicted segmentation of a neural network P k with a reference segmentation R k for each individual segmentation class k and is defined as DC is a value between 0 and 1, with 0 denoting no overlap between prediction and reference and 1 denoting perfect agreement.

Calculation of heart rates
We also categorized all real-time images based on heart rates, as these differed for RT max stress.
For the calculation of heart rates, we used the three central slices of all slices between the base and apex.In the central slices, the left ventricle is present during the entire time series despite respiratory motion, which can cause a displacement of the heart in and out of the imaging plane in basal and apical slices.Heartbeats per minute (bpm) are calculated based on the duration between two end-diastolic phases.

Cardiac function parameters
End-diastolic volume (EDV), end-systolic volume (ESV), and ejection fraction (EF) were computed.To minimize the influence of respiratory motion on EDV and ESV, images in the ED and ES phase of the cardiac cycle during end-expiration were manually selected for each slice.
For comDL and nnU-Net, cardiac function parameters were calculated based on segmentation of the same selected images.Ventricular volumes were calculated with Simpson's rule [26,27].
For intra-observer variability, manually corrected contours for the derivation of the clinical measures of all volunteers were created three to six months after the initial segmentation.For inter-observer variability, manually corrected contours for the derivation of the clinical parameters were created for the first five volunteers by a second reader with experience in cardiac segmentation.Single images in the ED and ES phase during end-expiration were once again chosen from each slice and EDV, ESV, and EF were derived from newly created, manually corrected contours.
Previously reported [15] inter-observer variability was chosen as an additional reference for comparison because it represents the accuracy of the evaluation in the clinical workflow of cine CMR.Measurements are usually evaluated by a single person and the accuracy of the method is determined by the variance between different human observers.

Statistics
We evaluated cardiac function parameters through Bland-Altman plots and paired two-sample t-tests.The cardiac function parameters EDV, ESV, and EF that were derived from comDL and nnU-Net segmentations were compared to the respective values derived from manually corrected contours (see Supplementary Fig. S1 and S2).The comparison includes cine CMR and real-time CMR measurements at rest and under stress of all volunteers.Additionally, cardiac function parameters of cine CMR and real-time CMR at rest were compared with each other (see Supplementary Fig. S3).T-tests were performed under the null hypothesis with a significance level of α = 0.05.

Segmentation accuracy
The data for all 15 volunteers could be successfully analyzed for cine CMR and real-time freebreathing CMR at rest and under exercise stress.For the measurements of three volunteers at maximal physical stress, image quality was too poor to create reasonable reference contours and thus only data of 12 volunteers were analyzed.
For cine CMR, the DC values for both neural network segmentations are comparable (Table 1) and show a high correlation with conventional segmentation.Figure 2 shows representative DC cases.For cine CMR, the nnU-Net performance (Table 1) is comparable to the results achieved on the ACDC test dataset [24].The Dice's coefficients reported here are slightly higher, as might be expected for the application on healthy subjects as compared to different pathology groups in the ACDC test dataset.
Real-time free-breathing measurements can be categorized based on the form of acquisition, e.g. at rest (RT rest), under a level of stress selected according to a targeted heart rate of 110 bpm (RT stress) and maximal exercise stress (RT max stress), for which the stress level and heart rate varies by volunteer.Mean heart rate and standard deviation for all volunteers can be found as Supplementary Table S4.The different real-time free-breathing CMR measurements fall into different heart rates spans, RT -55 to 77 bpm, RT stress -107 to 120 bpm, RT max stress -121 to 164 bpm.
The accuracies of nnU-Net and comDL segmentation based on the heart rate categorization are presented in Table 1.The nnU-Net model shows good generalizability with high accuracies of segmentation for both real-time CMR (DC: LV 0.94, MYO 0.89, RV 0.90) and real-time stress (DC: LV 0.92, MYO 0.85, RV 0.83).The accuracies for real-time CMR at rest are in the order of previously reported [15] inter-observer variability for cine CMR.With the exception of the RV segmentation for real-time CMR at rest, the accuracy of nnU-Net predictions exceeds the accuracy of comDL in real-time CMR.
Example failure cases for the nnU-Net are shown in Fig. 3. Images in the basal and apical region of the heart were especially prone to segmentation failures.
To better visualize the relationship between heart rate and the neural network segmentation accuracy, Fig. 4 shows the nnU-Net and comDL Dice's coefficients by calculated heart rate for real-time CMR.Two outliers are visible for the RV segmentation of the nnU-Net.Both data points correspond to the RT and RT stress measurements of the same volunteer.While the RV segmentation performs well for cine and RT it nearly completely fails for RT stress and RT maxstress, most likely because of smooth transitions between RV and neighboring tissue.A representative comparison can be found as Supplementary Fig. S4.

Cardiac function parameters
For cine and real-time CMR, the EDV, ESV, and EF were derived from the manually corrected contours for manually selected images in the ED and ES phases.The absolute values for each volunteer can be found in Supplementary Table S5.The comparison between these values and those derived from the nnU-Net and comDL segmentation is presented in Table 2 in form of absolute and relative differences.
For cine CMR, manual corrections of comDL led to only slight differences in EDV and ESV.For real-time CMR, the difference between the nnU-Net segmentation and manually corrected contours compares well to intra-and inter-observer variability for real-time CMR as well as to the inter-observer [15] variability previously reported for cine CMR (Table 2).For nnU-Net, mean differences of 2.4 % for EDV, 6.6 % for ESV, and 4.6 % for EF were obtained for real-time CMR at rest.For nnU-Net, the absolute and relative differences of EDV, ESV, and EF are less than the inter-observer variability, although the direct comparability is limited because images were newly selected for intra-and inter-observer variability, while the nnU-Net segmentation has been compared to the manually corrected contours of the same images.The different selection process of the images may influence the variability a lot more than a different delineation of LV within the same image.Relative intra-observer variability increases from cine CMR to real-time CMR at rest, and again to real-time CMR at stress.Relative intra-and inter-observer variability is overall higher for ESV than for EDV, which is in agreement with previously reported interobserver variability.Inter-observer variability is slightly higher for ESV in cine CMR than what was previously reported but only increases somewhat for RT and RT stress.
Clinical measures derived with nnU-Net and comDL are compared to references in form of Bland-Altman plots (Supplementary Fig. S1 and S2).According to paired two-sample t-tests, differences between nnU-Net and manually corrected contours are significant for ESV and EF for cine CMR (P < 0.001) and not significant for EDV (P = 0.23) for cine CMR and for all cardiac function parameters for RT and RT stress (P > 0.5).Differences between comDL and manually corrected contours are not significant for EDV for cine (P = 0.16) and real-time CMR (P = 0.61) and significant for all other cardiac function parameters of cine, RT and RT stress (P < 0.05).Individual P values are shown in the Bland-Altman plots.Although a comparison of cardiac function parameters between cine and real-time CMR was not the focus of this work, we note that the data show a higher mean EDV (mean difference 8.9 mL) and a lower mean ESV (mean difference -4.1 mL) for cine CMR (see Supplementary Fig. S3).These deviations are in the order of previously reported [4] differences between cine and real-time CMR.According to paired two-sample t-tests, differences between cine and realtime CMR are significant (P < 0.05) for all cardiac function parameters for manually corrected contours.

Discussion
In this study of cine CMR and real-time free-breathing CMR at rest and under exercise stress of 15 healthy volunteers, we found that the accuracy of comDL and nnU-Net is in the order of interobserver variability for cine CMR and real-time CMR at rest.Consequently, both methods are viable for a prospective, automated evaluation with little or no manual correction.For real-time CMR under exercise stress, nnU-Net is usable as a basis for manually corrected contours.
In our study, comDL contours are the basis for manually corrected contours, which act as reference.These automatically created contours require little manual correction, confirming the good performance of comDL for cine CMR.However, our study showed comDL to be less accurate than nnU-Net for real-time CMR, especially for RT stress and RT max stress.These results might be expected from the perspective that comDL was designed and most likely optimized for cine CMR, for which it performed very well.
nnU-Net shows better generalizability for real-time CMR than comDL, having its segmentation accuracy decrease less between real-time and exercise stress measurements.For measure-Table 2: Cardiac function parameters derived from comDL and nnU-Net segmentation compared to reference values obtained with manually corrected segmentation.The table features the differences of EDV, ESV, and EF of cine and real-time CMR of all volunteers.The mean and standard deviation (in parenthesis) of the difference are reported for (a) the absolute and (b) the relative difference.Intra-and inter-observer variability are given for values which were derived from manually corrected contours of newly selected images.Previously reported values for inter-observer variability from three different human observers for cine CMR are presented for comparison.ments under exercise stress, nnU-Net may have reached the limit of its applicability for fully automatic segmentation.Although its performance remains quite good, as demonstrated by high Dice's coefficients and a low mean absolute difference in ESV, the mean difference in EDV is significantly larger than intra-observer variability.While the accuracy of nnU-Net might not yet be sufficient for fully automatic segmentation, it shows promising results for an increased degree of automation in the future.In this work, we observed the highest degree and frequency of deviations between reference and neural network segmentation in the basal and apical regions of the heart.These regions were also identified as the most problematic factor for neural network segmentation in the evaluation of ACDC [14].Due to the ambiguity of slice positions, which can still include or exclude certain segmentation classes, comDL and nnU-Net showed some difficulty in correctly segmenting these areas.One approach to this issue is the usage of multiple neural networks individually trained on specific heart regions [28].This however requires manually labeling input data prior to segmentation.
One previous study concerning the segmentation of real-time free-breathing CMR with deep learning neural networks is the work by Yang et al. [20], which created a custom neural network based on the U-Net [29] architecture and also trained on the ACDC dataset.They evaluated their network on end-systolic and end-diastolic phases in the end-expiration state of healthy volunteers.Our results for real-time free-breathing CMR show a higher segmentation accuracy for comDL and nnU-Net compared to results of [20] (DC: LV 0.919, MYO 0.806, RV 0.818), showing the progress of segmentation networks for real-time CMR.
New methods have been developed for cardiac segmentation in recent years, e.g. the usage of transformers within neural networks for segmentation is a more recent idea than the use of convolutional neural networks and might be an essential element for future research [30].A combination of nnU-Net with transformers [31] shows promising results, especially in an ensemble with the unmodified nnU-Net.However, the highest performance of deep learning neural networks for segmentation in the field of CMR is still achieved by convolutional neural networks, often based on the U-Net or the nnU-Net [30].Therefore, it seems reasonable to have nnU-Net as the state-of-the-art method, in particular because of its accessibility through pre-trained weights.
The development and application of deep learning methods is an active research topic in radiology [32,33,34].Standards for the reporting of artificial intelligence methods were established for the medical field as a whole in [35] and the topic was specifically discussed for CMR in [36,37].
For cardiac segmentation, deep learning methods have already been successfully integrated into clinical routines and have substantially reduced the number of manual corrections needed.With recent progress, a fully automatic workflow seems feasible even for real-time CMR, but essential steps are still missing.If the evaluation of real-time free-breathing measurements should work analogously to cine CMR, images within the same respiratory motion state must be evaluated across slices.The identification of the respiratory state would need to be automated, e.g. with the help of an external device like a respiratory belt, or by automatic analysis of the images.Based on the respiratory motion state, images in the correct cardiac phase can then be automatically selected based on LV area.To keep manual corrections to a minimum, additional priors in the form of confidence maps, such as uncertainty maps [38], could be used to quantify the need of manual correction.For patients with arrhythmia, arrhythmic heartbeats would need to be distinguished automatically from regular (sinus) rhythm.
Some limitations of our study must be noted.Firstly, extrasystolic heartbeats within the selected images were not excluded, as the limited duration of the time series did not always allow the monitoring of prior and subsequent heartbeats to fully exclude an extrasystole.However, this only affects the validity of the resulting cardiac function parameters, not the comparison between the manually corrected contours and the deep learning methods, as the same images have been evaluated for all methods.Secondly, only a relatively small number of healthy volunteers were included in this study and the validity for clinical application on patients still needs to be demonstrated.Thirdly, no detailed methodological information can be provided for comDL, as it is part of a commercial software.It still serves as a reference to the clinical standard.Finally, only two deep-learning methods have been compared, which limits the generalizability of the results to machine learning methods in general.We hope to address this with the publication of image data and code to enable the testing and comparison of other deep-learning methods on the same data.

Conclusion
In this study, we assessed the feasibility of automatic cardiac segmentation on real-time CMR using deep learning methods.Two deep learning methods originally designed or trained for segmentation of cine CMR were evaluated for cine and real-time MRI in comparison to a manually corrected reference segmentation.The segmentation accuracy is superior in cine CMR compared to real-time CMR at rest and diminishes further for real-time CMR under exercise stress.The accuracy for real-time CMR at rest is in the range of reported inter-observer variability of cine CMR for both networks.In this work, comDL shows very good performance for segmentation on cine CMR but less applicability for real-time CMR compared to nnU-Net.For real-time CMR at rest, cardiac function parameters obtained with nnU-Net segmentation are in the range of intra-observer variability.For real-time CMR under exercise stress, the performance of the deep learning methods -while still not sufficient for a fully automatic segmentation -is promising.

Figure 1 :
Figure 1: (Top) Mid-ventricular slices in ED phase of the same volunteer for cine and realtime free-breathing at different heart rates and (bottom) corresponding manually corrected segmentation in a short axis view are shown.Image quality decreases and reconstruction artifacts increase with an increasing heart rate.The left ventricular endocard (red), the left ventricular myocardium (green), and the right ventricle (blue) are segmented.

Figure 2 :
Figure 2: Representative segmentations of manually corrected contours and deep learning methods.(First row) Mid-ventricular slices in ES phase of a volunteer for cine and real-time free-breathing at different heart rates with (second row) corresponding manually corrected, (third row) comDL, and (fourth row) nnU-Net segmentation.Accuracy of segmentation is measured with Dice's coefficient (DC).DC for left ventricular endocard (LV), left ventricular myocardium (MYO), and right ventricle (RV) are given for each segmentation.

Figure 3 :
Figure 3: Example segmentation failures of nnU-Net for real-time CMR under exercise stress.Incomplete segmentation of the right ventricle in the apical region (first and second column).Anatomically incoherent segmentation of the myocardium and right ventricle in the basal region (third and fourth column).

Figure 4 :
Figure 4: Dice's coefficient of nnU-Net and comDL segmentation for real-time CMR measurements plotted against heart rate.DC values of LV, MYO, and RV are calculated for (a) nnU-Net and (b) comDL segmentation in respect to manually corrected contours.Each data point presents the average DC of a segmentation class for a single real-time measurement of a volunteer.Real-time CMR at rest, under exercise stress and maximal exercise stress are presented by their average calculated heart rate.

Table 1 :
Dice's coefficients of nnU-Net and comDL segmentation for cine and real-time CMR.The table features the mean and standard deviation (in parenthesis) of the Dice's coefficients for LV, MYO, and RV for all volunteers.For cine CMR, images were separated into ED and ES phase.For RT max stress, only data of 12 volunteers were analyzed, because in three cases the image quality was too poor to create reasonable reference contours.Mean and standard deviation (in parenthesis) of previously reported values for inter-observer variability from three different human observers for cine CMR are shown for comparison.