MedalCare-XL: 16,900 healthy and pathological synthetic 12 lead ECGs from electrophysiological simulations

Gillette, Karli; Gsell, Matthias A. F.; Nagel, Claudia; Bender, Jule; Winkler, Benjamin; Williams, Steven E.; Bär, Markus; Schäffter, Tobias; Dössel, Olaf; Plank, Gernot; Loewe, Axel

doi:10.1038/s41597-023-02416-4

Download PDF

Data Descriptor
Open access
Published: 08 August 2023

MedalCare-XL: 16,900 healthy and pathological synthetic 12 lead ECGs from electrophysiological simulations

Scientific Data volume 10, Article number: 531 (2023) Cite this article

2060 Accesses
4 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Mechanistic cardiac electrophysiology models allow for personalized simulations of the electrical activity in the heart and the ensuing electrocardiogram (ECG) on the body surface. As such, synthetic signals possess known ground truth labels of the underlying disease and can be employed for validation of machine learning ECG analysis tools in addition to clinical signals. Recently, synthetic ECGs were used to enrich sparse clinical data or even replace them completely during training leading to improved performance on real-world clinical test data. We thus generated a novel synthetic database comprising a total of 16,900 12 lead ECGs based on electrophysiological simulations equally distributed into healthy control and 7 pathology classes. The pathological case of myocardial infraction had 6 sub-classes. A comparison of extracted features between the virtual cohort and a publicly available clinical ECG database demonstrated that the synthetic signals represent clinical ECGs for healthy and pathological subpopulations with high fidelity. The ECG database is split into training, validation, and test folds for development and objective assessment of novel machine learning algorithms.

A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients

Article Open access 12 February 2020

PTB-XL, a large publicly available electrocardiography dataset

Article Open access 25 May 2020

A framework for comparative study of databases and computational methods for arrhythmia detection from single-lead ECG

Article Open access 19 July 2023

Background & Summary

The 12 lead ECG is a standard non-invasive clinical tool for the diagnosis and long-term monitoring of cardiovascular disease. To support cardiac disease classification and interpretation of 12 lead ECGs in clinical practice, algorithms based on machine learning are increasingly utilized. Training of these algorithms requires large databases of 12 lead ECGs that have been labeled according to desired disease classifications with high accuracy and represent the target population. The most extensive publicly available database for such purpose to date is PTB-XL¹.

Clinical 12 lead ECG databases like PTB-XL, however, have several limitations reducing efficacy of machine learning algorithms². As the databases are typically attained from multiple medical centers, different filtering levels may be applied to reduce noise. Labeling uncertainties may arise due to differences in expertise or judgment between clinicians. Patient enrollment can also lead to both gender bias³ and uneven representation of certain cardiac diseases⁴. Furthermore, such databases provide limited insight into the underlying mechanisms of cardiovascular disease. Databases of synthetic ECGs have the potential to either complement and enrich^5,6, or in the long run to even replace⁷, clinical datasets to overcome such limitations. Currently, no sizeable and open synthetic ECG databases are available due to the high computational cost and limitations in modeling complete four-chamber cardiac electrophysiology in silico at scale.

We thus aimed to assemble the first public database of labeled synthetic 12 lead ECGs by joining two independent multi-scale models of atrial and ventricular electrophysiology used to compute P waves and QRS complexes, respectively. This approach provides a complete chain of traceability from the anatomical and electrophysiological input parameters of the model to the final 12 lead ECGs. Common diseases were modeled mechanistically in addition to normal healthy control within the synthetic database. Within the ventricular-torso model, the pathologies of myocardial infarction (MI) and complete bundle branch block of both the left ventricle (LBBB) and the right ventricle (RBBB) were modeled. The MI class comprised 6 sub-classes pertaining to the three predominant arteries of right-anterior descending (RAD), left anterior descending (LAD), and left circumflex (LCX)⁸ each with two different transmural extent. The diseases fibrotic atrial cardiomyopathy (FAM), complete interatrial conduction block (IAB) and left atrial enlargement (LAE) were modeled within the atria. Also, 1st degree AV block (AVB) was modeled as an atrio-ventricular (AV) conduction-based disease. In this way, the chosen pathologies cover a wide range of both atrial and ventricular diseases representing conduction disturbances as well as structural remodeling for which established modeling approaches published in previous work could be resorted to. A total of 16,900 synthetic ECGs equally distributed into the 8 groups (healthy control and 7 cardiac pathologies) were made publicly available in the MedalCare-XL database. This MedalCare-XL dataset is publicly available under the Creative Commons Attribution 4.0 International license⁹. Thus, we provide a large and balanced ECG dataset with precisely known ground truth labels of the underlying pathology as derived from the mechanistic multi-scale simulations.

Validation of the synthetic ECG database was performed using two approaches to analyze to what extent the synthetic ECG database could represent clinical ECG databases. First, we tested the MedalCare-XL data set of simulated ECGs by comparing the statistical distribution of crucial ECG features extracted from MedalCare-XL with the same features taken from the clinical PTB-XL¹ data base for normal healthy ECGs and for different pathology classes. The comparison showed excellent qualitative agreement, while still exhibiting quantitative differences that provide a starting point for future improvement of the underlying models as well as of the quality of future simulation data bases. Second, two clinical Turing tests were also conducted to evaluate the ability of the generated synthetic ECG signals to represent clinical signals undergoing ECG diagnostics by cardiologists. The first test required trained cardiologists to determine the origin of both measured and simulated 12 lead ECGs under normal healthy control. The second test additionally involved pathology classification. Both tests were performed on a subset of 50 synthetics ECG signals extracted from the database and mixed with 50 clinical signals taken from PTB-XL¹. Altogether, the MedalCare-XL data base provides the first example for a large-scale data set of physiologically-realistic simulated ECGs.

Methods

We separate the genesis of the 12 lead ECG into P waves and the QRST complex, modeled by two separate atrial and ventricle-torso models. Generation of the anatomical model cohorts and the simulation of electrophysiology to mimic a large patient population is described for both the atrial and ventricular models. Having run single beat simulations for P waves and QRST complexes separately in the two independent models, both signal parts had to be merged in a post-processing step to obtain an ECG of a full heart cycle comprising one P wave, one QRS complex and one T wave. Subsequently, the single heartbeat was repeated with varying RR intervals to account for heart rate variability (HRV) to obtain a time series signal of 10 s length. A visual overview of the pipeline for generating the synthetic 12 lead ECG database is visualized in Fig. 1. The entire ECG dataset described in the manuscript is available online under the Creative Commons Licence CC-BY 4.0⁹. The anatomical model cohort of the atria is publicly available under the Creative Commons Licence CC-BY 4.0¹⁰. Subject data acquired at the Medical University of Graz which were used to construct the cohort of ventricular-torso models can only be shared with additional IRB approval and subject consent. Requests should be directed to the IRB of the Medical University of Graz with reference to their vote EKNr 24–126 ex 11/12. The data utilized from the participants were used to generate this work but are not part of the published data set.

Anatomical model populations

Ventricles

A cohort of anatomically-specific ventricular-torso models was generated for 13 healthy subjects (8 M, 5 F) ranging from 30 to 65 years of age. All subjects were part of a clinical study approved by ethical review board at the Medical University of Graz (EKNr: 24–126 ex 11/12). Written and informed consent for each subject was attained at the time of the study. Two separate MRI scans of the full torso and whole heart were sequentially acquired using standardized protocols at 3 T (Magnetom Skyra, Siemens Healthcare, Erlangen, Germany). The torso MRI (1.3 × 1.3 × 3.0 mm³) was acquired in four overlapping stacks using a non-ECG gated 3D T1-weighted gradient-echo sequence. The whole heart MRI (0.7 × 0.7 × 0.7 mm³) was acquired using an ECG-gated, fat-saturated, T2-prepared, isotropic 3D gradient-echo sequence. Respiratory navigators were employed to gate the MR-acquisition under free-breathing to end-expiration. MRI-compatible electrodes for recording the 12 lead ECG of each subject were left intact during image acquisition. Intensity thresholding techniques implemented in Seg3D¹¹ were used to segment each torso MRI into heart, lungs, and general torso tissue. Segmentation of the cardiac MRI was automatically performed using a two-kernel convolutional neural network. The network was tailored for MRIs from the original network implemented for computed tomography images¹². Segmented structures included blood pools, ventricles, and general atrial tissue. To automatically register the four-chamber heart segmentation into the torso, an iterative closest point algorithm was utilized in Seg3D^11,13. Anatomical meshes were generated automatically from the joint segmentations using the Tarantula software meshing package¹⁴. Target resolutions within the cardiac and torso surfaces of 1.2 and 4.0 were prescribed, respectively. All models within the cohort were equipped with universal ventricular coordinates (UVCs) to allow for automated manipulation of all geometric-based entities^15,16. The entire framework for the generation of the ventricular-torso model cohort is described in detail in Gillette et al.¹⁵. The ventricular-torso model cohort comprising geometries ${\Gamma }_{V,i},i\in [1,13]$ is visualized in Fig. 2.

Atria

An overview of the anatomical model cohort generated for the atrial simulations is shown in Fig. 3. A total of 125 anatomical models ${\Gamma }_{A,h,i},i\in [1,80]$ and ${\Gamma }_{A,LAE,i},i\in [1,45]$ of the atrial endocardium were derived from a bi-atrial statistical shape model^10,17. The endocardial surfaces were augmented with a homogeneous wall thickness of 3, rule-based myocardial fiber orientation, tags for anatomical structures and interatrial connections as described by Azzolin et al.^18,19. Out of these 125 geometries, 80 models exhibited left and right atrial volumes in physiological ranges reported for healthy subjects²⁰. In these geometries, 10 different fractions from 0 to 45% of the atrial myocardial tissue volume were additionally replaced by fibrotic patches as described previously²¹ to model atrial cardiomyopathy. The remaining 45 anatomical models were generated by constraining the coefficients of the statistical shape model such that left atrial volumes were increased to value ranges typically observed in left atrial enlargement patients²⁰. Additionally, 25 torso geometries ${\Gamma }_{T,i},i\in [1,25]$ were obtained by modifying the coefficients of the two leading eigenmodes in the human body statistical shape model constructed by Pishchulin et al.²². In this way, height, weight and gender differences were represented in the anatomical torso model cohort. By applying random rotation angles ${\alpha }_{x},{\alpha }_{y},{\alpha }_{z}$ and translation parameters ${t}_{x},{t}_{y},{t}_{z}$ in ranges summarized in Table 4 to the atrial geometry, heart location and orientation variability were additionally accounted for in the virtual population.

Simulation protocol and parameters

Ventricles

Under normal healthy control, activation of the ventricles was assumed to be Durrer-based²³, where the His-Purkinje System was modeled assuming 5 fascicular sites of earliest breakthrough on a fast-conducting endocardium. Three fascicular sites were placed in the left ventricle (LV) on the anterior endocardium ${\overrightarrow{x}}_{lv,ant}$, posterior endocardium ${\overrightarrow{x}}_{lv,post}$, and the septum ${\overrightarrow{x}}_{lv,sept}$. Activation of the right ventricle (RV) was controlled using a site corresponding to the moderator band ${\overrightarrow{x}}_{rv,mod}$. An additional site ${\overrightarrow{x}}_{rv,sept}$ was also placed on the right-ventricular septum. All fascicular sites were defined in UVCs. The RV moderator band was placed in the middle of the RV free wall. The transmural depth of the remaining fascicular sites was assumed to be constant at 20% of the ventricular free wall. The fascicles were assumed to be of disc-like shape with a transmural thickness of 0.5% of the ventricular wall, and a radius controlled through additional parameter $\overrightarrow{r}$ that related to endocardial extent. Activation was assumed to be simultaneous, apart from a prescribed delay ${\overrightarrow{t}}_{mod}$ in the activation of the RV moderator band site.

To modulate the fast spread of conduction on the endocardial surface of the ventricles modulated by the His-Purkinje System, a fast-conducting endocardium was also included that spanned from the middle 10% to 90% of the ventricular mesh along the apico-basal direction. Details of the His-Purkinje representation are available in Gillette et al.¹⁵. An isotropic conduction velocity of 2.0 was prescribed within the fast-conducting endocardium²⁴.

Myocardial fiber directions were applied using a rule-based method²⁵ that assumed principal fiber directions rotate radially from 60.0° on the endocardium to the epicardium −60.0°²⁶. Corresponding sheet fiber directions of −65.0° and 25.0° were applied, respectively²⁶. Conduction velocity along the principal direction of myocardial fibers of 0.6 was applied with an off-axis conduction velocity ratio of 4:2:1²⁷. Conductivity within the myocardium was set according to Roberts et al.²⁸. All remaining conductivities within the volume conductor containing lungs, blood pools, atria, and general torso tissue were set according to Keller et al.²⁹.

Ventricular myocyte electrophysiology was modeled using the Mitchell-Schaeffer ionic model ${\overrightarrow{i}}_{sinus}$³⁰. A resting membrane voltage of −86.2 and a peak action potential voltage of 40 was assumed. Gradients in action potential duration (APD) within the myocardium, needed to establish physiological T waves, were generated by utilizing a known relationship between the ${\tau }_{close}$ parameter and APDs. A linear combination of the UVCs weighted with given weights ${\overrightarrow{q}}_{w}$ was first computed at each node of the mesh. The weighted UVC gradients were mapped into a range between $AP{D}_{min}$ and $AP{D}_{max}$ to generate an APD map within the entirety of the ventricles. Values for the gradients and the APD are derived from the literature^31,32,33. In total, variation in electrophysiology during normal healthy control was controlled through 20 variable parameters summarized in the parameter vector ${\overrightarrow{\omega }}_{qrs}$ for the QRS complex:

$${\overrightarrow{\omega }}_{qrs}=\{{\overrightarrow{x}}_{lv,ant},{\overrightarrow{x}}_{lv,post},{\overrightarrow{x}}_{lv,sept},{\overrightarrow{x}}_{rv,mod},{\overrightarrow{x}}_{rv,sept},{\overrightarrow{t}}_{mod}\}$$

(1)

and ${\overrightarrow{\omega }}_{t}$ for the T wave:

$${\overrightarrow{\omega }}_{t}=\{{\overrightarrow{i}}_{sinus},AP{D}_{min},AP{D}_{max},{\overrightarrow{q}}_{w}\}.$$

(2)

All geometric-based parameters could be mapped into the mesh using kD-trees implemented in meshtool³⁴. Parameters relating to both the QRS complex and T wave under normal healthy control were varied in physiological ranges to generate variation in the QRST complex as reported in Tables 1, 2, respectively. Sampling through the ranges for each of the parameters was done using Latin Hyper Cubes.

Table 1 Model parameters for the electrophysiology within the ventricular simulations generating QRS simulations.

Full size table

Table 2 Model parameters for the electrophysiology within the ventricular simulations generating T waves simulations.

Full size table

The two pathologies of BBB and MI were then modeled in the ventricles alongside normal healthy control. Pathologies of LBBB and RBBB were included in the ventricular-torso model. To cause a complete branch block, all fascicular root sites within either the LV or the RV were neglected to inhibit activation. All other relevant electrophysiology parameters were allowed to vary in the same ranges as reported for normal healthy control above.

A MI stemming from occlusion of one of the three primary arteries of RAD, LAD, and LCX was inserted into the ventricles. For each of the arteries $\nu \in \{RAD,LAD,LCX\}$, a core center ${\overrightarrow{x}}_{\nu ,mi}$ was defined using the apico-basal and rotational UVC coordinate values that were bounded according to recommendations of affected regions on the clinical 17-segment model determined by the American Heart Association (AHA)⁸. Namely, the LAD was restricted to the anterior-anteroseptal region spanning the entire apico-basal extent. Both the RAD and LCX extended less apically, and were confined to the lateral wall and the inferior-inferioseptal regions, respectively. For each artery, the infarct was either assumed to span the entirety of the ventricular wall or transmural extent of 30% from the endocardium, giving rise to a transmural extent value ${\rho }_{n,mi}$ such that $n\in \{0.3,1.0\}$. The outer 5% of the infarct area was allocated to be border zone (BZ), and the remaining area was defined as the infarct core. All scars were assumed to be left-sided, thus presenting only in LV.

From each infarct center, an Eikonal activation map was computed within the ventricular geometry assuming the same conduction velocity and off-axis ratios as assigned in the general myocardium during normal healthy control. An infarct geometry was taken by thresholding the activation map according to the computed time that generated a radius of distance d_co. The infarct core was assumed to be electrically inert, while the conduction velocity in the BZ was set to 0.15 with an off-axis ratio of 1.0³⁵. The conductivity within the BZ was set to the same values reported for the healthy myocardium. Parameters for the Mitchell-Schaeffer ionic model within the BZ ${\overrightarrow{i}}_{BZ}$ were manually adjusted using bench leading to characteristic action potential changes during MI³⁶. In total, the MI class comprised 6 sub-classes. The parameters varied to induce various degrees and positions of MI ${\overrightarrow{\omega }}_{\nu ,mi}$ included:

$${\overrightarrow{\omega }}_{mi}=\{{\overrightarrow{x}}_{\nu ,mi},{\rho }_{n,mi}{d}_{co}\}\,:\nu \in \{RAD,LAD,LCX\},n\in \{0.3,1.0\}$$

(3)

Parameters were similarly varied using Latin Hyper Cubes through ranges based on clinical observation for characteristic occlusion sites and action potential changes (Table 3).

Table 3 Additional parameters were included to define infarct zones within the ventricular-torso model.

Full size table

Transmembrane voltages were simulated using the efficient reaction-Eikonal method in the monodomain formulation without diffusion³⁷. Electrical potentials of each electrode on the torso surface were recovered from transmembrane voltages using lead fields precomputed once for every model³⁸. A ventricular 12 lead ECG (QRST complex) was generated by simulating a ventricular beat for 450. All simulations were run using the CARPentry cardiac solver³⁹ and the openCARP simulation framework^40,41 on a desktop machine with 24 cores, parallelized into 3 threads.

Atria

Local activation times in the atria were obtained by solving the Eikonal equation with the Fast Iterative Method⁴² and the Fast Marching Method⁴³. Excitation was initiated at the sinoatrial node with an exit site located at the junction of crista terminalis and the superior vena cava. Locally heterogeneous conduction velocity ${{\rm{CV}}}_{{\rm{[Region]}}}$ and anisotropy ratios ${{\rm{AR}}}_{{\rm{[Region]}}}$ for [Region] ∈ {bulk tissue, interatrial connections, crista terminalis, pectinate muscles, inferior isthmus} were modeled as summarized in Table 4. The spatio-temporal distributions of transmembrane voltages ${\rm{TMV}}(t,x)$ were subsequently derived from the computed activation times by shifting pre-computed Courtemanche et al. action potential templates ${\rm{TMV}}(t)$ in time. Remodeling of cellular electrophysiology was applied in fibrotic regions as described below. For all simulations except for those of fibrotic atrial cardiomyopathy, the baseline parameters of the Courtemanche et al. model remained unchanged in all atrial regions. The atria were placed inside a torso geometry and were rotated (${\alpha }_{x},{\alpha }_{y},{\alpha }_{z}$) and translated (${t}_{x},{t}_{y},{t}_{z}$) around and along all three coordinate axes to account for additional anatomical variability in the cohort. The forward problem of electrocardiography was solved with the infinite volume conductor method (for the normal healthy control cases and fibrotic atrial cardiomyopathy) or the boundary element method (for interatrial conduction block and left atrial enlargement). Single beat 12 lead ECGs of the P wave lasting 150–200 were subsequently extracted at standard electrode positions. In total, variation during healthy sinus rhythm simulations was controlled through the parameters summarized in the following vector

$${\omega }_{P}=\{{\overline{{\rm{CV}}}}_{[Region]},{\alpha }_{x},{\alpha }_{y},{\alpha }_{z},{t}_{x},{t}_{y},{t}_{z},{\overrightarrow{\lambda }}_{T,i},{\overrightarrow{\lambda }}_{A,i},\}.$$

(4)

For simulations of fibrotic atrial cardiomyopathy, nine different fractions from 5% to 45% of the healthy atrial myocardial volume were replaced by fibrotic tissue as described in detail by Nagel et al.²¹ in the same 80 atrial anatomical models that were employed for the healthy control simulations. In fibrotic patches, 50% of the cells were modeled as passive conduction barriers by removing the affected elements from the volumetric meshes. In the remaining 50% of the fibrotic cells, conduction velocity was reduced by a factor of 0.2 and 0.5 compared to the healthy baseline values in Table 4 in transversal and longitudinal fiber direction, respectively. In this way, anisotropy ratios were increased by a factor of 2.5, which typically facilitates functional reentry in patients with atrial fibrillation. To account for paracrine cytokine remodeling effects in fibrotic regions, maximum ionic conductances of the Courtemanche et al. cell model were rescaled (0.6×g_Na, 0.5×g_K1, 0.5×g_CaL).

For left atrial enlargement simulations, 45 additional atrial geometries were derived from the bi-atrial statistical shape model. Constraints were applied to the coefficients of the leading eigenmodes to generate anatomical atrial models with systematically increasing left atrial volumes⁶. Different rotation angle combinations and conduction velocity variations were applied for the simulations as reported in Table 4.

Complete interatrial conduction block was modeled by inhibiting conduction propagation through the elements in Bachmann’s bundle at the junction between the left and the right atrium in the same 80 bi-atrial geometries that were used for the control simulations. Different combinations of rotation angles and spatial translations of the atria within the torso were applied for the ECG calculations.

Synthesization of complete ECGs

Signal components were synthesized to a full ECG using a heart rate variability (HRV) model to obtain 10 s recordings in accordance with the standard clinical 12 lead ECG. As atrial and ventricular ECGs were carried out using different forward calculation methods, the amplitudes of QRST complexes were scaled according to the P waves prior to concatenation to ensure that signal amplitudes of single waveforms were consistent within one heartbeat. Thus, maximum P wave and R peak amplitudes were extracted in lead II of all clinical recordings from healthy subjects in PTB-XL¹ using ECGdeli⁴⁴. Based on these values, a multi-variate normal distribution was set up representing the relation between P wave and R peak amplitudes in clinical ECGs. In this way, the simulated QRST complex could be scaled with a factor sampled from this multi-variate probability distribution to match the corresponding amplitude of the simulated P wave. A PQ interval complying with the simulated P wave duration was selected like-wise by drawing from a multi-variate normal distribution generated from clinical P wave duration and PQ interval values. Finally, the P waves and the scaled QRST complexes were concatenated using a sigmoid shaped segment of a length determined by the difference of PQ interval and P wave duration. When synthesizing ECG segments for the 1st degree AV block class, the PQ interval was sampled from the range > 200 ms.

To account for heart rate variability in the simulated 10 s ECGs, we refrained from simply repeating the concatenated single heart beat multiple times. Instead, the heart rate variability model developed by Kantelhardt et al.⁴⁵ was used to generate a series of RR intervals for an average heart rate within physiological ranges (50–90 bpm) determined from the QT interval of the respective simulation run using the multi-variate normal distribution. For each heart beat holding a different RR interval, the signal was shrunk or stretched in the [QRS_off, T_off] interval, again by sampling values from a multi-variate normal distribution derived from clinical QRS duration, QT- and RR interval values. After adding a sigmoidal shaped TP segment to connect subsequent heart beats in the defined RR interval, we obtained the final 10 s 12 lead ECG. The raw ECG signal was superimposed with realistic ECG noise that mimics the effects of electrode movement, baseline wander, and motion artefacts, as reported by Petranas et al.⁴⁶. The amplitudes of the noise vectors were scaled based on a chosen signal to noise ratio between 15 and 20 dB.

Data Records

The MedalCare-XL dataset is publicly available on Zenodo⁹ under the Creative Commons Attribution 4.0 International license. Approximately 1,300 ECGs of 10 s length for each disease class are stored in csv format. Rows 1–12 contain the 12 leads of each ECG following the order I, II, III, aVR, aVL, aVF, V1-V6. All signals are sampled at 500 Hz, amplitudes are in mV. Each signal is available in three different versions: ‘*_raw.csv’ contains the noise-free synthesized ECG, ‘*_noise.csv’ contains the synthesized ECG with superimposed realistic ECG noise⁴⁶, ‘*_filtered.csv’ contains the bandpass filtered version (Butterworth filters of order 3, cut off frequencies of 0.5 Hz (highpass) and 150 Hz (lowpass)) of the synthesized ECGs with superimposed noise. For meaningful machine learning approaches, the signals are split in suggested subsets for training, validation and testing depending on the atrial and ventricular anatomical models the single simulation runs were based on to make sure each anatomical model is only contained in one of the subsets. A detailed description of the structure of the MedalCare XL dataset is shown in Table 5. Example ECGs of lead II for each disease are shown in Fig. 4(A). In Fig. 4(B), exemplary ECGs for each MI pathology class are shown corresponding to different occlusion sites and degrees of transmurality.

Table 4 Model parameters for atrial simulations.

Full size table

Technical Validation

We have employed two different approaches for the technical validation of the MedalCare-XL dataset of simulated, synthetic 12 lead ECGs as described in the following. For a validation of the complete dataset, the statistical distribution of ECG features extracted separately for each class (healthy control and specific pathologies) from the records in the MedalCare-XL database⁹ were compared to the distributions of the corresponding features extracted from the clinical PTB-XL that were recently summarized in the PTB-XL + dataset⁴⁷. In addition, we performed several so-called clinical Turing tests, where the ability of expert cardiologists to distinguish the simulated ECGs from clinical ECGs was evaluated again with representative samples from the MedalCare-XL and PTB-XL databases as described in detail below.

Feature distribution

To validate the simulated data against the statistical properties of clinically recorded ECGs, interval and amplitude features were extracted from the synthetic dataset and from PTB-XL using ECGdeli⁴⁴ and compared to one another. Figure 5 shows the probability density functions for 6 timing and 5 amplitude features extracted from lead II of all ECGs in the healthy clinical and virtual cohort. Except for the T wave amplitudes, the feature values for the synthetic signals lie within the clinical and physiological ranges. However, the feature distributions from the healthy and the virtual data do only coincide for the QRS duration. All other simulated timing and amplitude features only cover a subset of the clinically observed ranges. In Figs. 5, 6, a comparison of feature distributions for healthy and pathological ECGs in the virtual cohort (top panel) and the clinical cohort (bottom panel) is visualized for timing or amplitude features that are clinically considered for a diagnosis of the respective disease.

Clinical turing tests

We aimed to ensure that the synthetic ECG signals correspond to the clinically measured signals with respect to ECG features which are characteristic for healthy cases. If cardiologists are not able to distinguish between measured and simulated ECG signals, this will increase confidence in the in-silico model as a surrogate for real clinical data. Therefore such a test can be considered as a clinical Turing test. For this, cardiologists were asked to perform an online Turing test to evaluate and to provide feedback on both healthy and pathological ECGs. A first clinical Turing test was conducted to determine the ability of the synthetic 12 lead ECGs within the database to pass as real clinical signals. In a second test, cardiologists were asked to determine the pathology of the signals as conducted routinely in ECG diagnostics. Under all clinical Turing tests, the PTB-XL¹ database served as the basis for the measured signals and the simulated database described above was used for the synthetically generated signals.

Development of online platform for clinical turing test

In order to conduct clinical Turing tests, an online solution provided by the Know-Center (https://www.know-center.at), a research center for data science and artificial intelligence located in Graz, was used. The Know-Center extended its TimeFuse (https://ecgviewer.timefuse.io/public/login/turing) online signal data platform to include a survey feature and a plotter to visualize 12 lead ECG signals. The ECG plotter was designed specifically to present 12 lead ECGs in a typical visualization as seen by cardiologists in the clinic on chart paper. Namely, horizontal lines on the pink background correspond to 0.4 and vertical lines correspond to 0.1. The platform was also designed for hosting of multiple clinical Turing tests. Clinical Turing tests of either healthy signals or pathological signals could then be organized and conducted separately.

Conducting tests

In a first iteration, Turing tests were performed with normal healthy control ECGs to better understand the ability of signals to pass as clinical signals under normal healthy. For this purpose, five groups with 20 signals each were created, resulting in a total of 100 signals. For the measured ECGs, 50 signals were randomly selected from a subset of the PTB-XL database, which contained only signals annotated as 100% healthy. For the generated ECGs, 50 signals under healthy sinus rhythm were randomly taken from the synthetic database described above. After pre-processing and filtering the 100 signals, the five groups were uploaded to the online platform and assigned to the survey participants. Within the test, expert cardiologists were required to evaluate whether each ECG test case from the total 100 was measured or generated. Clinicians were also allowed to refrain from answering, but a lack of a statement was taken as a false classification. All clinicians were also asked to provide reasoning behind the classification. A total of 6 clinicians performed the test.

A similar test was also performed with pathological conditions to demonstrate that the synthetic ECGs of the various modeled pathological cases would be classified by expert clinicans at the same accuracy as real clinical signals and could not be distinguished from clinically measured ECGs taken from the PTB-XL database. The cases included myocardial infraction (MI), left bundle branch block (LBBB), right bundle branch block (RBBB), first degree AV block (1AVB), and left atrial overload/enlargement (LAO/LAE). Conditions of fibrotic atrial cardiomyopathy (FAM) and complete interatrial conduction block (IAB) were neglected as such diseases were not present within PTB-XL. Examples of the disease are provided in Fig. 4(A,B).

Similar to the healthy Turing test, 50 generated ECG signals were taken from the synthetic database such that each of the five pathological classes is represented by 10 ECGs. The 50 measured ECGs were randomly selected from five subsets of the PTB-XL database, 10 cases per subset, where each subset only contained signals labeled as 100% pathological according to the 5 classes. Clinicians could choose from a list of 11 labels. Clinicians were asked to make at least one annotation for each of the 100 pathological 12 lead ECG signals from a list of 11 pathologies as listed below:

1AVB
atrial fibrillation (AFIB)
FAM
IAB
LAO
LBBB
MI
normal healthy control (NORM)
right atrial overload/enlargement (RAO/RAE)
RBBB
Wolf-Parkinson-White syndrome (WPW)

A total of two cardiologists responded.

Within the normal healthy control clinical Turing Test, the six clinicians correctly classified 464 of the 600 cases, which corresponds to an accuracy of 77.33%. On the other side, 136 signals (22.67%) could not be correctly classified, including 62 (10.34%) synthetic and 74 (12.33%) measured ECGs, see Fig. 7(B). A detailed summary is given in Fig. 7(A,C). Primary ECG features leading to classification as simulated included fractionation or improper R wave propagation in the QRS complex, a spiking or biphasic T wave, and a lack of physiological noise in the signals.

Within the clinical Turing test on pathological ECGs, the two clinicians correctly classified the signals as either measured or clinical in 166 of the 200 cases, which corresponds to an overall accuracy of 83%. On the other side, the type of 34 signals (17%) could not be correctly classified, including 10 (5%) synthetic and 24 (12%) measured ECGs, see Fig. 7(E). A detailed summary is given in Fig. 7(D,F). Regarding the correct classification of pathological cases, only 101 of the 200 (50.5%) overall cases including both simulated and clinical signals were classified correctly by both clinicians. Namely, 38 measured ECGs were classified as the wrong pathology by experts resulting in an accuracy of 62%. Inversely, simulated pathologies were correctly classified at only 39%, with 61 signals being classified incorrectly. A detailed summary is given in Fig. 8(A,B). The actual pathology and the diagnoses given by each clinician within the pathological clinical Turing test is provided in Fig. 8(C).

Usage Notes

Separate models of atrial and ventricular electrophysiology that are individually more detailed and steerable were joined together to capture the P wave and the QRST complex within the 12 lead ECG, respectively. Cohorts of four chamber models of cardiac electrophysiology⁴⁸ could also be used for such a purpose and offer distinct advantages for modeling certain pathologies with atrio-ventricular dependencies. Such four-chamber cohorts, however, are not yet well suited for the generation of large ECG databases due limited anatomical variation. While statistical shape models of the four chamber heart have been generated to encode such anatomical variation, these models still lack controllable electrophysiology needed to generate realistic signals. For example, repolarization in the ventricles has volumetric gradients (both transmural and apico-basal) that are needed for realistic T waves. Using a cohort of volumetric ventricular model for the QRST complex in combination with a statistical shape model in the atria overcomes such limitations. Furthermore, using both atrial and ventricular shape models does not necessarily mean that the two systems may be linked, which may lead to unphysiological configurations.

The feature analysis showed that the synthetic signals exhibit interval and amplitude features that are mostly in line with feature ranges reported in PTB-XL for the healthy and the pathological cohorts. From Fig. 6, it is apparent that the change in feature values extracted from healthy and diseased ECGs is consistent between the simulated and the clinical data even though absolute feature ranges sometimes deviate. However, they neither cover the full range of feature values that occur in clinical practice nor are they characterized by accurately coinciding distributions. This could be attributed to the fact that the atrial model population was parameterized using ECG biomarker ranges for P wave amplitudes and durations reported for extensive clinical cohorts partially comprising > 200,000 subjects^49,50 which might lead to slightly different feature distributions compared to those extractable from PTB-XL. The QRST complexes were also parameterized according to experimental data or clinical data conducted on smaller model cohorts that may not be representative of the entire population especially in terms of age (covered range: 30–65 years) and comborbidities (healthy subjects). Some parameters were also estimated as no direct clinical or experimental data is available for these entities. One such example is the heightened T wave amplitudes, which stem from repolarization gradients in the ventricles that generate large cardiac source. While the occurrence of repolarization gradients are known^31,32, the exact nature of such gradients are not well understood and thus hard to parameterize for a patient population. Therefore, the synthetic signals are not fully representative for an entire population, such as the one in PTB-XL.

The feature distributions in the synthetic cohort are however consistent in themselves, i.e., unrealistic combinations of different features are unlikely to occur. For example, the upper limit of RR intervals in the simulated healthy cohort does not exceed 1000 ms, while simultaneously, the QT interval also only covers lower ranges of the clinical QT interval values (compare Fig. 5 and Table 6). This is due to the fact that multi-variate normal distributions were used during the synthesization procedure ensuring that clinically reported correlations between ECG biomarkers (such as P wave duration and PQ interval or QT duration and RR intervals) are taken into account. This is also advantageous as is is possible to account for physiological responses that include alterations in the QT duration or PQ interval. In the case of exercise, for example, an increase in heart rate outside of the reported physiological range of 67–100 bpm can be accounted for by shortening the QT interval. Furthermore, detailed mechanistic electrophysiological models of the heart were employed and simulation parameters in reasonable ranges reported in literature were chosen leading to realistic single beat P waves and QRST complexes in most cases. It must be noted that PTB-XL lacks clinical data for fibrotic atrial cardiomyopathy and for interatrial conduction block. Thus, fidelity assessment of ECG features within these two classes by means of a comparison to clinical data was not possible using the same clinical ECG resources. However, we already showed in previous work that the simulated P waves reproduce characteristic changes in key diagnostic ECG markers^21,51. These include a prolongation of the P wave duration compared to the control simulations due to delayed depolarization in fibrotic patches as well as a retrograde activation of the left atrium through interatrial conduction pathways on the posterior wall. Moreover, as shown in Fig. 6, in interatrial conduction block patients, the morphology and therefore the P wave amplitude is markedly changed in lead aVL compared to the healthy cohort. In patients with fibrotic atrial cardiomyopathy, the most pronounced decrease in P wave amplitude due to scar tissue not contributing to the overall source distribution in the atria occurs in the lateral leads (compare Fig. 6).

Table 5 In the MedalCare-XL dataset two classes are available: (i) the WP2_largeDataset_Noise class, which contains the simulated ECG signals, and (ii) the WP2_largeDataset_ParameterFiles class, which contains all the parameter files used to run the simulations.

Full size table

Table 6 Mean values μ and standard deviation σ for all features and all 12 leads for healthy simulated (“sim”) and healthy clinical (“clin”) ECG signals.

Full size table

The clinical Turing tests aimed to investigate the ability of the 12 lead ECG signal to exhibit morphological features in accordance with clinical diagnostic criteria as routinely assessed by clinicians under both normal healthy control and pathological conditions. Within the clinical Turing test performed for normal healthy control, it can be observed that accuracy in identifying whether a signal was simulated or clinical was 77% accurate. Primary ECG features leading to identification as a synthetic signal included fractionation and R wave progression of the QRS complex under certain diseases conditions. Before scaling of the QRS complex according to the P wave, identification of synthetic signls was common based on improper matching of amplitudes under normal hearth rhythm. Spiked T waves with high amplitudes or biphasic T waves could also be observed. Real ECG signals tended to also exhibit a certain noise types not accounted for, including electrical disturbances and large baseline wander, that must either be modulated within simulated data or removed during the clinical Turing test. Within the clinical Turing test to diagnose pathological ECGs, the accuracy of type classification increased to 83%, indicating type classification was easier with synthetic pathological data. Misdiagnosis was common across both signal types as pathologies were only diagnosed correctly by the two expert cardiologists in 51% of cases. More clinicians should perform the clinical Turing test on pathology classification to give a better indication of the true accuracy of ECG diagnosis on both simulated and clinical signals. Furthermore, the clinical Turing test must be conducted on a larger number signals beyond the 100 analyzed, ideally, for the entire ECG synthetic database.

Regardless, it can be observed that clinicians had varying performance on clinical-based 12 lead ECG signals in comparison to those taken from the synthetic ECG database. This is highlighted by the confusion matrices constructed for all pathological cases from the results for both measured and simulated signals (Fig. 8(D)). Clinical signals were classified with the correct pathology at an accuracy of 62%. Within clinical signals, the pathological cases of LAO, 1AVB, and MI were commonly mistaken as a 12 lead ECG in normal sinus rhythm by both clinicians.Simulated signals, on the other side, were classified correctly for the underlying disease pathology at only 39%. None of the modeled pathological cases could be diagnosed with 100% accuracy by either clinicians using standard guidelines for ECG diagnostics across both simulated and clinical signals.

Largest differences in diagnostic outcomes between simulated and clinical data sets is observed for LBBB and RBBB. Within simulated ECGs, LBBB and RBBB were commonly mistaken for MI.This stems from the fact that some morphological features in these signals are characteristics of infarction or aneurysm within the heart. In some RBBB signals, for example, V1 is predominantly negative and in combination with large Q waves in I and aVL could indicate an anterior infarction. LAO within both clinical and simulated data experienced the highest level of misdiagnosis and resulted in similar performance. This could be attributed to the fact that LAO manifests only within the P wave, where morphological deviations are harder to detect due to a substantially lower amplitude than the QRS complex. Misdiagnosis was also high among the diseases of LBBB and RBBB within the simulated data set. Differences in outcome between the clinical and synthetic signals may stem from the inability of the synthetic ECG database to manifest the full complexity of the underlying diseases. For example, remodeling within the ventricles under such conditions may lead to slower conduction properties and alternative wave morphology. Furthermore, only complete LBBB or RBBB was modeled. In clinical practice, however, there are varying degrees of conduction block. A lower reported diagnostic accuracy for MI and 1AVB is seen for the clinical signals in comparison to the simulated ECGs, which could also stem from a lack of complexity within the simulated setup easing diagnosis.

Some results from the Turing test of pathological cases indicate that standard protocols for ECG classification by clinicians are not sufficient. Machine learning algorithms may offer a means to aide in ECG diagnosis to improve reliability of clinical decisions. Therefore it is important to provide reference data to test such algorithms. An earlier benchmark study demonstrated this with the large data set of clinical ECGs in PTB-XL⁵². In this work, deep learning algorithms were e. g. found to exhibit diagnosis success rates in the range of 80–95 percent depending on the used metric. The clinical PTB-XL data set was also instrumental in demonstrating the clear improvement of algorithms based on self-supervised learning⁵³. Nevertheless, clinical data bases strongly depend on the quality and the terminology used to label the ECG data. In addition large sets of publicly available clinical data sets are still rare and limited in number. Here is where benchmarking ML algorithm with validated simulated data sets can become an important tool in the development and benchmarking of new algorithm for ECG classification. Machine learning algorithms could then also be trained and tested on real and synthetic data in different combinations. Data bases of simulated ECGs like the MedalCare-XL set presented in this paper provide also an important link of the growing knowledge developed in the cardiac modelling community and practical development of algorithm for data analysis.

To lower the mismatch in performance between clinical and synthetic signals, further parameter tuning is needed. Iterative clinical Turing tests would be beneficial to update parameters ranges to mitigate the prevalence of undesirable ECG features within the entire database. Refinement could also be guided by sensitivity analysis that provides more information on the relationship of model parameters and the morphological traits of simulated signals as determined by clinicians. However, this requires a large investment due to the variety in clinical pathological classes, and the lack of known electrophysiology in such conditions. Certain important ECG features may also be detected by machine learning analysis⁵² to provide insight into the refined sub-classification of pathological cases beyond current routine diagnoses.

When using the synthetic ECGs as an input data source for machine learning applications, samples that were generated based on the same anatomical model should explicitly belong to only one of the training, testing or validation sets. As the main variation in morphology of the P waves and QRST complexes stem predominantly from anatomical differences in the model cohort⁵⁴, splitting the data in the described fashion thus helps to prevent overfitting to similar or almost identical samples that were already seen during training⁵⁵.

When applying the simulated data for extending or replacing small or imbalanced clinical datasets, the user is advised to refer to the signals with superimposed realistic ECG noise instead of the raw signal traces. In this way, the simulated signals exhibit characteristics due to noise interference that are also observable in clinical ECGs. Thus, possible domain gaps can be reduced eventually leading to an improved classification outcome on actual clinical data.

Code availability

Code for solving the Eikonal equation and the forward problem of electrocardiography using the boundary element method as used for the atrial simulations is openly available (Stenroos et al.⁵⁶, Schuler et al.⁵⁷). The electrophysiology of the ventricular-torso model was simulated using the proprietary CARPentry-Pro software (NumeriCor, Graz, Austria). Similar simulations can also be carried out with the publicly available openCARP simulation framework^40,41. Python code for synthesizing single beat P waves and QRST complexes to a 10 s time series using multi-variate normal distributions for amplitude scaling and interval selection is publicly available⁵⁸.

References

Wagner, P. et al. PTB-XL, a large publicly available electrocardiography dataset. Scientific Data 7, 154, https://doi.org/10.1038/s41597-020-0495-6 (2020).
Article PubMed PubMed Central Google Scholar
Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence 3, 199–217, https://doi.org/10.1038/s42256-021-00307-0 (2021).
Article Google Scholar
Puyol-Antón, E. et al. Fairness in cardiac MR image analysis: An investigation of bias due to data imbalance in deep learning based segmentation. In de Bruijne, M. et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, 413–423, https://doi.org/10.1007/978-3-030-87199-4_39 (Springer International Publishing, Cham, 2021).
Pilia, N. et al. Quantification and classification of potassium and calcium disorders with the electrocardiogram: What do clinical studies, modeling, and reconstruction tell us? APL Bioeng 4, 041501, https://doi.org/10.1063/5.0018504 (2020).
Article CAS PubMed PubMed Central Google Scholar
Luongo, G. et al. Hybrid machine learning to localize atrial flutter substrates using the surface 12-lead electrocardiogram. EP Europace https://doi.org/10.1093/europace/euab322 (2022).
Nagel, C., et al. (eds.) Statistical Atlases and Computational Models of the Heart. Multi-Disease, Multi-View, and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge, 38–47, https://doi.org/10.1007/978-3-030-93722-5_5 (2022).
Luongo, G. et al. Machine learning enables noninvasive prediction of atrial fibrillation driver location and acute pulmonary vein ablation success using the 12-lead ECG. Cardiovascular Digital Health Journal 2, 126–136, https://doi.org/10.1016/j.cvdhj.2021.03.002 (2021).
Article PubMed PubMed Central Google Scholar
American Heart Association Writing Group on Myocardial Segmentation and Registration for Cardiac Imaging. et al. Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart: statement for healthcare professionals from the cardiac imaging committee of the council on clinical cardiology of the american heart association. Circulation 105, 539–542, https://doi.org/10.1161/hc0402.102975 (2002).
Article Google Scholar
Gillette, K. et al. MedalCare-XL. Zenodo https://doi.org/10.5281/zenodo.8068944 (2023).
Nagel, C., Schuler, S., Dössel, O. & Loewe, A. A bi-atrial statistical shape model and 100 volumetric anatomical models of the atria. Zenodo https://doi.org/10.5281/zenodo.4309957 (2020).
CIBC. Seg3D: Volumetric image segmentation and visualization. Scientific Computing and Imaging (2016).
Payer, C., Štern, D., Bischof, H. & Urschler, M. Multi-label whole heart segmentation using cnns and anatomical label configurations. In Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges: 8th International Workshop, STACOM 2017, Held in Conjunction with MICCAI 2017, Quebec City, Canada, September 10-14, 2017, Revised Selected Papers, 190–198, https://doi.org/10.1007/978-3-319-75541-0_20 (Springer, 2018).
Chetverikov, D., Svirko, D., Stepanov, D. & Krsek, P. The trimmed iterative closest point algorithm. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, vol. 3, 545–548, https://doi.org/10.1109/ICPR.2002.1047997 (IEEE, 2002).
Prassl, A. J. et al. Automatically generated, anatomically accurate meshes for cardiac electrophysiology problems. IEEE Transactions on Biomedical Engineering 56, 1318–1330, https://doi.org/10.1109/TBME.2009.2014243 (2009).
Article PubMed Google Scholar
Gillette, K. et al. A framework for the generation of digital twins of cardiac electrophysiology from clinical 12-leads ecgs. Medical Image Analysis 71, 102080, https://doi.org/10.1016/j.media.2021.102080 (2021).
Article PubMed Google Scholar
Bayer, J. et al. Universal ventricular coordinates: A generic framework for describing position within the heart and transferring data. Medical Image Analysis 45, 83–93, https://doi.org/10.1016/j.media.2018.01.005 (2018).
Article PubMed Google Scholar
Nagel, C., Schuler, S., Dössel, O. & Loewe, A. A bi-atrial statistical shape model for large-scale in silico studies of human atria: Model development and application to ECG simulations. Medical Image Analysis 74, 102210, https://doi.org/10.1016/j.media.2021.102210 (2021).
Article PubMed Google Scholar
Azzolin, L. et al. AugmentA: Patient-specific augmented atrial model generation tool. Computerized Medical Imaging and Graphics 102265, https://doi.org/10.1016/j.compmedimag.2023.102265 (2023).
Zheng, T., Azzolin, L., Sánchez, J., Dössel, O. & Loewe, A. An automate pipeline for generating fiber orientation and region annotation in patient specific atrial models. Current Directions in Biomedical Engineering 7, 136–139, https://doi.org/10.1515/cdbme-2021-2035 (2021).
Article Google Scholar
Lang, R. M. et al. Recommendations for cardiac chamber quantification by echocardiography in adults: An update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. Eur Heart J Cardiovasc Imaging 16, 233–70, https://doi.org/10.1093/ehjci/jev014 (2015).
Article PubMed Google Scholar
Nagel, C. et al. Non-invasive and quantitative estimation of left atrial fibrosis based on P waves of the 12-lead ECG - a large-scale computational study covering anatomical variability. J Clin Med 10, https://doi.org/10.3390/jcm10081797 (2021).
Pishchulin, L., Wuhrer, S., Helten, T., Theobalt, C. & Schiele, B. Building statistical shape spaces for 3D human modeling. Pattern Recognition 67, 276–286, https://doi.org/10.1016/j.patcog.2017.02.018 (2017).
Article ADS Google Scholar
Durrer, D. et al. Total excitation of the isolated human heart. Circulation 41, 899–912, https://doi.org/10.1161/01.CIR.41.6.899 (1970).
Article CAS PubMed Google Scholar
Kassebaum, D. G. & Van Dyke, A. R. Electrophysiological effects of isoproterenol on purkinje fibers of the heart. Circulation Research 19, 940–946, https://doi.org/10.1161/01.RES.19.5.940 (1966).
Article CAS PubMed Google Scholar
Bayer, J. D., Blake, R. C., Plank, G. & Trayanova, N. A. A novel rule-based algorithm for assigning myocardial fiber orientation to computational heart models. Annals of biomedical engineering 40, 2243–2254, https://doi.org/10.1007/s10439-012-0593-5 (2012).
Article CAS PubMed PubMed Central Google Scholar
Streeter, D. D. Jr, Spotnitz, H. M., Patel, D. P., Ross, J. Jr & Sonnenblick, E. H. Fiber orientation in the canine left ventricle during diastole and systole. Circulation Research 24, 339–347, https://doi.org/10.1161/01.RES.24.3.339 (1969).
Article Google Scholar
Taggart, P. et al. Inhomogeneous transmural conduction during early ischaemia in patients with coronary artery disease. Journal of Molecular and Cellular Cardiology 32, 621–630, https://doi.org/10.1006/jmcc.2000.1105 (2000).
Article CAS PubMed Google Scholar
Roberts, D. E. & Scher, A. M. Effect of tissue anisotropy on extracellular potential fields in canine myocardium in situ. Circulation Research 50, 342–351, https://doi.org/10.1161/01.RES.50.3.342 (1982).
Article CAS PubMed Google Scholar
Keller, D. U., Weber, F. M., Seemann, G. & Dossel, O. Ranking the influence of tissue conductivities on forward-calculated ecgs. IEEE Transactions on Biomedical Engineering 57, 1568–1576, https://doi.org/10.1109/TBME.2010.2046485 (2010).
Article PubMed Google Scholar
Mitchell, C. C. & Schaeffer, D. G. A two-current model for the dynamics of cardiac membrane. Bulletin of Mathematical Biology 65, 767–793, https://doi.org/10.1016/S0092-8240(03)00041-7 (2003).
Article CAS PubMed MATH Google Scholar
Opthof, T. et al. Cardiac activation–repolarization patterns and ion channel expression mapping in intact isolated normal human hearts. Heart Rhythm 14, 265–272, https://doi.org/10.1016/j.hrthm.2016.10.010 (2017).
Article PubMed Google Scholar
Opthof, T. et al. Dispersion in ventricular repolarization in the human, canine and porcine heart. Progress in Biophysics and Molecular Biology 120, 222–235, https://doi.org/10.1016/j.pbiomolbio.2016.01.007 (2016).
Article PubMed Google Scholar
Keller, D. U., Weiss, D. L., Dossel, O. & Seemann, G. Influence of I_Ks heterogeneities on the genesis of the t-wave: A computational evaluation. IEEE Transactions on Biomedical Engineering 59, 311–322, https://doi.org/10.1109/tbme.2011.2168397 (2011).
Article PubMed Google Scholar
Neic, A., Gsell, M. A. F., Karabelas, E., Prassl, A. J. & Plank, G. Automating image-based mesh generation and manipulation tasks in cardiac modeling workflows using Meshtool. SoftwareX 11, 100454, https://doi.org/10.1016/j.softx.2020.100454 (2020).
Article PubMed PubMed Central Google Scholar
Mendonca Costa, C., Plank, G., Rinaldi, C. A., Niederer, S. A. & Bishop, M. J. Modeling the electrophysiological properties of the infarct border zone. Frontiers in Physiology 9, 356, https://doi.org/10.3389/fphys.2018.00356 (2018).
Article PubMed PubMed Central Google Scholar
Loewe, A., Wülfers, E. M. & Seemann, G. Cardiac ischemia-insights from computational models. Herzschrittmacher & Elektrophysiologie 29, 48–56, https://doi.org/10.1007/s00399-017-0539-6 (2018).
Article Google Scholar
Neic, A. et al. Efficient computation of electrograms and ECGs in human whole heart simulations using a reaction-eikonal model. Journal of Computational Physics 346, 191–211, https://doi.org/10.1016/j.jcp.2017.06.020 (2017).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Potse, M. Scalable and accurate ecg simulation for reaction-diffusion models of the human heart. Frontiers in physiology 9, 370, https://doi.org/10.3389/fphys.2018.00370 (2018).
Article ADS PubMed PubMed Central Google Scholar
Vigmond, E., Dos Santos, R. W., Prassl, A., Deo, M. & Plank, G. Solvers for the cardiac bidomain equations. Progress in Biophysics and Molecular Biology 96, 3–18, https://doi.org/10.1016/j.pbiomolbio.2007.07.012 (2008).
Article CAS PubMed Google Scholar
Plank, G. et al. The openCARP simulation environment for cardiac electrophysiology. Computer Methods and Programs in Biomedicine 208, 106223, https://doi.org/10.1016/j.cmpb.2021.106223 (2021).
Article PubMed Google Scholar
openCARP Consortium et al. openCARP v11.0. RADAR4KIT https://doi.org/10.35097/703 (2022).
Fu, Z., Kirby, R. M. & Whitaker, R. T. A fast iterative method for solving the eikonal equation on tetrahedral domains. SIAM J Sci Comput 35, c473–c494, https://doi.org/10.1137/120881956 (2013).
Article MathSciNet PubMed MATH Google Scholar
Loewe, A. et al. Patient-specific identification of atrial flutter vulnerability–a computational approach to reveal latent reentry pathways. Frontiers in Physiology 9, https://doi.org/10.3389/fphys.2018.01910 (2019).
Pilia, N. et al. ECGdeli - An open source ECG delineation toolbox for MATLAB. SoftwareX 13, 100639, https://doi.org/10.1016/j.softx.2020.100639 (2021).
Article Google Scholar
Kantelhardt, J. W., Havlin, S. & Ivanov, P. C. Modeling transient correlations in heartbeat dynamics during sleep. Europhysics Letters (EPL) 62, 147–153, https://doi.org/10.1209/epl/i2003-00332-7 (2003).
Article ADS CAS Google Scholar
Petrenas, A. et al. Electrocardiogram modeling during paroxysmal atrial fibrillation: application to the detection of brief episodes. Physiol Meas 38, 2058–2080, https://doi.org/10.1088/1361-6579/aa9153 (2017).
Article PubMed Google Scholar
Strodthoff, N. et al. PTB-XL+, a comprehensive electrocardiographic feature dataset. Scientific Data 10, 1–11, https://doi.org/10.1038/s41597-023-02153-8 (2023).
Article Google Scholar
Strocchi, M. et al. A publicly available virtual cohort of four-chamber heart meshes for cardiac electro-mechanics simulations. PloS one 15, e0235145, https://doi.org/10.1371/journal.pone.0235145 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, J. B. et al. P-wave duration and the risk of atrial fibrillation: Results from the Copenhagen ECG study. Heart Rhythm 12, 1887–1895, https://doi.org/10.1016/j.hrthm.2015.04.026 (2015).
Article PubMed Google Scholar
Nagel, C., Pilia, N., Loewe, A. & Dössel, O. Quantification of interpatient 12-lead ECG variabilities within a healthy cohort. Current Directions in Biomedical Engineering 6, 493–496, https://doi.org/10.1515/cdbme-2020-3127 (2020).
Article Google Scholar
Bender, J. et al. A Large-scale Virtual Patient Cohort to Study ECG Features of Interatrial Conduction Block. Current Directions in Biomedical Engineering 8, 97–100, https://doi.org/10.1515/cdbme-2022-1026 (2022).
Article Google Scholar
Strodthoff, N., Wagner, P., Schaeffter, T. & Samek, W. Deep learning for ECG analysis: Benchmarks and insights from PTB-XL. IEEE Journal of Biomedical and Health Informatics 25, 1519–1528, https://doi.org/10.1109/JBHI.2020.3022989 (2020).
Article Google Scholar
Mehari, T. & Strodthoff, N. Self-supervised representation learning from 12-lead ECG data. Computers in Biology and Medicine 141, 105114, https://doi.org/10.1016/j.compbiomed.2021.105114 (2022).
Article PubMed Google Scholar
Dössel, O., Luongo, G., Nagel, C. & Loewe, A. Computer modeling of the heart for ECG interpretation—a review. Hearts 2, 350–368, https://doi.org/10.3390/hearts2030028 (2021).
Article Google Scholar
Luongo, G. et al. Automatic ECG-based discrimination of 20 atrial flutter mechanisms: Influence of atrial and torso geometries. In Computing in Cardiology, vol. 47, 1–4, https://doi.org/10.22489/CinC.2020.066 (IEEE, 2020).
Stenroos, M., Mäntynen, V. & Nenonen, J. A Matlab library for solving quasi-static volume conduction problems using the boundary element method. Computer Methods and Programs in Biomedicine 88, 256–263, https://doi.org/10.1016/j.cmpb.2007.09.004 (2007).
Article CAS PubMed Google Scholar
Schuler, S. & Loewe, A. FIM_Eikonal: v1.0. Zenodo https://doi.org/10.5281/zenodo.7217554 (2022).
Nagel, C., Eichhorn, N. & Loewe, A. ECG-Synthesization: v1.0. Zenodo https://doi.org/10.5281/zenodo.7293625 (2022).
Gillette, K. et al. Automated framework for the inclusion of a his–purkinje system in cardiac digital twins of ventricular electrophysiology. Annals of biomedical engineering 49, 3143–3153, https://doi.org/10.1007/s10439-021-02825-9 (2021).
Article PubMed PubMed Central Google Scholar
Odille, F., Liu, S., van Dam, P. & Felblinger, J. Statistical variations of heart orientation in healthy adults. In Computing in Cardiology Conference (CinC), vol. 44, https://doi.org/10.22489/CinC.2017.225-058 (2017).
Loewe, A. et al. Left and right atrial contribution to the P-wave in realistic computational models. In van Assen, H., Bovendeerd, P. & Delhaas, T. (eds.) Lecture Notes in Computer Science, vol. 9126 of Functional Imaging and Modeling of the Heart, 439–447, https://doi.org/10.1007/978-3-319-20309-6 (2015).

Download references

Acknowledgements

This work was supported by the EMPIR programme co-financed by the participating states and from the European Union’s Horizon 2020 research and innovation programme under grant MedalCare 18HLT07. The authors also acknowledge the support of the British Heart Foundation Centre for Research Excellence Award III (RE/18/5/34216). SEW is supported by the British Heart Foundation (FS/20/26/34952). The authors declare that that there are no relevant financial or non-financial competing interests to report. We thank the cardiologists Dr. Anna-Sophie Eberl, Dr. Ewald Kolesnik, Dr. Martin Manninger-Wünscher, Dr. Stefan Kurath-Koller, Dr. Susanne Prassl, and Dr. Ursula Rohrer for their involvement in the clinical Turing tests and for their feedback regarding the online platform and the ECG signal morphology. We also thank Thomas Ebner and his colleagues from the Know-Center for the great collaboration and the rapid implementation of our requirements in their online platform TimeFuse.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Karli Gillette, Matthias A. F. Gsell, Claudia Nagel, Olaf Dössel, Gernot Plank, Axel Loewe.

Authors and Affiliations

Gottfried Schatz Research Center: Division of Medical Physics and Biophysics, Medical University of Graz, Graz, Austria
Karli Gillette, Matthias A. F. Gsell & Gernot Plank
BioTechMed-Graz, Graz, Austria
Karli Gillette & Gernot Plank
Institute of Biomedical Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Claudia Nagel, Jule Bender, Olaf Dössel & Axel Loewe
Physikalisch-Technische Bundesanstalt, National Metrology Institute, Berlin, Germany
Benjamin Winkler, Markus Bär & Tobias Schäffter
King’s College London, London, United Kingdom
Steven E. Williams & Tobias Schäffter
University of Edinburgh, Edinburgh, United Kingdom
Steven E. Williams
Biomedical Engineering, Technische Universität Berlin, Einstein Centre Digital Future, Berlin, Germany
Tobias Schäffter

Authors

Karli Gillette
View author publications
You can also search for this author in PubMed Google Scholar
Matthias A. F. Gsell
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Nagel
View author publications
You can also search for this author in PubMed Google Scholar
Jule Bender
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Steven E. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Markus Bär
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Schäffter
View author publications
You can also search for this author in PubMed Google Scholar
Olaf Dössel
View author publications
You can also search for this author in PubMed Google Scholar
Gernot Plank
View author publications
You can also search for this author in PubMed Google Scholar
Axel Loewe
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors were involved in the writing and revision of the manuscript. K.G. built the ventricular-torso model cohort, parameterized and performed the simulations of the QRST complexes under both sinus and disease, conducted analysis on the clinical Turing tests, and organized the revision of the manuscript. M.G. managed the development of the testing platform for the clinical Turing test and developed tools to evaluate the test results. He also assisted in all aspects of simulations and model building. C.N. built the atrial model cohort, parameterized and performed the P wave simulations under both sinus and disease conditions, designed and implemented the synthesization model, led the technical validation of simulated and clinical ECG biomarkers. J.B. performed and validated P wave simulations for interatrial conduction block. B.W. ran simulations on the ventricular-torso model cohort, and extracted features from the clinical ECGs for technical validation. S.W. provided clinical insight and feedback on the clinical Turing tests. He also provided assistance on parameterization for both models. M.B. was involved in funding acquisition, provided guidance on relevant data processing and metrology aspects, reviewed and edited the final manuscript. T.S. provided motivation behind the study and gave clinical insight and guidance on relevant disease pathologies. O.D. was involved in funding acquisition and provided supervision of the atrial model cohort simulations. G.P was involved in funding acquisition and provided supervision of the ventricular-torso model cohort simulations. A.L. was involved in funding acquisition and provided supervision of the atrial model cohort simulations.

Corresponding authors

Correspondence to Gernot Plank or Axel Loewe.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gillette, K., Gsell, M.A.F., Nagel, C. et al. MedalCare-XL: 16,900 healthy and pathological synthetic 12 lead ECGs from electrophysiological simulations. Sci Data 10, 531 (2023). https://doi.org/10.1038/s41597-023-02416-4

Download citation

Received: 29 March 2023
Accepted: 25 July 2023
Published: 08 August 2023
DOI: https://doi.org/10.1038/s41597-023-02416-4

This article is cited by

Comparison of discrimination and calibration performance of ECG-based machine learning models for prediction of new-onset atrial fibrillation
- Giovanni Baj
- Ilaria Gandin
- Giulia Barbati
BMC Medical Research Methodology (2023)
PTB-XL+, a comprehensive electrocardiographic feature dataset
- Nils Strodthoff
- Temesgen Mehari
- Tobias Schaeffter
Scientific Data (2023)

Subjects

Abstract

Similar content being viewed by others

A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients

PTB-XL, a large publicly available electrocardiography dataset

A framework for comparative study of databases and computational methods for arrhythmia detection from single-lead ECG

Background & Summary

Methods

Anatomical model populations

Ventricles

Atria

Simulation protocol and parameters

Ventricles

Atria

Synthesization of complete ECGs

Data Records

Technical Validation

Feature distribution

Clinical turing tests

Development of online platform for clinical turing test

Conducting tests

Usage Notes

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comparison of discrimination and calibration performance of ECG-based machine learning models for prediction of new-onset atrial fibrillation

PTB-XL+, a comprehensive electrocardiographic feature dataset

Search

Quick links