Artificial intelligence-enabled fully automated detection of cardiac amyloidosis using electrocardiograms and echocardiograms

Patients with rare conditions such as cardiac amyloidosis (CA) are difficult to identify, given the similarity of disease manifestations to more prevalent disorders. The deployment of approved therapies for CA has been limited by delayed diagnosis of this disease. Artificial intelligence (AI) could enable detection of rare diseases. Here we present a pipeline for CA detection using AI models with electrocardiograms (ECG) or echocardiograms as inputs. These models, trained and validated on 3 and 5 academic medical centers (AMC) respectively, detect CA with C-statistics of 0.85–0.91 for ECG and 0.89–1.00 for echocardiography. Simulating deployment on 2 AMCs indicated a positive predictive value (PPV) for the ECG model of 3–4% at 52–71% recall. Pre-screening with ECG enhance the echocardiography model performance at 67% recall from PPV of 33% to PPV of 74–77%. In conclusion, we developed an automated strategy to augment CA detection, which should be generalizable to other rare cardiac diseases.

C ardiac amyloidosis arises from deposition of misfolded proteins in the heart muscle, which results in a restrictivetype cardiomyopathy, and commonly progresses to heart failure, conduction system disease, and cardiac death. Cardiac amyloidosis is subclassified based on the specific protein involved, with the major subtypes being transthyretin amyloidosis (ATTR cardiac amyloidosis), caused by misfolding of the transthyretin protein, and light chain amyloidosis (AL cardiac amyloidosis), caused by accumulation of immunoglobulin light chains 1 . Cardiac amyloidosis was previously believed to be rare, but recent reports have suggested that it is largely underdiagnosed [2][3][4][5][6] . The imperative of identifying patients has dramatically increased with the advent of therapies for specific forms of cardiac amyloidosis [7][8][9][10][11] .
The clinical manifestations of cardiac amyloidosis-including conduction system disease, vitreous opacity, carpal tunnel syndrome, orthostatic hypotension, polyneuropathy, spinal stenosis, kidney dysfunction, atrial fibrillation, heart failure-are also commonplace in aging, thus making detection challenging. These signs and symptoms are distributed across multiple organs and tissues (and therefore medical disciplines), and the probabilistic weighting of so many different features is forbidding, even in the unlikely event that all of the relevant exam findings, medical history details and diagnostic test results were available to a given practitioner. Furthermore, definitive diagnostic tests for cardiac amyloidosis-which include tissue biopsy and some forms of radionuclide scintigraphy-are costly and have associated risk, and thus are not plausible as screening approaches 12 .
Cardiac amyloidosis nonetheless has predictive features captured by less expensive and more widely available diagnostic modalities such as electrocardiography [13][14][15][16] (ECG) and echocardiography 17,18 , but the features themselves are not highly specific and thus often missed. Also, some of the recently highlighted echocardiographic features require providers to master specialized software packages 19 , which are time-consuming to use and therefore tend to be employed in practice only after the disease is suspected. A truly generalizable detection strategy should require no specialized acquisition or processing and should rely on only widely available input data. However, the low existing prevalence of the disease places high demands on model performance to reduce the rate of costly false positives, something that has not been achieved to date.
Here, we show a human-interpretation-free machine learning pipeline that accurately detects cardiac amyloidosis using a combination of ECG and echocardiography across multiple institutions.

Results
An ECG model detects cardiac amyloidosis effectively across multiple institutions. Electrocardiography is the most widely available cardiac diagnostic test and is frequently performed in primary care settings at a low cost. Since many of the initial manifestations of cardiac amyloidosis are likely to result in a presentation to a primary care physician, we sought first to develop a model based solely on ECG. We constructed ECGderivation, ECG-validation and ECG-test groups from Brigham and Women's Hospital (BWH) consisting of 5495, 2247 and 3191 ECG studies respectively ( Supplementary Fig. 1, Methods). We tested the model's performance using data from a held-out partition of the BWH data, as well as distinct cohorts from Massachusetts General Hospital (MGH) and the University of California San Francisco (UCSF), which consisted of 842 and 1,103 studies, respectively (Table 1, Table 2). The composition of AL amyloidosis varied from 34.4% to 58.5% within these groups. There were no patients diagnosed solely based on transthoracic echocardiography (Supplementary Table 1 Fig. 4). To determine if our model could detect amyloidosis before a clinical diagnosis was made, we performed a sensitivity analysis limiting cases to time windows before the diagnosis date (e.g., all echocardiograms taken 365 or more days before a diagnosis). This analysis showed that our model was able to detect amyloidosis with C-statistics of 0. A video-based echocardiography model for cardiac amyloidosis has very good performance for patients from five AMCs across two countries. Although the ECG-based models were encouraging, we anticipated they did not have the requisite performance  Table 3) and the study dataset included echocardiograms before and after diagnosis (Supplementary Fig. 7 and Supplementary Table 4). To test if our model was able to discriminate cardiac amyloidosis from other diseases that cause cardiac hypertrophy, we further performed analysis by looking at discrimination against patients with hypertrophic cardiomyopathy ( The cardiac amyloidosis echocardiography model outperforms interpretation by expert cardiologists. Two issues make detection of cardiac amyloidosis on echocardiograms particularly challenging for human readers: a lack of sufficiently specific features within the videos and the need to remember to look for these features in every study. Although the latter is difficult to address within existing clinical workflows (though completely solved by an automated system), we sought to evaluate the former by head-to-head comparison. We thus had two expert readers (KM, SG) attempt to diagnose cardiac amyloidosis using the test sets from 3 institutions: MGH, UCSF, and Keio (Fig. 4). In all cases, the model AUC outperformed the human readers ( Fig. 4), though for KM on the UCSF data, the result was within the 95% confidence interval. Overall, the model's superior performance was more apparent for ATTR than AL amyloidosis.
A stepwise approach using ECG and echocardiography models detects cardiac amyloidosis from a surveillance population. Within the MGH and UCSF cohorts, there were 11,541 patients and 6,792 patients with ECG-echocardiogram pairs (within 180 days of one another, with the ECG preceding the echocardiogram), respectively (Table 5). Based on the output of the echocardiography model, we estimated the prevalence of cardiac amyloidosis in this group was 0.60% and 0.62%, which is in keeping with our estimates of cardiac amyloidosis prevalence within this population (see Methods). Using the echocardiography       at a PPV of nearly 75% (Fig. 5c). In comparison, at a PPV of 75%, the recall values for the echocardiography model alone would be 12.3% for MGH and 12.3% for UCSF.

Discussion
Cardiac amyloidosis is one member of a group of cardiovascular diseases, including hypertrophic cardiomyopathy and pulmonary arterial hypertension, that is potentially treatable but rare and therefore difficult to detect 20 . The imperative to recognize patients with these and other rare diseases largely depends on availability of specific therapeutic options, but once these appear, it can be difficult to rapidly adapt prior workflows to ensure that patients are treated appropriately. Moreover, since patients are likely to present to non-experts with their initial symptoms, an operational challenge becomes how best to construct systems that facilitate detection even in such settings 21 .
Although the impact of cardiac amyloidosis on ECG and echocardiography has been known for many decades, the features themselves in isolation have not been sufficiently specific or sensitive to be used as heuristics 15,16,22,23 . For example, in one study of 400 cardiac amyloidosis patients, the characteristic lowvoltage ECG pattern of cardiac amyloidosis was seen in only 33% of cardiac amyloidosis patients 13 . One could in principle combine these with other non-cardiac features, but this places an increasing burden on the provider to seek such information, which often only occurs when a suspicion of the disease exists in the first place.
In contrast, the approach we have developed here has deliberately limited the need for any recognition by the provider and use inputs that can be potentially acquired in primary care settings-whether by ECG or handheld echocardiography. To further enable effective deployment in such settings, these detection approaches should ideally be coupled with further facilitation of confirmatory diagnostic processes. In fact, our approach benefits from the fact that there is a second gate of confirmatory diagnostic testing: namely measurement of free light chains, scintigraphy scanning, and possibly tissue biopsy 24 . The ECG and echocardiography models thus represent a tunable detection tool, with cutpoints that can be selected based on population prevalence and costs and benefits (diagnostic, therapeutic, financial and otherwise) of downstream true and false positives (and negatives). The data collected through deployment can itself enable refinement of cutpoints, and potentially spur retraining of models to better match local conditions. Critically, in such a system involving a confirmatory step downstream of the AI detection output, model explainability is less of an issue, and one can focus on maximizing model performance.
There are several limitations to this study. First, since cardiac amyloidosis is an underdiagnosed disease, there may have been undiagnosed cases in the control group. This would produce false labels and may have affected the model performance, as well as the ability to estimate it accurately. For example, false labels in the test sets would worsen the apparent specificity. Second, although our echocardiography model outperformed experts, the expert had access to only the echocardiography videos and no other clinical information. Thus, this analysis compared the ability to detect amyloidosis using only echocardiogram videos but not to a total judgement based on multiple information sources, which are sometimes available in clinical settings.  Medicine has historically reserved screening for widely prevalent diseases such as breast and colon cancer, in part because of the larger number of individuals who may benefit, and also because of the anticipated higher PPV of any diagnostic algorithms. However, given the collective scope of rare diseases 25 , the possibility of developing highly specific models to recognize them (whether by genetics or imaging), and the increasing number of therapies being developed to target them, it will be informative to establish whether a similar paradigm can be developed for other underdiagnosed conditions.

Methods
Patient selection procedure for ECG and echocardiography models. For all institutions, prospective cardiac amyloidosis patients were first identified based on diagnostic codes and/or echocardiography reports and then manually confirmed by chart review. Specifically, patients with ATTR cardiac amyloidosis were required to have confirmation of amyloid disease by tissue biopsy, nuclear medicine scan, cardiac magnetic resonance imaging, or genetic testing (transthyretin variant). For AL amyloid, biopsy confirmation was required as well as some evidence of cardiac involvement, whether by cardiac magnetic resonance or echocardiography. The method and date of diagnosis were also identified by chart review. A positive result for myocardial biopsy, cardiac MRI or PYP scan was considered to be diagnostic and the date of whichever study came first defined as the diagnosis date. For cases where providers noted a strong suspicion of amyloidosis on TTE before subsequent confirmation by another modality, the date of the TTE was recorded as the diagnosis date. When notes indicated that the inclusion criteria were met (e.g., statement of "biopsy proven cardiac amyloidosis") but more details were not available, the method and date of diagnosis was set to "unknown". For both models, cases were initially matched based on age and sex to patients who underwent ECG or echocardiography at the same institution but did not have cardiac amyloidosis. For the ECG models, we excluded ECGs with pacing spikes.
The ECG model was trained with data from Brigham and Women's Hospital (BWH) and was externally validated with the data from two different institutions from US: Massachusetts General Hospital (MGH) and University of California San Francisco (UCSF). The patients from BWH were randomly split into three groups (ECG-Derivation, ECG-validation and ECG-Test cohort) in a 5:2:3 ratio to be used for model training (Supplementary Fig. 1). Patients who had ECGs at both BWH and MGH were identified and was allocated to the ECG-Test cohort to avoid overfitting.
The echocardiography model was trained with data from Brigham and Women's Hospital (BWH) and was externally validated with the data from four different institutions from US and Japan: Massachusetts General Hospital (MGH), University of California San Francisco (UCSF), Northwestern University (NW) and Keio University Hospital (Keio) (Supplementary Figs. 6, 11, 12, 13 and 14). The cases for UCSF were overlapping with those from our previous report 17 . To make the model robust to intracardiac leads and wall thickness, an additional 253 patients with a pacemaker or implantable cardiac defibrillator and without cardiac amyloidosis and 383 patients with HCM were identified and added to the control group for the BWH dataset. The patients from BWH were randomly split into three groups (echocardiography-derivation, echocardiography-validation and echocardiography-test cohort) in a 5:2:3 ratio to be used for model training.
Patients who had an echocardiography study at both BWH and MGH were identified and was allocated to the echocardiography-test cohort to avoid overoptimistic estimation of model performance on the MGH test set.
To test the ability of the echocardiography model to discriminate cardiac amyloidosis from other diseases with cardiac hypertrophy, we identified HCM patients in MGH, UCSF, and Keio ( Supplementary Figs. 15, 16, 17), HTN patients in BWH, MGH, UCSF, and Keio ( Supplementary Figs. 18, 19, 20 and 21), and ESRD patients in BWH and MGH ( Supplementary Figs. 22 and 23). HCM patients for BWH, MGH, and Keio were identified by a combination of search by encounter diagnosis and chart review. UCSF HCM patients were taken from those reported previously 17 . HTN for BWH and MGH was defined as a median systolic blood greater than 160 mmHg for blood pressure measurements within two years prior to the echocardiogram study date. For UCSF and Keio, blood pressures were only available within the DICOM header, at the time of the study. ESRD status was defined as patients with an encounter diagnosis ICD-10 code of Z99.2 (dependence on renal dialysis).
ECG model architecture and training. The ECG model was constructed as a 2D-CNN based model. It consisted of a layer of 2D-CNN followed by 18 layers of multi-2D-CNN-module, which was constructed by 3 parallel multilayer CNNs concatenated at the end of the module (schematic shown in Supplementary Fig. 24, code is included as ECGModel.py). We placed a 50% dropout layer before the final fully connected layer to improve generalization. The model had 49,823,214 parameters total and 49,744,020 were trainable. The model was trained using data from ECG-Derivation cohort from BWH. ECGs were labeled as case=1 or control=0 and the model was trained to minimize the binary cross entropy between model prediction and the label using RMSprop optimizer with initial learning rate of 0.0001. The model was trained for 150 epochs. At the end of each epoch, Cstatistics on the ECG-validation cohort were calculated. The final model was chosen as the model with highest C-statistics on the validation cohort across all 150 epochs.
Echocardiography model architecture and training. Given that echocardiograms are videos, which are time-series of multiple frames, we constructed a 3D-CNN based model treating temporal axis as the 3 rd axis rather than taking a frame-byframe approach as done previously 17 , to maximize the ability of the model to use dynamic features in disease detection. This approach should, in principle, also enable detection of diseases if important features are only visible in a subset of frames. The model consisted of 3 layers of 3D-CNN followed by 12 layers of Multi-3D-CNN-module, which was constructed by 3 parallel multilayer 3D-CNNs and a max pooling operation concatenated at the end of the module (schematic shown in Supplementary Fig. 25, code is included as EchoModel.py). We placed a 40% dropout layer before the final fully connected layer to improve generalization. The scales of the video (in cm/pixel) was input into the fully connected layer. The model had 28,341,385 parameters total and 28,298,105 were trainable. The model was trained using data from echocardiography-derivation cohort from BWH. The echocardiography videos were labeled as case=1 or control=0 at the study level and was trained to minimize the binary cross entropy between model prediction and the label using RMSprop optimizer with initial learning rate of 0.0001. The model was trained for 50 epochs. At the end of each epoch, C-statistics on the echocardiography-validation cohort was calculated. The final model was chosen as the model with highest C-statistics on the validation cohort across all 50 epochs.

Echocardiography model comparison with expert cardiologist interpretation.
The performance of the echocardiography model to detect cardiac amyloidosis was compared with two expert cardiologists (SG: general cardiologist and MK: National Board-certified expert in Adult Comprehensive Echocardiography). The comparison was performed at the study level rather than individual video level. While the CNN model diagnostic output was based on only apical 4 chamber views, the experts had access to all the videos in each echocardiogram study to diagnose cardiac amyloidosis. The experts were blinded to model output. The experts labeled each study as cardiac amyloidosis positive or negative for 3 external validation datasets from MGH, UCSF and Keio. Sensitivity and specificity were calculated and compared with the ROC curve of the model. A subtype analysis on ATTR and AL amyloidosis was also performed.
Estimating positive predictive value of ECG, echocardiography, and combined ECG-echocardiography models. We estimated prevalence for cardiac amyloidosis within the population of patients with echocardiograms as follows. From our internal data across two large AMCs, we have found that over the past 4 years, 20-25% of the~16,000-18,000 unique patients who obtain an echocardiogram have at least one encounter diagnosis for heart failure. Of those we anticipate 50% to have heart failure with preserved ejection fraction (HFpEF), or 10-12.5% of patients. The percentage of cardiac amyloidosis within HFpEF is unknown but recent studies suggest proportions of 13-20% in selected subsets [2][3][4][5][6] . Given that these represented enriched populations, we assumed a lower value of 5-7%, which corresponds to 0.5-0.9% of our total population. This value is in keeping with prevalence analysis using 916 successive echocardiograms from Keio University, which included 7 patients with known cardiac amyloidosis (0.76%).
To estimate PPV for our ECG model, we identified 11,541 and 6,792 patients within our respective MGH and UCSF cohorts with an ECG followed by an echocardiogram within 180 days. (Supplementary Figs. 26 and 27). A single ECGechocardiography study pair was selected for each patient that had the shortest time between ECG and echocardiography studies. We deployed the ECG and echocardiography cardiac amyloidosis models on each study and defined the gold standard as individuals with an echocardiography model score of at least 0.8, a threshold that resulted in prevalence values of 0.60% and 0.62% for MGH and UCSF, respectively. We assessed the ability of ECG model to detect cardiac amyloidosis using precision-recall curve plots.
To assess the PPV for the echocardiography model, we estimated a likelihood ratio from the receiver operating characteristic curve 26 across the combined test sets for BWH, MGH, UCSF, and Keio. At a threshold of 0.8, the likelihood ratio of  the echocardiography model was 83.5. Assuming the above cardiac amyloidosis prevalence of 0.60% and 0.62% for MGH and UCSF, respectively, we were able to estimate an institution-level PPV for the echocardiography model. For the successive deployment of ECG and echocardiography models, we updated the PPV based on the prevalence expected from using only studies that exceeded a cutpoint of 0.7 from the output of the ECG model.
Statistical analysis. Data were collected and stored using Numpy package version 1.19.2 with Python 3.7.3. All the models were trained with Keras 2.3.0 on a Tensorflow 1.14.0 backend 27 . The ROC curves are plotted using the ggplot2 28 package (R 3.6.1) and the C-statistic, sensitivity, specificity, and 95% confidence intervals (using 2000 bootstrap samples) were calculated using the pROC 29 package (1.16.2). The precision-recall plots were made using the plotnine package (0.6.0) in Python 3.7.3. Continuous values are presented as mean ± standard deviation (SD) and categorical values are presented as numbers and percentages if not otherwise specified.
Ethics statement. This study complies with all ethical regulations and guidelines.
The study protocol was approved by local institutional review boards (IRB) of Mass General Brigham (2019P002651), UCSF (10-03386), Northwestern University (STU00207540) and Keio University (20200030). This study had minimal patient risk: it collected data retrospectively, there was no direct contact with patients, and data were collected after medical care was completed. Thus, and to recruit an unbiased and representative cohort of patients, data were collected under a waiver of informed consent, which was approved by the IRB. The only minimal risk was breach of confidentiality during data abstraction from the electronic health record system. As such any identifiable health information and study identifier linkage list were securely kept within the original institutions. The model training was done within Mass General Brigham by the authors at that institution (S.G. and R.C.D.). The model validation was run within each institution without sharing identifiable data. All authors had access only to de-identified data during the analysis phase.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The data that support the findings of this study are available on request from the corresponding author R.C.D. upon approval of the data sharing committees of the respective institutions. The data are not publicly available due to the presence of information that could compromise research participant privacy. Source data are provided with this paper.