The heart is a specialized muscle that contracts rhythmically around its closed chambers to propel blood. However, this pumping function fluctuates throughout the day as the circulating blood flow adapts to the body’s ever-changing metabolic demands1. Understanding the variations in cardiac pump activity with each heartbeat might have relevance for explaining the intricacies of heart function in health and disease. However, the tools for scrutinizing such changes remain imprecise. Writing in Nature, Ouyang et al.2 report the development of a computational platform that uses an artificial-intelligence (AI) approach to assess cardiac ultrasound video and to provide continuous, beat-by-beat measurement of cardiac pump function.
Clinicians commonly assess cardiac function using a value termed the ejection fraction, which is the percentage of the blood volume in the left heart chamber (the left ventricle) that is pumped out when the heart contracts. In a normal heart, just over half of the blood is ejected; thus, the calculated ejection fraction is more than 50%. Highly trained physicians can ‘eyeball’ ultrasound video loops of a beating heart and make a precise estimate of the ejection fraction3. However, if two isolated frames from the video were presented, showing only the beginning and the end of the ejection, even a trained physician would struggle to estimate the ejection fraction. Given that training and expertise vary from person to person, eyeballing is not relied on, and the ejection fraction is calculated by tracing the boundaries of the left ventricle on a digital image to estimate the blood volume at the beginning and end of ejection. It is recommended4 that clinicians estimate the ejection fraction of a heart by tracking it over three to five heartbeats; however, in typical clinical practice, often just one beat is assessed.
If the accuracy of estimates of ejection fraction could be improved by having an easy way to routinely determine its precise value by tracking and averaging several heartbeats, this would be of immense benefit, particularly for people whose hearts are beating out of rhythm (a condition termed arrhythmia). If arrhythmia occurs, the changing duration of heartbeats alters the volume of blood filled and ejected from the left ventricle, thereby resulting in variations in the ejection fraction (Fig. 1). This variability makes the ejection fraction challenging to estimate for a type of arrhythmia known as atrial fibrillation. It is predicted5 that this condition will affect between 6 million and 12 million people in the United States by 2050, and 17.9 million in Europe by 2060. Moreover, ejection fraction needs to be assessed frequently in people who have atrial fibrillation, because heart failure (a state characterized by a poor ability of the heart to pump blood) occurs in more than one-third of such individuals6. And more than half of people with heart failure have atrial fibrillation6.
To develop an AI-based method for assessing ejection fraction, Ouyang et al. used 10,030 cardiac ultrasound videos. These videos were stored along with images containing human-generated tracings that marked the inner border of the left ventricle at the beginning and end of the ejection cycle. The authors used a type of AI architecture called a convolutional neural network (CNN), first to perform a semi-automatic detection of a pattern of pixel-based information (segmentation) to recognize the left ventricle in the video frames; and second, to track the borders of the ventricle during the heartbeat cycle. Using CNN architecture to find the left-ventricle border in ultrasound images is not new7,8, but the innovation here is that Ouyang and colleagues evaluated new forms of three-dimensional CNN. This enabled them to integrate recognition of the left-ventricular border (the spatial information) from the 2D display in single video frames with the changes over time (the temporal information), to determine the information needed regarding the moving heart border. Forms of 3D CNN have been used previously in realms as diverse as general video analysis9,10, assessment of human physical activity11 and medical imaging12. However, Ouyang and colleagues’ work is, to our knowledge, the first attempt to take this approach in analysing cardiac ultrasound information over such a strikingly large number of videos.
After Ouyang and colleagues had ‘trained’ the 3D CNN using the video data, they compared the AI-generated estimates for the ejection fraction with human-measured ejection fractions. Their 3D CNN method estimated the ejection fraction with a mean error of 4.1% and 6%, respectively, for two different sets of data used for validation. In other words, on average, using the authors’ proposed 3D CNN method, the ejection fraction was estimated to within 95.9% and 94%, respectively, of the corresponding ejection-fraction measurement reported by a clinician. These reported AI errors are substantially lower than those reported in previous attempts to use CNN to estimate the ejection fraction7, and are well within the inter-observer variability in ejection-fraction measurements between experienced clinicians3.
Ouyang et al. then tested a further 55 patients for whom 2 ultrasound specialists separately assessed the heartbeat videos. The authors found that when the variability in the human- and AI-generated ejection-fraction estimates for each patient was compared, the 3D CNN method produced results with the least variability in the ejection fractions noted between the two recorded measurements. Furthermore, results obtained with the 3D CNN were extremely consistent across different ultrasound machines, and for measurements taken on different occasions. These results also indicate the importance of assessing the kinetics of cardiac-wall motion in developing a system for gauging cardiac function.
Several avenues of possible future work building on this research should be explored. Efforts to reduce the overall computational burden would be welcome, so that the technique could be performed inexpensively and instantaneously during an ultrasound examination. Ouyang and colleagues’ approach required 0.05 seconds per video frame, which they reported as being faster than the estimation speed of human experts. However, this is not yet as fast as real time, which would be less than 0.02 seconds per video frame (for a rate of 64 frames in 1.28 seconds). A careful look at the different stages in the overall architecture of the 3D-CNN deep-learning approach will be needed to determine the best architecture for use in existing cardiac-ultrasound technologies, such as 3D echocardiography and ultrafast cardiac ultrasound. Moreover, the choice of computational approach for handling videos that contain suboptimal images, or those in which the image quality has been improved by the injection of image-enhancing agents, will need to be considered.
This tool for the continuous assessment of cardiac pumping has the potential to affect other areas of cardiology. For example, such an approach might be adapted to monitor ultrasound changes in ejection fraction in people undergoing complex medical procedures, such as catheter-based cardiac interventions, surgery or when receiving medication or mechanical circulatory support for a condition termed acutely decompensated heart failure.
Furthermore, the use of 3D CNN to track other parameters that are more sensitive than ejection fraction for determining early changes in cardiac function (such as physical measures of heart-muscle deformation or changes in cardiac shape or geometry) that develop before a person shows disease symptoms might lead to new ways of measuring or identifying cardiac biomarkers (hallmarks of disease)13–17. Such automated approaches might be particularly relevant for the burgeoning ‘multi-omics’ approaches for data integration that incorporate different layers of biological information to define different stages of cardiac dysfunction18.
In this regard, we applaud the authors for making available to the research community a large data set of annotated ultrasound videos (presented stripped of information that could identify the individuals). This resource will be extremely useful, and will probably spur yet more innovations in automated analysis that will boost our understanding of cardiac function. Moreover, such steps will be needed to achieve greater consistency in results obtained using different imaging systems for assessing cardiac function (such as cardiac ultrasound, computed tomography and magnetic resonance imaging).
The ongoing efforts to improve the accuracy of automated measurements and disease prediction will, undoubtedly, ultimately free up extra time for physicians, enabling them to provide higher-quality clinical care and have better interactions with patients. Given the high health-care burden of cardiovascular disease worldwide, Ouyang and colleagues’ work is timely, and hints at an ensuing technological revolution that could have a profound effect on risk prediction of cardiovascular disease and on routine clinical decision-making.
Nature 580, 192-194 (2020)
Competing Financial Interests
P.P.S. is an adviser for HeartSciences, Ultromics and Kencor Health, and has received research grants and support from HeartSciences, Hitachi Aloka, EchoSense and TomTec.