Abstract
High-quality standard views in two-dimensional echocardiography are essential for accurate cardiovascular disease diagnosis and treatment decisions. However, the quality of echocardiographic images is highly dependent on the practitioner’s experience. Ensuring timely quality control of echocardiographic images in the clinical setting remains a significant challenge. In this study, we aimed to propose new quality assessment criteria and develop a multi-task deep learning model for real-time multi-view classification and image quality assessment (six standard views and “others”). A total of 170,311 echocardiographic images collected between 2015 and 2022 were utilized to develop and evaluate the model. On the test set, the model achieved an overall classification accuracy of 97.8% (95%CI 97.7–98.0) and a mean absolute error of 6.54 (95%CI 6.43–6.66). A single-frame inference time of 2.8 ms was achieved, meeting real-time requirements. We also analyzed pre-stored images from three distinct groups of echocardiographers (junior, senior, and expert) to evaluate the clinical feasibility of the model. Our multi-task model can provide objective, reproducible, and clinically significant view quality assessment results for echocardiographic images, potentially optimizing the clinical image acquisition process and improving AI-assisted diagnosis accuracy.
Similar content being viewed by others
Introduction
Two-dimensional transthoracic echocardiography is widely used as a non-invasive, radiation-free, low-cost, and real-time imaging modality to assess cardiac function and cardiovascular disease diagnosis1. Cardiac images acquired based on specific probe positions and angles are referred to as standard views, which combined provide comprehensive information on cardiac structure and function2. High-quality standard views are the basis for reliable cardiac parameter measurements and accurate diagnosis3,4. However, compared with other imaging modalities, the acquisition process for clinical echocardiographic images is less automated, with echocardiographers manually adjusting probe positions and parameters, subjectively recognizing individual standard views, and selecting high-quality image frames5. The entire process is cumbersome, time-consuming, highly dependent on the echocardiographer's experience and maneuvers, and prone to inter- and intra-observer variability6,7,8. When the acquired views are of poor quality or key views are missing, the interpretation by human experts or artificial intelligence (AI) models is compromised. Current automated diagnostic frameworks for cardiac diseases do not yet incorporate image quality control (QC) into the analysis process, necessitating manual pre-filtering of low-quality images and thereby limiting clinical applicability9. Ideally, real-time QC should be implemented during patient image acquisition to ensure that the optimal images are captured within the same examination. Therefore, performing automatic view recognition and quality assessment during image acquisition or before downstream image analysis tasks is crucial. The former directly generates a high-quality image base, while the latter screens the optimal images for subsequent analysis.
Recently, deep learning has been widely used in echocardiographic image analysis, enabling automated clinical workflows for view classification, cardiac structure extraction, cardiac function quantification, and cardiac disease diagnosis10,11,12,13. View classification is the first step in echocardiographic analysis. Previous studies have proposed view classification models based on convolutional neural networks (CNNs), such as VGGNet and ResNet, achieving good recognition performance14,15,16,17. On-going research on image quality assessment mainly targets natural images and focuses on the various distortions of natural images during acquisition, compression, storage, and transmission18,19. Noise and artifacts are commonly found in ultrasound images due to the coherent interference of scattered waves20. However, unlike natural images, noisy ultrasound images are not always of low quality. The quality assessment of ultrasound images must also consider clinical practice requirements, emphasizing the visibility and integrity of specific anatomical structures21. Several studies have implemented quality assessment based on echocardiographic images, which are broadly categorized into three forms: categorical confidence, quality level classification, and quality score regression. Huang et al.22 and Zhang et al.23 used classification confidence for standard view recognition as the image quality score. The view classification confidence level represents the model's confidence in the predicted results but does not directly represent the image quality level from a clinical practice perspective. Zamzmi et al.24 proposed a MobileNetV2-s-based encoder-decoder network to recognize four standard views and classify them into two quality levels (good or poor). However, providing a continuous numerical score feedback to the operator during the image-acquisition process is more helpful than providing discrete quality-level feedback. Abdi et al.25 categorized the quality of end-systolic apical 4-chamber (A4C) view frames into six levels (0 to 5) based on the visibility and clarity of anatomical structures and proposed a nine-layer CNN for score regression prediction. Some studies were conducted based on echocardiography videos. Luong et al.26 set four quality score levels (0.25, 0.5, 0.75, and 1.0) based on the visibility of anatomical structures, and combined the DenseNet and LSTM networks to simultaneously achieve view recognition and quality assessment for nine echocardiographic videos. Labs et al.27 combined four convolutional layers and an LSTM network to assign scores ranging from zero to ten to four quality attributes in A4C and PLAX videos, respectively. These studies provide effective standard view quality assessment methods; however, several limitations remain. First, the current image quality assessment criteria are limited in scope and inadequate for clinically meaningful assessment of complex and diverse standard views. Second, most studies focus on a single task, with little research on a comprehensive multi-view classification and quality assessment pipeline. Third, traditional CNN architectures progressively abstract image information with deeper network layers, potentially losing spatial details at lower layers28.
In this study, we proposed four image quality attributes based on clinical practice needs and established evaluation criteria for six standard view categories accordingly. We developed a multi-task model that integrates view classification and image quality assessment into a unified framework. By sharing feature representations, multi-task learning enables the simultaneous learning of multiple related tasks in a single training step, effectively facilitating the exchange of information between tasks, thereby improving the overall performance and efficiency of the model 29,30. Furthermore, we introduced the Feature Pyramid Network (FPN)31 into echocardiographic image quality assessment for the first time to achieve the fusion and utilization of multi-scale features.
Methods
An overview of the study is provided in Fig. 1. A multi-task deep learning model was trained on a dataset consisting of 170,311 echocardiographic images to automatically generate view categories and quality scores for clinical quality control workflows. This study conformed to the principles outlined in the Declaration of Helsinki and was approved by the Ethics Board of our institution (No. 2023–407).
Data
The study is a retrospective study. A large number of echocardiographic studies were randomly extracted from the picture archiving and communication system (PACS) of the Sichuan Provincial People's Hospital between 2015 and 2022 to establish the experimental dataset, with all subjects aged 18 and above. Images showing severe cardiac malformations that prevented recognition of anatomical structures were excluded. The dataset consists of 107,311 echocardiographic images and includes six standard views commonly used in clinical practice: A4C view, parasternal view of the pulmonary artery (PSPA), parasternal long axis (PLAX), parasternal short axis at the mitral valve level (PSAX-MV), parasternal short axis at the papillary muscle level (PSAX-PM), and parasternal short axis at the apical level (PSAX-AP). Except for these six views, all other views are classified as “others”. For standard views with unevenly distributed quality levels, we performed undersampling to balance the data distribution. All images were acquired using ultrasound machines from different manufacturers such as Philips, GE, Siemens, and Mindray. The dataset was randomly divided into training (70%), validation (10%), and test (20%) sets through stratified sampling (Table 1). The distribution of quality scores for three subsets can be found as Supplementary Figure S1 online.
Quality scoring method
We established percentage scoring criteria for different standard views based on four attributes: overall contour, key anatomical structural details, standard view display (see Supplementary Fig. S2 online for an example), and image display parameter adjustments. Each attribute contributed to the score in a ratio of 3:4:2:1. Table 2 presents the scoring criteria for the PLAX view. Two accredited echocardiographers with at least five years of experience individually annotated all images in the dataset. The average of their annotations was used as the final expert score label. A third experienced cardiology expert, with over ten years of experience, conducted a review assessment of images with score differences of > 10. The “others” view was set to zero points for training purposes.
Model development
The model architecture is shown in Fig. 2 and mainly consists of a backbone network, neck network, and two branch modules for view classification and quality assessment. The backbone network is used to learn and extract the multi-scale image features. We choose the output feature maps \(\left\{{S}_{2},{S}_{3},{S}_{4},{S}_{5}\right\}\) (with output sizes of 1/4, 1/8, 1/16, and 1/32 of the original resolution, respectively) from the last four stages of the backbone network as the input for the neck network. To obtain the best backbone network, we compared six different deep CNN architectures, namely, MobileNetV332, DenseNet12133, VGG1634, EfficientNet35, ResNet5036, and ConvNeXt37, and selected VGG16.
The neck network serves as an intermediate feature layer for further processing and fusing the features extracted from the backbone network for the two subsequent tasks. The highest-layer feature,\({S}_{5}\), is a more discriminative high-level semantic feature that reflects the network's understanding of the overall context of the image and is suitable for classification tasks. Lv et al.38 proposed that conducting the self-attention mechanism on high-level features with richer semantic concepts could capture the connections between conceptual entities in an image. Therefore, to further enhance the expressiveness of the features, we input \({S}_{5}\) into a Vision Transformer Block (VTB)39 that unites a multi-head attention layer and a feedforward layer to facilitate intra-scale feature interaction. The feature map after this step is denoted as \({S}_{5}{\prime}\), which is applied to view classification. Subsequently, FPN is applied to fuse the features of the four scales \(\left\{{S}_{5}{\prime},{S}_{4},{S}_{3},{S}_{2}\right\}\) in a layer-by-layer manner from the top down for cross-scale feature interactions. We denote the set of feature maps output from the FPN as \(\left\{{P}_{5},{P}_{4},{P}_{3},{P}_{2}\right\}\), and each feature map has strong semantic information. Next, we fuse all scale feature maps using an Adaptive Feature Fusion Block (AFFB) to better model image quality perception. As shown in Fig. 3, the AFFB module first upsamples the feature maps at different scales to the size of \({P}_{2}\) and then concatenates them. Subsequently, the channel attention is calculated using the Squeeze-and-Excitation Block40 to adaptively adjust the importance of each channel feature. Finally, element-wise addition is performed on the features from each scale to generate the final fused feature map \(\text{F}\), which is used to perform the quality assessment task.
For the view classification branch (VCB), a linear classifier is used to generate the view classification results. Simultaneously, a projection head is utilized to map the feature dimensions to a specified size to compute the Supervised Contrastive Loss41. The goal of supervised contrastive learning is to pull features of the same class closer together in the feature vector space while pushing the features of different classes apart. By applying supervised contrastive loss, we aimed to overcome the problem of small inter-class differences in echocardiographic images. For the quality assessment branch (QAB), a global average pooling is performed on feature map F to generate a K-dimensional feature vector, which is then fed to a multilayer perceptron (MLP) to fit and generate the final image quality score.
Model training
We jointly trained the model using the cross-entropy loss (for the view classification task), supervised contrastive loss, and mean squared error loss (for the quality assessment task). Additionally, to address the imbalance problem in multi-task training, an auto-tuning strategy42 was applied to learn the relative loss weights for each task.
The model was implemented in Python v3.8.12 using PyTorch v1.12.0 and was iteratively trained on two NVIDIA GeForce RTX 3090 GPUs, each with 24 GB of RAM. During training, the initial learning rate was set to 1e-5, and the batch size was set to 128. The Adam optimizer with a weight decay of 1e-5 was used. The input images were resized to 224 × 224, and pixel values were normalized to the range 0 to 1. No data augmentation was performed to prevent changes in image quality. An early stop strategy was used to stop training and reduce overfitting. The best model from the validation set was applied to the test set to evaluate the model performance.
Evaluation metrics
Five performance evaluation metrics, accuracy (ACC), precision (PRE), sensitivity (SEN), specificity (SPE), and F1 score (F1), were applied to validate the view classification performance. A confusion matrix was constructed to analyze the classification effect on different views. For quality assessment, Pearson’s linear correlation coefficient (PLCC), Spearman’s rank-order correlation coefficient (SROCC), mean absolute error (MAE), and root mean square error (RMSE) were used as evaluation indices. Indicators, such as the number of model parameters and inference time, were also considered to comprehensively evaluate the model performance. The Kruskal-Wallis test was employed to assess significant differences among the independent groups, with p < 0.05 considered statistically significant. For multiple comparisons, the Dunn-Bonferroni tests were applied. The Bootstrap analysis technique was utilized to calculate the 95% confidence intervals. Statistical analyses were conducted using SPSS v27.0 or Python v3.8.12.
Results
Evaluation of view classification task
The overall accuracy of the view classification task on the test set was 97.8% (95%CI, 97.7–98.0), with macro-average PRE, SEN, SPE, and F1 exceeding 94.8% (Table 3). The confusion matrix is shown in Fig. 4, which indicates that the model is prone to confusion when recognizing the three parasternal short axis views. The Grad-CAM maps in Fig. 5 reveal the image regions on which the model focuses when making classification decisions. It can be observed that the A4C view focuses on the mitral valve, tricuspid valve, ventricular septum, and atrial septum. The PLAX view focuses on the aortic and mitral valves, while the PSPA view focuses on the pulmonary artery wall and pulmonary valve. Additionally, the three similar parasternal short axis views effectively focus on key anatomical structural details, including the fish-mouth-like mitral valve orifice (mitral valve level), two sets of strong echo papillary muscles (papillary muscle level), and the annular left ventricular wall structure in three planes (apical level). To further show the robustness of the model, the Grad-CAM maps following data augmentation and under pathological conditions can be found in Supplementary Figures S3, S4 online, respectively.
Evaluation of quality assessment task
The results of the quality assessment task on the test set are presented in Table 4. The average PLCC and SROCC values were 0.898 (95%CI, 0.893–0.902) and 0.893 (95%CI, 0.888–0.897), respectively, indicating a strong correlation between the model-predicted and expert subjective scores. The average MAE and RMSE values were 6.54 (95%CI, 6.43–6.66) and 9.42 (95%CI, 9.24–9.60), respectively, which are within the acceptable range relative to the label range of 0–100. The scoring effect of the samples for each view is shown in Fig. 6. It can be seen that there is a significant image quality improvement as the score increases.
Effect of different backbones and additional modules on the proposed method
The performance of the proposed method when implemented on different backbone networks is presented in Table 5. Compared with other CNNs, VGG16 achieved the best trade-off between accuracy, number of parameters, and inference time. To analyze the effectiveness of each module in the proposed method, the ablation experiments were conducted using the VGG16-based quality assessment model (single task) as the baseline. As shown in Table 6, with the sequential addition of the neck network, view classification task, and supervised contrastive loss modules, the performance of our model was significantly improved. Furthermore, we conducted a comparison to assess the impact of including or excluding the "others" view on model performance.
Application of the proposed method for echocardiographic image quality analysis
To verify the feasibility of our proposed method for standard view quality assessment, we compared the archived image quality among three groups of echocardiographers with different levels of experience (3 juniors, 3 seniors, and 3 experts). The junior group has 1–2 years of experience, the senior group has 4–5 years of experience, and the expert group has over 10 years of experience. The distribution of manufacturers among the three groups of echocardiographers was relatively balanced. We hypothesize that the image quality from the expert group is higher compared to the other groups. Images collected by nine echocardiographers between July and December 2023 were predicted using the proposed model. The subjects were males aged 18–40 years, without obvious cardiac structural or functional abnormalities. Based on the predictions, 6000 images were randomly selected from each echocardiographer, comprising 1000 images per view type. In total, 54,000 images from nine echocardiographers, covering six standard views, were used for statistical analysis.
The Kruskal-Wallis test indicated that there was a significant difference in quality scores across the three groups of echocardiographers on each view (p < 0.001). The box plot further illustrates the distribution of quality scores for the three groups across six standard views (Fig. 7). After adjusting for multiple comparisons, the median quality score of the expert group was higher than that of both the senior and junior groups for each view (p-adj < 0.001). Except for the PSPA view, the median quality score of the senior group was higher than that of the junior group (p-adj < 0.001).
Discussion
In this study, we developed and validated a multi-task model that simultaneously performs view recognition and percentile image quality assessment for seven types of views. The rationale for integrating these two tasks into a single model is that they are interrelated, as the view type determines the focus of the quality assessment. The results of the ablation experiments show that it is feasible to train a generic model to extract features from different echocardiographic views for quality assessment. Furthermore, introducing view classification as an auxiliary task can provide additional support for feature learning, which improves the quality assessment performance. The model performs well on both tasks. For the view classification task, misclassifications were mainly focused on three parasternal short axis views, which were mainly attributed to the high similarity between their anatomical structures. However, guided by the supervised contrastive learning loss to learn distinctive feature representations, relatively accurate recognition results can be obtained. For the quality assessment task, the results show that the proposed model incorporating quality-aware features at different scales effectively learned the judgment criteria used by human experts in echocardiographic image quality assessment. Even for the PSAX-AP view, our proposed model achieved acceptable results with a small sample size.
Compared with previous methods, our study has some strengths. For quality assessment, we summarized four clinically significant quality attributes that ensure image quality scores are closely aligned with diagnostic value. We applied the model to analyze archived images from echocardiographers of varying experience levels and confirmed that those with higher levels of experience produced higher-quality images. This demonstrates the model's ability to analyze image quality from a clinical diagnostic perspective effectively. Regarding model design, Prior methods focused solely on high-level single-scale features and overlooked low-level details. To address this issue, we added a hierarchical neck network to perform multi-scale perception modeling at a low computational cost, simulating the human visual system's hierarchical processing of visual stimuli at different scales. The results indicate that the quality assessment task significantly improved by adaptively integrating high-level semantic information with low-level detailed information through the neck network. From a clinical application perspective, our study utilizes echocardiographic images, rendering it more pertinent to real-world practice compared to video-based studies. The proposed model accurately captures static images at each moment, avoiding the complexity and computational costs incurred by using a 2D + t model to process dynamic video data. Additionally, our dataset encompasses six standard views and classifies all other views into an "others" category, enabling the model to directly differentiate between the six target views and other views. The results show that the introduction of the “others” view increases data diversity and improves the view categorization effect, with only a minor compromise on the quality assessment effect. In contrast, the model proposed by Luong et al.26 does not include an "others" category and relies solely on a confidence threshold to classify images: images with confidence below the threshold belong to the "others" category, while those with confidence above the threshold belong to one of the target views. However, since lower-quality target views may exhibit lower confidence, setting an effective threshold to distinguish them from "others" is quite difficult. Particularly in AI-assisted diagnosis, misclassifying other views as the required standard views can significantly affect diagnostic accuracy.
The proposed multi-task model effectively reduces the deployment pressure by merging the two tasks, achieving a good trade-off between accuracy and inference time. After deployment on a 3090 GPU, the model required no more than 2.8 ms to process a frame of 224 × 224 pixels. The model can be developed as part of a QC system that is applicable in several clinical scenarios. In echocardiography training, immediate feedback regarding the types of views and quality scores can help novice operators master the technical essentials of echocardiography more quickly and alleviate the shortage of faculty in underdeveloped areas43. During echocardiographic examinations, the system can assist operators in standardizing imaging and monitoring the progress of view acquisition, reducing measurement variability, and improving diagnostic quality44,45. Furthermore, the system can perform post-analysis on large-scale stored images or serve as a preprocessing step for AI-assisted diagnostic systems, selecting high-quality, interpretable cardiac ultrasound images from pre-stored data.
Our study has several limitations. First, the standard views covered are limited, and commonly used apical series views such as the apical 2-chamber and apical 3-chamber need to be further incorporated. Since our method does not impose specific constraints on view selection, it can theoretically accommodate additional standard echocardiographic views. Second, our method generates an overall quality score for echocardiographic images, and the individual scoring of different quality attributes should be explored in future research. Third, although the model was developed with a diverse dataset, its robustness and reliability still necessitate further validation in real-world clinical settings.
Data availability
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.
Code availability
The code for this work is available upon request.
References
Lang, R. M. et al. Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American society of echocardiography and the european association of cardiovascular imaging. Eur. Heart J. Cardiovasc. Imaging 16(3), 233–271 (2015).
Mitchell, C. et al. Guidelines for performing a comprehensive transthoracic echocardiographic examination in adults: recommendations from the American Society of Echocardiography. J. Am. Soc. Echocardiogr. 32(1), 1–64 (2019).
Nagata, Y. et al. Impact of image quality on reliability of the measurements of left ventricular systolic function and global longitudinal strain in 2D echocardiography. Echo Res. Pract. 5(1), 28–39 (2018).
Foley, T. A. et al. Measuring left ventricular ejection fraction-techniques and potential pitfalls. Eur. Cardiol. 8(2), 108–114 (2012).
Zhou, J., Du, M., Chang, S. & Chen, Z. Artificial intelligence in echocardiography: detection, functional evaluation, and disease diagnosis. Cardiovasc. Ultrasound 19(1), 1–11 (2021).
Letnes, J. M. et al. Variability of echocardiographic measures of left ventricular diastolic function. The HUNT study. Echocardiography 38(6), 901–908 (2021).
Liao, Z. et al. On modelling label uncertainty in deep neural networks: automatic estimation of intra-observer variability in 2d echocardiography quality assessment. IEEE Trans. Med. Imaging 39(6), 1868–1883 (2019).
Ouyang, D. et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580(7802), 252–256 (2020).
Liu, B. et al. A deep learning framework assisted echocardiography with diagnosis, lesion localization, phenogrouping heterogeneous disease, and anomaly detection. Sci. Rep. 13(1), 3 (2023).
Barry, T. et al. The Role of Artificial Intelligence in Echocardiography. J. Imaging 9(2), 50 (2023).
Sehly, A. et al. Artificial Intelligence in Echocardiography: The Time is Now. Rev. Cardiovasc. Med. 23(8), 256 (2022).
Kusunose, K. Steps to use artificial intelligence in echocardiography. J. Echocardiogr. 19(1), 21–27 (2021).
Wang, W. et al. An Automated Heart Shunt Recognition Pipeline Using Deep Neural Networks. J. Imaging Informatics Med. 1–16 (2024).
Madani, A., Arnaout, R., Mofrad, M. & Arnaout, R. Fast and accurate view classification of echocardiograms using deep learning. NPJ Digit. Med. 1(1), 6 (2018).
Santosh Kumar, B. P. et al. Fine-tuned convolutional neural network for different cardiac view classification. J. Supercomput. 78(16), 18318–18335 (2022).
Belciug, S. Deep learning and Gaussian mixture modelling clustering mix a new approach for fetal morphology view plane differentiation. J. Biomed. Inform. 143, 104402 (2023).
Wu, L. et al. Standard echocardiographic view recognition in diagnosis of congenital heart defects in children using deep learning based on knowledge distillation. Front. Pediatr. 9, 770182 (2022).
Yang, S. et al. Maniqa: Multi-dimension attention network for no-reference image quality assessment. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 1191–1200 (2022).
Zhang, S. et al. CNN-based medical ultrasound image quality assessment. Complexity 2021(1), 9938367 (2021).
Zhang, F., Yoo, Y. M., Koh, L. M. & Kim, Y. Nonlinear diffusion in Laplacian pyramid domain for ultrasonic speckle reduction. IEEE Trans. Med. Imaging 26(2), 200–211 (2007).
Czajkowska, J., Juszczyk, J., Piejko, L. & Glenc-Ambroży, M. High-frequency ultrasound dataset for deep learning-based image quality assessment. Sensors 22(4), 1478 (2022).
Huang, K. C. et al. Artificial intelligence aids cardiac image quality assessment for improving precision in strain measurements. Cardiovasc. Imaging 14(2), 335–345 (2021).
Zhang, J. et al. Fully automated echocardiogram interpretation in clinical practice: feasibility and diagnostic accuracy. Circulation 138(16), 1623–1635 (2018).
Zamzmi, G., Rajaraman, S., Hsu, L. Y., Sachdev, V. & Antani, S. Real-time echocardiography image analysis and quantification of cardiac indices. Med. Image. Anal. 80, 102438 (2022).
Abdi, A. H. et al. Automatic quality assessment of echocardiograms using convolutional neural networks: feasibility on the apical four-chamber view. IEEE Trans. Med. Imaging 36(6), 1221–1230 (2017).
Luong, C. et al. Automated estimation of echocardiogram image quality in hospitalized patients. Int. J. Cardiovasc. Imaging 37, 229–239 (2021).
Labs, R. B., Vrettos, A., Loo, J. & Zolgharni, M. Automated assessment of transthoracic echocardiogram image quality using deep neural networks. Intell. Med. 3(03), 191–199 (2023).
Ding, Y. et al. AP-CNN: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification. IEEE Trans. Image Process. 30, 2826–2836 (2021).
Zhang, Y. & Yang, Q. An overview of multi-task learning. Natl. Sci. Rev. 5(1), 30–43 (2018).
Xu, Z., Zhang, Q., Li, W., Li, M. & Yip, P. S. F. Individualized prediction of depressive disorder in the elderly: a multitask deep learning approach. Int. J. Med. Inform. 132, 103973 (2019).
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B. & Belongie, S. Feature pyramid networks for object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2117–2125 (2017).
Howard, A. et al. Searching for mobilenetv3. Proc. IEEE/CVF Int. Conf. Comput. Vis. 1314–1324 (2019).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 4700–4708 (2017).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Int. Conf. Learn. Representations 1–14 (2015).
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. Int. Conf. Mach. Learn. 6105–6114 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 770–778 (2016).
Liu, Z., Mao, H., Wu, C. Y., Feichtenhofer, C., Darrell, T. & Xie, S. A convnet for the 2020s. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 11976–11986 (2022).
Zhao, Y. et al. Detrs beat yolos on real-time object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 16965–16974 (2024).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. Preprint at https://doi.org/10.48550/arXiv.2010.11929 (2021).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 7132–7141 (2018).
Khosla, P. et al. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 33, 18661–18673 (2020).
Liebel, L. & Körner, M. Auxiliary tasks in multi-task learning. Preprint at https://doi.org/10.48550/arXiv.1805.06334 (2018).
Narang, A. et al. Utility of a deep-learning algorithm to guide novices to acquire echocardiograms for limited diagnostic use. JAMA Cardiol. 6(6), 624–632 (2021).
Ferraz, S., Coimbra, M. & Pedrosa, J. Assisted probe guidance in cardiac ultrasound: A review. Front. Cardiovasc. Med. 10, 1056055 (2023).
Zhang, Z. et al. Artificial intelligence-enhanced echocardiography for systolic function assessment. J. Clin. Med. 11(10), 2893 (2022).
Funding
This work was supported by the Sichuan Science and Technology Project (grant no. 2023YFQ0006) Project.
Author information
Authors and Affiliations
Contributions
X.L: Software, Formal analysis, Methodology, Writing-Original Draft. H.Z: Conceptualization, Data curation, Writing-Review & Editing. J.Y: Methodology, Writing-Review & Editing. L.Y, W.L, G.D and B.P: Data curation, Writing—Review & Editing. S.X: Conceptualization, Methodology, Writing—Original Draft, Supervision, Funding acquisition. All authors approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This study was approved by the Ethics Board of Sichuan Provincial People's Hospital (No. 2023–407). The Ethics Board of Sichuan Provincial People's Hospital also approved the waiver of informed consent for this study.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, X., Zhang, H., Yue, J. et al. A multi-task deep learning approach for real-time view classification and quality assessment of echocardiographic images. Sci Rep 14, 20484 (2024). https://doi.org/10.1038/s41598-024-71530-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-71530-z
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.