Main

With the ever-growing volume of available data in the biomedical domain, artificial intelligence and machine learning (AI/ML) models are showing great potential across a wide range of healthcare applications, from medical image computing and analysis1,2,3,4 to implantable health monitoring5. The performance of AI/ML models has been demonstrated to be comparable to or even superior to human expert performance in various healthcare applications6,7. The adoption of AI/ML models in healthcare has substantially reduced labour costs and freed doctors from tedious manual work8. With increasing computing power and the ever-growing available healthcare data, AI/ML models, while achieving better inference performance, are currently experiencing an exponential increase in terms of their size and demand for resources.

Unfortunately, with the exponential growth in the size of models, as well as the associated increase in computational complexity and rapid growth in the volume of health data, the development of accurate AI/ML models, which often consume extensive resources for training as well as testing, is facing critical sustainability issues in relation to energy, storage, computing power, networking and domain expertise. For healthcare applications with access to powerful computing infrastructure, the energy consumed in the operation of large AI/ML models may also be subject to unsustainability issues as a result of the slowdown in advances in hardware platforms. As model sizes grow, the amount of health data required for training will also increase dramatically. For healthcare applications for which cloud servers are easily accessible through network connections, a storage sustainability issue arises in relation to upgrading and maintaining current storage infrastructures, with the high associated costs (when budgets are often limited). For applications that are restricted to edge computation of embedded hardware, the energy, computing power, networking and storage sustainability issues are even more severe as there are already constraints in power and areas such as security, privacy and latency. In addition to hardware-related sustainability issues, unsustainable domain expertise in health data labelling has also restricted the development of AI/ML in healthcare.

Although there have been attempts to address certain types of resource constraint in AI/ML for healthcare, the proposed methods have mostly been devised to ‘passively’ deal with specific resource constraint issues. Few known methods have been systematically designed to proactively tackle resource sustainability issues for general current or future developments. We believe that addressing bottlenecks in algorithm and system design with sustainability awareness and promoting collaborations between academia and industry are key to resolve emerging resource sustainability issues. In this Perspective, we first demonstrate that resource sustainability issues are commonplace in AI/ML methods for healthcare applications, then discuss various algorithms and system approaches that can help alleviate these sustainability issues. Finally, we outline future directions for proactively and prospectively tackling these issues.

Resource sustainability issues

Sustainability is critical for AI/ML applications in healthcare. In previous developments of AI/ML healthcare systems, resource sustainability issues were often neglected, and it was implicitly assumed that there would always be adequate resources for future AI-based health data analysis. However, for AI/ML-based health applications where powerful servers and supercomputers are readily accessible, the current technology evolution trends may continue, but at the cost of unsustainable energy consumption. According to estimates, carbon emissions must be reduced by half over the next ten years to prevent an increase in the frequency of natural disasters9. For those applications that have to be conducted on the edge due to security, privacy and/or real-time constraints, the push for much more advanced deep learning technologies will very soon hit the wall in terms of unsustainable energy budget, computing power, network bandwidth, storage capacity and so on. Furthermore, the shortage of healthcare domain expertise and expert time in diagnosing, labelling and cross-validating diagnoses will become more severe as the volume of health data increase. We will focus on these critical resource sustainability issues in AI/ML for healthcare.

Energy consumption outpaces efficiency gains

We have examined data from recent years regarding the capacity of state-of-the-art (SOTA) deep neural networks for healthcare applications, as well as energy efficiency performances of hardware platforms. The results show that an energy sustainability issue exists between the increasing model complexity required for better accuracy and the advances in the hardware architectures needed to accommodate deep models.

Figure 1a shows that the memory energy efficiency of hardware platforms is not keeping up with the increasing sizes of networks, resulting in a growing gap between the two. Representative networks were chosen from two prevalent healthcare tasks (medical image segmentation and biomedical information processing). These two tasks are representative examples of fundamental information extraction processes in the healthcare field, from visual (for example, medical images) and textual (for example, electronic healthcare records) modalities10. The dashed lines in Fig. 1a show that the number of parameters of deep neural networks has increased exponentially over the past few years. The solid lines in Fig. 1a present the energy efficiency of static random-access memory (SRAM) and dynamic random-access memory (DRAM). Data moving dominates the energy consumption of hardware platforms, and the total amount of energy consumed by memory is directly proportional to the number of deep model parameters11. From 2011 to 2017, the energy efficiency of DRAM, including double data rate, low-power double data rate, graphics double data rate and three-dimensional (3D) DRAM, increased exponentially, according to Moore’s law-based complementary metal–oxide–semiconductor scaling11,12. Since 2017, DRAM has not experienced a dramatic improvement in terms of energy efficiency. The energy efficiency trend of SRAM is typically bounded by Moore’s law, because SRAM is realized with complementary metal–oxide–semiconductor transistors11. The trend lines in Fig. 1a show that improvements in memory energy efficiency cannot keep up with the increasing sizes of deep models in healthcare. Accordingly, with the rising memory energy demand, sustainability issues will arise by exceeding the limited energy budget for network inference and training.

Fig. 1: Unsustainable energy resource issues caused by the gap between model complexity and efficiency.
figure 1

a, Dashed lines show the trend of the number of parameters versus years for representative AI/ML models in medical image segmentation (from 7 million parameters with U-Net73 to 200 million parameters with Swin Transformer74) and biomedical information processing (from 2 million parameters with autoencoder eNRBM75 to 8.9 billion for BERT-based GatorTron76), respectively. Solid lines show the memory energy efficiency of DRAM and SRAM. Memory efficiency cannot accommodate the exponentially increasing number of memory accesses under an unsustainable energy budget. b, Dashed lines show the trend of the number of floating point operations versus years for representative AI/ML models in medical image segmentation (from 4.84 GFLOPs for U-Net73 to 362.1 GFLOPs for SETR77) and biomedical information processing (from 88 GFLOPs for BioBERT78 to 18,000 GLOPs for GatorTron76), respectively. The solid line shows the trend of computation energy efficiency of SOTA GPUs from 2011 to 2022. Computation energy efficiency cannot accommodate the exponentially increasing number of required floating point operations. Only representative AL/ML models are annotated each year. Both y axes are in log scale. Data are taken from refs. 11,12,13,21,73,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106.

An energy sustainability issue also arises in relation to the exponentially increasing computational demands of AI/ML models in comparison to the computation energy efficiency, which shows relatively slow improvement (Fig. 1b). The dashed lines in Fig. 1b show that the number of giga floating point operations (GFLOPs) of networks in medical image segmentation and biomedical information processing experienced an exponential increase from 2015 to 2022. A corresponding trend is shown in the advancement of hardware efficiency in computation (solid line in Fig. 1b). The efficiency of the float32 precision format of SOTA graphics processing units (GPUs) in desktops and servers from each year is reported. Units of GFLOPs per joule are utilized as a measure of hardware computation energy efficiency. Figure 1b shows that the trend of computation energy efficiency improvement cannot keep up with the increasing trend of the computational complexity of deep models in either the medical image segmentation or biomedical information processing tasks. Although the energy consumption of a single inference may not grow at the same rate as the number of model parameters, thanks to systematic optimization13, the total energy consumption of frequent and multiple inferences in healthcare tasks remains unsustainable with the limited energy budgets available.

Furthermore, training large AI models requires massive amounts of energy, substantially higher than for simply performing inference14,15,16. For example, full training of a BERT model consumes ~103,593 kWh of electricity9. The use and deployment of AI models can also contribute to carbon emissions through the use of energy-intensive hardware, such as GPUs and tensor processing units (TPUs). To obtain a model architecture for better accuracy performance, running a neural architecture search over all BERT model parameters would consume even more electricity (656,347 kWh, which is as much as six times that for full training of a BERT model17,18) and cause more carbon emissions (626,155 pounds CO2 emissions, as much as the lifetime emissions of five cars9,18,19).

As shown in Fig. 1, it is clear that the energy efficiency of leading hardware (that is, memory and computing capacity) cannot keep up with the deep model complexity required for better accuracy and broader usage. Simply increasing the number of hardware platforms to accommodate more complex deep models is not a sustainable solution due to the limited energy budget.

Computing power demand outpaces performance density gains

The use of AI/ML in healthcare requires large amounts of computing power, and this can have substantial sustainability implications. The rapid development of AI/ML models, with their growing computational demand, has led to a constant need for more powerful hardware, such as GPUs, to accommodate this increasing demand for computing power. As shown in Fig. 1b, the number of floating point operations performed by one inference increases from 4.84 GFLOPs for U-Net to 18,000 GFLOPs for GatorTron. The required computing power further increases in scale with the increasing number of model inferences conducted for various tasks. Training AI/ML models is an even more computing-intensive task. For example, it requires 1.9 × 105 peta FLOPs (PFLOPs) to train a BERT model20 for biomedical information processing and 3.1 × 108 PFLOPs to train the GPT-3 model utilized for interactive computer-aided diagnosis21,22. A lack of computing power would be an obstacle to training an AI/ML model or conducting model inference, especially given the current pursuit of large models, further hindering technological progress.

The rapidly growing computing power demand for AI/ML model development and deployment has led to an unsustainable situation where simply increasing the number of hardware platforms cannot address the issue of a limited budget, as shown in Fig. 2. Figure 2a depicts how the performance density (the computing capacity per unit area, in giga floating point operations per second (GFLOPS) per square millimetre) of leading GPUs has gradually improved over the past few years. From 2015 to 2022, there has been a great improvement in the performance density, which improves by around tenfold, with the process size scaling down. However, the growth rate still cannot catch up with the growth rate of AI/ML model computational complexity. This has resulted in an exponentially increasing computation density demand (that is, the time and chip area product to complete one model inference on the leading GPU with the highest performance density), as indicated by dashed lines in Fig. 2a. The computation density demand is approximately obtained by dividing the total number of GFLOPs of a model by the performance density of the leading GPU (GFLOPS mm−2) in the same year the model was developed. The same applies for performance cost (the computing capacity per unit expense in GLOPS per US dollar), as shown in Fig. 2b. The price of each year’s leading GPU is set to match the buying power in January 2023 by considering inflation, using the tool provided by the US Bureau of Labor Statistics23. As the performance cost of leading GPUs increases, the computation expense demand (the time and expense product to complete one model inference on the leading GPU with the lowest performance cost) grows exponentially in both healthcare tasks, as indicated by dashed lines. The computation expense demand is approximately obtained by dividing the total number of GFLOPs of a model by the performance expense of the leading GPU (GFLOPS $−1) in the same year the model was developed.

Fig. 2: Unsustainable computing power resource issues caused by the gap between model complexity and computing capacity.
figure 2

a, The solid line shows the trend of the increasing performance density of leading GPUs from 2015 to 2022. Dashed lines show the trend of the time and chip area product of the SOTA AI/ML model inference conducted on the leading GPU in medical image segmentation (orange) and biomedical information processing (red), respectively. An exponential increase in computing time or chip area over time is needed for both tasks over the past five years. b, The solid line shows the trend of the increasing performance cost of the leading GPUs from 2015 to 2022. Dashed lines show the trend of the time and expense product of the SOTA AI/ML model inference conducted on the two healthcare tasks. An exponential increase in computing time or expense is needed for both tasks over the past five years. Both plots show that computing power cannot keep up with the increasing computational demand (in terms of chip area, expense and time) of AI/ML models in healthcare. Only the leading GPUs on desktop or servers for each year are selected. The y axis is in log scale. Data are taken from refs. 11,12,13,21,73,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107.

Accordingly, there is a sustainability issue in that the computing power of leading hardware platforms cannot keep up with the rapidly increasing computation complexity of AI/ML models in healthcare. Simply increasing the number of GPUs, performance density or expense budget is not a sustainable solution. Furthermore, computing hardware platforms are generally bounded by Moore’s law12, and the computing capacity of SOTA GPUs may not increase with an exponential growth rate when Moore’s law ends as we approach the limit of process size11.

Data storage outpaces infrastructure development

Along with the increasing model complexity, biomedical data involved in AI/ML model development have also experienced sharp growth in terms of volume and resolution. We now examine recent medical image datasets utilized in deep model training. The results show that the storage sustainability issue arises in the current storage infrastructure of research centres and medical institutes.

Figure 3 demonstrates a dramatic increase in storage requirements for medical image data. In particular, the solid line in Fig. 3 shows the number of pixels/voxels in a single medical image from representative datasets used in deep model inference. Recent advances in biomedical image acquisition technologies have led to an upscale in image resolution. For example, the size of three-dimensional (3D) micro-computed tomography (micro-CT) images of mouse skull cartilages and bones has increased from 200 × 5,122 to 1,500 × 20,002 voxels, requiring around 12 GB to store a single 3D image1. As for the resolution of 2D images, in the ACHIGMU (Affiliated Cancer Hospital and Institute of Guangzhou Medical University) dataset24, a single histopathological image is scanned at ×20 magnification with an average size of 68,096 × 125,440 pixels, which requires ~16.3 GB for a single image. In addition, data volumes utilized in AI/ML healthcare applications have also become overwhelming. The dashed line in Fig. 3 demonstrates the volumes of representative datasets utilized in deep model training, showing that the memory capacity required to store datasets (utilized in deep model training) has increased from 1.7 GB (for the dataset X rays-Bone25) to 34,779 GB (for the dataset WSIs-Lung26). With increasing model sizes, the required data volume for properly training a deep model is also increasing. The multimodal imaging of proteomics27, cell segmentation28, super high-resolution 3D imaging1,2,3 and the enormous amount of CT scan images29 are further pressing the storage resources needed for cloud computing infrastructures. For example, the cancer prognostication described in ref. 30 proposes a deep learning-based multimodal fusion algorithm that uses both whole slice images and molecular profile features (that is, mutation status, copy-number variation, RNA sequencing expression), which amounts to over 7,000 GB in terms of data volume.

Fig. 3: Unsustainable storage issue caused by the increasing data volume and resolution of medical data utilized in AI/ML model development.
figure 3

The solid line shows the trend of the number of pixels/voxels in a single medical image used in model inference. The dashed line shows the trend of the volume of a medical image dataset utilized for model training. The explosively increasing sizes of a single medical image and image dataset challenge storage infrastructure. Only representative medical image datasets are annotated each year. The y axis is in log scale. Data are taken from refs. 1,24,25,26,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124.

Furthermore, existing storage infrastructures may not be able to keep up with the increasing storage capacity required by medical image datasets. Health institutes and universities often provide two tiers of storage: locally maintained on-campus network-attached storage and storage infrastructure provided and maintained by cloud storage vendors31. The cost of on-campus network-attached storage can be up to US$3,000 per terabyte per year, and the cost of storage on high-performance computation cluster facilities can be up to US$3,600 per terabyte per year31. What is worse, universities and institutes do not generally offer more than a few terabytes of storage. With the increasing data volume and the total number of datasets for model training, the existing storage infrastructure is rather impractical and unsustainable in terms of memory capacity and budget. The limited storage capacity will greatly obstruct AI/ML development in healthcare. When it comes to cloud storage, integrating commercial cloud storage infrastructure into the AI/ML development process is not a simple task due to concerns regarding the transparency, privacy and security of sensitive health data7. Although commercial cloud vendors can provide sufficient storage capacity, there is a risk of exposure and misuse of sensitive data without additional legislation and regulation in place for the third-party cloud storage platforms used for AI in healthcare7,32. As such, as of today, the utilization of commercial cloud storage for healthcare data is rare.

Data transmission outpaces network infrastructure development

The integration of AI/ML into healthcare systems generates large amounts of data and AI/ML models that need to be transmitted between devices and servers for a range of purposes, such as remote diagnosis33 and collaborative learning34. The transmission efficiency of network communication infrastructure is particularly important in healthcare application scenarios where real-time data transmission is critical for patient care. Bandwidth limitations can result in delays or loss of data, which can have serious consequences for point-of-care patient outcomes.

This process thus comes with a substantial sustainability challenge related to network infrastructure. The network infrastructure in healthcare facilities is not designed to handle the massive amounts of data that deep learning requires35. A lack of high-speed data transmission required by real-time decision-making is putting further pressure on the current networking infrastructure, resulting in issues such as excessive latency.

Accordingly, the sustainability of network communication infrastructure is a critical issue in AI for healthcare. It requires immediate attention and efforts towards developing sustainable solutions to establish efficient communication while ensuring the effective processing and analysis of healthcare data using AI/ML.

Data preparation effort outpaces experts’ load

In healthcare applications in particular, data preparation (that is, annotation and label verification) is a critical process to guarantee the performance of an AI/ML model and establish the trust of users and doctors. Often under-appreciated by AI/ML developers, data annotation and verification for AI/ML model training is a labour-intensive and time-consuming task, can take up to several hours for a single image, and results in expert load sustainability issues, as shown in Fig. 4.

Fig. 4: Unsustainable expert load on health data preparation for AI/ML model training.
figure 4

a, The unsustainable expert load issue caused by the gap between the increasing number of acquired radiological images and the radiologists required for annotation and interpretation. The solid orange, dash-dotted blue and dashed green lines respectively show the numbers of medical images in datasets of X-ray, CT and MRI modalities utilized in AI/ML model training over the past ten years. The yellow dashed line shows the number of registered radiologists in the USA. To catch up with the required annotation of medical images, training more qualified radiologists in a short period is not feasible. Only representative medical image datasets are annotated each year. The y axis is in log scale. Data are taken from refs. 40,108,110,113,116,119,120,121,122,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155. b, Time taken by healthcare domain experts in the diagnosis and annotation of medical images under different modalities. The histograms show the time (in minutes) spent by the domain expert for CT, MRI, X-ray (XR), ultrasound (US) and pathology (PA) images, respectively. Most diagnostic and annotation tasks require more than 10 min for each image. Data are taken from refs. 36,37,38,39,156,157,158,159.

As AI/ML models with an increasing number of parameters are adopted in healthcare applications, more images are required for model training to achieve better accuracy. As shown in Fig. 4a, the number of X-ray, CT and magnetic resonance imaging (MRI) images utilized in AI/ML model training is exponentially increasing. Alongside the explosively increasing number and size of medical images utilized in AI/ML model training, the time spent by domain experts on data preparation of a medical image has remained constant for years as a result of mature diagnostic procedures36,37. For a sophisticated case, a domain expert will take a long time to complete a specific diagnostic image analysis for interpretation and annotation verification. As shown in Fig. 4b, detailed manual contouring of COVID-19 infection regions on one chest CT scan can take 187 ± 38.5 min (ref. 38), and labelling neuroblastic tumours manually requires a mean time of 56 min per case in MRI images39. Indeed, current qualified domain experts cannot keep up with the data preparation demand. The orange dashed line with diamond-like points in Fig. 4a demonstrates a fairly small increase in the number of registered radiologists in the USA, from 36,000 to 41,000, in the past ten years40. This slow growth is due to the long period of training time required to become a qualified domain expert.

The total load of all domain experts devoted to labelling and verification cannot keep up with the explosively increasing number of medical images needed for training. This could lead to the waste of the abundant medical data made available by advances in healthcare technologies, which could have been effectively used to boost very large-scale deep learning models. Limited domain expertise can also potentially lead to biased models if only a few people contribute to training dataset preparation.

Resource sustainability issues considered from lifecycle perspective

It is worth noting that the costs and benefits associated with developing and deploying AI systems are not restricted to a single stage such as the training or testing phase. Rather, they span from the design and development stages all the way through to adaptation and implementation, and may continue to evolve with each subsequent use or contribution. To assess the resource impact of an AI system, it is important to consider its entire lifecycle, taking into account the different stages and their associated costs. For example, during the early stages of model development, resources such as energy, computing power, domain expertise and time may be heavily invested in failed experiments and testing with various libraries. As the model evolves into a prototype, testing, software, hardware and computational resources become increasingly focused on reliability, stability and generalizability. Once the system is deployed to users, additional resource costs are incurred to enable efficient human–machine collaboration, potential domain adaptation and/or model upgrade. These may further exaggerate the sustainability issues and require a holistic approach to consider the resource sustainability of an AI system over its entire lifecycle.

Resource sustainability issues considered in edge computing

The application of edge computing is gaining attention in AI for healthcare as it allows for data processing and analysis to be done closer to the source of data generation (for example, in implantable5 or wearable devices41) or at the point of care, rather than sending it to a centralized cloud server. This approach offers several benefits, including reduced latency, improved data privacy and security, and more efficient use of network bandwidth. However, resource sustainability issues would become more acute due to the stringent resource capability of these edge platforms. For example, the existing storage infrastructure of edge computing systems generally cannot accommodate the enormous volume of healthcare data for on-device learning. On the other hand, frequent communication to acquire data would increase the burden on network bandwidth and edge device power consumption, and pose potential security risks. The computing power provided by these edge computing devices is also inherently limited due to physical size constraints and limited energy budgets. Therefore, as the prevalent AI models in healthcare sharply increase in size and require more data to feed, training the model or even performing inference locally will become prohibitively expensive in the near future.

In summary, sustainability issues are critical for AI in healthcare in relation to various types of resource. AI, on the other hand, has the potential to have a substantial positive impact on carbon emission reduction by enabling the electronic delivery of targeted clinical expertise needed in remote areas or underdeveloped regions so that healthcare providers can reduce travel. For example, AI-CHD42 has recently demonstrated its power in providing automatic diagnoses of congenital heart disease, which previously would have required radiologists with very specialized training.

Approaches to solving resource scarcity

The current solutions for addressing resource constraints in AI/ML for healthcare span the perspectives of algorithms and system optimization.

Algorithm perspective

Domain adaptation to reduce resource consumption

In recent years, domain adaptation techniques have been applied to pretrained models based on abundant health data to adapt the networks to downstream tasks. In this way, a pretrained model trained for one domain can be reused in another domain, with minimal energy, computing power and domain expertise costs for the model development process. As a result, an extensively pretrained network will require less training than if trained from scratch for a new healthcare application. Hence, the amount of training and the number of labelled training samples will be reduced, and will cost less in terms of different resources. In the medical image domain, the pretrain-finetune paradigm has been applied to electron microscopy data4 and mass spectroscopy43 by fine-tuning the pretrained model with a limited amount of task-specific data to adapt to the new domain. A novel method that learns to ignore the scanner-related features present in MRI images when performing domain adaptation for a multi-site MRI dataset has been presented44. In the physiological signal domain, the authors of ref. 45 proposed a model personalization method based on meta-learning and fine-tuning for personalized arrhythmia detection and human activities recognition. An on-device model personalization based on the generative adversarial network for ventricular arrhythmia detection is proposed in ref. 46.

Model compression and architecture search to save resources

There are algorithms designed specifically to find tiny networks, thus using the least amount of parameters so as to reduce energy consumption and storage in healthcare applications. For example, for medical image segmentation tasks, a compressed CeNN framework47 has been devised to perform incremental quantization and early exit, which substantially reduces computational demands while maintaining acceptable performance with a field-programmable gate array. A fairness-aware pruning strategy, FairPrune48, has been proposed to prune the network weights while maintaining the fairness and accuracy of medical image classification, and a network quantization method was proposed to reduce the model size to fit the constrained resources of edge devices for medical image segmentation49. There are also works in resource-aware neural architecture search to find the best-fit tiny network architecture for real-time electrocardiogram reconstruction50.

Self-supervised learning paradigm to reduce expertise involved

There are algorithms designed for self- and semi-supervised learning that reduce the dependency on expert annotation for model training. For example, a self-supervised learning strategy based on context restoration has been proposed to better exploit unlabelled images of CT scans and 2D ultrasound results51. A self-supervised learning method for MRI images was devised in which the network is trained using a contrastive loss on whether the scan is from the same person (that is, longitudinal) or not, together with a classification loss on predicting the level of vertebral bodies52. A novel multi-instance contrastive learning53 method that used multiple images of the underlying pathology was proposed to improve classification accuracy on dermatology and chest X-ray images. ConVIRT was proposed in ref. 54 to learn medical visual representations through naturally occurring paired descriptive text in an unsupervised way, which required only 10% of the labelled training data compared with the supervised learning approach, while achieving better performance. As for the physiological signal domain, the authors in ref. 55 proposed a contrastive learning approach, CLOCS, that enabled the model to learn representations across space, time and patients and achieved comparable atrial fibrillation detection accuracy but with 25% the labelling required by supervised training approaches. An intra–inter subject self-supervised learning model was presented in ref. 56 to effectively learn from intra–inter subject differences and achieved 10% improvement over supervised training, but required only 1% of the labelled data.

System perspective

Federated learning to balance workload and resource consumption

From the system perspective, federated learning has become prevalent in balancing the resources required in training by offloading the computational workload from a single central server to multiple servers or other edge devices with computing system development. Federated learning is a paradigm that addresses the problem of data governance by training a deep neural network collaboratively without uploading data to a centralized server34. The training process of federated learning can occur locally at each participating site’s end, and only model characteristics (such as parameters and gradients) are transferred. In this way, the energy consumption and computational workload of deep learning-based analysis on overwhelming healthcare data could be properly managed by distributing the tasks among all participants, which alleviates the sustainability issues on the central server. Cross-institute federated learning is an emerging technique in the healthcare industry that enables multiple hospitals or organizations to collaboratively train a global model without aggregating the raw data into a central server. This technique could preserve data privacy and further balance the workload of data processing by avoiding the conventional centralized training paradigm. This learning paradigm in computing system development has been adopted in the medical domain on multi-site institutions’ servers34, and the global model being collaboratively trained by ten institutions reached 99% of the model quality achieved with centralized data. The work in ref. 57 shows that model training tasks can be offloaded across multiple institutions with real-world private clinical data while generating the model, demonstrating improved generalization. Clustered federated learning58, which offloads training tasks from servers to local edge devices, was proposed for an automatic COVID-19 diagnosis and resulted in performances comparable to those of the central baseline on X-ray and ultrasound datasets. The federated learning framework FedHealth41 was introduced to include wearable devices in collaborative model training to build personalized models for Parkinson’s disease auxiliary diagnosis.

Decentralized storage system to mitigate the sustainability issue while preserving data privacy

To tackle the storage constraint, decentralized storage systems for medical data have been developed for healthcare applications. With advances in medical image acquisition techniques, image sizes have dramatically increased. Meanwhile, the amount of medical image data is growing quickly as the utilization of massive medical images for clinical decision support has grown. As a result, there is a big demand to provide highly scalable data management and sharing in AI/ML for healthcare. Decentralized storage systems have been applied in the field to substantially reduce the storage overhead in the centralized cloud system. For example, a scalable and flexible decentralized medical image management system based on DCMRL/XMLStore and DCMDocStore has been devised59. A healthcare data-sharing scheme has also been proposed to enable efficient and secure medical data sharing via blockchain for a decentralized storage system60.

Outlook

Current attempts to tackle resource constraints in AI/ML for healthcare propose algorithm- and system-perspective optimizations. However, there is still a lack of sustainability awareness when developing systems and algorithms in AI/ML for healthcare. In this section we present an outlook for potential solutions for resource sustainability issues.

Cost-aware cross-layer co-design for AI/ML healthcare systems

It is well known that hardware and the performance of AI models, including their accuracy61, confidence62 and even security63, are entangled. To enable efficient design space exploration, it is critical to develop a cross-layer co-exploration framework that spans hardware, algorithms and models to identify the best configurations of resource-sustainable solutions for healthcare applications64,65,66,67. In addition, a resource cost model is needed that accurately estimates or predicts resource consumption in terms of different combinations of hardware/software resources, neural network components and algorithm design options for each specific AI/ML-based healthcare application68. Moreover, the prediction model should also be able to predict the cost to maintain sufficient resource availability in terms of energy, computing power, storage, networks, algorithms, domain expertise and data volumes for model upgrades or possible domain adaptation from a lifecycle perspective. AI/ML models can potentially help in making accurate predictions to reduce waste and carbon emissions. With the cost prediction model and co-design framework, designers can customize the optimization goals in terms of a newly updated infrastructure, specific resource budget and other optimization criteria for the entire lifecycle.

Consensus-based distributed learning

A novel consensus-based distributed learning framework should be developed to fully utilize the storage resources of existing and future computing infrastructures. Existing distributed learning paradigms, such as cross-institute FL in the healthcare field, still heavily rely on tedious verification and authorization before the actual learning process. They also largely rely on the data centre infrastructure for data storage and sharing. Consensus-based distributed learning can be a future direction in the field by incorporating a large number (for example, millions) of Internet-of-Things devices, edge servers and cloud centres altogether into the learning infrastructure, which fully utilizes the storage capacity and computational power of these devices based on a smart consensus strategy to protect data privacy while enabling fast data sharing and learning. Effective consensus mechanisms must be developed to be scalable, flexible and lightweight so as to fit heterogeneous system structures and requirements into distributed learning.

Stable infrastructures with AI-enhanced resource allocation

Out of regulatory considerations on health data security and privacy, instead of pushing towards better use of general-purpose commercial cloud storage, an alternative, perhaps more viable approach, is to establish dedicated healthcare AI infrastructures that maintain full compliance with government regulations, which may evolve over time7. These infrastructures can be fully funded by governments or private sectors, and will secure the preservation of data and support the further development of algorithms. Additionally, AI-based techniques can be applied to optimize resource allocation in terms of storage, computing power and energy efficiency in these infrastructures.

Interpretable self-supervised learning

The conflict between the increasing demand for domain expertise and the increasing volume of healthcare data will become more severe. Self-supervised learning can be a key approach to addressing the sustainability issue in domain expertise. Currently, there is still a big barrier to obtaining the trust of providers, doctors and patients due to the blackbox nature of deep learning-based diagnosis. A future direction could be self-supervised learning with interpretability/explainability. Interpretable self-supervised learning algorithms can substantially alleviate the expertise sustainability issue by extracting clinically useful features, explaining analysis/diagnosis results with human-interpretable evidence, and training models without massive expert-labelled data, while further gaining the trust of domain experts, providers and patients. Future steps of interpretable self-supervised learning in healthcare applications could include (1) full exploration and usage of core medical features (for example, human-understandable features) by self-supervised learning algorithms to improve performance and explainability; (2) human-interpretable inference and reasoning; (3) evolution of deep models with closed interaction with and input from human experts (for example, human-in-the-loop for error correction and function enhancement).

Few-shot learning on large language model for automatic labelling and annotation

Advancements in AI/ML algorithms have made it possible to automate the process of data labelling. Using few-shot learning techniques, large language models can be fine-tuned to automatically label meta-features in medical images or biomedical information text. This is achieved by providing prompts and a series of labelled examples, which allow the AI system to learn to recognize patterns and features in the images or text and label new instances accordingly. Automated labelling can substantially speed up the process of annotating large datasets, while also increasing the accuracy and consistency of the labels. Although automation can greatly reduce the time and effort required for annotation, it is important to validate and cross-check automated methods with manual annotations to ensure the highest level of accuracy.

The past few years have witnessed great success in large model design, exemplified by models such as ChatGPT, with its 175 billion parameters69, PaLM with its 540 billion parameters70 and Visual ChatGPT71. It is anticipated that, in the very near future, large AI models will also revolutionize healthcare, enabling improved performance in a broader spectrum of tasks beyond what we have focused on in this Perspective, such as evidence-based medicine and personal health advisors. The explosive increase in model complexity will, however, further stress the already existing resource sustainability issues. With the deep learning stack developed for predictable scalability in GPT-472, the performance of a small model with a limited number of parameters on a healthcare-specific task can be utilized to precisely predict the performance of a standard-scale model, which will substantially alleviate the resource sustainability issue. It is also important to investigate methods that can better support not only model generalization but also specialization, through weakly supervised or unsupervised on-device learning and model personalization.