Introduction

Modern healthcare makes use of cutting-edge biomedical and nanomedicine technologies to deliver early prevention, accurate diagnoses and precise treatments. Examples include early warning for neurodegenerative disease1,2, imaging-based diagnosis of cardio-cerebrovascular diseases3,4 and tumour-targeting therapies5,6. Further progress is hindered by the uncertainty of the human body, originating from the complex relationships among organs, the unclear effects of the everyday environment on human bodies, and the heterogeneity of different individuals. For example, the mechanisms behind the development of diseases such as amyotrophic lateral sclerosis7 and infant asthma8 remain unclear because it is difficult to determine the precise effects of various external environmental factors and intrinsic factors (such as genes and microbiome) on human physiological conditions. In these case studies, as in many others, new investigational approaches are needed to decode the specific mechanisms of action leading to a disease. To tackle the uncertainty and complexity associated with the human body, a useful approach could be to establish and analyse virtual representations of human organs and functions, inspired by the concept of digital twin (DT) technology.

DTs are virtual replicas or representations of physical objects. The technology has the potential to decode uncertainties within the target system by using sensors to collect information over an extended period and making use of advanced artificial intelligence (AI) algorithms. In this context, DT technology offers an opportunity to determine and predict the status of complex dynamical systems, even in the presence of changing conditions. Favoured by the exponential growth of computational capacity, DTs have been successfully applied in diverse complex industrial scenarios. For example, in the manufacturing sector, DTs are used to enhance production lines, equipment performance, bottleneck identification and prediction of equipment failure, and to optimize various processes through real-time data monitoring and integration with relevant data sources9. Similarly, in transportation, DTs optimize delivery schedules, route planning and fuel efficiency by integrating vehicle data, traffic sensors and weather forecasts10. These advances have sparked interest in applying DTs to the human body, using smart sensors in ‘Internet of Things’ environments for systematic monitoring of human health combined with clinical analysis for disease diagnosis and treatment planning11,12,13.

By making use of accurate and individualized models of human physiology, human body DTs hold immense potential for accurately and swiftly predicting the outcomes of treatments before their real-world implementation. This reduces the risk of adverse reactions and streamlines the regulatory pathway in clinical trials. Nevertheless, the lack of a unifying approach, especially a consensus on the taxonomy and future roadmap of human body DTs, limits their development and deployment for use in healthcare.

Based on analysis of state-of-the-art and technological developments in the field, we anticipate that future human body DTs will be defined into five levels for different healthcare applications. To aid scientific and technological advancement in this promising field of engineering, we present here a five-level blueprint for modelling the human body DT (Fig. 1). The five-level roadmap will serve as a convenient and unifying framework to establish a common language and aid collaboration among researchers in different fields.

Fig. 1: Five-level roadmap for human body digital twins (DTs).
figure 1

Level 1 (cross-sectional model) aims to determine various human health indicators by using artificial intelligence (AI) classification methods on real-time data. Level 2 (deductive model) builds on the first level by incorporating not only real-time data but also past data to train predictive models for forecasting future health conditions. Level 3 (editable model) extends the previous levels by considering human interventions, such as drug treatments, organ transplants and gene editing, to analyse and predict the health outcomes post-intervention. Level 4 (evolutionary model) adds another layer by also taking into account the interactions between the human body and the external environment, such as solar exposure, diet and interpersonal connections, for more accurate health projections. Level 5 (explainable model) is based on the previous four levels and uses explainable AI methods to delve into the underlying logic of health analysis and prediction. It explores not just the output results from data input to health state but also the biological principles involved.

In this Perspective, we provide a detailed introduction to our roadmap for building the human body DT, review the prerequisite data-capturing technologies associated with it and discuss prospective applications. Furthermore, we discuss the necessary support from stakeholders (including data-sharing and test initiatives, research funding and so forth) and the open issues that need to be addressed for the deployment of human body DTs, including security, cost and ethical considerations.

Five-level roadmap and modelling methods

The scarcity of human data, especially data annotated by clinicians, poses a considerable challenge to the advancement of the digital health field12. The development of a human body DT model is no exception. However, recent advances in related technologies offer promising solutions. Specifically, innovations in nanotechnology have led to the design and fabrication of new sensors that are more sensitive, adaptable and comfortable, thereby enabling large-scale collection of human data over extended periods14,15. Moreover, the advent of advanced self-supervised learning (SSL) algorithms allows for the use of copious amounts of unlabelled human data, a previously inconceivable accomplishment (see Box 1). SSL algorithms, combined with large-scale pretraining methods, have proven to be effective in fields such as computer vision and natural language processing. AI models such as DALL·E 2 (which can complete intricate drawings based solely on descriptive sentences)16 and ChatGPT (which can answer various complex questions, including coding and finance strategies)17 pretrain on immense amounts of unlabelled data and fine-tune on small, labelled datasets for diverse tasks. The capacity of AI models to use SSL algorithms capable of leveraging unlabelled data aligns well with the demands of human body healthcare and holds immense potential for addressing the issue of insufficient human data for the development of a robust human body DT model. Under the impetus of emerging on-body sensors and multimodal AI technologies (Fig. 2), human body DT technology is expected to progressively unveil the mysteries of the human body along the five-level modelling roadmap outlined below.

Fig. 2: Must-have technologies to build human body digital twins.
figure 2

The sensing devices that can be used to capture human body data are represented on the left side of the figure; the algorithms to model the human body on the right. The development stages of the human digital twin (DT) pose different requirements for wearable devices. Multimodality is crucial to measure signals of different nature and combine their information to extract complex patterns. Top left, types of wearable sensor that can be used as foundations of the human DT, including traditional electrical, optical, mechanical and chemical sensors, as well as emerging sensors such as medical imaging devices and gas sensors. Middle left, actuators that can provide quantifiable outputs, such as targeted drug delivery or physical interventions like heat, light or electrical stimulation. Bottom left, devices that monitor long-term exposure effects, such as smart masks that monitor air pollution, textile-based ultraviolet (UV) sensors, or ingestible sensors that monitor digestive tract exposures. On the right side are the algorithms corresponding to each stage. Top right, deep learning methods for classification and prediction to build level 1 and level 2 tasks. Middle right, transfer learning methods to adopt pretrained models for the development of level 3 models. Bottom right, incremental learning methods to help the model learn from ever-changing external environments, and explainable artificial intelligence (AI) to inform the underlying rationale and explanations for diagnoses and treatments when building level 5 models. ECG, electrocardiogram.

Cross-sectional model

In level 1, cross-sectional models are created for depicting a digital portrait of the human body, by collecting data from the body in a temporal cross-section to determine the real-time physical and biochemical states. The term ‘cross-sectional’ in this context draws inspiration from medical imaging, where it refers to examining a body at a single point in time with a static but detailed snapshot.

Pioneering work at level 1 has focused on constructing diverse cross-sectional models that capture different aspects of the human body. In 2023, for example, a graphic processing unit (GPU)-accelerated computer model was developed to simulate the entire multiphysics dynamics of the human heart, opening new avenues for cardiovascular research18; and a workflow for 3D reconstruction and spatial analysis of cells in tissue (MATRICS-A) was introduced to provide deeper insights into cellular relationships in healthy and ageing organs19. Although these contributions are undoubtedly groundbreaking, current level 1 research focuses on modelling individual organs or tissues. Recognizing that the human body functions as an interconnected system, we expect that future work will offer more comprehensive cross-sectional models that integrate multiple bodily components.

Multimodal contrastive learning, as an emerging AI method, has the potential to assist in building a more comprehensive model by effectively integrating and comparing data from various sources, enhancing the accuracy and depth of the understanding and predictions of the model. Contrastive learning, as one of the most efficient SSL algorithms in recent years, uses intrinsic relationships between data modalities as pseudo-labels to train models (for example, DALL·E 2 uses pair relationships between images and their captions)20. This idea can be applied to the human body, where various multimodal sensor data also have diverse intrinsic relationships. For instance, when monitoring human motion, sensors deployed at different locations produce various datasets all corresponding to the same action or posture21. When detecting neurodegenerative diseases, sensors such as inertial motion units, electroencephalography (EEG) and electromyography (EMG) electrodes, and biochemical sensors for detecting disease-related biomarkers in biological fluids produce patient-specific data. Although these outputs have distinct patterns, they often contribute to map the same disease. These intrinsic relationships derived from the human body can be used as pseudo-labels to perform large-scale pretraining and aid the training of the foundation model, a versatile AI model that is trained on vast amounts of data and can adapt to different applications with further fine-tuning (see Box 1 for details). Foundation models can extract cross-sectional conditions of the human body, primarily through encoders that extract human information from different data modalities and can decipher the cross-sectional status of the human body without further supervised training (‘zero-shot’). More importantly, they will serve as the cornerstone of human DT models and continue to have a role in subsequent higher-level tasks.

Deductive model

Level 2 models perform deductive reasoning on the future development of the human body based on the time-continuous cross-sectional model information. Whereas cross-sectional models (level 1) are all about the present state, deductive models use the current and past snapshots to predict the future. By integrating the data from past cross-sections and the current cross-section, deductive models can predict evolutionary trends in human health status and potential disease risks. Several level 2 models have been reported22,23. For instance, DTs can be used to predict the onset age of brain atrophy in patients with multiple sclerosis22 or to quantify the extent of overdiagnosis in colorectal cancer screenings23. Although these level 2 models have been pivotal in medical advancements, they also have their intrinsic limitation: the deductions are based only on existing cross-sectional data, and the uncertain interventions on the human body and changes in the external world make it hard to yield an accurate long-term prediction. Hence, we need to develop higher-level models.

In establishing the level 2 model, numerous past cross-sectional data can be encoded by pretrained foundation models through zero-shot or few-shot fine-tuning and fed into models for analysing temporal states, including models based on network backbones such as recurrent neural networks, long short-term memory, and transformer networks24. This enables the development of models that can predict future states through inference.

Editable model

In level 3, editable models, which can predict the impacts of ‘edits’ (or interventions, such as drug administration, organ transplantation or gene editing) on human bodies, are created and used. The term ‘editable’ refers to the possibility of simulating interventions on the human body by directly editing the digital model and testing the outcomes, hence emphasizing the capacity of the model to anticipate and understand the effect of direct interventions on the human body before the actual intervention takes place. Level 3 models seamlessly combine the essence of editing with inherent adaptability. As they are exposed to various interventions and modifications, they evolve and adapt, refining their predictions based on new data. This synthesis of editing and adaptability provides a dynamic representation, allowing the models to respond to a vast array of potential edits and continually enhance the accuracy and relevance of their predictions.

For the level 3 model, the fundamental aspect is the integration of actual human edit data as new channels with the prior levels of the model. In comparison to cross-sectional data of the human body collected through routine health monitoring, the availability of body data after edits (such as drug intake, surgeries or gene therapies) is limited. Using pretrained encoders from the previous levels and incorporating them with a small number of edit inputs is a promising approach to creating an editable model. To incorporate the edit data into the model, the parameters of the cross-sectional data encoders can be fixed, leaving only the parameters of the encoder for the edits and the decoder to be trained. This approach reduces the number of parameters to be trained, effectively addressing the challenge posed by the limited quantity of edit data.

Building on the previous discussion, level 2 deductive models often have limitations in long-term predictive accuracy. To address these challenges, some research efforts have advanced to level 3, focusing on personalized modifications to the condition of the human body. For example, a multitask deep learning framework called GCAP was introduced to predict the severity of adverse reactions to drugs25. Unlike existing computational methods, GCAP not only identifies whether a drug will produce an adverse reaction but also assesses the severity of clinical outcomes, providing a more nuanced understanding of drug safety. These level 3 models represent a leap forward in the application of DTs to medical research by addressing the deficiencies found in earlier levels and focusing on personalized, long-term outcomes.

Evolutionary model

Models in previous levels focus on interpretations and predictions of human bodies with few considerations of the influences that external factors play in most if not all cases. The human body is by no means an isolated system. Interactions with the outside world, including solar exposure, diet intake and interpersonal connections, can have subtle but determinant impacts on the human body (some have minimal impact in the short term but can lead to adverse effects over a longer exposure period)26,27,28. In level 4, models can merge external factors into previous tasks to evolve and enhance the prediction accuracy, so this level is therefore named ‘evolutionary’. Quantifying the external factors and incrementally feeding them to the learning machine to update the DT model is the focus of level 4.

Interactions between the human and the external environment are incorporated into the level 4 model. Unlike previous cross-sectional sensor data and edits, these interactions are no longer limited to a one-time input to the model. A single moment of interaction may have a minimal impact on human health, but when these interactions are accumulated over time, they can have a considerable influence on the individual’s overall health. Therefore, the core task of this level is to quantify this accumulation and feed it into the model as additional parameters. Some factors, such as respiration and light exposure, can be quantified well by ambient sensors; more complex interactions, such as human social interactions, require the assistance of embedded submodels to convert them into digitized inputs for the model.

Although systematic research on evolutionary models is not yet fully established, recent studies have confirmed the feasibility of real-time monitoring of important channels of interaction between the human body and external environment, such as solar exposure and breath29,30. These works have laid a solid foundation for the establishment of level 4 DTs of the human body.

Explainable model

In the first four levels, models may offer accurate predictions and estimations, but they operate in a black-box or grey-box manner, meaning that their internal working mechanisms are either entirely opaque or only partially understood. This lack of transparency makes it challenging to discern the relationships between inputs and outputs, leaving researchers ill-equipped to navigate the inherent uncertainties associated with human physiology. Moreover, in clinical settings, the use of AI models as black-box tools for diagnosing and treating patients often leads to trust issues, as patients and doctors might find it difficult to rely on the model, even if it has demonstrated nearly 100% accuracy in trials. Therefore, in real-world clinical applications, the interpretability of the model — its ability to provide underlying logic and explanations for diagnoses and treatments, akin to a doctor’s reasoning — is crucial to achieve trust and acceptance. In level 5, models will inform the researchers of the logical connections between observed phenomena and their outcomes. Research projects have just started to deploy explainable models relevant to this task. For example, researchers developed translatable systems based on medical imaging to explain the information contained in computer tomography scans or magnetic resonance images31. Current works in level 5 are still in their infancy and unlikely to provide real guidance to clinicians32. However, with the continuous development of human body DTs, models in level 5 may be integrated with suitable datasets to mine deep into actual features of the human body, as witnessed in the AI domain33,34, pushing the boundaries of future healthcare interventions.

The development of a level 5 model requires a deep understanding and explanation of the underlying mechanisms behind previous levels. To achieve this, the use of advanced model interpretability techniques, such as saliency maps, activation maps and model distillation, can provide insights into the decision-making process of the model35. The application of causal inference algorithms can further illuminate the relationships between inputs and outputs, allowing a better understanding of the internal workings of the model36. Additionally, incorporating model-agnostic interpretability methods, such as local interpretable model-agnostic explanations (LIME), Grad-CAM and SHAP, can offer a comprehensive view of the behaviour of the model, enabling a thorough understanding of not only how but also why the model arrived at its predictions35. Through these cutting-edge techniques, the level 5 model serves as a tool for advancing our knowledge of the human body and enhancing the reliability and trustworthiness of human body DT models.

Must-have technologies to collect physiological data

Human body DT models rely on input data from wearable sensors in various body locations (Fig. 2). For cross-sectional and deductive models in levels 1 and 2, we have identified four main categories of sensors that enable direct monitoring of physiological states to infer health trends and disease risks: electrical, such as electrocardiography (ECG), EEG and EMG; optical, such as photoplethysmography (PPG); mechanical, such as micro-electromechanical microphones or accelerometers; and chemical, such as biosensors to detect specific target biomarkers from biological fluids (for example, sweat). Several companies have already incorporated one or more of these sensors into commercial wearable products such as smart watches37,38, but further effort is required to encourage widespread adoption and continuous health monitoring. The sensors should integrate seamlessly into our daily lives by being comfortable, miniaturized, discreet and equipped with long-lasting energy sources. Besides these, various on-body and in-body sensors are required to provide holistic information for a higher level of human DT models, including, for instance, continuous monitoring of hormone levels and long-term observation of specific internal organs through implantable devices. Level 3 editable models additionally require actual human edit data. Carefully quantified edits by wearable actuators generate the data needed to train models to avoid side effects during long-term therapies. Furthermore, sensors for capturing environmental stimuli are required to be incorporated in levels 4 and 5, allowing spontaneous monitoring of human and surrounding environment. These sensors need to be designed to support long-term imperceptible monitoring and user convenience, as in the case of textile-based sensors integrated into clothing.

Current on-body wearable sensors with form factors such as textile, patches and tattoos can access diverse analytes with no or minimal invasiveness. For large-area monitoring, fibre and textile-based sensors can be seamlessly integrated into clothing to allow spatially distributed analysis of signals including strain, UV exposure, pH, metabolites, environmental pollutants, or biomarkers in sweat across epithelial areas39. Skin-conformable patches and tattoo-like devices using biocompatible adhesives are ideal for monitoring of physical parameters such as temperature, pressure, strain, and biomarkers in sweat or interstitial fluids40,41. Emerging bioadhesive ultrasound devices can provide ultrasound imaging of organs and anomalies beneath the skin, enabling early diagnosis through viewing cardiac motility or vascular remodelling42. Additionally, implantable sensors can provide direct transduction of biological signals under the skin. For instance, continuous glucose monitors from Abbott, Dexcom and Medtronic are implanted subcutaneously to measure glucose levels in blood and interstitial fluid, providing large dynamic datasets inaccessible from conventional finger-prick tests43.

As people are largely used to wearing earbuds, and, unlike arms, ears maintain constant proximity to vital signal sources such as the brain, lungs and heart, ears are an encouraging location for 24/7 continuous monitoring. Ears also have high vascularity, thus enabling the measurement of cardiorespiratory information through, for example, optical PPG signals. PPG recordings from the ears exhibit a more pronounced amplitude modulation with breathing, crucial for precise estimation of breathing rate, and a faster response to SpO2 drops (as compared with the gold standard SpO2 monitoring PPG recorded from fingers)44, which are essential requirements for promptly identifying potential hypoxia. Recently developed multimodal in-ear sensors45 have demonstrated their capability to measure EEG46, ECG47, PPG43, microphone and accelerometer signals48, indicating their potential for various applications of human body DT models.

Data acquisition and processing challenges

One of the biggest challenges in exploiting the potential of wearables for continuous monitoring is the presence of artefacts. These devices are prone to motion artefacts, which frequently result in the discarding of entire epochs of recordings. Such artefacts should be identified, classified and removed in real time with low-latency hardware-integrated algorithms. To remove the artefacts, one potential approach is to use the signals from mechanical sensors to capture and model the artefact. These sensors will only record a signal that is correlated with the artefact but not with the physiological signal of interest. The signals from the mechanical sensors can then be used as reference signals in an adaptive filter with an adaptive noise-cancellation configuration. Preliminary results48 have shown that accelerometers can be used to remove low-frequency artefacts generated by full-body movements (such as walking), while microphones can be used to remove higher-frequency artefacts generated by the relative movement between the sensor and the skin.

The influx of multidimensional sensor data generated by human body DTs poses considerable computational challenges. Cloud-based centralized computing built on conventional von Neumann architectures is energy-intensive and constrained by limited data transmission bandwidth. Neuromorphic computing offers a promising alternative pathway to efficiently process the massive datasets involved in modelling complex physiological systems49. Neuromorphic devices emulate the signal processing and learning capabilities of biological nervous systems through networks of artificial neurons and synapses. By processing data locally where they are generated, and harnessing the massive parallelism and adaptability of brain-inspired architectures, neuromorphic systems can potentially analyse streaming multimodal sensor data in real time at far lower energy costs than for traditional computing50. Integrating memristive synapses into wearable systems creates an artificial nervous network directly in the wearable platform51,52,53. Although limitations remain in switching speeds and reliability, progress in memristive materials and fabrication approaches may soon overcome these hurdles54. On-node and edge computing configurations place neuromorphic processors directly adjacent to sensors, avoiding data transmission lags. This tight sensing–computing integration allows rapid reflexive actions and real-time adaptive decision-making on the body.

For a discussion on data security, cost and ethics, refer to Box 2. As the realm of human body DTs continues to expand, recognizing and addressing these limitations collaboratively becomes paramount to ensuring an accurate and holistic representation of human biology.

Outlook

The evolution of human body DT technology has ushered in a new era in medicine. By generating virtual counterparts that replicate human anatomy and physiology, it offers a remarkable opportunity to comprehend and anticipate physiological and pathological states in a highly individualized manner.

One of the most exciting applications is in the field of personalized medicine. Level 1 of the DT model presents the potential to achieve real-time assessments such as metabolism monitoring and disease diagnosis, offering immediate insights into the human physiological state. At level 2, the capacity expands to predicting future physiological indicators, such as disease progression, bringing about proactive healthcare interventions. Level 3 highlights the adaptability of the models, with applications such as drug intervention illustrating the ability of the model to adjust based on medical treatments. Level 4 opens the door to understanding external influences on health, where factors such as weather and climate have a role in overall wellbeing. Culminating at level 5, the model aims for comprehensive interpretation of physiological signals, such as internal control modelling, setting the stage for tailored, precise healthcare measures. By following this multitiered approach, the human body DT model stands poised to transform healthcare practices, offering interventions based on rigorous analyses within the DT before their real-world application.

As the technology advances to level 5, it will enable the creation of accurate and individualized models of human physiology and, consequently, the design of personalized therapies for various diseases, considering the unique characteristics and requirements of each patient. The proposed human body DT model uses wearable sensors, which are cost-effective and convenient compared with large medical instruments, as the hardware backbone, and large foundation models as the basis, which can greatly increase user acceptance and reduce the model development cost. Clinicians can predict outcomes based on models before they are implemented in the real world, minimizing the risk of adverse reactions and increasing the chance of success while shortening the regulatory pathway in clinical trials. The possibility of using human body DT technology to accurately simulate and help to predict the response to therapies for chronic diseases based on personalized user profiles will boost the upgrade of the technology from the laboratory scale to large-scale applications.

Apart from the obvious advantages, personalized medicine that uses human body DT technology could be cost-effective. By pinpointing the root cause of a disease, it can reduce dependence on trial-and-error approaches and the use of costly and potentially detrimental drugs. Furthermore, through real-time monitoring of patients, it can help to detect potential complications early, allowing for timely and efficacious intervention. Detailed examples can be seen in Fig. 3 where, from left to right, the five-level roadmap moves towards precise and personalized healthcare, based on predictive versus real-time assessments enabled by the translation of human body DT models in a clinical setting.

Fig. 3: Application trend of human body digital twins (DTs) across various model levels.
figure 3

DT models at level 1 provide real-time assessments of the human physical condition. As the model progresses to level 2, it gains the ability to predict future physiological indicators. At level 3, the model undergoes editing and modification based on medical treatments. Additionally, at level 4, it incorporates the influence of external environmental factors. This progressive advancement culminates in level 5, where the model achieves interpretation of physiological signals and internal control modelling, enabling precise and personalized healthcare interventions.

Human body DTs could transform biomedical research. However, several challenges lie ahead. One pressing concern is associated with rare diseases. Owing to their infrequency, collecting adequate data for these conditions is difficult, making it challenging to develop comprehensive machine learning models. Therefore, there is an imperative to develop advanced, user-friendly and sustainable devices for continuous collection of human body information to augment data acquisition for these rare conditions. Reinforcing data sharing among institutions and researchers could further alleviate this scarcity of data. Another layer of complexity arises when addressing data sharing between research institutions, with issues ranging from security to geopolitical considerations. Our current understanding of the underlying physics of various diseases is another hurdle. Historical attempts, like Schrödinger’s venture into quantum physics to shed light on life, emphasize the vast interdisciplinary domains we still need to explore.

This Perspective provides a timely look at the status and prospects of the rapidly evolving area of the human body DT, informing a master plan for its future development, and stimulating discussion and new experimental approaches in this promising interdisciplinary domain. We believe that the proposed framework and five-level structure captures the main aspects and elements required for the future roadmap of human body DT models. Further research around validation of both data-driven and explainable AI models is ongoing as an interdisciplinary field, where data scientists and clinical experts are expected to collaborate in decoding and assessing the validity of the proposed approach and application use-cases. With continued progress and refinement, human body DTs could become indispensable in diagnosing and managing a wide array of medical conditions. We also anticipate a broader impact in future healthcare and in other research areas. For example, the adoption of the human body DT in human assistive tasks for older people or people with physical disabilities may help to alleviate the productivity deficit caused by global population ageing and provide resilient solutions in the context of global economic slowdown cycles.