Introduction

Primary liver cancer is the third leading cause of cancer-related death worldwide and is a major public health issue1. It includes three entities with different clinical, imaging, histological and molecular features: hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (iCCA) and combined hepatocellular–cholangiocarcinoma (cHCC-CCA). HCC is the most frequent subtype, constituting 80% of all primary liver tumours2 (Fig. 1a). HCC occurs in the context of chronic liver disease associated with risk factors such as alcohol consumption, viral hepatitis, metabolic syndrome or rare genetic diseases such as hereditary haemochromatosis (Fig. 1b). Thus, HCC develops on the background of a failing vital organ, and management and ultimately prognosis are guided largely by the underlying liver function. Professional societies such as the European Association for the Study of the Liver, the International Liver Cancer Association, the American Association for the Study of Liver Diseases and the Asian Pacific Association for the Study of the Liver play a vital part in shaping the landscape of liver cancer research and clinical practice by developing clinical guidelines, supporting clinical research, and providing specialized training and educational resources. Unlike for other cancer types, the diagnosis of HCC is usually based on radiographic features in patients with certain clinical backgrounds, such as cirrhosis or chronic hepatitis B3 (Fig. 2). Radiological imaging procedures such as MRI, CT and contrast-enhanced ultrasonography are commonly performed. However, with the increasing use of immunotherapy and targeted therapy in HCC treatment, guidelines now recommend considering biopsy to analyse tumour immune profiles for guiding treatment decisions3,4. Histopathological slides can be scanned to provide digital whole-slide images (WSIs), enabling computational analysis of the resulting image data5. At the histopathological level, HCC is highly morphologically heterogeneous6 (Fig. 2). We and others have reported several morphological HCC subtypes defined by distinctive histological features and associated with clinical outcomes and molecular alteration profiles7,8,9. Molecular biomarkers for HCC are emerging10, but are yet to be used broadly in clinical routine.

Fig. 1: Overview of primary liver cancer, clinical challenges during disease progression and where AI can integrate into management.
figure 1

a, Overview of the diagnostic methods, main challenges and biology of different primary liver cancer types. b, Hepatocellular carcinoma (HCC) makes up the majority of liver cancer cases, and clinical challenges in liver cancer exist across the whole complex disease spectrum, starting when a healthy liver, exposed to risk factors such as hepatitis B and C viruses, metabolic syndrome and alcohol consumption, manifests as either alcohol-associated liver disease (ALD) or metabolic dysfunction-associated steatotic liver disease (MASLD) (formerly known as non-alchoholic fatty liver disease). These pathological states can induce liver inflammation (viral hepatitis, metabolic dysfunction-associated steatohepatitis (MASH) or alcohol-associated steatohepatitis (ASH)). These diseases can progress to hepatic fibrosis characterized by extracellular matrix formation, which can subsequently lead to cirrhosis, a condition marked by substantial hepatic scarring. Cirrhosis, in turn, poses a risk for HCC. At various stages of this complex process, artificial intelligence (AI) models can be utilized for diverse purposes, such as screening, differential diagnosis, assessing operability, subtyping, extracting biomarkers and follow-up.

Fig. 2: Overview of different imaging modalities and subsequent steps for diagnosing primary liver cancer.
figure 2

Radiological imaging, such as MRI, CT or contrast-enhanced ultrasonography (CEUS), is commonly performed, and in many cases when imaging is ambiguous, histopathological evaluation of tissue is performed. Typical hepatocellular carcinoma (HCC) shows hyperenhancement during the arterial phase (CT scan, left; ‘wash-in’, blue arrow) and hypointensity (as compared with the rest of the liver) in the portal venous phase (CT scan, right; ‘wash-out’, blue arrow). The intrahepatic cholangiocarcinoma (iCCA) shows peripheral enhancement in the arterial phase (MRI scan, white arrow). Histologically, HCC most often displays a microtrabecular architecture with pseudoglandular formations; however, the tumour is highly heterogeneous and other subtypes, such as the steatohepatitic one (HCC histology, left), are well recognized. iCCA shows neoplastic cells arranged in glands embedded in a fibrous stroma. As combined HCC–cholangiocarcinoma (cHCC-CCA) is the most challenging for radiologists and pathologists to diagnose, imaging and routine histopathology are frequently supported with immunohistochemistry. CEUS image courtesy of D. Truhn; other images courtesy of J. Calderaro.

By contrast, iCCA accounts for approximately 15% of cases of primary liver cancer worldwide (Fig. 1a), making it the second most common primary liver tumour2. It is usually not associated with conventional HCC risk factors, but can arise due to specific aetiologies such as liver fluke infection, primary sclerosing cholangitis, hepatolithiasis, toxins or bile duct cysts11. Histopathological work-up of iCCA12 typically shows an adenocarcinoma with a dense fibrous stroma (Fig. 2). Unlike in HCC, precision oncology with extensive genomic profiling is the standard of care in iCCA, which often harbours one of several potentially targetable alterations in the BRAF, IDH1, FGFR2 and HER2 (ref. 13) genes or hypermutation associated with microsatellite instability (MSI)14. Finally, cHCC-CCA is a rare subtype of primary liver tumour (5%) (Fig. 1a), and is characterized by a coexisting or mixed hepatocellular and biliary differentiation. The overall prognosis of patients with cHCC-CCA is very poor with a 5-year survival rate lower than 30%, because these tumours are largely resistant to conventional therapies and even the diagnosis and subtyping of these tumours are highly variable and difficult15. Taken together, there is a complex spectrum of primary liver tumours, with numerous diagnostic approaches that are aimed at triaging patients for a multitude of available therapeutic approaches. The incidence of primary liver cancer is nowadays driven largely by metabolic diseases. Metabolic dysfunction-associated steatotic liver disease (MASLD) (formerly known as non-alcoholic fatty liver disease16) has become the leading cause of liver disease afflicting about 38% of the global population17. Obesity and diabetes are major components and risk factors for MASLD, and these factors have been shown to be independently associated with an increased risk of cholangiocarcinoma18,19. Thus, with the rise in the obesity epidemic, and the rise in the number of patients with MASLD, we expect to see continued increases in not only iCCA but also HCC.

Since 2012, artificial intelligence (AI) has progressed rapidly. AI has been a theoretical construct for many decades, and the mathematical and statistical basis of AI has been developed by generations of scientists. The most suitable technical approaches in the numerous sub-fields of AI research have been intensely debated for decades20. Today, the controversy has been settled and deep learning (Box 1) has emerged as the clearly favoured AI technique. Deep learning means the use of deep artificial neural networks, which are nowadays the state of the art in image processing and language processing, and for automating procedures with reinforcement learning. In practice, deep learning-based workflows are embedded software pipelines that include data collection, preprocessing, model training and deployment, as well as validation (Fig. 3a). Historically, 2012 was a turning point because deep convolutional neural networks (CNN) (Box 1) emerged as the clear winner in image processing competitions, traditionally the relevant benchmark in computer vision. Since 2021, CNNs have been challenged, and partly replaced, by a newer and more versatile deep learning method called transformer neural networks, or in short, ‘transformers’21 (Box 1). Transformers have further improved the performance of various medical image processing tasks (for example, prediction of biomarkers from histopathology slides) and crucially, they facilitate the integration of multimodal data types22 (Fig. 3b), which was previously difficult to achieve only with CNNs (Fig. 3c). Health-care professionals should not overlook these advances in deep learning and the need to develop a basic understanding of the principles, promises and limitations of this new technology. Like any new devices or medicines, AI methods have to undergo rigorous testing in clinical trials with meaningful end points and quantitatively demonstrate a superiority to the state of the art.

Fig. 3: Components of AI pipeline in liver cancer.
figure 3

a, An artificial intelligence (AI) pipeline comprises the AI model at its core, but requires several other components for a successful embedding into clinical workflows. The pipeline starts with data collection, and there are a few primary data types that can be used for AI models: tabular data such as blood tests and other clinical variables, imaging data such as CT scans and MRI, unstructured medical reports, and genomic and histology data. Collected data might need to undergo preprocessing steps to make it suitable for AI models; for example, cleaning steps, pseudonymization or anonymization, segmentation to mark regions of interest such as tumour, standardization and colour normalization to account for batch effects. Subsequently, an AI model can be trained to predict either numerical discrete targets (for example, number of tumour lesions), categorical targets (for example, disease or response to treatment) or continuous targets (for example, survival time). The performance of the model is validated on new (‘unseen’ by the model during training) data. A model that generalizes well (that is, performs well on data coming from different centres) can assist clinicians in decision-making. Overviews of the most widely used deep learning architectures for biomedical image analysis are shown in parts b and c. b, Transformer architecture for multimodal data. To enable processing of big whole-slide images (WSIs) or multidimensional radiology data, the images are divided into patches that are subsequently embedded into feature vectors. Clinical data go through a simple fully connected layer (FCL) and are subsequently concatenated with encoded data from other data modalities. Such an architecture enables patient-level predictions to be obtained. c, Overview of convolutional neural network architecture. This approach cannot learn interactions of visual clues in different regions of the WSI as the predictions are obtained on a patch-by-patch basis due to kernel-based processing of each patch individually. HCC, hepatocellular carcinoma; iCCA, intrahepatic cholangiocarcinoma. Histology slides reprinted with permission from The Cancer Genome Atlas, National Cancer Institute. Radiology images provided by J. Calderaro.

Here, we review the state of AI in liver cancer care, primarily focusing on HCC, while also addressing other liver cancer types when evidence is available. Notably, academic novelty (that is, research publications) usually pre-dates clinical novelty (that is, approved devices and results of clinical trials) by many years. In this Perspective, we cover both aspects and provide an overview of all facets of liver cancer care for which AI is on track to provide a meaningful benefit. We aim to inform clinicians and researchers on what is currently possible with AI, and what is needed to implement AI more broadly in clinical routine.

Current state of AI

For differential diagnosis

The practice of histopathology is changing due to a slow but steady increase in digitalization. Although there are barriers to the real-world implementation of digital pathology, pathology laboratories increasingly are adopting digital workflows that will enable the ultimate implementation of AI-based tissue biomarkers23. Digital histological slides are ‘gigapixel’ images, meaning they have billions of pixels. A single such image has the same size as 3,000 chest X-ray images5. WSIs of tissue slides stained with haematoxylin and eosin (H&E) are relatively standardized. Deep learning is particularly suitable for analysing such large images24 and can be used for a wide range of diagnostic tasks. For example, a large study using tissue samples from 738 patients showed that deep learning can improve the diagnosis of hepatocellular lesions25. The authors included surgical and biopsy specimens of normal and non-tumoural diseased liver, along with a wide array of hepatocytic lesions: benign liver tumours, low-grade and high-grade cirrhotic nodules and well-differentiated HCC, reaching a high performance (area under the receiver operating characteristic curve of 0.935) in the independent external validation set. However, it is important to determine how this approach would behave when dealing with a non-hepatocellular lesion (such as iCCA, cHCC-CCA or liver metastasis of a distant primary). In another study, a deep learning model trained on WSIs from 70 patients, and tested on an independent dataset from 80 patients, was able to differentiate HCC and iCCA26. Given its complex and composite histological appearance, diagnosis of cHCC-CCA remains challenging. Its existence as a completely separate entity has been challenged, as the cancer type lacks any specific molecular alteration. A study published in 2023 showed that cHCC-CCA could be reclassified as HCC or iCCA using deep learning on WSIs, and, interestingly, this reclassification had clinical and molecular relevance27.

Finally, an important challenge in the histological diagnosis of iCCA is to exclude metastasis from a distant primary and in particular a colorectal carcinoma28. This challenge is reasonably frequent in routine pathology and has a critical influence on clinical decision making. In a 2023 study, a deep learning model was able to distinguish iCCA from colorectal cancer liver metastasis, the most frequent metastatic adenocarcinoma, with high performance (accuracy of 98% on the external validation cohort), surpassing the performance of some pathologists29. Another study has shown that deep learning can predict the primary tumour from metastasis tissue slides in patients with cancer of unknown primary, which often manifests in the liver30. The high performance (accuracy of 0.83% on the external validation cohort) seen in this study could help to narrow down the list of potential differential diagnoses in patients presenting with liver metastases.

Deep learning biomarkers

AI has also been used to extract prognostic and predictive information from H&E slides (Table 1). In one such study, deep learning reached a higher predictive value (concordance index of 0.70) than all common variables usually associated with HCC aggressiveness (concordance index of 0.63) on the external validation cohort31. The use of AI as an HCC prognostic predictor was further confirmed by other research groups, and outcome prediction based on deep learning seems feasible in cohorts of European, North American and Asian patients alike32. However, gaps in research exist for other ethnicities and also for nuanced control of the influence of sex and gender and other factors on the performance of deep learning systems. There are few prognostic AI-based studies on iCCA (mainly due to its rarity). Xie and collaborators developed a model that characterized, directly from iCCA WSIs, the lymphocyte density and distribution at the cell level and the different tumour components33. It was also able to predict patient survival from these characteristics extracted by the AI model33. In another study, AI-based assessment of cytokeratin 7 staining on primary sclerosing cholangitis biopsy slides from 295 patients was able to predict a compound end point that included development of iCCA34.

Table 1 Summary of selected key studies applying deep learning in liver cancer research

Finally, deep learning has been widely used to predict predictive molecular biomarkers in cancer directly from H&E pathology slides35. Prominent examples are EGFR mutations in lung cancer36, MSI in colorectal cancer37,38 or HER2 status in breast cancer39. These studies provide evidence that information gleaned from a H&E slide can reflect the underlying molecular landscape of a given tumour and that deep learning can reconstruct part of that information directly from the image. HCC is no exception: deep learning models have been shown to be able to predict the mutational status of several genes from H&E slides, including TP53 and the β-catenin gene CTNNB1 (refs. 40,41). In another early study published in 2019, the tumour mutational burden was predicted from pathology slides in HCC42. Also, deep learning can predict gene expression signatures from H&E slides, several of which have been positively linked to immunotherapy treatment response43. In iCCA, unlike in HCC, genomic profiling is a cornerstone of treatment decisions, and several alteration-directed therapies are commonly being used44. However, given that iCCA is a rare tumour, there are practical difficulties in assembling patient cohorts that are large enough to enable deep learning-based prediction of biomarkers. Also, all approaches that predict molecular alterations from H&E are essentially a ‘surrogate of a surrogate’ biomarker. A more direct way would be to train deep learning models directly on clinical response data. This approach has been applied in HCC: in a study of patients with non-curable HCC, the response to the standard first-line treatment (with the immunotherapy atezolizumab and the anti-angiogenic therapy bevacizumab) could be directly predicted from H&E histopathology with a deep learning model trained on slides from 336 patients45. It will be particularly important to determine if this could also be achieved in iCCA, as only a minority of patients show a response to the durvalumab–chemotherapy combination46. Together, these studies show that even routinely available H&E pathology slides harbour clinically relevant information that can be quantitatively extracted by deep learning.

For radiology image analysis

Radiology imaging underpins any diagnosis of liver cancer and many academic studies on AI in liver diseases analyse radiology images47. AI can be used for several tasks in liver imaging (Table 1): first, as a technical aid to reconstruct the images and ensure high visual and diagnostic quality48,49, and second, to automate human tasks in the interpretation of these images, including for prognostication and treatment response prediction50,51. Improvement of image reconstruction by deep learning could enable a lower dose of contrast agents, reduced dosage of ionizing radiation and reduced image acquisition times in radiological imaging52. Deep learning has also been successfully used to process radiological images and detect liver pathologies. For example, deep learning can be used to distinguish HCC from iCCA on CT images, as demonstrated by a model trained on a cohort comprising 257 patients53. Ultrasonography is often used in cancer screening, and much work has been done to develop the potential of AI to automate the process of malignancy detection in ultrasound images. For example, deep learning has been used to segment and classify liver lesions in ultrasound images, focusing on the differentiation of benign from malignant liver lesions54. The diagnosis of malignant liver lesions is facilitated by structured reporting systems such as the Liver Imaging Reporting and Data System (LI-RADS) (American College of Radiology), and, as has been reviewed in detail elsewhere55, deep learning can further enhance the diagnosis when combined with such structured reporting.

Clinical AI models are considered medical devices and therefore require regulation56. In the USA, medical devices are approved by the FDA, and in the EU, medical devices require a CE mark to indicate conformity with the medical device regulation (MDR) or the in vitro diagnostics regulation (IVDR). Unlike drugs (medicines), the approval of medical devices does not always involve clinical trials that demonstrate their benefit for patients. Many medical devices are never evaluated in large clinical trials. The FDA has issued regulatory approval for several deep learning methods for liver cancer detection on radiology images (for example, MICA (Arterys, currently owned by Tempus Radiology) and LiverMultiScan (Perspectum)) or for aiding therapeutic intervention (FlightPlan for Liver (GE Medical Systems SCS)), but none for histopathology image analysis. This situation is in contrast to other tumour types such as melanoma, colorectal cancer, breast and prostate cancer, for which multiple AI-based products are available on the market in the USA or the EU57,58,59. What is more, AI methods should be evaluated in clinical trials to provide evidence of non-inferiority or superiority to conventional methods. Such evidence is available for AI-based endoscopy assistance systems for polyp detection60, and for AI reading X-ray mammograms61. However, similar studies are still lacking for liver cancer for several possible reasons. First, primary liver cancer is overall a smaller market in the USA and EU than colorectal, breast or prostate cancer. Primary liver cancer accounts for a larger share of cancer-associated mortality in Asia and Africa than in the EU or USA2. Also, historically, there were few drugs available to treat primary liver cancer, making it a relatively low-priority target for biomarker development compared with other tumour types. This situation is changing now, with a 75% increase in global liver cancer incidence from 1990 to 2015 (ref. 62), alongside the approval of numerous new drugs63 that are being recommended by evolving guidelines64, and the advances in precision medicine65. This progress mirrors the increasing complexity of treatment decisions in primary liver cancer, highlighting the need for new biomarkers and clinical decision support systems, in which AI-based systems could have a beneficial role (Fig. 4).

Fig. 4: Potential health-care benefits from AI integration in liver cancer management.
figure 4

Artificial intelligence (AI) can analyse large amounts of data to predict and forecast patient survival and treatment results. This capability can lead to advantages both for individual patients, through tailored treatment plans that reduce the likelihood of unsuccessful therapies and enhance outcomes, and for the broader health-care system, as the acceleration brought by automation allows more efficient resource utilization.

Large language models

Besides image processing, deep learning can achieve human-level performance in natural language processing. Large language models (LLMs) have emerged in the past 2 years66. These are AI models that can understand and synthesize text at human-level performance, and therefore have potential to unlock text as a quantitative data resource. LLMs encode clinical knowledge67, can process diverse medical textual data68, and can be used for logical reasoning69 and beyond that for many tasks in clinical research, practice, and education70. LLMs are a new tool and will have substantial effects on almost any area of medicine, including liver oncology. Patient-facing LLM applications could help patients with lifestyle modifications. Anecdotally, LLMs are already now useful in creating personalized nutrition or exercise plans71, potentially helping individuals at risk to manage their lifestyle-related risk of liver disease. For health-care professionals in liver oncology, LLMs could help to preprocess medical guidelines, which is particularly relevant in liver oncology, as guidelines from multiple professional societies coexist and provide somewhat different views on the available clinical evidence. LLMs can access such guidelines, and other documents, through a technique called retrieval augmented generation, which allows a user to provide previous knowledge as context to the LLM, without any retraining of the internal model weights. In clinical routine, LLMs could unlock the vast amount of unstructured information and therefore make it accessible for further analysis68,72. Specifically, LLMs can extract quantifiable information from free text, which can be used for data exchange and for prognostic or predictive machine learning models. From a technical perspective, LLMs are based on transformer neural networks, the same architecture that is currently the state of the art for image processing. Although direct evidence is available for these beneficial applications of LLMs in other medical fields73, liver oncology offers a huge potential to apply LLMs for improved patient care and scientific insights, and will be one of the most exciting research fields in the next few years. Thus, LLMs are closely linked with multimodal AI models that can take multiple types of data as input simultaneously74.

Emergence of AI for genomics

Data generated by high-throughput sequencing technologies are massive and complex to process, and several studies published in the past few years have therefore leveraged AI to process them75. However, most studies have been performed on tumour types other than liver cancer. Chen et al. used multimodal AI models to integrate histology and molecular profile data from a wide array of cancer types, including HCC76. The combination of omics data and histology, however, did not yield substantial added value for prognostication. Chaudhary and collaborators developed a deep learning-based model that predicts survival in patients with HCC by integrating RNA sequencing, microRNA sequencing and methylation data from The Cancer Genome Atlas cohort77. The more aggressive subtype of HCC as determined by the AI model showed TP53 mutations and overexpression of several stemness markers such as KRT19 and EPCAM. The predictive value was further validated in five external datasets77.

Multimodal AI models

The recent advances in deep learning in computer vision and natural language processing are only eclipsed by advances in multimodal data analysis78. In particular, transformers provide flexibility for combining information from various input modalities and are more powerful than retrofitting CNNs for image processing with multimodal input capabilities, which was attempted previously79. In parallel to these technical developments, the increasing availability of data from electronic health records, medical imaging, biobanks and pathology, along with the lower cost of genome sequencing, have paved the way for the development of multimodal AI models. These approaches are indeed able to perform better than conventional models developed on one data type22, but given the difficulties of obtaining different types of datasets and the technical considerations involved, few multimodal AI studies have been published for primary liver cancer. Before the advent of transformers, some researchers combined WSIs and gene expression to predict survival in HCC, but many of these studies were without external validation and hence of uncertain robustness80. Other groups combined WSIs and genomics (14 cancer types including HCC)76, WSIs and radiology (high-grade serous ovarian cancer)81 or H&E images and immunohistochemistry stains (colorectal cancer)82 into one single model. All of these findings could be, but have not yet systematically been, translated to primary liver cancer, although a strong biological rationale exists to build such multimodal prediction models based on an assessment of the immune response to liver tumours43.

Multimodal models can even be extended to multipurpose tools with interactive abilities, which will allow users to interact with the models in natural language, culminating in true ‘generalist’ AI models78,83. Consequently, the next generation of medical AI models might be able to solve problems in several domains at once and allow users to query the models in natural language84. Even today, off-the-shelf multimodal AI models such as the Generative Pretrained Transfomer 4 with Vision capabilities (GPT-4V) can process and describe medical image data, although not without shortcomings85. However, multipurpose capabilities potentiate the hurdles for clinical evaluation and regulatory approval of these models56. One of the main obstacles in multimodal AI is integration of disparate data sources in a timely manner. Data privacy and security must be ensured during the whole process, which will mean profound changes in the way data are collected and stored. Hospitals use numerous software systems dedicated to drug prescriptions, clinical records, medical imaging, pathology and genomics, and the vast majority of them do not so far communicate with each other.

In summary, the technical advances in medical AI are fast, and are quickly moving from unimodal, single-purpose towards multimodal, multipurpose applications. This progress will require doctors and IT hospital specialists to radically change the way they work. Given the inherent multidisciplinarity of liver cancer care, incorporation of such tools should be a priority for those involved in the management of liver cancer.

Challenges in clinical translation

Diverse patient populations

The patient journey in liver cancer is complex and AI could be integrated into almost every step of it (Fig. 1b). Why are we not seeing a broad application of deep learning in liver cancer care yet? We suggest a range of reasons that need to be overcome. First, liver cancer is clinically challenging: it is a complex and extremely heterogeneous disease. The aetiology of liver cancer is heterogeneous and leads to different disease phenotypes86. Also, many patients have overlapping comorbidities including viral hepatitis, alcohol use and/or metabolic syndrome, and these additive risk factors have a different prognostic impact than in patients with only one disease aetiology87,88. Additionally, the majority of evidence-based guidance comes from studies performed in patients with chronic viral hepatitis B based in East Asia, in which patients often developed cancer in the absence of cirrhosis89. The generalizability to patients with metabolic disease, not only in the Western world but throughout the world, is still unknown. Nowadays, metabolic-associated liver disease has become the most common cause of liver disease worldwide90 and, interestingly, is the leading cause of non-cirrhotic HCC, with approximately 25–30% of patients developing HCC in this context91. HCC is diagnosed with radiology imaging, achieving sensitivity and specificity of 89% and 96%, respectively, in accordance with the LI-RADS92. However, the LI-RADS system has been validated only in patients who have cirrhosis, chronic viral hepatitis B or prior history of HCC. This aspect has created substantial diagnostic challenges given the changing landscape of HCC incidence and epidemiology. This change has not only challenged our diagnostic capabilities, as our prior validated radiographic systems are not applicable in this cohort, but has also challenged our screening and surveillance practices93. In addition, there remains an inadequate capability of early screening and diagnosis in primary liver cancer. Many liquid biopsy tests have been evaluated in liver cancer, including circulating tumour cells, circulating tumour DNA, circulating RNA (microRNA, circular RNA, long non-coding RNA) and extracellular vesicles, yet none has been validated to date94.

Changing clinical decision trees

From the perspectives of staging, prognostic and management, the most widely accepted system for HCC utilizes the Barcelona Clinic Liver Cancer Staging system (BCLC), which groups patients in prognostic subclasses to further guide treatment recommendations95. Changes in treatment paradigms have challenged our linear approach to managing patients and have further complicated the generation of a standardized algorithm. For example, it had previously been recommended that patients with a solitary liver tumour, sufficient liver remnant and no evidence of portal hypertension (BCLC 0 or A) were the main candidates suitable for liver resection96. However, advances in interventional techniques including the use of preoperative portal vein embolization to increase the future liver remnant size, or studies supporting expanded surgical criteria including those with portal hypertension, multifocal disease or even those with subsegmental tumour thrombus have all shown a survival benefit97,98,99. Additionally, there is a lot of controversy surrounding the appropriate treatment of patients in the intermediate and advanced stages (BCLC B and C) as these stages encompass an extremely heterogeneous group of patients due to wide variations in tumour burden, extrahepatic spread and liver function. For this reason, various subclassification systems have been proposed to better prognosticate and guide treatment, but none has been validated to date100,101. Controversy also remains regarding the best model to evaluate liver function, and the clinical judgement of hepatologists is paramount in assessing risk factors for which intervention might be possible102. The landscape of systemic palliative treatment in these patients with advanced disease has evolved considerably. Since 2017 there have been major breakthroughs in systemic therapeutic options, now with immunotherapy at the forefront of treatment95. These treatments are not only more tolerable in terms of their general adverse effect profile but also substantially superior in terms of overall and progression-free survival103,104. It is important to note, however, that with the use of immunotherapy, only about 30% of patients respond to this type of treatment105. Differences in tumour microenvironment and tumour immune profiles could harbour extractable information that could potentially serve as yet undetected biomarkers106, which deep learning could help uncover45. However, because the diagnostic and therapeutic decision tree in liver cancer is so highly convoluted and rapidly changing, it is difficult to evaluate deep learning systems for specific branches in the decision tree. To further challenge the ability to use deep learning for treatment guidance is the change in treatment paradigms such as the introduction of new systemic treatments for the adjuvant and neoadjuvant settings107.

Generating high-quality evidence

Encouragingly, a number of research studies in the past 5 years have attempted to develop AI-based decision support tools for liver cancer care, despite the challenges mentioned above. These studies have used a range of strategies for development (Fig. 5a) of AI methods. However, a major challenge for AI is now to move from academic prototypes to products ready for implementation in clinical practice and to ensure continuous evaluation (Fig. 5b). To achieve this aim, AI models need to be as rigorously assessed as any other medical device or biomarker. At the stage of an initial academic study, the development of AI methods needs to adhere to established guidelines, which are collected and endorsed by the Enhancing the Quality and Transparency of Health Research (EQUATOR) network108. Traditional reporting guidelines do not fully cover potential sources of bias specific to AI approaches. The emergence of clinical trials seeking to assess novel AI-based methods has been met with concerns about the quality of study design and reporting109. Hence, AI-specific extensions of established reporting guidelines have been proposed. CONSORT-AI110 is an international initiative that aims to provide guidance in the reporting of clinical trials for AI interventions. The CONSORT-AI checklist includes 14 new extension items related to human–AI interactions in the decision process, analyses of performance errors, and how missing data were handled. The safety of AI techniques is one of the major concerns of the authors of the CONSORT-AI statement. SPIRIT-AI111 is another initiative developed in parallel with CONSORT-AI, aiming to provide guidelines for clinical trial protocols evaluating AI-involving interventions, and introducing 15 new extension items to achieve this goal. Additional reporting guidelines include, for example, MINIMAR112 for medical AI research and DECIDE-AI113 for studies reporting early-stage live clinical evaluation of AI-based decision support systems. As opposed to other health or medical interventions, AI models can indeed unpredictably yield errors that human judgement cannot detect or understand. For example, changes in imaging techniques or slide staining reagents can be invisible to the human eye but substantially influence a model’s performance114. In addition, such guidelines also finally encourage investigators to report any potential bias related to different patient populations or subgroups. The publishing of a high-quality academic study of an AI model can open the door to commercial development of the model as a medical device and subsequent regulatory approval. Approval should be followed by clinical trials to evaluate the effectiveness of the model in the real world. Such real-world evidence is increasingly coming into the focus of clinical research, and structured guidelines in this field are being established115. In addition to transparent reporting, it is also critical to validate AI models prospectively, as has been done with AI tools in other diseases such as breast cancer116,117, even in large randomized trials61,118. In summary, a clear path exists from an initial research idea for a new AI method all the way to its implementation and prospective, randomized evaluation in routine clinical workflows.

Fig. 5: Strategies to improve AI utilization in research and clinical workflows.
figure 5

a, Overview of strategies to improve the development of artificial intelligence (AI) solutions, from basic research to clinical trials. Embedding deep learning in research workflows to solve relevant clinical problems is challenging because it requires close interdisciplinary collaboration between scientists with technical expertise and clinicians. The exchange of knowledge and the mutual training of these two distinct disciplines should be facilitated to identify a suitable medical problem that could be effectively addressed with the shared resources at hand. Furthermore, the process is enriched by an ongoing feedback loop, fostering a dialogue between collaborators, which has an essential role in refining algorithm development and optimizing model selection. Following the developmental phase, the next challenge is extensive validation of the model. It might require experimental validation within laboratory settings, and it could even lead to prospective preclinical and clinical trials. b, Overview of strategies to improve the translation of an AI-based solutions for liver cancer into clinical practice. Hurdles have to be overcome at different stages: problem identification, research collaboration, product development, regulatory approval and clinical integration. Depending on the problem, the AI model might need to be optimized for real-time inference to support clinical decision-making at the point of care. In any scenario, it is crucial for clinicians to undergo comprehensive training in the utilization of the AI-based medical device and in the interpretation of its outputs. The continuous insights provided by clinicians will serve as a valuable mechanism to ensure the ongoing improvement of the device. Patient engagement and involvement, when relevant, will further improve this collaborative process.

Strategies for acceleration

AI literacy and medical literacy

AI is causing a relatively swift societal transformation, which means that many people, including medical professionals and researchers, are not fully equipped to utilize new tools to their maximum potential. Although digitalization in health care is now being incorporated into medical school and postgraduate teaching curricula, specific education about AI, its concepts, limitations and potential uses, is still largely absent. Health-care professionals and researchers in the future need more than just basic digital literacy that enables them to efficiently use devices such as mobile phones and electronic health records. They require a specific ‘AI literacy’, which could enable effective use and interpretation of AI solutions in liver cancer care as soon as these solutions become available for lifestyle counselling, management of chronic liver diseases, precision oncology approaches, palliative care and other facets. Specific education on AI applications should be implemented now so that the health-care workforce is well prepared for the broad advent of AI in the coming years. Furthermore, medical practitioners and researchers with a deeper interest in AI can acquire skills to either develop their own AI solutions or test AI systems in clinical trials. This approach positions them as multipliers of AI technology, keeping the ultimate goal of enhancing patient outcomes at the forefront. We, however, acknowledge that the medical curriculum is already long enough and that familiarizing students with the different foundational fields involved in AI (such as mathematics, statistics and programming) might seem too ambitious, alongside continuous adaption to the rapid progress in AI technologies and ethical considerations related to the use of AI. We believe that universities should encourage the involvement of medical students in hands-on AI interdisciplinary projects that could help them to develop an intuition about handling AI systems.

Biomarker development

As discussed, AI could substantially influence the field of liver cancer. Therapeutic strategies for patients with advanced HCC have evolved in the past few years, and several options are now available in first line, including atezolizumab–bevacizumab or durvalumab–tremelimumab, with many other promising combinations currently in phase III trials (such as camrelizumab plus rivoceranib)93,103,104,119. Trials testing immunomodulatory agents in monotherapy (nivolumab or pembrolizumab) failed, but they were performed in an all-comers setting, and responses were observed in a subset of patients120. In this context, it will be important to develop biomarkers that will enable the best treatment to be given to each patient. Several predictive gene signatures for immunotherapy response have been proposed across tumour types121. They are, however, complex to implement in clinical practice and AI-based biomarkers exploiting information in routinely available clinical data could be a shortcut to widespread clinical implementation. In addition to HCC, the field of iCCA treatment is also profoundly evolving. The standard of care for patients with advanced disease is now chemoimmunotherapy122, and several molecular alterations can be efficiently targeted. Screening for these alterations is, however, costly, and in practice can require up to 2–3 weeks from RNA and DNA extraction to sequencing results. Similar to HCC, the development of AI biomarkers could also help to improve treatment allocation and stratification of patients with CCA and cHCC-ICCA.

Policy recommendations

The WHO guidance ‘Ethics and governance of artificial intelligence for health’ outlines six key principles that are crucial in navigating challenges associated with AI in medicine123. These are particularly relevant to the field of liver oncology, as AI is emerging in various applications in this field, and developers and researchers can still shape the AI models to conform to these strategic principles. First, the principle of ‘protecting autonomy’ emphasizes that AI should support, not replace, human decision-making in health care. In liver cancer care in which complex decisions rely not only on evaluation of the tumour but also on the patient and the liver function and performance status of the patient, this aspect is even more relevant than in other fields of medicine95. Second, the commitment to ‘promote human well-being, safety, and the public interest’ is especially relevant given the global inequities in liver cancer. Some high-income countries such as the USA, Australia and many European countries are experiencing increasing trends in liver cancer incidence and mortality, driven primarily by risk factors such as hepatitis C virus infection, alcohol use and metabolic disorders124. Meanwhile, some Asian countries have shown decreasing trends that are largely attributable to effective vaccination programmes and reduced aflatoxin exposure, highlighting the positive effect of successful public health interventions124. Transparency, the third principle, is vital in improving the quality of liver cancer care and protecting patients. The ability to audit AI devices and access comprehensive information about their limitations and protocols is important for system evaluators and regulators to identify errors and conduct effective oversight. The fourth principle, ‘fostering responsibility and accountability’, again reinforces that AI systems need to be employed in tandem with clinicians and patients and to ensure that the performance levels are clinically meaningful and robust. ‘Inclusiveness and equity’ form the fifth principle that advocates for AI in health care to be accessible and equitable, irrespective of age, gender or sex, income and other characteristics. As liver diseases often carry stigma125, it is important that AI systems in liver diseases are developed and deployed in a way that acknowledges and mitigates this stigma. Finally, the principle of ‘promoting responsive and sustainable AI’ entails the approval of AI interventions only if they can be fully implemented, monitored and sustained in the health-care system, with a minimal ecological footprint aligning with global efforts to mitigate climate change and ecological damage126. Despite the development of impressive academic AI-based prototypes for the managment of liver cancer, the transition to clinical practice faces challenges, including rigorous assessment and the need for easy AI explainability methods and education to build trust among physicians and patients. These guidelines serve as a cornerstone in creating AI-supported liver cancer care.

Potential risks of AI

As already outlined, the potential benefit of the application of AI models in liver cancer is tremendous (Fig. 4). However, as with the introduction of any new technology there are associated risks. AI models tend to perform well on data that are very similar to the data that they have been trained on, but struggle with data from different domains127. For example, when a model is applied to images from a new scanner generation it might fail and deliver false predictions. This problem is particularly serious with models that have already been established in clinical use and that might have gained the trust of physicians. Thus, it will be necessary to implement strict measures to continuously monitor the robustness of models used in clinical practice. Also, AI models tend to perpetuate bias in medical data and might perform worse in under-served patient populations128. Physicians need to be aware of this potential shortcoming and AI model developers need to take care to remove potential biases to avoid sociomedical inequalities. To mitigate these and similar challenges, the WHO has released a publication outlining guidelines and recommendations for safe, effective and equitable AI development129. It will be important to follow these guidelines and apply robust and continuous evaluation measures when AI models are implemented in clinical practice to foster trust of physicians in their reliability.

Conclusions

The application of AI for the management of primary liver cancer is an active field of research as unmet clinical needs are numerous and clinical decisions are complex and multifactorial. Potential applications include diagnostic automation, patient stratification, biomarker development and drug development. The landscape of systemic therapy has drastically changed for both HCC and iCCA in the past few years, and there are in particular high expectations for an improved prediction of response to systemic treatment. Several teams have developed impressive academic AI-based prototypes; however, there are still many challenges to overcome for their full implementation into clinical practice (Box 2). They will have to be as rigorously assessed as any other medical device or biomarker, and explainability and education will also be key to gaining the necessary trust of physicians and patients.