The artificial intelligence-based model ANORAK improves histopathological grading of lung adenocarcinoma

The introduction of the International Association for the Study of Lung Cancer grading system has furthered interest in histopathological grading for risk stratification in lung adenocarcinoma. Complex morphology and high intratumoral heterogeneity present challenges to pathologists, prompting the development of artificial intelligence (AI) methods. Here we developed ANORAK (pyrAmid pooliNg crOss stReam Attention networK), encoding multiresolution inputs with an attention mechanism, to delineate growth patterns from hematoxylin and eosin-stained slides. In 1,372 lung adenocarcinomas across four independent cohorts, AI-based grading was prognostic of disease-free survival, and further assisted pathologists by consistently improving prognostication in stage I tumors. Tumors with discrepant patterns between AI and pathologists had notably higher intratumoral heterogeneity. Furthermore, ANORAK facilitates the morphological and spatial assessment of the acinar pattern, capturing acinus variations with pattern transition. Collectively, our AI method enabled the precision quantification and morphology investigation of growth patterns, reflecting intratumoral histological transitions in lung adenocarcinoma.

The introduction of the International Association for the Study of Lung Cancer grading system has furthered interest in histopathological grading for risk stratification in lung adenocarcinoma.Complex morphology and high intratumoral heterogeneity present challenges to pathologists, prompting the development of artificial intelligence (AI) methods.
Here we developed ANORAK (pyrAmid pooliNg crOss stReam Attention networK), encoding multiresolution inputs with an attention mechanism, to delineate growth patterns from hematoxylin and eosin-stained slides.In 1,372 l un g a de no ca rc inomas across four independent cohorts, AI-based grading was prognostic of disease-free survival, and further assisted pathologists by consistently improving prognostication in stage I tumors.Tumors with discrepant patterns between AI and pathologists had notably higher intratumoral h et er og en eity.Furthermore, ANORAK facilitates the morphological and spatial assessment of the acinar pattern, capturing acinus variations with pattern transition.Collectively, our AI method enabled the precision quantification and morphology investigation of growth patterns, reflecting intratumoral histological transitions in l un g a de nocarcinoma.
Lung adenocarcinoma (LUAD), the most common type of non-small cell lung cancer, is histologically characterized by distinct growth patterns: lepidic, papillary, acinar, cribriform, micropapillary and solid 1 (Extended Data Fig. 1a).The proposed International Association for the Study of Lung Cancer (IASLC) grading system, based on a combination of the predominant growth pattern and high-grade patterns (cribriform, micropapillary and solid) within individual tumors, is highly prognostic 2 .However, there is interobserver variability among pathologists due to the challenges of consistently defining, recognizing and quantifying the wide spectrum of growth patterns 3 .This variability particularly affects differentiating lepidic, papillary and acinar patterns 2,4 , as well as the estimated proportion of high-grade patterns in non-high-grade pattern-predominant tumors 2,5 .Accurate quantification is challenging when there are multiple admixed growth patterns across several histological sections, as is the case in most LUADs.This challenge is compounded by the difficulty of defining the Technical Report https://doi.org/10.1038/s43018-023-00694-wpattern (0.7170), which was lower than DeepLabv3+ (0.7381).ANORAK also achieved overall promising performance at the patch-level and WSI-level evaluations (patch-Dice: ANORAK: 0.6034, other methods: 0.3770-0.5691;WSI agreement: ANORAK: 60.00-65.31%,other methods: 16-48.98%;Extended Data Fig. 3c).Furthermore, the parameters of ANORAK are 4.10 million, that is, more lightweight than other convolutional models (6.67-15.55million; Extended Data Fig. 3c).Taken together, the proposed model may have advantages in performance and computing over other methods.
cutoff between different patterns where they represent a spectrum of histological appearances 6 .This poses challenges for accurate prognostic inference and reproducibility in clinical studies.
Computer-assisted approaches powered by artificial intelligence (AI) have been widely applied to histological image analysis [7][8][9][10][11] .While some studies have applied deep learning models to LUAD growth pattern classification 12,13 , automated IASLC grading by AI methods is yet to be explored.Moreover, previous deep learning methods were mainly based on patch-wise classification that predicts a histological subtype for each patch, overlooking the detailed morphological structure of patterns.To capture the distinct pattern morphology, we developed an AI method based on pixel-wise classification to segment growth pattern islands and automate the IASLC grading for risk stratification and outcome prediction.
In this study, we developed an AI method to segment LUAD growth patterns at the pixel level using hematoxylin and eosin (H&E) whole-slide images (WSIs) (Fig. 1a and Extended Data Fig. 1b,c) and applied it to 5,540 diagnostic slides from 1,372 cases, spanning four cohorts: TRAcking non-small cell lung Cancer Evolution through therapy (Rx) (TRACERx); Leicester Archival Thoracic Tumor Investigatory Cohort-Adenocarcinoma (LATTICe-A); The Cancer Genome Atlas (TCGA) LUAD; and Dartmouth Lung Cancer Histology Dataset (DHMC) (Fig. 1b).The growth pattern proportions, predominant pattern and IASLC grading of a tumor can be derived automatically based on growth pattern mapping (Fig. 1c).This pixel-wise segmentation method also revealed the morphological properties of growth patterns and enabled analysis of the degree of spatial heterogeneity, highlighting its advantages over patch-wise classification algorithms.

A hierarchical AI model for growth pattern quantification
To spatially map complex growth patterns in LUAD, we developed ANORAK (pyrAmid pooliNg crOss stReam Attention networK), which encodes cross-stream interactions using a multi-order attention mechanism within convolutional neural networks 14 (Fig. 1a and Extended Data Fig. 1b,c).Moreover, a pyramid pooling module (PPM) 15 distributed global contextual information of growth patterns to guide high-level feature learning.ANORAK was trained on data annotated from 49 WSIs in the TRACERx 100 cohort (Extended Data Fig. 1a) by three thoracic subspeciality pathologists (Extended Data Fig. 1b), and validated on a total of 5,540 WSIs from 1,372 LUAD tumors across four cohorts (Fig. 1b and Table 1).This model enabled precision mapping of diverse growth patterns at pixel-level resolution, thereby facilitating automated grading and analysis of morphological intratumoral heterogeneity (Fig. 1c).
Taken together, these data suggest that AI grading adds independent prognostic value for patient stratification, particularly for stage I disease in which clinical decision-making regarding adjuvant therapy following surgery can be challenging in the absence of evidence for outcome benefit.

Assisting pathologists in challenging scenarios
To evaluate the utility of our AI method to assist pathologists with LUAD grading, we identified four specific scenarios and used the large LATTICe-A cohort with manual grading available from three pathologists.We focused on stage I LUAD tumors, a group of patients with an unmet need for predicting which patients are likely to relapse to guide early intervention, potentially with adjuvant therapy 21 .
The third scenario was the detection of aggressive, high-grade patterns.Although there was a high concordance rate for cases composed predominantly of high-grade patterns (Extended Data Fig. 4e), the proposed IASLC grading system sets a 20% cutoff for high-grade patterns to qualify as grade 3, adding challenges to identify high-grade

Growth pattern intratumor heterogeneity
Growth pattern intratumor heterogeneity

Growth pattern intratumor heterogeneity
Growth pattern intratumor heterogeneity
These data indicated that our proposed AI method was not inferior to pathological grading and could assist pathologists to grade growth patterns in certain challenging scenarios.

Acinar morphology and spatial heterogeneity
Precise spatial delineations of growth patterns allowed us to study the spatial configuration of tumors as morphologically distinct pattern islands (Fig. 2a and Extended Data Figs.2a,b and 3a).Acinar growth, often considered as an intermediate state during the transition of morphological patterns 6,23 , was also the most prevalent pattern in stage I tumors in the LATTICe-A cohort (Fig. 5a).The area of individual acinar islands was similar to that of micropapillary islands, and smaller than those of other patterns (Fig. 5b).These data led us to investigate the importance of morphological features and spatial distribution of acinar islands that may be indicative of histology pattern transition.
We used area and shape measured using pixel number and solidity index (Extended Data Fig. 7a) to represent the morphological features of individual acinar islands.Acinar island area and shape were notably different in tumors (≥5% of acinar) with different predominant patterns (TRACERx 421 n = 173; LATTICe-A n = 654; Extended Data Fig. 7b).Smaller acinar islands were enriched in lepidic-predominant tumors compared to acinar-predominant and papillary-predominant tumors (TRACERx 421 P = 0.00052; LATTICe-A P = 5.4 × 10 −12 ; Fig. 5c and Extended Data Fig. 7c).This may reflect the acinar structures in lepidic-predominant disease frequently representing airspaces with iatrogenic collapse 24 .The area of acinar islands in high-grade pattern-predominant (cribriform, micropapillary and solid) tumors were also smaller than those in acinar-predominant and papillary-predominant tumors (TRACERx 421 P = 9.8 × 10 −11 ; LATTICe-A P < 2.22 × 10 −16 ; Fig. 5c and Extended Data Fig. 7c).Notably, this area feature was a strong discriminator between acinar-predominant and cribriform-predominant tumors (TRACERx 421 P = 0.0007; LATTICe-A P = 1.5 × 10 −7 ; Fig. 5d), indicating that acini may form differently in acinar-predominant tumors compared to others.The transition from an acinar to a cribriform pattern may frequently occur to large acinar islands through gland fusion (Extended Data Fig. 7e), while smaller acinar structures may remain.Alveolar architectures in airspace detected in acinar-predominant tumors might also be supporting large 'glands'.Acinar islands with regular shapes were enriched in high-grade-predominant tumors compared with lepidic subtypes (TRACERx 421 P = 0.0024; LATTICe-A P = 4.1 × 10 −7 ; Fig. 5e and Extended Data Fig. 7d), which is again consistent with morphological variance due to the compressibility of lepidic growth.Taken together, the morphological features of acinar islands vary notably in tumors predominantly enriched with different patterns (Fig. 5f).
To investigate the spatial arrangement of acinar patterns, we developed an acinar scattering score that measured the degree of acinus dispersion.A low score indicated locally clustered acinar islands, while a high score implied a dispersion of acinar islands throughout the tissue (Extended Data Fig. 7f).Low acinar scattering was found more frequently in lepidic-predominant tumors compared to all others (TRACERx 421 P = 0.017; LATTICe-A P = 0.004; Fig. 5g), indicating that clustered acinar islands may reflect the compression induced by iatrogenic collapse and may also suggest that the transition from lepidic to acinar occurs in an organized manner 25 .We next explored acinar scattering in the context of outcome prediction.Tumors with highly scattered acini were associated with reduced DFS compared to lowly scattered tumors (TRACERx 421 n = 205, P = 0.003, HR = 1.89, 95% CI = 1.25-2.86;LATTICe-A n = 837, P = 5.09 × 10 −7 , HR = 1.63, 95% CI = 1.35-1.98;Fig. 5h) in univariate analysis.In a multivariable model incorporating acinar scattering and AI grading, acinar scattering was independent of AI grading (TRACERx 421 P = 0.004; LATTICe-A P = 2.61 × 10 −5 ; Fig. 5i).These data suggest that acinar scattering may be a potential pattern reflecting histological transition events, and that high scattering may be a morphological phenotype indicating poor prognosis, which can be assessed from H&E images.

Discussion
We have developed an AI method ANORAK for the precise classification of growth patterns in LUAD.To the best of our knowledge, this is the first AI method to dissect LUAD growth patterns at the pixel level and be tested in over 1,000 cases, setting a benchmark in automated grading of LUAD.Our method can automatically estimate growth pattern proportions and predominant patterns within a tumor, providing an unbiased and automated pipeline for determining IASLC grading in LUAD.Moreover, the precise delineation of growth patterns can provide insights into the heterogeneous landscape of LUAD, which cannot be addressed by patch-wise classification methods.
The AI method was evaluated in four cohorts, comprising a total of 1,372 tumors.The overall agreement of predominant pattern at the tumor level between AI and pathologists across four cohorts was moderate, which is consistent with the inter-pathologist agreement in the LATTICe-A and DHMC cohorts 13 .Similar results were found in previous studies.Boland et al. 3 reported an agreement of 51.7% between two pathologists for a large cohort of individuals with LUAD (n = 534), while Thunnissen et al. 4 showed good agreement for typical cases and fair agreement for difficult cases by comparing scores from 26 pathologists.In addition, tumors with a discrepant predominant pattern classification between AI and manual scoring were more heterogeneous compared to tumors in agreement.Previous attempts were made to determine how clonal evolution is reflected in growth pattern heterogeneity through the identification of molecular alterations that accompany the transition between growth patterns 6 .This detailed analysis in a small number of tumors found that changes in expression, rather than mutations, accompanied the transition; as such, clear evidence of divergent tumor clones reflected in the growth pattern was not identified.On a larger scale, in the TRACERx study, although without specific focus on sampling to capture divergent growth patterns, there was a tendency for tumors to evolve from low-grade or mid-grade to higher grade growth patterns in individuals with LUAD where an ancestor-descendant relationship could be described based on clonal or subclonal loss of heterozygosity 22 .
The proposed IASLC grading system was originally introduced to improve prognostication using tumor morphology 2 .In our study, AI grading improved the performance of predicting DFS compared to the baseline and pathological grading for stage I tumors, and be comparable for stage I-III tumors.Moreover, the prognostic value of AI grading was independent of clinical parameters in the TRACERx 421 and LATTICe-A cohorts.In typical clinical practice, the colineage of postsurgical recurrence is not definitively confirmed, although

Technical Report
https://doi.org/10.1038/s43018-023-00694-wdata from the TRACERx 421 cohort showed that only two out of 49 cases of clinically classified postsurgical recurrence were of different lineage using whole-exome sequencing 26 .While we acknowledge that these uncommon events limit the ability to predict recurrence from resection specimens, this applies equally to both our method and established practices.The LATTICe-A cohort, consisting of 845 tumors with scores from three pathologists, allowed a comprehensive investigation of the clinical impact of the AI method and showed its benefit as a morphological biomarker.This benefit was slightly higher than that brought by an additional manual grading for stage I tumors, and was comparable with additional manual grading for stage I-III tumors.Furthermore, analyses of manual scoring demonstrated that tumors with multiple slides and intratumoral morphological heterogeneity were particularly challenging cases.In these cases, AI grading achieved a stronger predictive ability compared to manual grading for stage I tumors.Because stage I patients frequently receive surgical resection without adjuvant therapy, the accurate prediction of recurrence, to better target individual patients for adjuvant therapies, is critical.These data illustrate the clinical utility of our AI method for stage I tumors, which could potentially be used as an alternative or independent variable to manual grading, or be applied specifically to challenging cases.
The AI method enables the spatial profiling of growth patterns at the pixel level, allowing morphological and spatial heterogeneity analyses at the growth pattern island level.This would be unattainable with alternative manual or patch-wise classification methods.We used the area and solidity index to measure acinar island morphology and found that small acinar islands were enriched in lepidic-predominant and high-grade-predominant tumors, while the shape of these small acini in lepidic-predominant tumors was more irregular than high-grade-predominant tumors.This may reflect tumor cell biological and microenvironmental differences regarding the formation of acinar structures within the context of different predominant architectures.Because acinar morphological features were obtained by averaging thousands of acinar islands within a tumor, noise due to island segmentation was mitigated (Supplementary Figs.1-7).We also developed a metric for measuring the spatial distribution within the tissue space of acinar islands, termed acinar scattering.Low acinar scattering was notably associated with lepidic-predominant tumors compared to others, suggesting that acinar spatial distribution may reflect the transition of growth patterns toward more aggressive behavior.High acinar scattering was correlated to unfavorable outcomes, independent of AI grading.
This study has some limitations.The Dice coefficient of ANORAK is still limited, indicating that error modes exist.Intratumoral and tumor microenvironment heterogeneity may result in variations in growth pattern morphology, making segmentation more challenging, specifically among lepidic, papillary and acinar patterns.Meanwhile, the patching operation during the training and testing stages may limit the field of view, thus losing context information.Stain color shift may also have the potential for misclassification despite the color augmentations and normalizations applied to mitigate this impact.These factors may contribute to local error modes, which, when accumulated, may result in errors at the WSI level.In addition, because the model counted the number of pixels to determine the predominant pattern per tumor, and the area of micropapillary islands was smaller than the papillary structures 27 , the discrepancy between AI and pathologists regarding papillary-predominant and micropapillary-predominant patterns may be considered another error mode.Furthermore, because we only collected histopathology annotations from invasive non-mucinous LUAD as training data, invasive mucinous and preinvasive tumors with distinct morphologies are therefore outside of the scope, which may generate inaccurate results or completely fail if applied to such samples.In addition, we selected a 'challenging case series' from the LATTICe-A cohort, because the other cohorts considered in this study had fewer cases satisfying the selection criteria.However, LATTICe-A is not a screening-based cohort.It is therefore crucial to validate the potential clinical benefits of AI grading in further cohorts that include screening-detected tumors.Because there are no other studies reporting the importance of acinar spatial arrangement, further validations and studies of the biological implications of acinar scattering are needed.
In summary, the AI method we developed can automate the predominant growth pattern and IASLC grading for LUAD tumors, achieving a moderate agreement with pathologists; this was validated in four cohorts consisting of 1,372 cases.In the TRACERx 421 and LATTICe-A cohort, AI grading was an independent prognostic indicator and had a stronger prognostic ability than pathological grading alone for stage I tumors in the LATTICe-A cohort.The prognostic performance of AI grading was further underlined in challenging scenarios consisting of cases with multiple slides and greater intratumoral heterogeneity.Furthermore, specific morphological features of tumor acini have the potential to infer different underlying tumor biology, with the spatial heterogeneity of acinar islands reflecting divergent tumor behavior and prognosis.

Study cohorts
TRACERx is a multi-center, prospective study, which began recruitment in April 2014 (https://clinicaltrials.gov/ct2/show/NCT01888601, approved by an independent research ethics committee, ref. no.13/ LO/1546).Formalin-fixed paraffin-embedded and H&E-stained histopathology diagnostic slides were scanned using the NanoZoomer S210 digital slide scanner (catalog no.C13239-01) and NanoZoomer digital pathology system v.3.1.7 (Hamamatsu) at ×40 (0.228 μm per pixel resolution) 28,29 .LATTICe-A is a retrospective series of all consecutively resected primary LUAD tumors at a single UK surgical center between 1998 and 2014.The work was ethically approved by a UK National Health Service research ethics committee (ref.no.14/EM/1159) and complies with Strengthening the Reporting of Observational Studies in Epidemiology guidelines.All archived slides containing tumor material were used to capture the full diversity of each lesion.Slides were dearchived and scanned using a Hamamatsu NanoZoomer XR at ×40 (0.226 μm per pixel resolution) 23,29 .Available diagnostic slides from the TCGA LUAD were downloaded from https://portal.gdc.cancer.gov/ in 2021.The DHMC 13 was downloaded from https://bmirds.github.io/LungCancer/ in 2021.Further information on the research design is available in the Nature Research Reporting Summary linked to this article.
The training set of the AI method consisted of 49 WSIs from patients in the TRACERx 100 cohort 28,29 .The WSIs were sparsely annotated by three independent thoracic subspeciality pathologists, yielding 3,662 patches (768 × 768 pixels at ×20, approximately 0.45 μm per pixel) of annotations for six typical growth patterns (Extended Data Fig. 1a) and non-tumor areas, for example, normal tissue and blank areas.
The AI method was then applied and evaluated on a total of 5,540 WSIs from four cohorts, which were collected, processed and scanned independently.This included patients with invasive non-mucinous LUAD as primary diagnosis (excluding adenocarcinoma in situ, minimally invasive adenocarcinomas and other variants) from the TRACERx 421 cohort (n = 206, 1,184 slides) 22,26 , LATTICe-A cohort (n = 845, 3,979 slides) 23 , TCGA LUAD cohort (n = 178, 234 slides) 30 , DHMC cohort (n = 143, 143 slides) 13 (Table 1).TRACERx 100 is a subset of TRACERx 421.For the TRACERx 421 and LATTICe-A cohorts, slides were from all the diagnostic blocks containing tumor cells.For the DHMC cohort and most patients (91%) in the TCGA cohort, only one slide was available.Hence, we only considered these two cohorts for agreement performance comparison.No statistical method was used to predetermine sample size but our sample sizes are similar to those reported in previous publications 13,22,26,29,30 and subject to available diagnostic slides.Blinding and randomization were not relevant because this was Technical Report https://doi.org/10.1038/s43018-023-00694-wan observational study.Patients were not allocated to any interventions and they were followed up and assessed as per routine practice.No results from this study were reported back to patients, so there is no likelihood of people changing their behaviors based on these findings.The deep learning model was trained without knowing the outcome of patients, which represents a form of blinding.
Manual pathological grading of growth patterns, as well as individual pattern proportion scoring, were available for the TRACERx 421, LATTICe-A and TCGA cohorts.The DHMC cohort only had predominant pattern data for each slide.In the LATTICe-A cohort, three independent consultant-level thoracic subspeciality pathologists provided growth pattern scoring for each tumor.
In the TRACERx 421 cohort, DFS was defined as the period from the date of registration to the time of radiological confirmation of the recurrence of the primary tumor registered for the TRACERx or the time of death by any cause.During the follow-up, three participants with LUAD (CRUK0512, CRUK0428 and CRUK0511) developed new primary cancer and subsequent recurrence from either the first primary lung cancer or the new primary cancer diagnosed during the follow-up.These cases were censored at the time of the diagnosis of the new primary cancer for DFS analysis because of the uncertainty of the origin of the third tumor 22 .
In the LATTICE-A cohort, recurrence data were obtained from the examination of patient records, notably paper notes and radiological databases, to identify the date of radiologically or biopsy-confirmed recurrence.Cancer-specific death was determined by the presence of lung cancer in the cause of death in the death certificate.Overall survival refers to the date of death.

Deep learning model architecture
We developed a deep learning-based model 14 ANORAK which leveraged cross-stream interaction to recognize and segment six histological patterns (lepidic, acinar, papillary, micropapillary, cribriform and solid) on WSIs at the pixel level.The model applied ResNet50 (ref.31) as the backbone with customized modifications to account for the limited training data.It encoded three streams (coarse, intermediate and fine) with different scales of information to gather abundant features at different resolutions (×10 at approximately 0.9 μm per pixel, ×5 and ×2.5).The first-order attention (Extended Data Fig. 1c) introduced global contextual information at an early stage to guide low-level feature learning and enable the first round of interactions between streams.Each output in the coarse and intermediate streams was then fed into a convolution layer to align the depth dimension with the fine stream output.A PPM 15 (Extended Data Fig. 1c) was used to integrate high-level features.Afterwards, such features were forwarded to a second-order attention module, learning the relationship of streams to extract more discriminative features, and driving high-level feature exchanging between streams (Extended Data Fig. 1c and Fig. 1a).

Implementation and evaluation
Before training, the annotated tiles were divided into nonoverlapping patches, except for patches at the bottom and right edges, with a size of 768 × 768 pixels at ×20.During training, four data augmentation strategies were used to mitigate overfitting: random rotation within 90 degrees; random width-shift and height-shift up to 20% of the input width and height; randomly zooming in or out in a range of (0.8, 1.2); and random adjustment of the saturation within (0.8, 2.0) and hue within (−0.1, 0.1).Color augmentation was not applied to the cross-validation stage because data were from the same cohort.The model was trained for 60 epochs with a batch size of eight.Cross-entropy loss was applied as the objective function, which was minimized by the Adam optimizer with a step-wise learning rate.The initialization rate was set to 10 −3 for the first ten epochs; then, it was decreased by ten times for the next 40 epochs, which was then followed by another ten times of decreasing (10 −5 ) for the remaining ten epochs.The pipeline was implemented with Python v.3.The ablation experiments at the patch level included comparisons with the baseline method (single-stream), multi-stream with the element-wise add combination (multi-ADD), multi-stream with first-order attention alone (multi-FO), multi-stream with second-order attention alone (multi-SO) and the proposed ANORAK model (multi-FO and multi-SO).The proposed model was compared against other widely used approaches in semantic segmentation, including attention U-Net 17 , DeepLabV3+ (ref.18), DANet 19 and MedT 20 .We applied the Dice coefficient to evaluate segmentation performance at the patch level and the agreement of predominant patterns to assess prediction at the WSI level.Comparisons were conducted with fivefold cross-validation for the TRACERx 100 cohort (n = 53) and on a subset of the LATTICe-A cohort (n = 50), an independent dataset to the training dataset.

Growth pattern and grading inference
Each WSI was divided into tiles of 2,000 × 2,000 pixels with the magnification downsampled to ×20 (approximately 0.45 μm per pixel) 29 .Each tile was then normalized to a target image to align the color before feeding it to the well-trained deep learning model, which, in turn, generated corresponding masks for all growth pattern regions detected at the pixel level.The tile masks were then stitched and further downsampled to ×1.25 (approximately 7.2 μm per pixel).Small components were empirically removed as postprocessing; lepidic patterns that were less than approximately 0.05 mm 2 , and papillary, cribriform and solid patterns that were less than approximately 0.015 mm 2 were removed.
The predominant pattern and grading were inferred from a stitched and downsampled mask (approximately 7.2 μm per pixel).The growth pattern proportion for each tumor was computed as the proportion across all slides of a given tumor: where g j is a proportion for the j pattern, j represents lepidic, acinar, papillary, cribriform, micropapillary and solid, i is the i-th slide, m is the number of slides per tumor, n is the number of patterns and S ij is the number of pixels identified for the j pattern with the i-th slide.The predominant pattern, P, is determined as the pattern with the highest proportion.The growth pattern grading driven by AI followed the IASLC grading system 2 : grade 1, lepidic-predominant tumors with less than 20% of high-grade patterns (solid, micropapillary, cribriform); grade 2, acinar-predominant or papillary-predominant tumors with less than 20% of high-grade patterns; and grade 3, any tumor with 20% or more high-grade patterns.

Acinar morphological features
The pixel number and solidity index, that is, the proportion of pixels in the convex hull that were also in a region of interest, were used to measure the individual acinar island area and shape generated by the AI method.A higher solidity index indicated a more regular shape.The average area and solidity index of all the individual acinar islands identified from the available slides were taken as the tumor-level features.

Acinar scattering score
We adapted an established score, standard distance 32 , to measure the spatial distribution of acinar patterns, which we termed 'acinar scattering': where d is the standard distance, n is the number of isolated acinar islands within the tissue identified by the proposed AI method, N is the area of the tissue, (x i , y i ) is the centroid of an acinar island and (x 0 , y 0 ) is the mean center of all the acinar islands.
A higher acinar scattering score indicated a more scattered distribution of acini across the tissue.The median value of all available slides for a given tumor was taken as the tumor-level score.The optimal cutoff (0.36) separating tumors into low-scattering and high-scattering groups was selected from the discovery cohort, LATTICe-A, which was then applied directly to the TRACERx 421 cohort.

Statistics and reproducibility
Correlation tests used Spearman's method and were generated using the function cor.test from the stats v.4.1.2Rpackage.Confusion matrices were obtained using the function confusionMatrix from the caret v.6.0-93Rpackage.Fleiss' kappa was computed to assess the agreement among observers using the function kappam.fleissfrom the irr v.0.84.1R package.Survival analyses were conducted using the Kaplan-Meier estimator (ggsurvplot R function from the survminer v.0.4.9 and survival v.3.2-13Rpackages) as well as the Cox model (coxph R function, displayed using the ggforest R function).The differences between grade strata Kaplan-Meier curves were determined using Wald tests.Forest plots showed the HR on the x axis; each variable's HR was plotted and annotated with a 95% CI.All HRs were computed for all time points (the whole survival curve was not at a specific time point).For statistical comparisons among groups, a two-sided, nonparametric, unpaired Wilcoxon rank-sum test was used for the continuous variables, while a Fisher's exact test was used for the categorical variables.A Kruskal-Wallis test was used for comparisons among over two groups, unless stated otherwise.Predictive performance was assessed using a C-index 33 within 5 years, computed with the function Inf.Cval from the survC1 v.1.0-3Rpackage.Multicollinearity between AI and manual grading, and between two manual gradings were assessed using the function vif from the car v.3.0-12Rpackage.All statistical tests were two-sided and P < 0.05 was considered as statistically significant.To adjust P values for multiple comparisons, the Benjamini-Hochberg method was used.The packages tidyverse v.2.0.0 and tidyr v.1.3.0 were used for data processing in R. Plotting was done using ggplot2 v.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The training dataset consisting of annotations on small image tiles have been deposited in Zenodo (https://doi.org/10.5281/zenodo.10016027).Previously published image data that were reanalyzed in this study can be requested from https://bmirds.github.io/LungCancer/.The human LUAD diagnostic slide images were derived from the TCGA Research Network at https://portal.gdc.cancer.gov/.Images generated by the AI model in Fig. 2a and Extended Data Figs.2, 3a and 7f can be accessed at figshare (https://doi.org/10.6084/m9.figshare.24599796).For the TRACERx study, all of the scanned diagnostic histological images have a study number label embedded in the file that prevents complete anonymization.Therefore, these images cannot be shared, in line with the ethical approval for the study.Requests for access to the TRACERx dataset for academic noncommercial research purposes can be submitted through the Cancer Research UK and UCL Cancer Trials Centre (ctc.tracerx@ucl.ac.uk) and are subject to review of a project proposal that will be evaluated by a TRACERx data access committee, entering into an appropriate data access agreement and any applicable ethical approvals.The time frame of response to requests is about 6 months.LATTICe-A study data and materials are currently subject to a material and data transfer agreement between the University of Leicester, the University of Cambridge and NHS Greater Glasgow and Clyde, which Technical Report https://doi.org/10.1038/s43018-023-00694-wincludes a restricted access period of 5 years, precluding any access by other third parties during this time.After the 5-year period, restricted access data can be accessed by application to NHS Greater Glasgow and Clyde Biorepository (clare.orange@ggc.scot.nhs.uk;john.lequesne@glasgow.ac.uk) as custodians; the data access request will be reviewed and released under their research ethics committee-approved tissue bank protocols.Requests will be reviewed and approved within 6-8 weeks and will be accompanied by a data sharing agreement detailing the conditions and restrictions of use and publication.Source data are provided with this paper.S.V. is a coinventor to a patent of methods for detecting molecules in a sample (patent no.10578620).A.H. has received fees from Abbvie, Almirall, Boehringer Ingelheim, Clovis Oncology, Ipsen, Takeda Pharmaceuticals, AstraZeneca, Daiichi Sankyo, Merck Serono, Merck/MSD, UCB, Kyowa Kirin, Servier, Sobi, Pfizer and Roche for delivering general education and training in clinical trials; has received fees for member of independent data monitoring committees for Roche-sponsored clinical trials and academic projects on real-world evidence or tumor-agnostic therapies coordinated by Roche; he has been paid honoraria for speaking at Roche-funded conferences (on real-world data); he has an academic collaboration with Navio and is an unpaid member of their advisory board; he is an investigator for an academic study (SUMMIT) sponsored by UCL, which is funded by GRAIL; he has received one honorarium for an advisory board meeting for GRAIL; he has received a consulting fee from Evidera (for one GRAIL-initiated project); and he has previously owned shares in Illumina and Thermo Fisher Scientific (sold in 2020); he is on the scientific advisory board for Adela Bio and has received no payments or honoraria for this, although he has share options available.A.

Fig. 1 |
Fig. 1 | Proposed computational pipeline for precision mapping and spatial heterogeneity analyses.a, The deep learning network architecture for growth pattern segmentation integrating inputs over multiple spatial resolutions and delivering pixel-wise delineations.b, Overview of all the cohorts and available

Fig. 2 | 17 AIC
Fig. 2 | Performance of AI in the prediction and quantification of growth patterns.a, Segmentation example generated by ANORAK.b, Correlations of growth pattern proportions at the tumor level between AI and pathologists.Growth pattern proportions were not available in the DHMC cohort; thus, plots relevant to proportions were not illustrated for the DHMC (same in d and e).P values were corrected for multiple comparisons using the Benjamini-Hochberg method.c, Performance comparison with pathologists in predicting the predominant pattern per case (the cribriform predominant slide per tumor was not available in the DHMC cohort).d, Growth pattern intratumoral

*Extended Data Fig. 1 |. 2 |b. 3 | 1 2
/doi.org/10.1038/s43018-023-00694-wPath 3 is different with the Path 3 in scorring LATTICe-A.Precise pathological annotations for training and sub-modules of the developed deep learning model (ANORAK).a. Examples illustrating morphologically distinct growth patterns in lung adenocarcinoma.b.Distribution of annotations regarding the number of patches and pixels.c.Detailed architectures of sub-modules developed for the AI method.Technical Report https://doi.org/10.1038/s43018-023-00694-wSegmentation performance.a,b.Segmentations generated by AI at low-power and high-power resolutions, deposited in 10.6084/ m9.figshare.24599796.https://doi.org/10.1038/s43018-023-00694-wPerformance comparison with different segmentation methods at patch (Dice) and WSI (agreement) Segmentation performance and intra-and inter-comparisons.a. Segmentations generated by AI at low-power and high-power resolutions, deposited in 10.6084/m9.figshare.24599796.b.Comparison of segmentation and prediction performance for ablation experiments.c.Comparison of segmentation and prediction performance with other methods.Technical Report https://doi.org/10.1038/s43018-023-00694-wKappa index of predominant pattern in LATTICe-A Interobserver agreement and Kappa index of IASLC grading in LATTICe-Path 3 Growth pattern intra−tumour heterogeneity Predominant pattern discrepancy between pathologists in LATTICe-A cohort 0 Discrepant in Agreement Path 1 versus Path 3 Growth pattern intra−tumour heterogeneity 4.3 × 10 −13 Extended Data Fig. 4 | Inter-pathologists comparison for predominant pattern and IASLC grading in LATTICe-A.a. Interobserver agreement of each pattern.b, c.Interobserver agreement of predominant pattern and IASLC grading at tumor level.d.Growth pattern intra-tumoral heterogeneity substantially contributed to the discrepancy between pathologists (n = 845 each, P1 < 2.22 × 10 −16 , P2 = 4.323× 10 −13 , P3 = 1.589 × 10 −15 ).P value was calculated using a two-sided Wilcoxon rank-sum test and not adjusted for the multiple comparisons.The median value is indicated by a thick horizontal line; the first and third quartiles are represented by box edges; whiskers indicate 1.5 times interquartile range.e. Interobserver agreement of each grade.
G.N. reports personal fees from Merck, Boehringer Ingelheim, Novartis, AstraZeneca, Bristol Myers Squibb, Roche, Abbvie, Oncologica, Uptodate, the European Society of Oncology, Takeda Pharmaceuticals, Sanofi and Liberium, as well as personal fees and grants from Pfizer.M.J-H. is a Cancer Research UK Career Establishment Awardee and has received funding from Cancer Research UK, the International Association for the Study of Lung Cancer and International Lung Cancer Foundation, the Lung Cancer Research Foundation, the Rosetrees Trust, UK and Ireland Neuroendocrine Tumour Society, the National Institute for Health Research (NIHR) and the NIHR UCLH Biomedical Research Centre.M.J-H.hasconsulted for, and is a member of, the Achilles Therapeutics Scientific advisory board and steering committee, has received speaker honoraria from Pfizer, Astex Pharmaceuticals and Oslo Cancer Cluster, and holds a patent (no.PCT/US2017/028013) relating to methods for lung cancer detection.C.S. acknowledges grant support from AstraZeneca, Boehringer Ingelheim, Bristol Myers Squibb, Pfizer, Roche-Ventana, Invitae (previously Archer Dx, collaboration in minimal residual disease sequencing technologies) and Ono Pharmaceutical.He is an AstraZeneca advisory board member and chief investigator for the AZ MeRmaiD 1 and 2 clinical trials; he is also co-chief investigator of the NHS Galleri trial funded by GRAIL and a paid member of GRAIL's scientific advisory board.He receives consultant fees from Achilles Therapeutics (scientific advisory board member), Bicycle Therapeutics (scientific advisory board), Genentech, Medicxi, Roche Innovation Centre-Shanghai, Metabomed (until July 2022) and the Sarah Cannon Research Institute.C.S. has received honoraria from Amgen, AstraZeneca, Pfizer, Novartis, GlaxoSmithKline, MSD, Bristol Myers Squibb, Illumina and Roche-Ventana.C.S. had stock options in Apogen Biotechnologies and GRAIL until June 2021; he currently has stock options in Epic Bioscience, Bicycle Therapeutics; he has stock options and is a cofounder of Achilles Therapeutics.C.S. holds patents relating to assay technology to detect tumor recurrence (no.PCT/GB2018/051892).Y.Y. has received speaker's bureau honoraria from Roche and consulted for Merck.D.A.M. reports speaker fees from Eli Lilly, AstraZeneca and Takeda Pharmaceuticals, consultancy fees from AstraZeneca, Thermo Fisher Scientific, Takeda Pharmaceuticals, Amgen, Janssen, MIM Software, Bristol Myers Squibb and Eli Lilly, and has received educational support from Takeda Pharmaceuticals and Amgen.All other authors declare no competing interests.