An Automated Segmentation Pipeline for Intratumoural Regions in Animal Xenografts Using Machine Learning and Saturation Transfer MRI

Lam, Wilfred W.; Oakden, Wendy; Karami, Elham; Koletar, Margaret M.; Murray, Leedan; Liu, Stanley K.; Sadeghi-Naini, Ali; Stanisz, Greg J.

doi:10.1038/s41598-020-64912-6

Download PDF

Article
Open access
Published: 15 May 2020

An Automated Segmentation Pipeline for Intratumoural Regions in Animal Xenografts Using Machine Learning and Saturation Transfer MRI

Wilfred W. Lam¹,
Wendy Oakden¹,
Elham Karami^1,2,3,
Margaret M. Koletar¹,
Leedan Murray¹,
Stanley K. Liu^2,4,5,6,
Ali Sadeghi-Naini^1,2,3,6 &
…
Greg J. Stanisz^1,2,7

Scientific Reports volume 10, Article number: 8063 (2020) Cite this article

1562 Accesses
5 Citations
Metrics details

Subjects

Abstract

Saturation transfer MRI can be useful in the characterization of different tumour types. It is sensitive to tumour metabolism, microstructure, and microenvironment. This study aimed to use saturation transfer to differentiate between intratumoural regions, demarcate tumour boundaries, and reduce data acquisition times by identifying the imaging scheme with the most impact on segmentation accuracy. Saturation transfer-weighted images were acquired over a wide range of saturation amplitudes and frequency offsets along with T₁ and T₂ maps for 34 tumour xenografts in mice. Independent component analysis and Gaussian mixture modelling were used to segment the images and identify intratumoural regions. Comparison between the segmented regions and histopathology indicated five distinct clusters: three corresponding to intratumoural regions (active tumour, necrosis/apoptosis, and blood/edema) and two extratumoural (muscle and a mix of muscle and connective tissue). The fraction of tumour voxels segmented as necrosis/apoptosis quantitatively matched those calculated from TUNEL histopathological assays. An optimal protocol was identified providing reasonable qualitative agreement between MRI and histopathology and consisting of T₁ and T₂ maps and 22 magnetization transfer (MT)-weighted images. A three-image subset was identified that resulted in a greater than 90% match in positive and negative predictive value of tumour voxels compared to those found using the entire 24-image dataset. The proposed algorithm can potentially be used to develop a robust intratumoural segmentation method.

Predicting survival of glioblastoma from automatic whole-brain and tumor segmentation of MR images

Article Open access 17 November 2022

Investigation of breast cancer microstructure and microvasculature from time-dependent DWI and CEST in correlation with histological biomarkers

Article Open access 20 April 2022

Quantification of tumor microenvironment acidity in glioblastoma using principal component analysis of dynamic susceptibility contrast enhanced MR imaging

Article Open access 22 July 2021

Introduction

Tumours are highly heterogeneous. Not only do they vary considerably between different individuals, but a single tumour often demonstrates regional variations in cell density, cell death, vasculature, and metabolic activity, among other factors¹. These subregions can be due to genetic or local microenvironmental differences^1,2. Differentiation between active tumour and necrosis is of particular clinical interest^3,4 since heterogeneity is often predictive of survival, therapeutic response, or metastatic potential^2,5,6.

There is a diagnostic advantage to the segmentation of heterogeneous tumours prior to further analysis. Considering the tumour as a single entity and calculating whole-tumour metrics, such as perfusion parameters, can result in a loss of correlation between biomarkers⁵. Magnetic resonance imaging (MRI) is ideal for identifying intratumoural regions as it is non-invasive and does not utilize ionizing radiation. While tumour heterogeneity can be observed on conventional T₂-weighted and post-contrast agent injection T₁-weighted^7,8 MRI, quantitative techniques are generally required in order to accurately segment intratumoural regions^5,9. Manual segmentation is certainly possible^2,10. However, it is time consuming, subjective, and typically based on a single image contrast. It is also exacerbated by the fact that tumour boundaries are often irregular and intratumoural regions may not be contiguous.

Automatic segmentation can be performed by fitting a model to the imaging data and thresholding the model parameters^11,12,13. It can also be done using machine learning. Specific methods include the use of convolutional neural networks^7,8, Gaussian mixture modelling^14,15,16,17, k-means clustering^9,17,18,19, non-negative matrix factorization^20,21, and other techniques^17,22. Quantitative MRI data used by these automatic segmentation routines include proton density (M₀)^9,18, transverse relaxation time (T₂)^{9,14,17,18,22}, effective transverse relaxation time (T₂*)^8,12,14,17, diffusion-weighted imaging (DWI) model parameters^{8,9,11,14,15,17,18,21,22}, and dynamic contrast enhancement (DCE) model parameter maps^{8,12,14,16,20}. While these methods are promising, these are not the only options for quantitative imaging.

Saturation transfer MRI is sensitive to differences in tumour metabolism, which differ between intratumoural regions¹, and can also be particularly useful in the characterization of different tumour types²³, since it does offer superb tissue contrast, in comparison to other methods, without a need for exogenous contrast agents. The saturation transfer MRI contrast mechanism reflects the exchange rate of magnetization between hydrogen nuclei in water and other molecular pools that include macromolecules and dissolved proteins, as well relative pool sizes and their intrinsic magnetic resonance properties such as the longitudinal and transverse relaxation times (T₁ and T₂, respectively) of each pool. In saturation transfer-prepared pulse sequences, magnetization is reduced by an RF (radiofrequency) saturation pulse across a range of frequencies corresponding to the exchanging molecules. When high saturation amplitudes and large frequency offsets relative to the water resonance are used, these sequences are typically referred to as magnetization transfer (MT) and are sensitive to the exchange of hydrogen nuclei in semisolid macromolecules (mostly lipid bilayers)^24,25,26 with those of water. At lower saturation amplitudes and smaller frequency offsets, they are mostly sensitive to the exchange in chemical groups in dissolved proteins (e.g., amide²⁷, amine²⁸, guanidinium^29,30, and hydroxyl²⁸) and the mechanism is termed Chemical Exchange Saturation Transfer (CEST) or relayed-Nuclear Overhauser Effect (NOE). Importantly, CEST, unlike DCE, does not require the injection of an exogenous contrast agent, which can be an issue for renally compromised patients³¹ and a complication for longitudinal preclinical studies. CEST MRI in oncology is an active area of research^32,33. In a previous segmentation study involving saturation transfer, Zhang et al. combined manual tumour delineation with MT and contrast to segment tumour from necrosis, which has much lower MT and relayed-NOE contrast¹⁰.

The goal of the present study was to develop an automated algorithm to segment intratumoural regions as well as the surrounding tissue in a xenograft model of prostate cancer using only saturation transfer MRI data and T₁ and T₂ maps. This would allow a secondary use of this data, which originally was intended for studying metabolism in tumours²³. The algorithm was validated using histopathology and was tested for robustness using leave-one-out cross-validation. The trade-off between the number of image contrast types and segmentation accuracy was investigated for the purpose of minimizing data acquisition and the images with the largest impact on accuracy were found via feature selection. Finally, the MT and CEST effects in active tumour vs. necrosis/apoptosis were also quantitatively compared.

Results

Description of tumours

Of the 34 DU145 human prostate adenocarcinoma tumour xenografts included in this study, 33 ranged in size from 33 to 810 mm³ (with a mean ± SD of 259 ± 210 mm³) and one (shown in Fig. 1) was ~1,500 mm³ with a particularly large region of edema. Imaging and histology were acquired 46 ± 12 days post injection of tumour cells.

Histopathology

Most tumours were heterogeneous and comprised of a complex mixture of muscle cells, tumour cells, necrotic and apoptotic cells, blood cells, and regions of inter-cellular edema. Figure 1 shows an example of a particularly complex tumour. Most of the 34 tumours were largely active, indicated by colourless terminal deoxynucleotidyl transferase dUTP nick end labelling (TUNEL) images (Supplementary Fig. S1). Eight had significant edema indicated by the large hyperintense regions in the T₂-weighted images. Edema presented a challenge for interpretation of histopathology since the voxel size and liquid signal often masked an intricate microenvironment of muscle cells, leukocytes, fibroblasts, tumour cells, and fibrotic necrosis, as illustrated in the regions of blood/edema in the haematoxylin and eosin (H&E) histology (boxes with dashed borders in Fig. 1). Twelve tumours had large areas of necrosis/apoptosis indicated by the significant brown staining in the TUNEL images (Supplementary Fig. S1).

Optimization of segmentation pipeline

The segmentation pipeline (Fig. 2) consisted of running an independent component analysis³⁴ on the input dataset, followed by fitting a Gaussian mixture model (GMM)³⁵, assigning cluster labels based on comparison with histology, and generating a segmentation mask of the original image data. The optimal number of GMM clusters was determined using the gradient of the Bayesian information criterion³⁶, calculated multiple times using unique sets of input images and spanning a range of GMM clusters (up to 10). The gradient of the BIC approached zero after five clusters and remained near zero with increasing numbers of clusters, indicating that this model can reasonably estimate five clusters (Supplementary Fig. S2).

Segmentation masks were generated for each imaging protocol and number of independent components (ICs) ranging from two to four. For each mask, the fraction of intratumoural voxels identified as necrosis/apoptosis was calculated, and compared with that derived from histology. The comparison of various imaging protocols and different numbers of ICs with anatomical MRI images and histological sections for the complex tumour example is shown in Fig. 3A. Spatial proportions of the histological sections did not correspond exactly to the MRI due to tissue processing and mounting. The highest Pearson correlation coefficient (ρ = 0.81, p < 10⁻⁴) was found for the imaging protocol consisting of T₁ and T₂ maps, and saturation transfer-weighted images with B₁ = 3 and 6 µT and three ICs (Fig. 3B,C; see Supplementary Fig. S3 for segmented TUNEL histopathology). Therefore, only this input and number of ICs was considered in the remainder of this work and is referred to as the “optimized protocol”.

Cluster label assignment

Figure 4 shows the labelled GMM means (stars) for the simultaneous segmentation of all 34 mice and GMM means for 34 leave-one-out segmentations (circles; 33 mice each). Of the leave-one-out segmentations, only one resulted in substantially different clusters. A graphical explanation of the label assignment algorithm can be also found in Supplementary Fig. S4.

Robustness of segmentation pipeline

The Dice similarity coefficient between whole-dataset and leave-one-out segmentation was 98 ± 3% (mean ± SD across all mice). Leave-one-out segmentation differed greatly for one of the mice, which had a coefficient of only 84%. In this case, the necrosis/apoptosis voxels from simultaneous segmentation were erroneously added to the active tumour cluster. This is likely due to the proximity of active tumour and necrosis/apoptosis voxels in IC space, which confounded the Gaussian mixture model fitting.

Figure 5 compares segmentation performed with the whole-dataset and leave-one-out approaches along with anatomical images and histology for three representative cases. The tumours are (A) primarily active tumour; (B) active tumour and necrosis/apoptosis; (C) and active tumour, necrosis/apoptosis, and blood/edema. In the leave-one-out segmentation, the T₂-weighted image, histology, and segmentation masks of the omitted mouse are shown. In these cases, the morphology and extent of the brown areas staining for necrosis/apoptosis in the TUNEL sections (third column) qualitatively match with the areas segmented in orange on the whole-dataset and leave-one-out segmentation masks (fourth and fifth columns, respectively). A similar figure containing all the tumours can be found in Supplementary Fig. S1; mouse #13, which had the low Dice coefficient on the leave-one-out segmentation, can be seen in that figure.

Optimization of protocol via feature selection

The subsets of saturation transfer-weighted images and T₁ and T₂ maps which discriminated most accurately between tissue types are listed in Table 1. As expected, increasing the number of images in the analysis subset provides a better match to the full optimized protocol with 24 images in total, going from a Dice similarity coefficient of 93% with three images up to 98% with nine images. The positive predictive value (PPV) and negative predictive value (NPV) of the subsets for both active tumour and necrosis/apoptosis increase with the size of the image subset as expected. Of the three-image subset, which is the smallest allowed with three ICs, the PPV and NPV for active tumour and NPV for necrosis/apoptosis are at least 94%. Eight images are required to yield a PPV for necrosis/apoptosis of 90%, but this continues to rapidly improve, reaching 95% with nine images. The images common to most subsets were: the T₁ and T₂ map and saturation transfer-weighted images with B₁ = 6 µT at Δω ≈ 48 ppm, which is highly sensitive to MT. Representative segmentation masks are shown in Supplementary Fig. S5. Qualitatively, masks calculated using image subsets are very similar to those using the full protocol.

Table 1 Assessment of image subsets via feature selection. Image subsets were selected by an exhaustive search using the Dice similarity coefficient (mean ± SD across all mice) between labels generated using the subset and the optimized protocol (i.e., T₁ and T₂ maps and all 22 saturation transfer-weighted images with B₁ = 3 and 6 µT) as the metric. The positive and negative predictive value (PPV and NPV, respectively) of tumour and necrosis/apoptosis labels are also given with respect to those generated from all images from the optimized protocol.

Full size table

Quantitative MT model fitting

The observed T₁ (T_1,obs) and estimated MT model parameters for the five clusters are listed in Table 2. The product of R and M_0,B (fourth column), termed the MT effect, is significantly different between the active tumour and necrosis/apoptosis clusters (Fig. 6E). The blood/edema cluster (third row) has particularly large uncertainties due to its mixed composition. Representative quantitative MT model fits (used to estimate the values in Table 2) and Z-spectrum differences between clusters as a function of saturation amplitude are shown in Supplementary Fig. S6. The largest differences between interpolated MT Z-spectra of individual clusters were predicted between 37 and 62 ppm for spectra obtained at 6 µT (Supplementary Fig. S6B–G). This range corresponds well to the MT-sensitive saturation parameter set (6 µT at 48 ppm; see Table 1) that provided maximal discrimination between clusters as chosen by feature selection. Furthermore, extrapolation to higher and lower saturation powers indicates that increased B₁ confers greater discrimination between some but not all tissues, at a frequency offset that increases with B₁.

Table 2 Estimated parameters (mean ± SD) of observed T₁ and the two-pool quantitative MT model for the five clusters. T_1,obs is the observed longitudinal relaxation time. T_2,A and T_2,B are the transverse relaxation times of the liquid and macromolecular pools, respectively. R is the magnetization exchange rate from the semisolid macromolecular to liquid pools. M_0,B is the macromolecular pool size relative to that of water (defined to be unity). The product of R and M_0,B, termed MT effect, is presented because these two parameters are coupled. Blood/edema is expected to have a relatively small MT pool size⁴¹, which is reflected in the large uncertainties in MT effect and T_2,B. All parameters were estimated for individual mice before averaging. Any given cluster per mouse was included only if it contained at least seven voxels.

Full size table

Isolation of CEST and relayed-NOE contributions

CEST and relayed-NOE effects were calculated separately for active tumour regions, necrosis/apoptosis regions, and whole tumour (consisting of both active tumour and necrosis/apoptosis regions). The mean and standard deviation of representative high and low B₁ Z-spectra and CEST and relayed-NOE contribution spectra over all mice for both regions plus the combined regions are shown in Fig. 6A–C. Unpaired t-test comparisons of the MT-weighted image common to all the optimal image subsets (with B₁ = 6 µT at 48 ppm), MT effect, and CEST contribution at the amide frequency offset (3.5 ppm) with B₁ = 2 µT for the same clusters are shown in Fig. 6D–F. There are significant differences are between the tumour and necrosis/apoptosis clusters. The values for combined tumour and necrosis/apoptosis clusters lie between those of the tumour and necrosis/apoptosis clusters as expected. This supports the requirement for separate tumour and necrosis/apoptosis clusters when analysing quantitative data.

Discussion

In this study, an automatic framework was developed for segmenting intratumoural regions using T₁ and T₂ maps and saturation transfer-weighted images. The segmentation pipeline consisted of a three-component ICA transform and a five-cluster GMM which took less than one second to calculate. The optimal imaging protocol was determined to be the set of T₁ and T₂ maps and saturation transfer weighted images with B₁ = 3 and 6 µT. Using histology, the five clusters were identified as corresponding to three intratumoural regions (active tumour, necrosis/apoptosis, and blood/edema) and two extratumoural (muscle and a mix of muscle and connective tissue). This automated segmentation qualitatively matched anatomical and histological images. Although there are no other studies using saturation transfer weighted MRI to automatically segment intratumoural regions to our knowledge, we discuss several relevant studies below. This will be followed by discussion of possible improvements to the pipeline. Finally, the novel use in this work of the apparent exchange-dependent relaxation (AREX) metric with an extrapolated MT reference (EMR) to isolate an aggregate CEST and rNOE contribution spectrum instead of the conventional multi-Lorentzian reference to yield individual contributions for each pool will be discussed.

Henning et al.¹⁸ identified two regions of viable tumour (normoxic and hypoxic), two of non-viable tumour in xenografts (n = 13) and a background region using apparent diffusion coefficient (ADC), T₂, and proton density maps input into a k-means clustering algorithm. A significant Pearson correlation coefficient between k-means and histologically derived tumour volumes of 0.94 was found.

Jardim-Perassi et al.¹⁴ also identified the same four regions in xenografts using T_2, T₂*, and ADC maps and three dynamic contrast enhanced (DCE) model parameter maps input into a GMM. Histological sections were cut while each excised tumour was placed in a tumour-specific 3D printed sectioning template, which improved co-registration between MRI and histology. The quality of fit to the histologic slices was quantified with the Jaccard index which was 82 ± 4% (n = 16). Chang et al.²⁰ identified two regions of active tumour (well-perfused and hypoxic) and one of necrotic using only DCE scans of xenografts (n = 1 prostate and n = 2 brain tumours). The clustering was performed based on the area under the contrast agent wash-in and wash-out curve. They showed a prostate xenograft histological section, which had some overlap with the segmentation mask, but also notable differences. However, these were not quantified. These two protocols required the injection of a gadolinium-based contrast agent to generate DCE images, which is contraindicated in renally impaired patients³¹ and the background appeared to be removed manually.

Katiyar et al.¹⁷ identified three regions: viable, necrotic, and peri-necrotic using T₂-weighted images, ADC maps, and pre- and post-contrast T₂ and T₂* maps of xenografts (n = 6) input into several clustering methods. The method that performed the best was spatially regularized spectral clustering, which yielded Pearson correlation coefficients of 0.98, 0.92, and 0.82 for the three regions, respectively. The contrast agent was an injected superparamagnetic iron oxide nanoparticle approved to treat iron deficiency anaemia. However, it is unsuitable for frequent use because alteration of MRI imaging studies may persist for up to three months³⁷. This group also published a method¹⁵ to identify necrotic and viable voxels using an ADC map and ¹⁸F-FDG positron emission tomography image in xenografts (n = 4) input into a 2D GMM. Pearson correlation coefficients of 0.87 and 0.88, respectively, were found between histology and clustering. In this case, the use of ionizing radiation is undesirable. The background also appeared to be removed manually.

Alignment of histopathology with MRI and segmentation masks was a significant challenge and is an area of active research. Determination of complex tumour histopathology requires interpretation by a clinical radiologist and/or pathologist, which can produce variant annotation between observers³⁸. There is disproportionate image resolution between a large MRI voxel, capturing greater tissue heterogeneity, versus the cellular composition represented by histology³⁹. Furthermore, contrast achieved in MRI may not easily align with variations of chromogen staining in tissue. Shape and orientation of the tissue depends on positioning of the body part in MRI. Similarly, excision of the tumour, tissue fixation causing dehydration, and a shift in the slicing plane through the tissue, add to the complexity aligning size and shape³⁹. The mismatched uniformity between the two techniques, along with subjective interpretation of images, facilitates error in anatomical and pathological measurements.

The choice of ICA over principal component analysis (PCA)⁴⁰, another commonly used dimensionality-reduction technique, is logically based on ICA’s assumption of independent sources. PCA, on the other hand, tries to find components that explain maximum variance drawn from across all clusters. Segmentation of this dataset using PCA resulted in more overlap between clusters, although the segmentation masks generated were largely similar (data not shown).

For the necrosis/apoptosis cluster, the positive predictive value (PPV) of segmentation masks generated from the three- to seven-image subsets was below 90%. It may be possible to increase this by choosing another metric for feature selection. The metric used in this work was the Dice similarity coefficient between image-subset and whole-dataset segmentation. However, a metric which explicitly includes for the PPV of necrosis/apoptosis cluster may increase this PPV for smaller image subsets.

The large degree of uncertainty in the estimated parameters of the blood/edema cluster are likely due to different sources. For the observed T₁ (T_1,obs) and T₂ of the free water pool (T_2,A), the source is probably both mixed composition and the relatively small number of samples. As an example, the regions with the two highest coefficients of variation (SD/mean) of T_1,obs are muscle/connective tissue and blood/edema (19% and 15%, respectively) are the smallest regions, representing 7% and 2% of all voxels, respectively. The uncertainty in the MT effect (R × M_0,B) and T₂ of the MT pool (T_2,B) is probably due the fact that blood/edema is expected to have little MT effect⁴¹.

There is further evidence that the blood/edema cluster is of mixed composition and could be sub-clustered. A scatter plot of the blood/edema voxels in the complex tumour in Fig. 2 as a function of their observed T₁ and T₂ shows the presence of a possible second cluster (Supplementary Fig. S7). The larger cluster, probably edema, is between the literature values of blood, cerebrospinal fluid, and synovial fluid. The second cluster contains far fewer voxels scattered around the observed T₁ and T₂ of blood. However, there was an insufficient number of these voxels in our dataset to train the model to detect a sixth cluster.

Although feature selection and quantitative MT modelling indicated that 48 ppm was the frequency offset giving the largest contrast differences between clusters, the inclusion of images at 5–8 ppm by feature selection could be due to sensitivity to CEST that these offsets provide, since necrotic and apoptotic cells are expected to have decreased metabolism to which CEST is sensitive. Overall, however, MT contrast appears to better inform the segmentation algorithm presented than does CEST contrast. Note that MT modelling is not necessary for the segmentation algorithm presented in this work.

To evaluate the CEST contribution, we used the EMR⁴² technique, but added, for the first time to our knowledge, the AREX metric^43,44 in order to remove the effects of T₁. The original multi-pool AREX method requires fitting a summed-Lorentzian model⁴⁵ to one low B₁ Z-spectrum. Then, one Lorentzian spectral contribution for each pool is extracted using the sum of the other modelled pools’ signals as a baseline⁴⁶ (called the reference Z-spectrum Z_ref) and an observed T₁ map. The adapted method, used here, extracts only one spectrum, with the contributions of all CEST and rNOE pools in aggregate, using the EMR Z-spectrum as Z_ref. This is potentially faster because the number of measurements required to generate an EMR Z_ref is much less than a low B₁ Z_ref (22 offsets at high B₁ and a T₁ map vs 66 offsets, respectively, in this work). Note that, using the EMR as the reference spectrum, measurements at low B₁ only need to be made at the offset(s) where one wishes to assess the CEST or rNOE contribution. The aggregate CEST and rNOE contribution spectrum contains contributions from multiple chemical groups at each offset, but requires less data acquisition and, unlike a summed-Lorentzian model, doesn’t assume a fixed number of pools.

Future work will incorporate multi-slice imaging enabling through-plane registration, adding an intermediate ex vivo MRI scan, creating a tissue sectioning template, and developing 3D imaging techniques for histopathology such as whole tumour slice reconstruction^14,38,39. Together these will provide quantification of whole tissue and sub-regional detail for improved alignment and translation between MRI and histopathology. Another possible modification to the segmentation pipeline would be to introduce an individual weighting for each IC before input to the GMM. This is left for future work because, when a grid search of the correlation of necrosis fraction from machine learning and histology as a function of IC weight (up to 9, 16, and 16 for IC₁, IC₂, and IC₃, respectively), no trend was apparent. Although there was a local maximum in correlation (ρ = 0.90 with an IC₁:IC₂:IC₃ weighting of 1:3:2 compared to 0.81 with a weighting of 1:1:1 as shown in Fig. 3B), it was decided not to weight the ICs in this work because there was no logical rationale to do so.

Methods

Animal model

Approximately 3 × 10⁶ DU145 human prostate adenocarcinoma (ATCC, Manassas, VA) cells mixed in a 1:1 ratio by volume with growth factor reduced Matrigel matrix (BD Canada, Mississauga, ON) were injected in the right hind limbs of 34 female athymic nude mice (Charles River Canada, Saint-Constant, QC) and allowed to grow into tumours for at least 34 days post-injection. Tumours were measured using callipers every one to four days and their volume was calculated using the formula volume = length × width²/2. All experimental procedures were approved by the Animal Care Committee of the Sunnybrook Research Institute, which adheres to the Policies and Guidelines of the Canadian Council on Animal Care and meets all the requirements of the Animals for Research Act of Ontario and the Health of Animals Act of Canada.

Magnetic resonance imaging

Tumours were scanned at 7T (BioSpec 70/30 USR with BGA-12SHP gradients running ParaVision 6.0.1, Bruker BioSpin, Billerica, MA) using an 86 mm inner diameter volume coil for transmit and a 20 mm diameter loop surface coil for receive. A fifteen-slice 2D axial T₂-weighted rapid acquisition with refocused echoes⁴⁷ (RARE; TR = 2500 ms; TE_eff = 55 ms; FOV = 20 mm × 20 mm; slice thickness = 0.5 mm; matrix = 128 × 128; RARE factor = 12; bandwidth = 33 kHz; averages = 4; 6 min, 40 s) was used for prescribing the slice of interest, chosen to be at the thickest point of the tumour. B₀-map-based shimming (MapShim) of second order gradients was performed on an ellipsoidal volume enclosing the tumour in the slice of interest. Flip angle scale factor maps⁴⁸ were calculated for the first four mice using a series of 3D high flip angle fast low angle shot (FLASH)⁴⁹ scans and the T₁ map for the slice of interest and the flip angle in the tumour region of interest was found to be within 6% of nominal (Supplementary Fig. S7 in our previous work²³). Thus, B₁ correction was deemed unnecessary going forward.

Saturation transfer-weighted images were acquired using a 490 ms block RF saturation pulse per k-space line and single-slice FLASH acquisition (TR = 500 ms; TE = 3 ms; flip angle = 30°; FOV = 20 mm × 20 mm; slice thickness = 1 mm; matrix = 64 × 64; bandwidth = 50 kHz; and 1 dummy scan) as in our previous work⁴⁵. The cumulative saturation time when acquiring the centre of k-space was approximately 16 s. Five datasets were acquired: two Z-spectra sensitive to the direct water saturation effect (DE), CEST, and MT with B₁ = 0.5 and 2 µT at 66 frequency offsets Δω = (ω − ω₀)/ω₀ × 10⁶ (where ω is the saturation frequency and ω₀, the water resonance frequency) between ±5 ppm; two Z-spectra mainly sensitive to DE and MT with B₁ = 3 and 6 µT at 11 logarithmically spaced Δω between 300 and 3 ppm; and one WASSR Z-spectrum⁵⁰ sensitive only to DE with B₁ = 0.1 µT at 21 Δω between ±0.5 ppm.

To allow for correction of system instability in post-processing, reference scans at Δω = 667 ppm were acquired before and after and also interleaved between every five Z-spectrum measurements^23,45. The scan time for the Z-spectra including reference scans with B₁ = 0.5 and 2 µT was 44 min/spectrum; 3 and 6 µT, 8.5 min/spectrum; and 0.1 µT, 15 min. To evaluate longitudinal relaxation time T₁, five inversion recovery RARE scans (TR = 10,000 ms; TE_eff = 10 ms; TI = 30, 110, 390, 1400, 5000 ms; same FOV, slice thickness, and matrix as FLASH; RARE factor = 4; bandwidth = 77 kHz; 2 min each) were also acquired for a T₁ map⁵¹. The total acquisition time including scout and shimming was 2.5 h per animal.