A multi-modal data resource for investigating topographic heterogeneity in patient-derived xenograft tumors

Patient-derived xenografts (PDXs) are an essential pre-clinical resource for investigating tumor biology. However, cellular heterogeneity within and across PDX tumors can strongly impact the interpretation of PDX studies. Here, we generated a multi-modal, large-scale dataset to investigate PDX heterogeneity in metastatic colorectal cancer (CRC) across tumor models, spatial scales and genomic, transcriptomic, proteomic and imaging assay modalities. To showcase this dataset, we present analysis to assess sources of PDX variation, including anatomical orientation within the implanted tumor, mouse contribution, and differences between replicate PDX tumors. A unique aspect of our dataset is deep characterization of intra-tumor heterogeneity via immunofluorescence imaging, which enables investigation of variation across multiple spatial scales, from subcellular to whole tumor levels. Our study provides a benchmark data resource to investigate PDX models of metastatic CRC and serves as a template for future, quantitative investigations of spatial heterogeneity within and across PDX tumor models.

. Clinico-pathologic annotation of PDX models. 5-FU: 5-fluorouracil; Ox: oxaliplatin; Ir: irinotecan; Cet: cetuximab; Bev: bevacizumab. *Genetic and proteomic profiling also performed on patient tumor biopsy sample; response to targeted therapies was recapitulated in the xenograft model 15 . † Histology of patient tumor was conserved in the corresponding xenograft (Fig. 2a). ‡ Mutations common to patient tumor and xenograft model; all cancers were microsatellite stable.
PDX extraction. Mice were euthanized by exposure to CO 2 and confirmed dead by cervical dislocation.
Tumors, located in either one or both sc pockets, were extracted so as to cut off as much surrounding murine tissue as possibly without disrupting the tumor. The tumor was cut into three sectors (dorsal, central, ventral), snap-frozen immediately and stored in a vial at −80 °C.
Embedding and Sectioning of PDX. PDX samples were embedded into optimal cutting temperature (OCT) and stored at −80 °C until ready for sectioning. A total of 40 cryosections were prepared on a cryostat for each sample; they were prepared as alternating 8 μm and 4 μm sections (5 each). The 8 μm sections were mounted on uncharged glass slides and used for the isolation of pure populations of tumor cells via Laser Capture Microdissection (LCM). The 4 μm sections were mounted on charged microscope glass slides (FisherBrand, Cat# 12-550-17) and used for immunofluorescent (IF) staining. Slides were stored at −80 °C until further processed.
Tumor content for each sample was verified by a certified pathologist before dissection, and the pathologist report is available on the NCI data website.

LCM.
Pure tumor cells were isolated via LCM for downstream DNA, RNA, and protein profiling. Serial sections were dissected for the three molecular analyses to minimize heterogeneity due to changes in the tissue composition across the tumor. Slides dedicated to the DNA/RNA analysis were stained before dissection using the HistoGene LCM Frozen Section Staining Kit (Life Technologies, Carlsbad, CA) following manufacturing www.nature.com/scientificdata www.nature.com/scientificdata/ instructions. Slides dedicated for the RPPA analysis were stained as previously described (Baldelli, 2015). Isolation of the tumor cells was performed using the Pixcell II IR system (Arcturus Bioscience, Mountain View, CA) and dissected cells were collected on CapSure Macro LCM caps (Arcturus Bioscience, Mountain View, CA). ~2,500 cells were collected on a single CapSure Macro LCM caps for DNA and RNA analyses. LCM for RPPA differed from this procedure in two ways. For every PDX sample, we captured ~4,000 tumor cells as well as residual tissue after removal of tumor regions. For a selected subset of five samples, we additionally captured pure stromal tissue. The above workflow is depicted in Fig. 1b. , the surgical tissue (top) from a metastatic implant on the vaginal wall has the typical morphologic and cytological features of a mucinous adenocarcinoma with conspicuous lakes of extracellular mucin (arrows) and abundant cells with intracellular mucin accumulation (arrowhead, inset) arranged in small epithelial clusters and sheets of more amorphous epithelial arrangements. A xenograft (bottom left) derived from this surgical tissue shows similar cellular arrangements and large extracellular spaces consistent with extracellular pooling of mucin (arrows). Higher magnification shows both extracellular and intracellular mucin accumulation with conspicuous intracellular vacuolization (arrowhead, inset). For model L2 (right column), the surgical tissue (top) features of a moderately differentiated adenocarcinoma with abundant, irregular tubules and ribbons of epithelial "glandular" structures composed of columnar tumor cells. Note the markedly enlarged and hyperchromatic nuclei with a paucity of mucin production (inset). A xenograft derived from this surgical tissue (bottom right) demonstrates very similar abundant morphology without conspicuous mucin production. Higher magnification shows columnar cells with similarly enlarged and hyperchromatic nuclei in these "glandular" epithelial arrangements. (b) Growth rate is model dependent. Tumor volume growth curves were fit to exponential distributions to extract doubling times (n = 3 for each model www.nature.com/scientificdata www.nature.com/scientificdata/ immunofluorescence (iF) staining. All solutions for IF staining were made in 1X PBS (10X PBS diluted with ddH 2 0, Lonza 17-517Q). All wash steps were performed three times using 1X PBS. Primary and directly conjugated antibodies were purchased from Cell Signaling and stored according to supplied data sheets. Slides were removed from the −80 °C freezer and left at room temperature for several minutes. Tissues were fixed in 4% paraformaldehyde (Electron Microscopy Sciences 15710, Hatfield, PA) for 15 min, washed and permeabilized with Methanol at -20 °C for 10 min and washed once. Tissue was circled with a PapPen and blocked with Universal Blocking Solution (1% BSA, 0.1% Fish skin gelatin, 0.5% TritonX-100, 0.05% Na-Azide in 1X PBS). Slides were incubated overnight at 4 °C with primary antibodies (Table 2)

H&E.
Three slides per tissue sample were stained with H&E. Slides were removed from the −80 °C freezer and left at room temperature for several minutes. They were immersed in 0.1% Mayers Hematoxylin (Sigma MHS-16) for 10 min, washed with ddH2O and then dipped into 0.5% Eosin (Sigma E6003-25G) several times. Slides were then washed with increasing EtOH concentration starting at 50% and ending at 100%. Lastly, the slides were dipped in Xylene before letting them dry, mounting with Permount ® Mounting Medium (Fisher Chemical SP15-500) and letting them harden overnight. Finally, slides were sealed with clear nail polished and stored in the dark at room temperature.
image acquisition. Immunofluorescence images were acquired using the Leica Aperio Scanscope FL. Per marker set, exposure times were kept constant for the FITC, TexasRed and Cy5 channels. The exposure time for the DAPI channel was set by the microscope, as it uses the DAPI signal to optimize Aperio's focusing algorithm. We set a focus offset of 0.002 mm. H&E images were acquired using the Aperio Scanscope CS. Tissue detection, focus points and calibration were set by the Aperio Software ScanScope Console.
Reverse phase protein array (RPPA). Samples were obtained from either: whole-tissue with no LCM (all 36 samples), ~4000 LCM captured tumor cells (all 36 samples), residual tissue after LCM (all 36 samples), and stromal tissue (5 samples). LCM caps were lysed in a 1:1 solution of 2x Tris-Glycine SDS Sample buffer (Invitrogen Life Technologies, Carlsbad, CA) and Tissue Protein Extraction Reagent (Pierce, Rockford, IL) to which 2.5% 2-βmercaptoethanol (Sigma, St. Louis, MO) was added. Samples were lysed as previously described 19 . Tissue lysates were printed in triplicates onto nitrocellulose coated glass slides (Grace Biolabs, Bend, OR) using an Aushon 2470 arrayer (Aushon BioSystems, Billerica, MA) along with reference standards for internal control. Before immunostaining, each array was treated with Reblot antibody stripping solution (Chemicon, Temecula, CA) for 15 minutes followed by 2 washes in PBS and a one hour block in I-block solution (Tropix, Bedford, MA). Immunostaining was performed at room temperature using an automated Autostainer (Dako Cytomation, Carpinteria, CA).
To reduce background signal generated by endogenous proteins, arrays were first probed with 3% hydrogen peroxide, avidin/biotin blocking system (Dako Cytomation, Carpinteria, CA), and an additional serum free protein block (Dako Cytomation, Carpinteria, CA). Each array was then probed with one primary antibody targeting the protein of interest. A total of 46 (of the 48 probed) antibodies were used for this analysis (list available on the NCI data website). Antibody specificity was tested by western blotting using a panel of cell and/or tissue lysates before being tested on the arrays 20 . Negative control slides were incubated with antibody diluent alone [goat antirabbit IgG heavy + light (1:7,500; Vector Laboratories, Inc. Burlingame, CA) and rabbit antimouse IgG (1:10; Dako Cytomation, Carpinteria, CA)]. A commercially available tyramide-based avidin/biotin amplification kit [Catalyzed Signal Amplification System (CSA; Dako Cytomation Carpinteria, CA)] couples with the IRDye 680RD Streptavidin (LI-COR Biosciences, Lincoln, NE) were employed for the amplification and detection of the signal 19 .
Protein concentration of each sample was quantified on selected arrays using a Sypro Ruby Protein Blot Stain (Molecular Probes, Eugene, OR) protocol following manufacturer's instructions. Antibody-and Sypro Ruby-stained slides were scanned on a laser PowerScanner (TECAN, Männedorf, Switzerland). Images were analyzed using the MicroVigene software version 5.1 (Vigene Tech, Carlisle, MA), a commercially available software that performs spot finding, background/secondary antibody slide(s) subtraction, and normalization to the amount of protein in each samples determined by the Sypro Ruby Protein Blot staining.
DNA and RNA extraction. DNA and RNA extraction was performed using the Arcturus PicoPure DNA Extraction Kit (Life Technologies, Carlsbad, CA) and the Arcturus PicoPure RNA Extraction Kit (Life Technologies, Carlsbad, CA) following manufacturing instructions. 8 uL of buffer were added to each cap for the extraction. Samples were incubated at the recommended temperature for 3 hours for the DNA extraction and 30 minutes for the RNA extraction. DNA profiling. The Ion AmpliSeq ™ Cancer Hotspot Panel v2 is a pool of primers used to perform multiplex PCR for preparation of amplicon libraries from genomic "hot spot" regions that are frequently mutated in human cancer genes. DNA libraries were prepared from 10 ng of laser-captured tumor or NIH3T3 DNA samples using the Ion Ampliseq Library Kit 2.0 with the 207 Ion AmpliSeq Cancer Hot Spot panel primer pairs. Individual samples were amplified for a total of 20 cycles of PCR. The amplified product was digested, bar-coded sequencing primers were ligated to the amplicons, and the reactions were cleaned per Ion Torrent procedure instructions.
www.nature.com/scientificdata www.nature.com/scientificdata/ Completed libraries were quantitated by qPCR using primers specific to the A and P1 adaptors. All 37 samples were diluted to 100 pM and combined in equal volumes. The combined libraries were then diluted to 50 pM and loaded into the Ion Chef for automated templating, enrichment and loading onto Ion Torrent PI chips for semiconductor sequencing.
RNA profiling. The Arcturus PicoPure RNA Isolation Kit was used for RNA extraction on the LCM samples.
Each sample was bound to an RNA purification column, DNase treated, washed and eluted in a final volume of 14 µl. The Qubit RNA High Sensitivity Assay was used for quantitation. For samples with a measurable concentration (i.e., more than 5 ng/μl), 1 ng of RNA was used for input into library preparation.
The Ion Ampliseq Transcriptome Kit TM was used for the library preparation and the procedure was performed accurately and according to protocol. Briefly, approximately 1 ng of total RNA was reverse transcribed using the AmpliSeq transcriptome VILO master mix. cDNA synthesis was verified by qPCR on 0.5 µl of the reverse transcription reaction using AmpliSeq specific GAPDH primers. The remaining library input cDNA was amplified for 15 cycles using the AmpliSeq Human Transcriptome Gene Expression kit primer pool with AmpliSeq PCR Master Mix. Amplicons were digested with the proprietary FuPa enzyme, and barcoded IonXpress adapters were ligated onto the target amplicons. The library amplicons were bound to magnetic beads, and residual reaction components were washed off. Libraries were eluted and individually quantitated by qPCR using Ion Torrent P1 and A sequencing primers and SYBR Green master mix. Samples were diluted to 100 pM and combined in equal volumes. The combined libraries were then diluted to 50 pM. Emulsion PCR, templating and PI chip loading was performed with an Ion Chef Instrument. Sequencing was performed on an Ion Proton TM sequencer, with HiQ sequencing chemistry. Analysis was performed using recommended Torrent Suite software with AmpliSeq transcriptome and Ion Torrent Cancer Hotspot panel specific plug-ins.
To test for the potential contribution of residual mouse RNA to AmpliSeq-determined gene expression, an additional sample was generated by adding RNA extracted from mouse NIH3T3 cells to one of the human tumor RNA samples. RNA was combined at a ratio of 10% mouse RNA plus 90% tumor RNA. The mouse spike-in AmpliSeq transcriptome library was prepared as above.

Analytical Methods
Tumor growth. Tumor growth was quantified by fitting log of measured tumor volume as a linear function of number of days since implantation (The doubling time was calculated by log(2)/slope) (Fig. 2). To test whether the tumor model explained any of the variation in tumor doubling time, we performed a one-way ANOVA on the doubling time values for our 12 (4 models × 3 replicate PDX) tumors with the 4 model numbers as a grouping variable. Quality control. We used the following quality control measures to ensure the reliability of our profiles: (1) # mapped reads: low coverage implied poor profiling of the genome; and (2) % reads mapped to the amplicon: a low fraction suggested that a samples' DNA was damaged in some way. Our heuristic criterion for sample inclusion was: # mapped reads >1E6, and % reads mapped to the amplicon >85%. 4 of the 36 samples did not meet these criteria (Fig. 3a) and were not used in further downstream analyses.
Mouse contribution. We also performed DNA sequence on pure mouse tissue, using the same pipeline as for the PDX samples, including alignment to the human reference genome. We found that several sites with variant calls made on our PDX samples were also called as "mutant" in the pure mouse. The variant allele frequencies at most of these sites varied in synchrony across the various samples (Fig. 3c). This co-variation is consistent with calls driven by residual mouse DNA, with variation across samples arising from different amounts of mouse DNA in each sample. Accordingly, for all further analysis, sites with pure mouse mutation calls were not used for downstream analysis.
RNA profiling. The AmpliSeq human transcriptome gene expression primer pool specifically targets 18,574 protein-coding mRNAs and 2,228 non-coding ncRNAs based on UCSC hg19. Ion Torrent AmpliSeq data were analyzed using the Ion Torrent Mapping Alignment Program (TMAP), included with Torrent Suite Software v5. Briefly, by mapping reads onto the known amplicon sequences, the software quantifies the expression of a gene in terms of the number of reads from each specific amplicon. RNA expression levels may be displayed as counts, or as reads per million (RPM), normalized for the number of sequence reads per sample.
Quality control. We used the following quality control measures to ensure the reliability of our profiles: (1) average read length: based on the size of our amplicons, the average length of mapped reads is expected to be ~100 base pairs. Shorter read lengths were suggestive of poor quality reads; and (2) % genes mapped: this is the fraction of the full set of 22 K genes that has some reads mapped to it. Given our total RNA levels, we expected to be able to map ~50% of the genome, with lower values indicating poor quality data. Our criterion for sample inclusion was that the average read length be greater than 100 and 45% of the genes be mapped. As with the DNA, these cutoffs were visually selected to exclude clear outliers. 5 out of 36 samples did not meet these criteria (see Fig. 3b) and were not used in further downstream analysis. Pathway scores. For each of our samples deemed to have good-quality RNA, their RNA expression profiles were used to calculate relative up or down regulation of the 50 pathways (Hallmark gene set from MsigDB 21 ) using GSVA 22 . Briefly, for each gene, GSVA first calculates the extent to which the gene's levels are higher/lower than in a specific sample relative to the entire collection of samples. Then for each gene set (i.e. Hallmark Pathway), GSVA scores a sample based on the extent to which there is an over representation of high/low expressing genes for that sample. immunofluorescence microscopy. Image correction. The IF images were observed to display two types of imaging artifacts: a) a non-periodic but slowly varying intensity offset that caused the intensity in non-tissue parts of the image to be non-zero, and b) a striping pattern that modulated image intensities periodically with an image/marker specific shading pattern but fixed periodicity of 1560 pixels along the x-axis. Thus, we assumed a model of I obs = (I true + B)S, where I obs and I true are the observed and "true" images, B is the background intensity, and S is the periodic shading pattern. We estimated these corrections from non-tissue portions of the image (where, in principle, I true = 0) on a per-channel and image basis and applied corrections to the tissue intensities. B was estimated as follows: we divided the image into large blocks of height 480 pixels and width equal to the striping pattern, calculated average intensities in blocks that contained no tissue, and used a Mumford-Shah infilling www.nature.com/scientificdata www.nature.com/scientificdata/ approach 23 to estimate B within the tissue. S was estimated as follows: the correction at a given phase of the periodically varying vertical stripes was obtained by averaging I obs /B non-tissue pixels each period. "True" image intensities were obtained from the model above as Intensity normalization. To reduce the contribution of non-biological variations on observed biomarker intensity differences between samples, we performed intensity normalization as follows. To avoid overfitting, we use linear normalization (i.e. we transformed the intensities of all pixels in a biomarker image using the same linear function). We performed two marker-specific normalizations. First, for DAPI, we sought to reduce the effect of exposure time variation on image intensity. In our imaging experiments, DAPI was the only biomarker for which the same exposure time was not used across all samples. We found that DAPI intensities varied linearly with exposure time, and we therefore simply scaled DAPI intensities such that each image received an effective exposure time of 125 ms. Second, for each biomarker, we sought to reduce the impact of technical slide-to-slide variation on observed intensities as we are primarily interested in the biological variation between samples. For each of our 36 samples, the 3 replicate sections stained with the same markers should exhibit similar intensity profiles, assuming no slide-to-slide intensity variation. Therefore, we linearly transformed the intensity on each section such that the 25th and 75th percentiles across these replicate sections were equal 24 . The numerical value of the 25th and 75th percentiles was chosen as the average of the 25th and 75th percentiles across the replicate slides.

Marker tumor intensity.
We devised an automated process to calculate the intensity of markers in tumor (i.e. non-stromal) cells using a combination of standard image analysis approaches and DNA-and stromal-specific markers (Fig. 4c). First, to identify cellular portions of an image, we applied adaptive thresholding to the DAPI channel, which essentially identifies nuclear pixels, and then we performed an image closing operation to fill in the area between nuclei. To identify stromal portions, a similar operation was applied to vimentin, a stromal-specific marker. We then defined tumor regions to be the cellular regions of the image that did not belong to the stromal areas. Finally, mean marker intensity (Fig. 4c) was calculated by averaging over all pixels in the tumor areas of an image.
image heterogeneity profiling. To profile the heterogeneity of our IF images, we made use of PhenoRipper, an un-supervised image analysis approach 25 . Given a collection of images, PhenoRipper divides each image into a collection of sub-cellular blocks and identifies a set of stereotypical "block-types" based on the biomarker intensity colocalization patterns exhibited by the pixels in a block. The heterogeneity of an image can then be described in terms of its PhenoRipper profile-the relative frequency of appearance of the different block type in that image-which may be thought of as a description of the mixture of sub-cellular phenotypes. We set the following parameters for PhenoRipper: (1) marker intensity scaling was chosen to be the same for all markers (0 and 35,000 as minimum and maximum, respectively); (2) a block-size of 20 pixels; and (3) an intensity threshold of 20% of the max intensity to distinguish between foreground and background regions. Default values www.nature.com/scientificdata www.nature.com/scientificdata/ were chosen for all other parameters, including 30 for the number of block types, which led to a 30-dimensional profile vector for each image.
In profiling tissue heterogeneity, we wanted to not just compare heterogeneity across tissues, but also within a tissue. We therefore divided each tumor tissue image into a grid of 1300 × 1300 pixels sub-images and generated PhenoRipper profiles for each of these sub-images. Given our imaging parameters, 1300 pixels roughly corresponds to the size of a 0.6 mm diameter tissue microarray core. PhenoRipper profiles produced a data cube of size n x × n y × 30, where a tumor image contained n x × n y sub-images and 30 is the length of a PhenoRipper profile for a single image. For applications where we were only interested in the overall heterogeneity of a sample (and not its distribution within the sample), we performed a weighted average of the PhenoRipper profiles across the n x × n y sub-images, with each sub-image weighted in proportion to the amount of tissue (i.e. number of foreground blocks) it contained.
Sample-To-Sample correlation. The expression profiles (genetic/rna/pathway/rppa/if) as described above were z-score normalized for each readout (e.g. gene/pathway/antibody) (Fig. 4a). Readouts with no variation across the full set of samples were not used in correlation calculations. Correlations used in Fig. 4a were calculated based on pairwise Pearson correlations between these normalized profiles.

Deconvolution of iF marker intensity variance across length scales. For any biomarker, every pixel
in an IF stained image can be thought of as belonging to a hierarchical set of levels, spanning length scales from its local sub-cellular neighborhood to the PDX model from which that tumor was derived. Specifically, within an image, we can consider the pixel belonging to growing sets of pixel neighborhoods (with order-of-magnitude length scales): sub-cellular (<10 micron) ⊂ cellular (between 10 to 100 micron) ⊂ micro-environmental (100 to 1000 micron) ⊂ regional (1000 micron to mm scales of slide). Across images, each image represents one of multiple sections from a sector, which in turn is derived from one of three tumors representing one of 4 models. We sought to break down the observed pixel intensity variation (for a biomarker) across the entire collection of pixels across all models, into contributions arising from each of these scales.
Accordingly, we started from the highest scale (whole data), and subtracted out the average intensity across all pixels at this scale (mean intensity of the biomarker). We moved on to the next scale (PDX model), and for each group (model) at this scale calculated the average of the residual intensity. These difference from the group average at this scale were then passed on to the next scale, where the procedure was carried out recursively at increasingly finer levels of grouping until, at the final cellular level, the residuals were considered to represent sub-cellular variation. For the levels above image (i.e. section images ⊂ sector ⊂ tumor ⊂ model ⊂ dataset), we performed a simple non-weighted mean. For levels within an image (image ⊃ region ⊃ microenvironment ⊃ cellular ⊃ subcellular), we performed a weighted average that takes into account the distance between pixels, in a scale-space-theory inspired approach. Specifically, we performed averaging by convolving with Gaussian filters of different widths, σ = Inf, 1000,100,10 pixels respectively (1 pixel = 0.4619 microns). We restricted ourselves to nuclear pixels identified as described above. Taken together, the intensity for every nuclear pixel is expressed as a sum of intensities across these different length scales: where I p is intensity of pixel p, and I p,scale is the contribution from each specific scale. We defined total variation as var p (I p ) and variation explained at each scale as var p (I p,scale ).
Heterogeneity sampling. We wanted to test how many samples were required to capture intra-and inter-tumor variation for a PDX model (Fig. 5c,d). We made use of a recently developed experimental design approach 26 , which requires the user to supply the following elements: 1. a description of the true heterogeneity of each PDX model: here, we used a PhenoRipper profile obtained by combining profiles from all images (3 replicate PDX's × 3 sectors × 3 sections) for each marker set; 2. a description of the heterogeneity of an arbitrary sub-sample: here, we described heterogeneity in terms of the weighted average of sub-image PhenoRipper profiles as described above; 3. a criterion to determine when a sub-sample accurately captures the heterogeneity of the model: "good" sampling of heterogeneity was defined to be when the average deviation (across the 30 block types) between the "true" and sub-sample PhenoRipper profiles was <0.01; 4. a sampling strategy (i.e. the process of selecting a set of subsamples): here we considered the following sampling strategies for a sampling run with n subimages: a. within model: n sub-images selected randomly from all sub-images within a model; b. within tumor: one of the 3 (replicate) single tumors belonging to a model was randomly selected, and n sub-images were then randomly selected from this tumor; c. within sector: for each sampling run, one of the three sectors (dorsal/ventral/central) was chosen at random, and then n sub-images were selected from this sector, but could come from different tumors; d. within sample: one of the 9 samples per model was chosen at random, and then n sub-images were selected from that sample; e. within section: first one of the 27 sections (9 samples × 3 replicates sections per sample) per model was chosen at random and then n sub-images were selected from that section. www.nature.com/scientificdata www.nature.com/scientificdata/ Given these elements, the method computed the confidence for a given sampling strategy and number of samples. Confidence was defined as the fraction of sampling runs for which the subsamples captured the heterogeneity of the model, as defined above. In our case, we sub-sample 1000 times and calculated the fraction of cases when good sampling (as defined above) was achieved. We averaged our confidence values over 40 PhenoRipper runs.

Data Records
All data including raw and processed for DNA/RNA/RPPA, H&E and IF stained image data and a description of their organization can be found at 11 : https://doi.org/10.17917/yr13-py25 Additionally, all DNA and RNA sequencing data has been deposited at DDBJ 27 as a BioProject with accession number PRJDB7951 and study accession number DRP004856 28 .

Technical Validation
Here, we include analysis intended to help guide users through these datasets. The analyses include: (a) quality control, (b) a multi-modal analysis of heterogeneity across different spatial scales, and (c) a demonstration of how image data might guide experimental design.
Quality control. We first investigated histopathological properties of our PDX collection. H&E slides were examined by a pathologist (S.V.) who found them histologically consistent with colon adenocarcinoma, displaying no evidence for transition to a different cancer type. Moreover, where we had access to H&E tissue from the parent surgical sample, we found that the characteristic histopathologic features were also captured by the corresponding PDX samples (Fig. 2a). Further, for each of the 36 PDX regions, an H&E section was scored for the percentage of tumor necrosis and mouse stroma and the degree of dysplasia. Consistent with previous findings 10, 14 , we found that properties of histology and tissue composition (% of tumor, stroma and necrosis) are largely driven by PDX model (Methods, Fig. 2). Further, we found no dependence of necrosis on tumor volume within our dataset. Taken together, these results suggested that histo-pathology is largely reproducible at intra-tumoral scales within a PDX model. www.nature.com/scientificdata www.nature.com/scientificdata/ We performed various quality controls to ensure reliability of our assays. To test whether the process of DNA/ RNA extraction from post-LCM lysates negatively impacted these assays, we used the number of reads and extent of mappability as proxies. For both assays, ~90% of samples showed similar high levels of quality; a few outlier samples failed these tests and were excluded from further analyses (Methods, Fig. 3a,b red points). Additionally, we used pure mouse samples as a reference to test the effect of mouse contamination on our profiles. For DNA, we found that despite performing LCM, several mutations called in our samples were present in both pure mouse and in the tumor samples, with sample-and model-dependent distributions of variant calls (Fig. 3c). These calls are likely driven by trace mouse DNA that is included despite the LCM process, and these genes were dropped from downstream analyses. For the RNA, an additional spike-in of 10% mouse RNA did not affect the overall transcriptional profile (Fig. 3d); however, ~100 profiled genes (<0.5% of all genes) showed a large increase in expression in the spike-in relative to the PDX sample (Fig. 3d, points with red outlines). Our analyses suggest that LCM alone is not sufficient to completely exclude mouse contribution from profiling assays, although this effect likely only impacts a tiny fraction of genes.
Quality control of the RPPA datasets was performed at the Center for Applied Proteomics and Molecular Medicine at George Mason University using antibodies validated as described in Methods 20 . Raw images of the protein array slides are included in our data set. For the IF datasets, antibodies were obtained from commercial vendors ( Table 2) and staining protocols were optimized on PDX tissue and tested on healthy mouse colon (Fig. 3e). Staining pattern and localization were examined visually for consistency with reported literature. Based on this quality control, all marker panels except MS4 (which showed a diffuse staining pattern) were used in subsequent analysis.
Multi-modal analysis of spatial heterogeneity. We next investigated how consistent our different molecular profiles were across models, replicate tumors, and sectors. The best correlations (Methods) were seen at the level of the PDX model for all assays, with genetic assays showing the highest levels of within-model concordance (Fig. 4a). Interestingly, across all assays, we did not see an appreciable increase in correlation at higher resolution within the same model ( Fig. 4a: rows 4 vs. 5 and 6) or by restricting our analysis to samples that arose from the same anatomical sector (Fig. 4a: rows 1 vs. 2, or 4 vs. 5). These results suggest that within a PDX model, the length-scale of sector had little effect on tumor biology or heterogeneity, an observation that had not been previously tested to the best of our knowledge. Specifically, in our data set, samples from the same sector of replicate tumors were no more similar to each other than samples from different sectors within the same PDX tumor. Overall, these global analyses, across different assay modalities, suggest that the PDX model, more than replicate or region, dominates global molecular characterization.
While the global molecular state is consistent for samples coming from the same PDX model, we did observe intra-model heterogeneity at the pathway-level (Fig. 4b). For example, samples from model PDX-L2 showed heterogeneity of Myc Targets (Fig. 4b, red triangle and rectangles). This intra-model heterogeneity was also observed for pS6 in IF (Fig. 4c). Interestingly, an examination of the IF images for PDX-L2 revealed that the increased pS6 levels did not represent a uniform increase in pS6 across these sectors, but rather the increased occurrence of a pS6-high subpopulation in specific parts of a tumor.
Microscopy-based assays capture phenotypic heterogeneity across length scales, spanning from sub-cellular variation all the way to global differences between PDX models. Motivated by the observed intra-tumor heterogeneity of pS6, we sought to quantify local tumor heterogeneity and examine its spatial variation across these different length scales. Accordingly, we developed a scale-space-theory-inspired method 29 (Methods, Fig. 5a) to deconvolve the overall observed variation in pixel intensity of IF biomarkers (Fig. 5b, x-axis) into components arising from differences between (Fig. 5b, y-axis): PDX models, replicate tumors within a model, sectors within a tumor, image regions within tumor sections, local microenvironmental areas within regions, cellular variation within the micro-environment and the residual sub-cellular variation.
Our approach revealed distinct spatial patterns of biomarker expression (Fig. 5b). Consistent with results above (Fig. 4c), pS6 and β-catenin showed different expression levels across PDX models, yet their patterns of variation within a model are very distinct: when β-catenin is expressed it appears to be expressed uniformly across the model, whereas there is a pS6 high sub-population that shows regional variation across different regions of a tumor. pErk also shows large variation between image regions in a sector and additionally between different  www.nature.com/scientificdata www.nature.com/scientificdata/ sectors within a tumor. On the opposite end of the spectrum is Ki67, which also shows significant cell-to-cell heterogeneity, but whose high sub-population (representing proliferating cells) seems to be more uniformly distributed across regions, and tumors. Interestingly, there is very little apparent variation between replicate tumors within a model suggesting that, at least for our data, individual tumors are representative of their parent models.
A demonstration of how image data might guide experimental design. To understand the implications of these patterns of variation on the design of experiments to capture heterogeneity, we synthetically divided each tumor slide image into ~100 sub-images, each approximately the size of a 0.6 mm diameter tissue-microarray core (size of scale bar in Fig. 4c). We then quantified the heterogeneity of each sub-image using PhenoRipper, an un-supervised image profiling approach 25 . Given a set of images, PhenoRipper automatically identifies a set of stereotypical sub-cellular states, and describes the heterogeneity of each image in terms of the frequency of occurrence of these states (Methods) 25 .
Due to spatial heterogeneity present within tumors, profiling only a few small local areas may poorly represent the overall heterogeneity of a PDX model. A rational approach is to sub-sample the PDX model in a way that balances the effort to acquire more tissue with the improved confidence in capturing heterogeneity. Here, we quantified the confidence that the phenotypic profile of a collection of sub-images matches the profile of the full model 26 . The level of confidence increases with the number of sub-samples (Fig. 5c), but our results show that the exact behavior depends on the way the sub-sampling is performed relative to the biology of the model. For example, in the case of marker set 1 and PDX-L1 (Fig. 5c, orange and red curves in left and right panels), adding sub-samples from the same tumor but from different sectors (orange curve) increases confidence more than adding multiple sub-samples for the same sector but from different tumors (red curve); however, the opposite is true for PDX-L2.
For any sub-sampling strategy, a point of diminishing returns for adding more samples can be reached. This "saturated confidence level" reflects the maximum heterogeneity than can be captured with a sampling strategy. To obtain a more general understanding of sampling, we averaged the saturated confidence levels across models and marker sets (Fig. 5d). As might be expected, restricting samples to a single tumor sector or section poorly represents overall model heterogeneity. Given the appreciable increase in confidence observed by sampling across multiple sectors within a whole tumor (Fig. 5d, orange vs light blue), our study suggests this is likely a worthwhile investment. A further increase in confidence is possible by sampling the same sector across multiple tumors, although given the modest gains, the additional experimental effort must be more carefully evaluated (Fig. 5d, orange vs red).

Code availability
All code used to process data and generate figures in this paper are available from the Altschuler-Wu Lab public github page: https://github.com/AltschulerWu-Lab/MultimodalPDXHeterogeneity.