Evaluation of molecular subtypes and clonal selection during establishment of patient-derived tumor xenografts from gastric adenocarcinoma

Patient-derived xenografts (PDX) have emerged as an important translational research tool for understanding tumor biology and enabling drug efficacy testing. They are established by transfer of patient tumor into immune compromised mice with the intent of using them as Avatars; operating under the assumption that they closely resemble patient tumors. In this study, we established 27 PDX from 100 resected gastric cancers and studied their fidelity in histological and molecular subtypes. We show that the established PDX preserved histology and molecular subtypes of parental tumors. However, in depth investigation of the entire cohort revealed that not all histological and molecular subtypes are established. Also, for the established PDX models, genetic changes are selected at early passages and rare subclones can emerge in PDX. This study highlights the importance of considering the molecular and evolutionary characteristics of PDX for a proper use of such models, particularly for Avatar trials.

I n the context of drug development, there is a continued need of preclinical models well covering the key aspects of the disease biology. With ability to propagate human tumor materials in immune-compromised mice, patient-derived xenografts (PDX) have increasingly become a cornerstone of anticancer agent testing. Whereas these models were shown to well mimic response to therapeutics [1][2][3][4][5][6][7] , recent studies pointed out the needs for large PDX collections to capture the cancer heterogeneity 8 . Previous studies focusing on gastric cancer PDX models, reported the preservation of the parental tumor histology in these models as well as their stability over passages. However, a low engraftment success rate with histology subtype selection was also often observed suggesting bias in these formed collections 7,[9][10][11][12] .
Extensive molecular characterization of gastric cancer has revealed cancer heterogeneity due to diverse etiological factors and genetic mechanisms underlying its pathogenesis. The cancer genome atlas (TCGA 13 ) project recently highlighted the landscape of genomic alterations in gastric cancer and proposed to classify tumors into four molecular subtypes: tumors with microsatellite instability (MSI), Epstein-Barr virus (EBV), chromosome instable tumors (CIN), and tumors with genomic stability (GS). Asian cancer research group (ACRG 14 ) established another classification based on tumor transcriptomic profile. Gastric cancers were divided into MSI and microsatellite stable (MSS) with MSS tumors further divided into MSS/EMT and MSS/TP53 + and MSS/TP53 − subtypes representing epithelial-mesenchymal transited, activation or inactivation of the TP53 pathway, respectively.
The goals of this study were to establish a collection of Asian gastric cancer PDX using a patient gastric cancer cohort shown to be representative for key clinicopathologic features and to determine if the molecular subtypes and heterogeneity of the established PDX models is adequately represented in patient tumors. For this, biologic materials were collected at the different steps of the PDX establishment process to conduct extensive genomic and transcriptomic analyses. We investigated whether heterogeneity as embodied by clonality, genetic alterations, and molecular subtypes is retained and if any biases are introduced by gastric cancer PDX establishment process.

Results
Representativeness of the patient tumors used for PDX establishment. We received resected tumor materials from n = 100 primary gastric cancer patients of Asian ethnicity from Seoul Hospital for PDX establishment (2008)(2009)(2010)(2011)(2012)(2013)(2014). Study design is described in the "Methods" section and detailed clinical data are shown in Table 1 and Supplementary Data 1. We first checked whether our cohort well-covered key histopathological and molecular subtypes of gastric cancer tumors. We observed that the distribution for patient gender, tumor location, WHO grades, Lauren 15 subtypes, lymph node invasion, and metastasis were comparable to those recently reported by the ACRG and TCGA studies 13,14 . At molecular level, our cohort contains a similar proportion of key ACRG 14 and TCGA 13 molecular subtypes ( Fig. 1 and Supplementary Data 2). Pentaplex microsatellite assay identified 27/100 MSI-positive tumors (common subtype to both ACRG and TCGA) with high MSI. According to the ACRG, 13/ 100 tumors were classified as MSS/EMT, 32/100 as MSS/TP53 − and 21/100 as MSS/TP53 + using qRT-PCR (Supplementary Data 3). Seven MSS tumors were not classified due to a lack of RNA. Regarding TCGA classification, 10 tumors were EBV positive with high Epstein Barr virus (EBV) infection burden (quantification done by qPCR) and we could not ascertain the GS or CIN TCGA subtypes for the remaining 63 tumors as it needed an Affymetrix SNP6.0 assay or equivalent.
Finally, by investigating 48 genes with potentially targetable alterations identified in PDX models using a database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors 20 , we observed that the collection had very similar distribution of these alterations than those observed across TCGA tumors.
MSS/TP53 + PDX harbored the lowest number of potentially targetable alterations compared to MSI PDX. Overall, TP53 and MSH3 mutations were the most frequent alterations (63% and 52%) in both MSS and MSI PDX and may allow investigation of compounds such as WEE1 and DNA-PKcs inhibitors (Fig. 4a). In the MSI PDX, alterations in genes such as ATM, MSH3, BRCA1, and BRCA2, suggest the testing of DNA-PKcs or PARP inhibitors. Several other genes with potentially targetable alterations such as KRAS and MYC or deletion of CDKN2A, CDKN2B, suggested the investigation of MAPK pathway inhibitors, BET and PIM inhibitors as well as CDK4/CDK6 blockers. We also identified five PDX with ERBB2 gene    For comparison, we analyzed the distribution of these gene alterations in both the 27 PDX and the 295 patient tumors from the TCGA (Fig. 4b). The overall number of altered genes per sample was comparable in patient tumors and PDX models, and was as expected, dependent on the histological and molecular subtypes (bar plot). However, the PDX are frequently classified among the heavily altered samples. At individual gene levels, the percentages of gene alteration observed per subtype correlate between patient tumors and PDX models (Spearman correlation r = 0.67 and 0.55 for MSI and CIN, respectively, Supplementary Data 11, Supplementary Fig. 3a and b). However, differences were noticed. A higher proportion of alterations in NOTCH1, MSH3, and RAD50 genes was present in MSI PDX compared to MSI patient tumors (80%, 60%, and 54% in MSI PDX, and 29%, 11%, and 13% in the MSI patients, respectively), while the percentage of alterations in BRCA2 was higher in MSI patient tumors than PDX models. Similarly, for the CIN subtype, PDX were enriched in NOTCH1, MSH3, and ERBB2 gene alterations compared to patient tumors. Interestingly, in MSS tumors, samples heavily affected by gene deletions/amplifications were also frequently accompanied with highest mutation loads in the 48 genes with potentially targetable alterations (Spearman correlation r = 0. We also observed that the models with higher number of alterations in these 48 genes had shortest growing time at P1 (Spearman correlation r = −0.44, p = 0.028). Also, the time between surgery and first implantation exceeding 5 days negatively impacted establishment of MSS subtype (Mann-Whitney p = 0.0081).
Clonal selection and evolution during PDX establishment. In our study, eight PDX models (four MSI and four MSS) had sufficient DNA available from tumor (T), matched normal (N), first three PDX passages (P1-P3, one tumor sample analyzed per passage for a given PDX model) and the established PDX (after P4, one tumor sample analyzed per PDX model) to attempt genetic analysis of the established models in context of parental tumors and early passages. As expected, the whole exome sequencing analysis (see Supplementary Data 13 for technical details) confirmed that MSI tumors and corresponding PDX were hypermutated and had a different mutational signature compared to MSS tumors ( Fig. 6a and b). We observed that MSS PDX had stable mutation loads across passages, while MSI samples presented a trend of increased mutation burden, through indel increase in the passages.
To study larger variations at chromosome levels, we investigated the allelic fractions of all single nucleotide variants (single nucleotide polymorphisms, germline, and somatic mutations). The MSI tumors were more likely to retain chromosome stability across PDX establishment with the allelic fractions close to 0, 0.5 for the majority of the chromosomes, in contrast with MSS PDX that showed more chromosomal instability (Fig. 6c). In MSI, a notable exception of chromosome 7 was however seen. All passages in all tumors showed a copy number increase (green arrowhead). We noticed that the MSI GXA_3037 series underwent dramatic changes with large numbers of tumor-specific variants lost in P1 along with a distinct set of somatic variants  (6) and the threshold of amplification as determined by silver in situ hybridization (2). b Representative images of detection of ERBB2 amplification and overexpression by double silver in situ hybridization and immunohistochemistry staining in five ERBB2 amplified models (GXA_3054, GXA_3038, GXA_3039, GXA_3084, and GXA_3067) and in one non-ERBB2-amplified model (GXA_3023) for comparison at ×20 magnification. Scale bars represent 50 µm. The arrow indicates the ERBB2 staining (blue) and the arrow head indicates the cep17 staining (pink). c Representative image of ERBB2 heterogeneous signal observed in GXA_3067 (score 2, homogeneity 70%) at ×20 (left) and ×80 magnification (middle and right).
appearing in P1, which then remained consistent in P2, P3 and in the established PDX model (examples of losses of heterozygosity indicated by orange arrow heads). In contrast to MSI, multiple aberrations hint toward loss of heterozygosity or copy number increase were seen in MSS samples. As example losses of heterozygosity for chromosome 5, 6, 8, 12, 13, and 14 in P1-P3 of GXA_3038 (dark blue rectangles), copy gain in most chromosomes of GXA_3039 (light blue arrow heads). In GXA_3040 we observed a loss of heterozygosity on the distal end of chromosome 1 and 9 between the original tumor and P1 (brown arrow heads), an expansion of this loss of heterozygosity between P1 and P2 that stayed stable between P2 and P3, suggestive of selection happening during passages. For GXA_3027, GXA_3039, and GXA_3040 the findings were validated by using data from Affymetrix genome wide SNP6.0 assay and ASCAT 21 algorithm (Supplementary Fig. 4). c Allelic read frequency (AF) of variants detected by whole exome sequencing. Each point represents a genomic single-nucleotide variation (polymorphism or mutation). AF is shown per passage on the x-axis ranging from 0 (left) to 1. Labels on the y-axis show the start of the individual chromosomes. For genomic regions with two alleles, the AF is expected to be close to 0, 0.5, or 1 while aberrations from this pattern hint toward loss of heterozygosity (no 0.5 AF) or copy number increase (more than three bands, e.g., at 0, 0.33, 0.66, and 1 for three copies). d Hierarchical clustering of somatic mutations (indels and small nucleotide variants) identified at P0, 1, 2, 3 and in the established PDX, by using whole exome sequencing with a minimum of 10 read coverage in all samples of the same model.
We studied the single nucleotide variants and small indels with a minimum of 10 read coverage across tumors and PDX to evaluate the representativeness of the PDX models regarding the mutation pattern observed in the patient tumors. The mutation contents and their allelic fractions overall were more stable in MSS series than in MSI samples (Fig. 6d, Supplementary Data 14). Exception was for the MSS model GXA_3040, with some mutations appearing and disappearing from P1 to the established PDX (see orange bars on the left of the plot). In all MSS samples, we noticed a trend for an increase in allelic fraction of some mutations starting at P1, that can be due to the replacement of the human stroma by the mouse stroma. A higher variability of mutational profiles was seen during MSI model establishment.
Three subclasses of mutations were identified, those stable over passages having usually a high allelic fraction; those presenting an increase of their allelic fraction at P1, probably due to a clonal selection in the passages (e.g. GXA_3037, at passage 1) and/or removal of human stromal cells (e.g. GXA_3040), and those with a low allelic fraction that appeared and disappeared over passages most likely being a consequence of the clone composition of the tumors over passages.
We explored gain/loss of variants in greater detail by focusing on allelic read fractions of 48 potentially targetable cancer genes identified within the PDX collection 20 (Fig. 7a  to variants detectable at P0 for which the allelic read fractions increased to 1 (homozygous somatic mutations) at P1 and remained stable after. Only a NOTCH1 variant presented a slight allelic fraction increase over passages in GXA_3038 and one of the two TP53 variants in GXA_3040 that were not detectable at P0, appeared at P1. In addition, only the NF1 variant found in GXA_3038 disappeared after P0. In contrast, more variations in allelic read fractions were seen in MSI samples. The variations affected cancer genes such as BRCA1, TP53, KRAS, and PIK3CA. These variants were frequently not detectable at P0 and became detectable at P1 with the allelic fractions centered on 0.5 (heterozygous somatic mutations) in models at P1-P3 and in the established PDX at later passages. We also noticed a decrease of the allelic fraction for some mutations that were detectable at P0 and not detectable in PDX. It argued in favor of a clonal selection occurring mainly during P0 to P1. Indeed, we noticed in the GXA_3023 a somatic mutation KRAS G12D and PIK3CA E542K in the parental tumor but the presence of KRAS G13D and PIK3CA H1047R across the passages. Similarly, we observed in GXA_3037 a somatic mutation KRAS G12D was lost and replaced by a KRAS G13D over passages. These two samples had shown evidence of a clonal selection across passages (Fig. 7b and c).
To ascertain if the two PDX models acquired these mutations as a de novo event or a rare subset of cells was selected during first passage, we deployed a highly sensitive technique-droplet digital PCR. We observed (Fig. 7d) very low percentage of cells with KRAS G13D in both GXA_3023 and GXA_3037 P0, strongly suggesting the presence of rare cells that were selected during early passage. We also observed a similar situation for PIK3CA (Fig. 8), overall suggesting that rare cells were likely selected during PDX establishment. The selection pressure favoring expansion of cells with KRAS G13D over KRAS G12D clone in the passage establishment needs to be further investigated.

Discussion
Over the last decade, propagation of patient tumors in PDX is increasingly being used as a model system in anti-cancer drug discovery and development as well as for biomarker investigation 22,23 . It is therefore important to understand the intra-tumor and inter-tumor heterogeneity that exists in both the parental tumors and the established PDX models, so that PDX models can be optimally utilized. To this end, we developed a collection of PDX from gastric cancer tumors and investigated in detail their clinical and molecular patterns. We show that the PDX establishment success largely relies on both tumor histological and molecular subtypes. We also observed that PDX are subject to clonal selection in early passages.
Firstly, we observed that tumors of the Lauren's intestinal subtype were established more commonly than diffuse or mixed tumors, confirming previously published data 9  analysis revealed that not all gastric cancer molecular subtypes were established, with PDX predominantly developed from MSI, CIN, and MSS/TP53 − . In contrast, PDX were rarely or not developed from the molecular subtypes EBV, GS, and MSS/EMT or MSS/TP53 + tumors; based on TCGA and ACRG studies, respectively. The MSI tumors accumulate a high number of mutations. This characteristic may confer a certain adaptability to the tumor cells and thus a facility to grow in a new microenvironment (the immune-compromised mice). PDX were also frequently established from ERBB2-positive tumors, probably because of the capacity of these cells to proliferate without the expression of the corresponding ligand (ligand-independent growth). Other subtypes, such as the GS, may require additional growth factors to proliferate which might not be available in the immune-compromised mice. This study has important implications. Firstly, the commonly held belief that PDX reflect parental tumors, needs to be adjusted in the context of this data and other recently emerging data 24 suggestive of inadequacies in PDX models. Both, establishment bias and clonal selection during PDX establishment happen making these models differ both at gastric cancer population level, as well as at the level of parental tumors. This study demonstrates that the consideration of gastric cancer PDX models on the simple basis of their type or histology is not sufficient. Molecular characterization, in terms of gene mutations, gene expression, and gene copy number, may drive appropriate use of these PDX for drug testing experiments. Secondly, in the field of biomarker discovery from PDX for gastric cancer treatments, molecular subtypes existing in PDX is likely an important consideration at the time of translating findings from preclinical to clinical settings. A careful selection of PDX models based on their molecular pattern may increase the success of drug testing experiments and/or may allow identifying molecular determinants of the sensitivity response. Thirdly, Avatar and co-clinical trials have been discussed and are being implemented 1,25 in the clinical trial NCT02732860: "Personalized Patient-Derived Xenograft (pPDX) Modeling to Test Drug Response in Matching Host (REFLECT)". Our results highlight clonal selection events which can occur during early PDX establishment with the emergence of rare clones. This may have implications for results and interpretation of data avatar trials.
Our study has a few known limitations. First, it is possible that the selection of tumor fragments for establishment and characterization, degree of immuno-deficiency in the mouse strain (for e.g. SCID vs. NSG) and the environmental context (lack of certain growth factors) may influence the establishment rate as well as clonal selection. Recent work by Eirew et al. 26 , however argues against it, who observed similar phenomenon while using NSG and NRG mice in establishing breast cancer PDX. Secondly, our analysis was conducted on bulk tissue samples. With the advent of newer methodologies such as single cell sequencing, a more comprehensive picture of heterogeneity in tumors and PDX may emerge. Thirdly, surgical samples may not capture the heterogeneity of metastatic samples and may have differing dynamics with regards to establishment rates and clonal selection. Fourthly, our study is focused on one tumor type. However, similar findings have been observed in cell lines and PDX 17,24,[26][27][28] , likely suggesting the existence of such phenomenon and an important consideration in other tumor types and associated tumor model system.
In summary, we showed this gastric cancer PDX collection does not fully cover the diversity of gastric cancers. Within the established models, molecular subtypes and possible clonal evolution raises the possibility of this being an important consideration for various translational studies. We also provide a molecular investigation framework that may aid in rational use of PDX models for translational studies not only in gastric cancer but also in other tumor types as well.

Methods
Study design, patient tissue specimens, and pathology. We designed this study as a patient tumor-derived xenograft study in gastric cancer with no pre-specified hypothesis. We systematically collected n = 100 surgical tumors from a single institution, i.e. Seoul National University at the time of total or sub-total gastrectomy. We aimed to establish a collection of Asian gastric cancer PDX as well as understand histological, genetic, and other clinic-pathological biases seen during PDX establishment. We clinically annotated the tumors but de-linked them from personally identifiable information. All patients provided informed consent and SNU IRB approved the study (IRB number H-0807-037-250).
Final cohort comprised of n = 64 males and n = 36 females, with patient ages ranging from 37 to 90 years (median = 64 years). Of the n = 100 patients, n = 10 patients showed metastasis at the time of surgery while no evidence of metastasis was found in n = 88 patients (data not available for two patients). According to the Lauren classification 15 , n = 53/100 tumors were intestinal, n = 42 diffuse, and n = 5 mixed (see details in Table 1).
We received patient material at former Oncotest GmbH now Charles River Discovery Research Services GmbH from Seoul National University, College of Medicine, 24-72 h after surgery. Tumor materials were collected under sterile conditions directly after surgery, for PDX model establishment and molecular profiling. We stored one piece of tumor (~1 cm 3 ) in Aqix (Liquid Life) medium for further implantation in nude mice. Additionally, we snap froze in liquid nitrogen one piece of the tumor as well as a fragment of normal peritumoral tissue (~1.5-2 cm 3 for each piece) and stored it at −80°C for DNA and RNA extractions. Finally, a third piece was directly fixed with 5% formalin for 24 h for FFPE blocks preparation for clinical investigation.
MSI analysis. MSI analysis was performed as previously described 29 . Briefly, MSI status was determined by analyzing five microsatellite loci (BAT-26, BAT-25, D5S346, D17S250, and S2S123) using DNA auto-sequencer (ABI 3731 genetic analyzer; Applied Biosystems, Foster City, CA). According to the Bethesda guideline, tumors were classified as MSI-H when at least two of the five markers displayed additional bands compared to the corresponding normal tissue, MSI-L, when additional alleles were observed with one of the five markers, and MSS, when all microsatellite markers examined displayed identical patterns in both tumor and normal tissues. MSI-H tumors were classified as "MSI" and MSI-L or MSS samples were categorized as "MSS".
PDX model establishment and animals. Female NMRI nude mice were obtained from Harlan (Denmark) at age of 4-6 weeks. Pieces of~1-2 mm 3 of tumors were implanted on these immune-compromised mice. This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the Society of Laboratory Animals (GV SOLAS). All animal experiments were approved by the Committee on the Ethics of Animal Experiments of the regional council (Regierungspräsidium Freiburg, Abt. Landwirtschaft, Ländlicher Raum, Veterinär-und Lebensmittelwesen-Ref. 35 DNA and RNA samples preparation. DNA and total RNA were extracted from frozen patient tumors and PDX material as previously described 30 . In brief, DNA was extracted from snap frozen patient tumors or PDX. A piece of~40 mg of frozen tumor was cut per sample and digested with proteinase K buffer (Qiagen, Hilden, Germany) overnight at 55°C, followed by a DNase-free RNase digestion (Qiagen, Hilden, Germany). The DNA was subsequently extracted with phenol/ chloroform/isoamyl alcohol, precipitated and washed with ethanol, and resuspended in Tris-EDTA buffer (Tris 10 mM pH 8, EDTA 0.1 mM pH 8). The DNA integrity of each preparation was checked on a 1.3% agarose gel, and the purity analyzed with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA).
For RNA preparation, a piece of~40 mg of frozen tumor was cut per patient sample and used for the extraction, while four pieces of~40 mg were pooled per PDX to limit the inter/intra-tumor variability. These pieces of frozen tissues were used as starting material for the RNA extraction using the mirVana™ miRNA isolation kit (Ambion, Carlsbad, CA, USA) according to the manufacturer's instructions. The RNA quality was controlled for purity with the NanoDrop 2000 (Thermo Scientific, Waltham, MA, US) and the RNA integrity by a Bioanalyzer (Agilent, Agilent Technologies, Palo Alto, CA, USA  Whole somatic exome mutation analysis. DNA were prepared as previously described 30 and were profiled by whole exome sequencing. Exons from DNA samples were targeted using Agilent SureSelect Human All Exon V1 38 MB (5), V4 51 MB (20), or V5 50MB (2) kits. Enriched genomic DNA was sequenced with Illumina HiSeq-2000/2500 in 100 or 125 bp paired-end reads and an expected coverage of~100×. Paired-end reads were independently mapped to the Human hg19 and the Mouse mm10 reference genome with Burrow-Wheeler aligner (BWA 31 ) with default parameters. To remove the mouse reads from the tumor stroma, paired-end reads that mapped better on the mouse (mm10) than on the human genome (hg19) were discarded from the human mapped read dataset (based on the BWA mapping score) using PicardTools 32 . Then, this filtered human-mapping dataset was recalibrated with GATK Lite's BaseRecalibrator 33 function after duplicates removal and indel (insertion-deletion) local realignment. Reads mapped around indels were realigned using the GATK Lite's IndelRealigner function before performing the variant calling step. Variants were detected independently using three different variant callers: GATK Lite's UnifiedGenotyper, the combination of Samtools mpileup 32 and bcftools caller 34 , and Freebayes 35 . Only variants identified by all three tools, showing a minimum number of variantsupporting reads of three and a minimum allelic frequency of 5% were further analyzed. Candidate mutations were identified with SnpEff 36 by selecting only small nucleotide variants and indels with a high or moderate protein impact from UCSC or Ensembl transcripts, and by filtering out known polymorphisms from annotation databases if a variant (1) has at least three allele counts from Hapmap or CGI 69 genomes or EVS+1000 genomes or (2) shows more than 5% of minor allele in at least one population from dbSNP. Raw reads were subjected to fastQC 37 to calculate read quality metrics. After the alignment to the Human reference genome and Mouse reads removal, the quality of BAM files was assessed by Qualimap 38 to obtain the percentage of mapped reads and coverage of reads to the targeted exons. Variant detection analysis was QC-evaluated by computing and validating the transition/transversion ratio from SNPs found in exons. The on-target coverage obtained ranged from 99× to 215×, with a mean of 131×. The reads obtained were aligned against the human and the mouse genomes. The percentage of reads that mapped to the human genome ranged from 78.7% to 98.2% (median = 94.7%) and the percentage of reads that mapped to the mouse genome ranged from 1.6% to 21% (median = 4.6%) (Supplementary Table 2). In the analysis, a total of 46,282 variants were identified, germ line variants were filtered out by removing variants (n = 4495) found in the analysis of eight associated normal gastric samples, giving finally a total of 32,416 somatic mutations.
Wide chromosomal alteration analysis. The detection of chromosomal alterations was performed by using the Affymetrix Genome-Wide Human SNP Array 6.0 following the standard protocol recommended by the manufacturer. According to Affymetrix guidelines, contrast quality control and MAPD threshold were set at the values of above 0.4 and 0.5, respectively. Copy number data were calculated using Affymetrix GTC v4.1 and PICNIC software provided by the Cancer Genome Project from the Welcome Trust Sanger Institute 39 . Gene amplifications were defined as gene having a PICNIC ≥ 8 and homozygous deletions of genes when the PICNIC = 0. GISTIC 2.0 method was used to identify significant focal copy number alterations as described previously 40,41 . For determining genomic stable and chromosomal instable PDX, a cutoff corresponding to 15 somatic copy number aberrations (SCNA) as the sum of homozygous deletions and gene amplifications, have been chosen. PDX having lesser or equal to 15 SCNA on the autosomes were considered as genomic stable and those with more than 15 SCNA were categorized as CIN.
Sanger-sequencing method. Sanger sequencing was used to confirm exomesequencing results. Primers surrounding the variant were designed with the online program Primer3, the primer sequences were: KRAS_F2: GGTGGAGTATTTG ATAGTGTATTAACC and KRAS_R2: ACCTCTATTGTTGGATCATATTCG. PCR was carried out with Advantage ® 2 Polymerase Mix (Clontech #639201) with Advantage 2 PCR buffer and cycled at 95°C for 2 min; 35 cycles of 95°C for 30 s; 58°C for 30 s, 72°C for 30 s, and a final extension of 72°C for 10 min. PCR products were purified with Wizard ® SV Gel and PCR Clean-Up System (Promega #A9281). Sequencing PCR was carried out using ABI BigDye Terminator v3.1 cycle sequencing kit (Life Technologies #4337457). The resulting products were run on an ABI 3730xl DNA analyzer. All sequences were visually analyzed with Sequencher (Gene Codes Corp.).
Sanger sequencing was also used to investigate the MSI mutation status in nine gastric patient tumors (GXA_3044, GXA_3045, GXA_3048, GXA_3050, GXA_3081, GXA_3082, GXA_3090, GXA_3092, and GXA_3094). Primers allowing complete amplification of the MLH1 coding sequence were designed using the PCRTiler v1.42tool (http://pcrtiler.alaingervais.org:8080/PCRTiler/) and are listed in the Supplementary Table 1. PCR were carried out with the KapaHiFi hot start polymerase (Peqlab #07-KK2501-02) for PCR on cDNA and Phusion DNA polymerase (New England Biolabs, # M0530L) for PCR on genomic DNA, both with high fidelity buffers, following the manufacturer's instructions. Nested PCR (30 cycles each) were done to amplify cDNA and 30 cycles-PCR were performed when genomic DNA was used as matrix. PCR products were purified using the QIAquick PCR purification kit (Qiagen, #28104) and sent to the GATC laboratory (now Eurofins Genomics, Konstanz, Germany) for Sanger sequencing.
Droplet digital PCR method. All RainDrop droplet digital PCR experiments were performed at RUCDR Infinite Biologics (Piscataway, NJ). Briefly, 0.1% mutant allele frequency positive controls were prepared by serial dilution of mutationspecific cell line with wild type genomic DNA (Promega), the wild type genomic DNA is also used as negative control (0% mutant). Tumors, positive and negative controls genomic DNA were sheared to~3000 bp using Covaris focused ultrasonicator. For each of the four mutation assays, 100 ng sheared DNA was mix with assay-specific 40X primers and probes, 2X Taqman genotyping master mix (Life Tech), 25X droplet stabilizer (RainDrop), and distilled water in 25 µl total volume. Primers and fluorescent probes used in this experiment are listed in Supplementary  Table 3. Droplets containing sheared DNA and PCR reaction components were generated in RainDrop source instrument and amplified in a thermal cycler with the following cycling parameters: 10 min 95°C, then 45 cycles of 95°C for 15 s, and 60°C for 1 min, followed by 98°C for 10 min. After PCR completion, droplets fluorescence was measured with RainDrop droplet reader and processed into twodimensional scatter plot display. Appropriate gates were drawn for each droplet cluster and the number of droplets within each gate was counted.
Gene expression profiling. Total RNA was submitted to service providers for microarray analyses by using Affymetrix HGU133 plus 2.0 arrays. First-strand and second-strand synthesis, biotin labeling, fragmentation, and hybridization were performed according to Affymetrix protocols. Evaluation and normalization of Affymetrix GeneChip data were done in the "R" (version 2.15.3) statistical computing environment. The hybridizations were normalized by using the gc robust multichip averaging (gcRMA) method from Bioconductor to obtain summary expression values for each probe set. Gene expression levels were analyzed on a logarithmic scale.
Statistics and reproducibility. All the statistical tests were done in GraphPad Prism 5. Chi-square and Fisher's test were used to evaluate the association between the clinical data (gender, grade, metastatic status, differentiation, Lauren classification, vascularization, and stroma content), the mutation status and the PDX establishment success rate or the molecular groups of gastric PDX. Mann-Whitney test or Kruskal-Wallis test were used to compare the groups of gastric PDX to the clinical data (age, delay of engraftment), the number of somatic mutations, the ploidy, and to the total number of gene amplifications and gene deletions. Spearman correlation was performed to evaluate the correlation between the mean signature of mutational process of the PDX and the signatures published by Alexandrov et al. 19 , to compare the percentage of alteration in the 48 genes in the patient tumors and the corresponding PDX by molecular subtype, and to compare the alteration counts to the number of copy number variations in the 48 genes with potentially targetable genomic alterations in the MSS TCGA patient tumor samples according to the TCGA and the ACRG classification.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The molecular data of the 27 established Asian gastric PDX can be queried on the Charles River Tumor Model Compendium at "https://compendium.criver.com". The whole exome sequencing data (raw FASTQ files) of the PDX models has been deposited in Sequence Read Archive (SRA) under the accession code SRP150675. The raw (CEL files) Affymetrix HGU133 Plus 2.0 transcriptomic data of the 27 Asian gastric PDX models that support the results presented in this paper has been deposited in Gene Expression Omnibus (GEO) under the accession code GSE115637. The raw (CEL files) Affymetrix SNP6.0 data and the PICNIC processed genomic data presented in this study for the 27 established Asian gastric PDX models, 7 normal tissues, 7 patient tumors, and 21 PDX samples at passages 1, 2, and 3 have been deposited in GEO under the accession code GSE115674. All Affymetrix data can be accessed via the GEO code GSE115755. The molecular data of the 295 patient tumors from the TCGA cohort (TCGA Nature, 2014, https://doi.org/10.1038/nature13480) and the associated clinical data are accessible from the cBioPortal (http://www.cbioportal.org/study/summary?id=stad_tcga_pub). The ACRG subtypes of the 295 gastric tumors from the TCGA dataset were presented in the paper published by Cristescu

Code availability
The statistical analyses were performed in R (version 2.15.3) and GraphPad Prism (version 5). The codes used for the genomic analysis are available upon request to the corresponding authors.