Main

Colorectal cancer harbors genetic alterations that have therapeutic impact making molecular testing essential prior to the initiation of treatment. Mutations in KRAS, NRAS, and BRAF confer resistance to anti-EGFR therapy and it is recommended to test patients with metastatic colorectal cancer who are candidates for therapy.1, 2 However, response to anti-EGFR therapy in wild type KRAS and NRAS tumors has not been optimal with response rates as low as 50%, and patients inevitably progress.3, 4, 5 This has led to the investigation of other key genes within the EGFR regulated signaling pathways including the MEK-ERK and the PIK3CA-AKT pathways and also of parallel pathways such as MET and ERBB2 (HER2).3, 4, 6

Deep sequencing studies have proven that there are different subpopulations of tumor cells (sub-clones) that differ in their mutational profile.7, 8, 9, 10, 11, 12 This phenomenon is referred to as intra-tumoral heterogeneity, occurs in ~10% of colorectal carcinomas, and complicates clinical testing.5, 7, 8, 9, 10, 13 Intra-tumoral heterogeneity may cause false negative clinical test results leading to unnecessary treatment with ineffective, costly agents that can cause significant side effects.9, 14 The multi-step model for colorectal carcinogenesis proposed by Fearon and Vogelstien posits that certain gene mutations (eg, APC, KRAS) are early events in the development of colorectal carcinoma; while other gene mutations (eg, TP53) are later events.15 Based on this well-established model, the intra-tumoral heterogeneity of early drivers such as KRAS mutation is expected to be significantly lower than that observed for later secondary alterations such as TP53 or PIK3CA mutation.

Although multiple studies have shown intra-tumoral heterogeneity, many have not addressed a practical approach to sampling that will both maximize mutation detection and that is feasible in a clinical laboratory. Only one study demonstrated that pooling samples from different areas in a tumor could be useful; however, they looked at only 7 cases and sampled all tumor-containing blocks for a case, which would be an undue burden on histology laboratories.13 In this study, we sought to evaluate whether sampling location with single site sampling or pooling of three sample locations within an individual tumor showed relevant heterogeneity using a clinically validated next generation sequencing test method with the goal of determining the best clinical sample for detection of clinically relevant mutations.

Materials and methods

Surgical excisions of 102 invasive colorectal cancers between March 2007 and October 2014 were selected for testing. The selection of cases was limited to consecutive cases with primary tumors that were ≥2.5cm in size and tumor stage of T3 or greater (to allow adequate separation of the sampling sites), had material available for review and testing, and had not received neoadjuvant chemotherapy. All slides from the cases were reviewed and from each individual tumor a (1) peripheral, (2) superficial, and (3) deep section were selected. Deep and superficial tumor areas were obtained from the central portion of the tumor at least 0.5 cm from an edge of normal mucosa; superficial sections included the tumor above the muscularis propria while deep sections included tumor within and below the muscularis propria (Figure 1). Peripheral tumor areas were taken within 0.5 cm of the adjacent normal colonic mucosa (Figure 1). Tumor percentages were estimated by JB for all samples, cases with <20% tumor percentage after macrodissection at any site (peripheral, superficial, deep) were excluded from the study (3 cases) in accordance with the acceptance criteria for clinical testing in our institution, leaving a total of 99 cases for the study. Unstained slides were obtained from the selected blocks. A total of four samples per case (total of 396 samples) were set up and underwent further testing: (1) peripheral, (2) superficial, (3) deep, and (4) ‘pooled.’ Three to seven 10 μm sections from the formalin fixed paraffin embedded tissue were macrodissected for tumor cell enrichment to obtain the peripheral, superficial, and deep samples. For the pooled sample, an equal number of slides (two each or three each) from the peripheral, superficial, and deep samples were macrodissected and combined. DNA was extracted from each of these samples using the QIAamp DNA mini FFPE tissue kit and deparaffinization solution (QIAGEN, Hilden, Germany). All samples were quantified using a Qubit 2.0 fluorometer (ThermoFisher Scientific, Waltham, MA, USA) and underwent amplicon-based target enrichment followed by sequencing on a MiSeq instrument version 3 chemistry (Illumina, San Diego, CA, USA).

Figure 1
figure 1

Deep (D) and superficial (S) tumor areas were obtained from the central portion of the tumor at least 0.5 cm from an edge of normal mucosa; superficial sections included the tumor above the muscularis propria while deep sections included tumor within and below the muscularis propria. Peripheral (P) tumor areas were taken within 0.5 cm of the adjacent normal colonic mucosa.

An amplicon-based target enrichment of portions of 20 genes followed by next generation sequencing was performed on all specimens using a CLIA-validated workflow. The focused 20 gene oncology next generation sequencing panel was developed to meet NCCN guidelines for molecular testing and/or assess emerging genomic biomarkers in the following disease states: colorectal, lung, and thyroid carcinomas, melanoma, glioma, gastrointestinal stromal tumor, myeloproliferative neoplasms, and acute myeloid leukemia. Although only a subset of this panel is reported clinically for colorectal carcinoma (KRAS, NRAS, BRAF, PIK3CA, HRAS), the entire panel was analyzed to increase the breadth of data for this research study. Amplicon enrichment was used with two methodologies dependent on DNA quantity. For samples with a DNA content of equal or greater than 15 ng/μl by Qubit, amplicon enrichment was performed on the Biomark Access Array System (Fluidigm, San Francisco, CA, USA) using 80 amplicons that average 189 base pairs in length and amplify ~13 kb of genomic DNA sequence (Supplementary Table 1). A second step PCR reaction was performed to append barcodes and sequencing adapters to the enriched amplicons, which were pooled and sequenced on a MiSeq instrument. Samples with a DNA content less than 15 ng/μl, but at least 1 ng/μl, underwent both Biomark Access Array amplification (as described above) and also underwent target enrichment in an amplicon-based workflow optimized for low DNA input that is used for clinical samples. No cases had less than 1 ng/μl of DNA. Four multiplexed 15 μl PCR reactions covering 28 amplicons from 6 genes (EGFR, KRAS, NRAS, HRAS, PIK3CA, and BRAF; Supplementary Table 1) are performed using 30 nM primer and 3 ng of DNA per reaction. The 4 reactions are pooled together for second step barcoding and sequencing as described above. The data from the low DNA input workflow was used for analysis only if the Biomark Access Array workflow data did not pass quality control coverage metrics. This low DNA input data was used for a total of 32 samples from 12 cases.

After sequencing and demultiplexing, FASTQ files were processed through a custom designed bioinformatics pipeline for mapping, indel realignment, and variant calling.16 Variant call files (vcf) were filtered for mutations occurring within 3800 hotspots within the tested regions, using a condensed database of non-synonymous variants of the targeted genes from the publically available COSMIC database17 (accessed 11/12/14, http://cancer.sanger.ac.uk/cosmic).

Coverage was analyzed and reported for all samples across all amplicons for quality control analysis. If 3 or more amplicons failed to reach 500 × minimum coverage after Biomark Access Array enrichment, the entire sample failed. Failed samples were repeated once. Filtering excluded variants with less than 500 × coverage and/or variant allele frequency (VAF) thresholds: <5% for single nucleotide variants (SNVs) of defined hotspots, <10% for single nucleotide variants outside of defined hotspots, <5% for small (<4 bp) insertion/deletion variants (indels), and <1% for large (>4 bp) indels. Cases that lacked mutations or detected the same mutation in the peripheral, superficial, and deep samples were considered concordant. Cases in which a mutation was detected in at least one site but not detected in at least one different site were considered discordant. A case with more than one mutation could have different concordance status for different mutations, therefore each mutation is considered separately. Given, the known low reproducibility in estimating tumor percentage, tumor percentages for all discordant samples were reviewed in a blinded fashion by a two additional reviewers, A.C.N. and D.C.18

This retrospective study was approved by the University of Minnesota Institutional Review Board.

Results

The average patient age was 64 years with a range from 30 years to 92 years. All cases were TNM stage II or higher with 42% stage II, 37% stage III, and 20% stage IV. Testing for microsatellite instability was performed in 57 cases and 25% showed evidence for microsatellite instability (most cases were tested by immunohistochemistry). All histologic grades were represented however grades 2 and 3 (moderately and poorly differentiated) were most prevalent accounting for 93% of cases.

A total of 99 cases underwent sequencing of the peripheral, superficial, deep, and pooled samples (396 total samples). Twelve cases (12%) were excluded from analysis following quality control assessment study (these cases had similar clinical characteristics as the entire cohort). Eleven cases were not interpretable as two or more of the four samples failed coverage quality control. One case, which had only one sample fail coverage quality control, was failed after pathologist assessment of data quality as the remaining three samples demonstrated an elevated number of variants, 92% of which were consistent with deamination artifact. Eighty-seven cases passed quality control and were included in the analysis. In two of these 87 cases (cases 7, 93), the pooled sample failed despite repeat sequencing and attempts to combine extracted DNA from the separate peripheral, superficial, and deep samples that individually yielded successful sequencing and concordantly detected a mutation. It is unclear why these two pooled samples failed as they had similar DNA quality and quantity (as measured by Qubit) to the same specimen’s non-pooled samples. Nonetheless, these two cases are included in the results and considered to be concordant.

Twenty five of the 87 cases (29%) were negative for mutations across the four samples and 62 cases (71%) had a mutation in at least one sample (Figure 2). A total of 86 mutations were detected across the 62 mutation positive cases. Most of the positive cases had a single mutation (68%), followed by 26% of cases with two mutations, and only 6% of cases with three mutations. Mutations in KRAS were the most frequent mutation (33%; n=28) followed by BRAF and PIK3CA mutations (28%; n=24 each) (Table 1). Among the 87 cases that passed quality control, 52 had testing for microsatellite instability (MSI). The cases with microsatellite instability were more likely to have a mutation (13 of 14, 93%) than cases that were microsatellite stable (23 of 38, 60%); however, the rate of discordant mutations was similar (2 of 14, 14% for microsatellite instable versus 3 of 38, 8% for microsatellite stable).

Figure 2
figure 2

A total of 99 cases underwent sequencing of the peripheral, superficial, deep, and pooled samples (396 total samples). Twelve cases (12%) were not interpretable as two or more of the four samples failed quality control. Twenty five of the 87 cases (29%) were negative for mutations across the four samples and 62 cases (71%) had a mutation in at least one sample. Of the 62 mutation positive cases, ten (11% of the total cases and 16% of mutation positive cases) were discordant. *2 samples failed quality control in the pooled sample only, a single mutation was present in each sample and was detected in the peripheral, superficial, and deep samples. These samples are considered concordant.

Table 1 Number of mutations by gene

Of the 62 mutation positive cases, ten (11% of the total cases and 16% of mutation positive cases) were discordant (Table 2), defined by detection of a mutation in at least one sample but failure to detect the same mutation in at least one other sample (peripheral, superficial, or deep). As some cases had multiple mutations; a total of 12 mutations (14% of all mutations) in the 10 cases were discordant (Table 2). The discordant cases were all stage II (50%) or stage III (50%). Two of the discordant cases showed microsatellite instability; while 3 showed microsatellite stable and the other 5 were not tested. Average tumor size in concordant cases was 5.8 cm (2.45 s.d.) compared with 6.7 cm for discordant cases (s.d. 3.11) which was not statistically significant (P=0.29 unpaired t-test) indicating that there is no association with tumor size and detection of discordant mutations.

Table 2 Discordant mutations

Ten of the 12 discordant mutations were in PIK3CA; the remaining two were in KRAS. The deep sample failed to detect a mutation in 4 cases; while the peripheral and superficial samples failed to detect a mutation in 5 cases each (Figure 3a–c). The pooled sample failed to detect a PIK3CA mutation in 1 case (case 52); therefore, the pooled sample detected mutations more often than the other samples (Figure 3d). In case 52, only the peripheral sample detected a mutation which highlights the potential pitfall of a pooled sample having decreased sensitivity due to diluting out a low level heterogeneous mutation. Half of the discordant cases (6 mutations in 5 cases: 29, 55, 56, 92, 96) were associated with a sample that had borderline tumor percentage (20–30%) by at least one reviewer (Table 2; Figure 4). One of the two discordant KRAS mutations falls into this category. Additionally this KRAS mutation is an atypical mutation outside of the functionally characterized hotspots. It is unknown if this mutation has the same driver effect as known KRAS pathogenic mutations and therefore it may not be of clinical significance.

Figure 3
figure 3

Proportion of the 12 discordant mutations detected by sample (peripheral, superficial, deep, pooled).

Figure 4
figure 4

Tumor percentage for each case by reviewer and location (deep, peripheral, and superficial). Black dashed line: 20% tumor percentage (testing cutoff). Gray dashed line: 30% tumor percentage (borderline tumor percentage cutoff).

Discussion

Our study detected 11% discordant cases (14% discrepant mutations) across all genes tested with 2% discordant KRAS mutations. Our study suggests different reasons for discordant mutation detection including tumor heterogeneity and borderline tumor percentage and that pooling may detect more mutations than single site sampling. Similar to other studies we detected higher mutation rates in microsatellite instable tumors; however we detected no difference in discordance suggesting that heterogeneity is not significantly higher in microsatellite instable tumors although the few numbers of cases in these groups preclude definitive conclusions.19

Our discordance rate for KRAS is lower than some of the literature, although it is more in line with what would be expected based on the Fearon and Vogelstien multi-step model for colorectal carcinogenesis.15 Our lower rate of discordance is likely due to multiple factors.8, 13, 20, 21 First, only stage II or higher colorectal carcinomas were included in our study which may limit the number of discordant cases. Losi et al demonstrated that tumor heterogeneity for KRAS was more prevalent in early colorectal carcinoma than advanced colorectal carcinoma.21 However, molecular testing for targeted therapy is clinically more important for more advanced colorectal carcinoma, as low stage disease can be surgically cured. Second, our study tested only three discreet areas of tumor compared with some other studies that tested as many as 20 different tumor areas.21 However, our study was intentionally designed this way as we were looking for a practical clinical solution to the problem of tumor heterogeneity and sampling a few areas is clinically feasible while sampling 20 areas is not. Third, some studies used a peptide nucleic acid (PNA) clamp technique with high analytic sensitivity.9, 10 However, we intentionally designed our study around a clinical next generation sequencing assay, as many laboratories are now using next generation sequencing based assays for oncology testing. Although the limit of detection is lower for peptide nucleic acid clamp than for standard next generation sequencing approaches; a wider breadth of mutations can be detected by next generation sequencing. Specifically our sensitivity of 5–10% variant allele frequency for single nucleotide variants and 1–5% for insertion/deletions is slightly lower than the 5% targeted single nucleotide variant detection by Richman. Lastly, unlike our study, previous studies had less stringent procedures for estimating variability in tumor percentage which has been shown to have low reproducibility.9, 12, 13, 18

Interestingly, the majority of discrepancies in this study occurred within the PIK3CA gene. Only 5 of these 10 cases involved base substitutions potentially caused by deamination (C>T or G>A) and only 1 case demonstrated a single positive sample (case 52); thus we conclude this observation is most likely due to true biologic heterogeneity and not technical artifact. Seven of these 10 PIK3CA mutations occurred at codons which are commonly altered in colorectal carcinoma (E542, E545, Q546, H1047), providing further evidence of biologic relevance. As PIK3CA mutation status has shown clinical utility as a predictive biomarker for adjuvant aspirin therapy, further investigation of intra-tumoral PIK3CA mutation heterogeneity in colorectal cancer is warranted.22

In our study, a borderline tumor percentage appeared to play a role in some discordant cases and pooling seemed to overcome that effect. As determination of tumor percentage has been shown to have low reproducibility, we conservatively defined a borderline tumor percentage as ≤30% by at least one reviewer (our required tumor percentage of 20% plus an additional 10%).18 With this definition, half of the undetected mutations occurred in a sample with borderline tumor percentage and in 4 of these 5 cases the pooled sample overcame the problem and detected the mutation. An argument could be made to increase the tumor percentage required for testing; however, fewer cases will be tested and we did have cases with borderline tumor cellularity with concordant mutations of clinical significance (for example, KRAS mutations in cases 28 and 29) that would have been denied testing with a more stringent cutoff. In practice we prefer to test cases with borderline cellularity: if no mutations are detected, we add a disclaimer to the report about the potential for false negatives. It is likely that intra-tumoral heterogeneity is the underlying cause for at least some of the remaining discordant results. For example, case 52 had ample tumor percentage as estimated by all three reviewers, yet a mutation was detected only in the peripheral sample with a low variant fraction and was not detected in the superficial, deep, or pooled sample, which highlights that intra-tumoral heterogeneity will not always be detected by a pooled sample. However, single site sampling would have missed the mutation in 2 out of 3 samples as well.

Across our 11% discordant cases, there was no significant difference in mutation detection between the peripheral, superficial, or deep locations; however, pooling of the three different sample locations yielded the highest detection of mutations in discordant cases. Pooling three samples will add slightly more work for reviewing slides, pulling blocks, and cutting material from three blocks instead of one per case; however, the overall additional time is low, the cost for these steps is considerably less than performing molecular testing on multiple blocks, and will lead to less false negative molecular tests. One caveat to this approach is that two of our 87 pooled samples failed quality control when the single site sampling passed quality control. Although we could not prove it, an error in pooling is a possible explanation and a laboratory protocol for pooling should be established before instituting this method.

Our data and others suggest that pooling from multiple different areas of a tumor is a practical approach that could be implemented routinely to increase detection of mutations.13 Pooling equal amounts of macrodissected tumor from three different areas prior to DNA extraction and downstream testing, detected 11 of 12 discordant mutations (92%) compared with sampling only a single area, which detected, at best, 8 of 12 discordant mutations (67%). Although we pooled three areas, the optimal number of areas is yet to be determined. Pooling may also overcome issues of low tumor percentage; however, we still recommend including areas where adequate tumor percentage can be obtained with macrodissection. Based on the available data we recommend that, when possible, laboratories pool three tumor sites with adequate cellularity for molecular genetic testing.