Analytical Validation of Multiplex Biomarker Assay to Stratify Colorectal Cancer Samples into Molecular Subtypes

Previously, we classified colorectal cancers (CRCs) into five CRCA subtypes with different prognoses and potential treatment responses, using a 786-gene signature. We merged our subtypes and those described by five other groups into four consensus molecular subtypes (CMS) that are similar to CRCA subtypes. Here we demonstrate the analytical development and application of a custom NanoString platform-based biomarker assay to stratify CRC into subtypes. To reduce costs, we switched from the standard protocol to a custom modified protocol (NanoCRCA) with a high Pearson correlation coefficient (>0.88) between protocols. Technical replicates were highly correlated (>0.96). The assay included a reduced robust 38gene panel from the 786-gene signature that was selected using an in-laboratory developed computational pipeline of class prediction methods. We applied our NanoCRCA assay to untreated CRCs including fresh-frozen and formalin-fixed paraffin-embedded (FFPE) samples (n=81) with matched microarray or RNA-Seq profiles. We further compared the assay results with CMS classification, different platforms (microarrays/RNA-Seq) and geneset classifiers (38 and 786 genes). NanoCRCA classified fresh-frozen samples (n=39; not including those showing a mixture of subtypes) into all five CRCA subtypes with overall high concordance across platforms (89.7%) and with CMS subtypes (84.6%), independent of tumour cellularity. This analytical validation of the assay shows the association of subtypes with their known molecular, mutational and clinical characteristics. Overall, our modified NanoCRCA assay with further clinical assessment may facilitate prospective validation of CRC subtypes in clinical trials and beyond. Novelty and Impact: We previously identified five gene expression-based CRC subtypes with prognostic and potential predictive differences using a 786-gene signature and microarray platform. Subtypedriven clinical trials require a validated assay suitable for routine clinical use. This study demonstrates, for the first time, how molecular CRCA subtype can be detected using NanoString Technology-based biomarker assay (NanoCRCA) suitable for clinical validation. NanoCRCA is suitable for analysing FFPE samples, and this assay may facilitate patient stratification within clinical trials.


Introduction
Colorectal cancer (CRC) is the fourth leading cause of cancer-related deaths worldwide 1 .
The median overall survival of metastatic (m)CRC patients with unresectable disease remains around 24 months with standard chemotherapies. Targeted therapies including anti-EGFR antibodies, anti-angiogenic and immunotherapy agents may extend survival up to 30 months in selected patients 2 . However, how to identify patients who will benefit from different drug options remains challenging. Additional predictive biomarkers are required to spare patients from unnecessary toxicities, improve outcomes and increase cost-effectiveness of treatment.
In order to classify colorectal cancers into subtypes with distinct biology, helping to effectively match existing therapies and facilitate subtype-specific therapeutic development, we previously identified five distinctive gene expression subtypes using a 786-gene signature 3 . Based on the gene expression similarities with different cell types of the normal colonic mucosa, the 5 subtypes were named goblet-like, enterocyte, stem-like, inflammatory and transit-amplifying (TA). We demonstrated significantly poorer disease-free survival (DFS) in untreated patients for the stem-like subtype, intermediate DFS for inflammatory and enterocyte, and better DFS for goblet-like and TA. Then, from two different datasets that included drug response information, we observed increased responses within the stem-like subtype to irinotecan, fluorouracil and leucovorin treatment combination (FOLFIRI) and the TA subtypes to anti-EGFR monoclonal antibody (cetuximab) [3][4][5] . These treatment responses were further validated by other studies 6,7 .
Five other groups independently identified between 3 and 6 molecularly distinct CRC subtypes based on expression profiles [8][9][10][11][12] . These and our findings were aggregated by a CRC Subtyping Consortium (CRCSC) into 4 consensus molecular subtypes (CMS): CMS1 (similar to inflammatory subtype); CMS2 (enterocyte and TA); CMS3 (goblet-like); and CMS4 (stem-like), plus a "mixed" subtype representing either the existence of additional subtypes or the presence of multiple subtypes in a single sample 13 . In the current manuscript, we found "mixed" subtypes to have multiple existing subtypes. CRCA and CMS subtypes are similar except that the enterocyte and TA subtypes of CRCA were merged into CMS2 of the consensus classification 13 .
Recently, the exploratory clinical applicability of our CRCA subtypes was demonstrated when secondary analysis of a randomised clinical trial assessing patient benefit from the addition of oxaliplatin to fluorouracil-leucovorin in early-stage disease revealed that benefits were highly enriched in the enterocyte subtype compared to the other subtypes in the discovery cohort. Therefore, in this study we have applied our CRCA subtype classification for assay development and evaluation.
Translating these findings into routine clinical practice remains challenging, mainly due to the lack of a fit-for-purpose assay that can classify patient samples into subtypes within a clinically relevant turnaround time and reasonable costs using formalin-fixed paraffinembedded (FFPE) samples. All classifiers were developed from microarray or RNA-seq gene expression profiles, which are expensive, time consuming, require dedicated bioinformatics expertise, and have turnaround times incompatible with clinical applicability. They also rely on pre-amplification of RNA, with consequent impact on accuracy and reproducibility. We previously demonstrated proof-of-concept assays using immunohistochemistry and quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) methods 3 .
Nevertheless, these methods may suffer from reproducibility issues. Hence, we applied nCounter platform (NanoString Technologies) to develop a clinically-relevant biomarker assay for CRC subtype classification.
The nCounter platform has previously been exploited to develop the Food and Drug Administration (FDA)-approved Prosigna® Breast Cancer Prognostic Gene Signature Assay 15 to predict risk of recurrence in patients treated with adjuvant hormonal therapy, as well as assays to predict medulloblastoma 16 and lymphoma 17 subtypes. This platform measures gene expression in the form of discrete counts of barcoded mRNAs, and requires no amplification step, eliminating a potential source of bias. In the present study, we evaluated the suitability of this platform for a gene expression-based assay for our CRCA (a potential surrogate for CMS) subtypes using a modified protocol to subtype CRCs in 4 different cohorts (fresh-frozen and FFPE samples) with different clinical and mutational characteristics. The results were compared to the CMS subtype classification and other platforms. A summary of the classifiers utilised in this study is given in Table 1.

Analytical development and assessment of CRC subtyping using a reproducible assay
In order to evaluate the applicability of the CRC subtype classification in the clinic, we initially developed a custom nCounter assay using a 50-gene panel, including 47 genes selected from the 786-gene signature and 3 additional genes 3 (Table S1; see Methods).
Initially, we applied a standard protocol (in which biotin labels and molecular barcodes are directly attached to the mRNA probes; Figure 1a) from the manufacturer and tested the performance of the custom nCounter assay using primary fresh frozen tumour RNA obtained from 22 CRCs from two different cohorts (Montpellier and OriGene; Table S2a). The distribution of samples across principal subspace using principal component analysis (PCA; Figure S1a) showed no batch effect between the two cohorts. In addition, hierarchical clustering analysis using nCounter profiles clustered these 22 samples into different groups that potentially represent different subtypes ( Figure 1b).
Next, we evaluated if a "modified" protocol (custom unique probes are attached to biotin labels and molecular barcodes separately; approximately ~35% less expensive than standard protocol; Figure 1a) from NanoString Technologies can deliver similar classification performance compared to the standard protocol-based assay. The results from the modified protocol-based profiling (Table S2b) showed similar distribution of samples (n=22) across principal subspace using PCA ( Figure S1b) and clustered into potential subtypes ( Figure S1c) in a similar fashion to the standard protocol.
We merged both the standard and modified protocols' gene expression data after normalising (gene-wise median centring) each dataset and performed PCA ( Figure S1d). Figure 1c shows the clustering of the same samples between protocols. Measurements showed high correlation (R 2 =0.90, p<0.001; Figure 1d) between these different protocols. This demonstrates that we can successfully replicate results from standard protocol using modified protocol for a more cost-effective assay. Therefore, we adopted the modified protocol for our assay (Figure 1a).
To test if our assay results are highly reproducible between batches, we performed our assay on five of the above samples twice, in separate batches of maximum 40 weeks apart; Table   S2c-d). Figure 1e and Figure S1e shows the clustering of replicate samples together with negligible batch effect. We achieved high concordance between the assays across the two replicates, with a Pearson correlation R 2 of 0.98 (p<0.001; Figure 1f). This establishes the high reproducibility of our assay over non-negligible periods of time. Hence, in the future, we can use this assay to test the state of subtypes using matched pre-and post-treatment biopsies or surgical materials.

Analytical validation of CRC subtype assay using FFPE samples
In the clinic, FFPE tumour samples are more prevalent than those that are fresh frozen.
Hence, clinical diagnosis is mainly dependent on FFPE samples. At the same time, FFPE samples can present challenges of low abundance or highly degraded RNA. We therefore tested NanoCRCA assay using FFPE samples. Using the FFPE-preserved cohort of samples  Table S3a-c) and the reproducibility of the modified protocol ( Figure S2f and Table S3d-e) in FFPE samples, as previously with fresh frozen samples. We again achieved successful clustering of samples by sample rather than by protocol ( Figure 2a

Selection of robust gene set for subtyping
Successful clinical biomarker assays should be able to classify samples into subtypes with high concordance, and this requires a robust set of genes. Hence, we tested the robustness of our selected 47 genes (out of 50 genes from 786-gene signature) using two in-laboratory developed bioinformatics tools -idSample and intPredict ( Figure 3a). Since "mixed subtype" samples with more than one subtype were present in CRC 13 , we selected only the samples from our published training dataset (n=387) 3 that showed at least 70% probability for a single subtype (n=192, Figure 3b; Table S4a). This was done using idSample tool that employs support vector machine regression method (see Methods). Furthermore, intPredict tool, which contains a pipeline of supervised class prediction methods, was used to identify

Subtyping of fresh frozen samples
To determine if these assays could successfully stratify patient samples into CRCA subtypes, we applied our NanoCRCA assay (utilising the modified protocol and 38-gene signature) to  Table S5), indicating the presence of multiple distinct subtypes within the same tumour, as previously demonstrated 13 .

Biological characteristics of the identified subtypes
To further confirm the molecular characteristics of the subtypes, we performed analysis using NanoString Technologies' PanCancer Progression Panel (Table S6a-b; Figure S3a

Concordance and distribution of subtypes across fresh frozen samples between platforms
We sought to confirm that subtyping using the NanoCRCA assay (with the modified protocol and 38-gene signature) mirrored the results of subtyping using highly multiplexed platforms such as microarrays and RNA-Seq. Matched microarray or RNA-Seq data for the freshfrozen Montpellier, Singapore and OriGene cohorts were generated or downloaded from public repositories (see Methods) (n=51). Subtypes were determined by correlation of the gene expression profiles with both our original 786-gene signature centroids 3 and our new 38-gene panel centroids.
As a confirmation that platform and gene set differences did not bias the distribution of subtypes assigned to the samples, Figure 4d shows there is no significant difference (p>0.05; proportion tests) in the distribution of each subtype across the two CRCA classifiers from different platform and CMS classifier in the three cohorts. This again validates the similarity between CRCA and CMS subtypes.
To assess the stability of individual subtypes across the various platforms and gene sets, we

Subtyping of FFPE-preserved samples
Applying the NanoCRCA assay to FFPE samples from our own clinical cohort (see Methods; n=24) revealed that all the CRCA subtypes were present in this cohort along with mixed and undertermined samples (Figure 4f and Table S3e-f). Regardless of the generally lower quality of RNA obtained from FFPE samples, this pilot cohort demonstrates that the NanoCRCA assay can be used to determine the subtypes of FFPE-preserved samples, thereby making our assay widely practicable in a clinical setting.

Assessment of CRC subtypes in Montpellier cohort using NanoCRCA assay
In order to understand the clinical significance of stratifying CRC samples using our NanoCRCA, we further analysed our Montpellier cohort of 17 primary tumours (stage IV) patients 4, 6 (Methods; Figures 5a and S4a-b; Table S7a-b). All the CRCA subtypes were present in this cohort, and all samples were successfully classified, with none showing mixed or undetermined subtype characteristics (Figure 5b-c). We observed a non-uniform distribution of the subtypes with enterocyte contributing 41.2% of all the samples followed by stem-like (23.5%) and goblet-like (17.6%) subtypes. However, inflammatory (11.8%) and TA (5.9%) subtypes were low in numbers in this cohort of samples (Figure 5c).

Comparison of subtypes with CMS classifier and across platforms
We additionally compared our NanoCRCA assay with the microarray-based CMS subtypes in this cohort. We classified the Montpellier cohort of 17 microarray gene expression profiles into CMS subtypes using the published CMS classifier 13 . We successfully classified the samples into all of the CMS subtypes: 47.1% were CMS2 (enterocyte and TA); 11.8% each of CMS3 (goblet-like) and CMS4 (stem-like); and 5.9% of CMS1 (inflammatory). However, we found 17.6% mixed and 5.9% undetermined samples (Figure 5d). Thus, NanoCRCA showed good concordance (84.6%) with the CMS classifier excluding the mixed and undetermined samples (Figure 5e). We further performed pairwise Fisher's exact test between the CMS and NanoCRCA subtypes (Figure 5f), which confirmed borderline significant association to the CMS classification (FDR=0.07). This suggests that NanoCRCA may be applied as a surrogate to predict CMS subtypes (as CMS classification was partly derived from CRCA classification) with reasonable concordance in addition to CRCA subtypes.
We also compared the performance of the NanoCRCA assay against classification using Affymetrix Human Genome U133 Plus 2.0 (HG-U133 Plus2) microarray profiles 4 in this cohort and the new 38-gene signature (Figures 5b and S4c). We classified the samples' microarray profiles into all the five subtypes with 35.2% as enterocyte, 23.5% as stem-like, 17.6% as goblet-like, 11.8% as inflammatory and 5.9% as TA ( Figure S4d). Only one sample was defined as a mixed subtype (1/17, 5.9%) expressing both inflammatory and enterocyte genes. The expression of enterocyte genes was consistent with the classification of the sample as enterocyte on the NanoCRCA platform. The fact that the sample was classified as a mixture of subtypes by the 38-gene panel may be attributed to platform-specific effects.
Overall, the NanoCRCA assay showed perfect 100% concordance with the microarray-based 38-gene panel classification after excluding samples with mixed classification (due to challenges in comparing these mixed subtypes to others) (Figure 5e).
Similarly, we compared the 786-gene signature-based classification using microarray data to the results of NanoCRCA. The 786-gene signature-based classification yielded 23.5% enterocyte, 29.4% stem-like, 23.5% goblet-like, 11.8% inflammatory and 5.9% TA samples ( Figure S4e). There was one mixed sample (1/17, 5.9%; Figures 5b and S4e) that was different from that called mixed by 38-gene panel. Irrespective of the different number of genes profiled on the different platforms, the NanoCRCA assay showed 87.5% concordance with the 786-gene signature subtypes (Figure 5e). Again, the 12.5% discordance may be attributable to noisy genes present in the 786-gene signature. Overall, the NanoCRCA assay and 786-gene signature classification perform well with good concordance.
We statistically validated these findings by applying Fisher's exact test on these classifications, excluding mixed or undetermined samples. We found that the NanoCRCA assay was significantly (false discovery rate; FDR<0.001; Figure 5f) associated with both the 38-gene panel and 786-gene signature classification systems. This again statistically validates the high concordance between NanoCRCA classification and different gene-and platformbased CRCA classifications, further confirming the robustness of our NanoCRCA subtypes.
In sum, all the three classifications from the two different platforms identified all the five subtypes and the NanoCRCA assay predicted robust subtypes highly consistent with the microarray platform.

Asian cohort
Our profiling of the multi-stage SG cohort using NanoCRCA is the first of its kind in an Asian population, to our knowledge. We further observed high concordance in subtypes between the NanoCRCA assay (n=23) and RNAseq (for both 38-gene panel and 786-gene signature; n=17; Figures S5a-e; Table S7c-d). There was no visible trend in association between stages and subtypes ( Figure S5f; Table S7e).
Previously, we reported that the inflammatory (CMS1) subtype is associated with MSI and BRAF mutations, whereas goblet-like (CMS3) subtype is highly associated with KRAS mutations 3, 13, 18 . To further validate the NanoCRCA subtypes, we compared these with the mutational (BRAF and KRAS) and MSI status (Figure 6a and Table S7e) of cancers in the Asian cohort (n=11) that were detected within our laboratory using methods described in Supplementary Information. All (100%) the inflammatory subtype CRCs were associated with MSI. Interestingly, one of the two BRAF mutant tumours was associated with the inflammatory (CMS1) subtype and MSI status. Similarly, all three goblet-like subtype (CMS3; 100%) tumours were associated with KRAS mutation. There were three other KRAS mutant tumours associated with the enterocyte or stem-like subtypes, representing KRAS mutant tumours are also associated less frequently with other subtypes, as previously reported 13 . This analysis with our NanoCRCA assay corresponds with known associations of subtypes with mutational and MSI profiles with this small SG data set of 17 samples. However, additional large data sets and NanoCRCA assays are warranted to validate these observations.

Effect of tumour cellularity on CRC subtyping using NanoCRCA assay
Next, we sought to test if tumour cellularity affects our NanoCRCA assay using OriGene cohort of samples (n=17) spanning all stages of CRC (Figures 6b and S6a-d; Table S7f-g).
We observed high concordance between NanoCRCA assay and HTA microarray (for both 38-gene panel and 786-gene signature; Figure S6e), although CA1 gene was not present in the HTA array and there was a low correlation (R 2 <0.5) in expression of two inflammatory genes between HTA and NanoCRCA assay ( Figure S6f and Supplementary Information).
With variable tumour cellularity ranging from 10% to 85% in this cohort, we postulated that if cellularity affects our assay, the low cellularity samples should be either qualified as "undetermined" or "mixed subtype" samples. One sample had extremely low cellularity of 10% and one additional sample was from a liver metastasis. Interestingly, none of the samples were classified as having undetermined subtype, regardless of cellularity. On the other hand, the three mixed samples had varying levels of cellularity (50-75%) (Figures 6bc). These results suggest that our NanoCRCA may not be affected by cellularity due to the selection of robust gene sets, which requires further validation in the future.

Conclusion
In summary, we analytically developed and validated NanoCRCA biomarker assay based on robust 38-gene panel and classified CRC samples into molecular subtypes, along with mixed subtypes. Subtype prediction by the NanoCRCA assay is highly concordant with CMS subtypes and multiple platforms. Also, stratification using this assay reproduces the clinical, mutational and molecular characteristics of the CRC subtypes. Since multiple CRC clinical trials require low-cost, reproducible and rapid clinically implementable assays to prospectively validate CRC subtypes for subtype-specific studies, our NanoCRCA assay may potentially facilitate this process in the clinic using FFPE samples.

Patient cohorts
Four CRC cohorts were studied: three derived from fresh-frozen and one from FFPE samples. The first included 17 metastatic (stage IV) CRCs (Montpellier cohort) from patients with no prior chemotherapy from a published study 4 (microarray data at GEO accession

Subtype concordance and significance
Subtype concordance between two different platforms was calculated as the percentage of samples that showed the same subtype (not including mixed and undetermined samples) in both. Subtypes were deemed concordant between CRCA and CMS subtypes based on the following equivalence: CMS1=Inflammatory; CMS2=Enterocyte and TA; CMS3=Gobletlike; CMS4=Stem-like 13 . Fisher's exact test between different platforms or classifiers was performed to assess statistical significance of their subtype concordance, and p-values adjusted using the FDR method.

DATA ACCESS
Previously published GEO Omnibus data sets were analysed for gene set selection (GSE14333 and GSE13294) and microarray-based subtyping of the Montpellier cohort (GSE62080). nCounter data for all cohorts (GSE101479 -standard protocol and GSE101481 -modified protocol) and microarray/RNA-Seq data for OriGene (GSE101472) and Singapore (GSE101588) cohorts are deposited under the SuperSeries with accession number GSE101651.

ETHICAL APPROVAL AND INFORMED CONSENT
The protocol for the SG cohort of this study was approved by the SingHealth Institutional F.S., and D.C., designed the RETRO-C study protocol and R.B. collected and prepared the tissue samples. C.R., K.E., and A.Sa. interpreted the results and wrote the manuscript.

DISCLOSURE DECLARATION
A.Sa. has ownership interest (including patents) as a patent inventor for a patent entitled "Colorectal cancer classification with differential prognosis and personalized therapeutic responses" (patent number PCT/IB2013/060416).