Cancer is the term used for a diverse collection of diseases characterized by uncontrolled cell growth caused by genetic and/or epigenetic disorders and results in aberrant gene function and/or expression. Cancer is one of the leading causes of death in developed countries and is projected to become a major cause of death worldwide1. Biomarkers play important roles in cancer diagnosis, prognosis and therapy. Currently, several classes of tumor biomarkers are being exploited, including DNA-based, RNA-based, protein-based and epigenetic-based biomarkers. The DNA-level aberrations in cancer comprise single nucleotide variants (SNVs), small InDels (small insertions or deletions), structural variants (SVs) and copy number variants (CNVs)2. RNA-based biomarkers include overexpressed or underexpressed mRNAs, miRNAs and noncoding RNAs. Protein markers include glycoproteins such as AFP (alpha-fetoprotein, a marker for liver cancer), CA125 (for ovarian cancer), and tumor antigens such as PSA (prostate-specific antigen, for prostate cancer)3 and CEA (carcinoembryonic antigen, for colon cancer).

mRNAs, miRNAs, and other noncoding RNA are major components of the gene expression network and cooperate to affect biological processes. These molecules play important roles in cell-cycle control, cell proliferation, apoptosis, differentiation, metabolism, contact inhibition, metastasis and post-transcriptional gene regulation during the development of cancer. In recent years, applications of DNA-microarray and next generation sequencing (NGS) techniques have enabled simultaneous analysis of the expression profiles of tens of thousands of genes in various tumor samples, including serum, plasma, urine, and formalin-fixed paraffin-embedded (FFPE) or biopsy tissues. The transcription-level changes in mRNAs, miRNAs, and other noncoding RNAs detected by high-throughput approaches provide novel insights into the identification of RNA-level cancer biomarkers. Moreover, in combination with novel computational strategies, recent genome-wide studies have unveiled the intrinsic relationships among dysregulated miRNA-mRNA interactions in cancers and explored their roles as biomarkers. In this review, we discuss the discovery and utilization of miRNA and mRNA biomarkers for cancer using genome-wide approaches, with emphasis on their roles in cancer diagnosis, prognosis and therapeutic applications. We also summarize the progress of the integrated genome-wide analysis of miRNA and mRNA in cancer.

miRNA signatures in cancer

Short (approximately 22 nucleotides) noncoding miRNAs are a class of small regulatory RNA molecules discovered in 19934. They trigger mRNA degradation or repression of translation via the 6-8 base pair “seed region” complementarily at the end of the miRNA-mRNA heteroduplex5. One miRNA may regulate hundreds of genes, and it has been estimated that more than 50% of mRNAs are regulated by miRNAs in mammalian genomes6. Thus, altered miRNA expression may affect a variety of cellular processes and diseases, and many studies have shown that miRNAs play important roles in the initiation, progression and invasion of human cancer7,8. miRNAs can act as either oncogenes or tumor suppressors. For example, the miR-17-92 polycistron, which has been found to be amplified in human B-cell lymphomas, was the first miRNA (cluster) characterized with potent oncogenic activity9. Many other miRNAs have also been identified as promoters or suppressors of oncogenesis by regulating specific oncogenic or tumor-suppressing pathways10. Increasing amounts of evidence have demonstrated that miRNAs exhibit spatiotemporal expression patterns and that many miRNAs are differentially expressed in cancerous tissues compared with adjacent normal tissues or at different stages of cancer progression11,12. The strong relationships between miRNA expression profiles and cancer progress are supported by many studies that profiled the miRNA patterns in various cancer types and inferred the value of miRNAs in diagnostic and prognostic applications13.

miRNAs/panels as diagnostic markers for cancer

miRNAs are promising biomarkers for cancer diagnosis. The current diagnostic miRNAs and panels for cancer biomarker are listed in Table 1. Yanaihara et al14 identified unique profiles of 43 miRNAs that could discriminate between lung cancer and noncancerous tissues by analyzing 104 pairs of primary lung cancers and noncancerous lung tissue by miRNA microarray analysis. The panel of miR-375, miR-424 and miR-92a distinguishes carcinomas from adenomas in colorectal cancer15. In a large-scale miRNA expression analysis of 540 samples of six solid tumors (lung, breast, stomach, prostate, colon, and pancreas), a group of 43 miRNAs were found to be dysregulated compared with matched normal tissues16. The study of the miRNA profiles of archived FFPE samples of papillary thyroid carcinoma (PTC) first demonstrated the utilization of FFPE samples as appropriate resources for miRNA microarray analysis and identified miR-21, miR-31, miR-221 and miR-222 as potential markers of PTC17. Circulating miRNAs are another source of cancer biomarkers. A microRNA panel (miR-122, miR-192, miR-21, miR-223, miR-26a, miR-27a and miR-801) in plasma was identified using an miRNA microarray and provides high diagnostic accuracy for hepatocellular carcinoma (HCC) diagnosis18. In our previous published study, three miRNAs (miR-187*, miR-371-5p and miR-378) were significantly upregulated in sera from gastric cancer (GC) patients, and miR-378 was validated as a noninvasive biomarker in GC detection19.

Table 1 Summary of diagnostic miRNAs/panel.

In addition to microarray technology, small RNA sequencing has also been applied to biomarker identification. For example, using Solexa sequencing20, 25 serum miRNAs were identified to be upregulated in esophageal squamous cell carcinoma (ESCC) patients compared with the controls. RT-qPCR analysis also confirmed a panel of 7 serum miRNAs (miR-10a, miR-22, miR-100, miR-148b, miR-223, miR-133a, and miR-127-3p) as ESCC biomarkers. Chen et al21 applied NGS and discovered elevated expression levels of miR-25 and miR-223 in the serum, which are blood-based biomarkers of non-small cell lung carcinoma (NSCLC). Using small RNA sequencing, miRNA biomarkers have been identified from human tissue, blood and FFPE samples in a wide range of tumor types, such as gastric cancer22, hepatocarcinoma23, breast cancer24,25, and prostate cancer26.

Different tumor subtypes may exhibit distinct miRNA expression signatures that can be used as diagnostic miRNA biomarkers to classify tumor subtypes13. The cure rate of cancer patients can be increased if the cancer subtype is correctly classified and treated with appropriate drugs and therapeutic methods. miRNA expression signatures have been used to classify subtypes in acute myeloid leukemia (AML) and leukemia27,28. Downregulation of miR-34a, let-7b, miR-106a, and miR-141 and upregulation of miR-301 and miR-452 was found in prostate cancer and used to discriminate among subtypes of prostate cancer29. In HCC, a 7-miRNA signature panel (miR-16, miR-122, miR-21, miR-223, miR-25, miR-375, and let-7f) was recently reported to successfully diagnose patients with HCC, patients with hepatitis B, and healthy individuals23,30. Gilad et al reported a single assay to classify the four main types of lung cancer, carcinoid, SCLC, and squamous and nonsquamous NSCLC, using the expression of eight miRNAs (miR-7, miR-21, miR-29b, miR-106a, miR-125a-5p, miR-129-3p, miR-205, miR-375)31.

miRNAs/panels as prognostic biomarkers for cancer

miRNA expression profiles can also serve as biomarkers of prognosis. Table 2 summarizes the prognostic miRNAs from recent studies identified by genome-wide analysis. Yanaihara et al14 showed that lung adenocarcinoma patients with either high miR-155 or reduced let-7a-2 expression had poor survival. High miR-21 expression was associated with poor survival and therapeutic outcome in colon adenocarcinoma73. In HCC, the loss of miR-122 expression is linked to cancer progression and gain of metastatic properties74. Lower expression of let-7 is associated with significantly shorter survival in patients with lung cancer after resection75. The loss of miR-335 and miR-126 expression occurs in the majority of primary breast tumors in patients whose relapse is associated with poor distal-metastasis-free survival76. The study of miRNAs as cancer biomarkers may promote the application of miRNA-based tests as an alternative to mRNA/protein expression for prognosis assessment.

Table 2 Summary of prognostic miRNAs/panel.

miRNA/panels as predictive biomarkers for cancer

miRNA can serve as predictive biomarkers of tumor recurrence and/or metastasis. miR-126 has been shown to regulate endothelial cell recruitment to metastatic breast cancer cells through coordinated targeting of IGFBP2, PITPNC1 and MERTK, and miR-126 might serve as an efficient biomarker to forecast cancer metastasis104. miR-221 is progressively downregulated in aggressive prostate cancer, lymph node metastases, and clinical recurrence and serves as a biomarker predicting the progression and recurrence in high-risk prostate cancer100.

Predictive miRNA biomarkers can also predict a patient's response to therapy and aid in treatment decisions. For example, miR-21 expression was significantly increased in platinum-based chemotherapy-resistant NSCLC patients, and increased miR-21 expression was associated with shorter disease-free survival105,106. Forty-one nasopharyngeal carcinoma (NPC) patients were analyzed before and after radiotherapy and showed that expression of miR-BART7 and miR-BART13 was abolished after treatment, indicating that these two Epstein-Barr virus (EBV) miRNAs are useful predictive biomarkers of NPC treatment efficacy. miR-107 and miR-99a-3p were also identified as predictive markers for chemotherapy response in advanced colorectal cancer107. An miRNA panel comprising miR-1290, miR-196b, and miR-135a* has been used to predict responses to platinum-based doublet chemotherapy in patients with lung adenocarcinoma108.

mRNA profiles as cancer signatures

mRNA profiling has shown that mRNA expression is also associated with cancer progression and identified effective cancer biomarkers. Aberrant expression of oncogenes (such as KRAS and MYC) and tumor suppressor genes (TSGs) (such as APC, BRAF, TP53, etc) are often correlated with cancer development, which might be caused by chromosomal instability, accumulation of mutations, CNVs or DNA methylation modifications. Moreover, all steps of mRNA biogenesis might be affected during cancer progression, from transcription, splicing, post-transcription regulation, and translation to mRNA stability control. For example, different isoforms of a gene can display opposite functions in tumor generation, such as the two transcript variants of BCL-x, which can serve as inhibitors or activators of apoptosis, respectively109. Given the complexity of cancer progression and transcriptome composition, genome-wide profiles have been widely used to obtain a whole transcriptome profile from tumor specimens and provide comprehensive and accurate information regarding mRNA expression that might serve as cancer signatures in diagnosis and prognosis, especially for subtype classifications.

mRNA profiles as diagnostic cancer markers

The first study using a microarray technique to explore global mRNA expression trends and identify tumor classifiers was reported by Golub et al110 They classified acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) using the Affymetrix gene microarray with 6817 human genes. A 50-gene panel was generated from analysis of 38 bone marrow samples (11 AML, 27 ALL) and verified to have high power in classifying AML and ALL. Since then, genome-wide microarray analysis has been widely utilized in identifying numerous diagnostic biomarkers in lung cancer111, gastric cancer112, prostate cancer113, and other types of cancer. Different types of cancer exhibit distinctive expression profiles; for example, in epithelial ovarian cancer (EOC) there is low expression of p53, BCL-2, BAX, EGFR and HER2,and overexpression of ASAP1114,115,116,117. The expression of IKBKβ, CREBBP, WNT10B, PRDX6, ITGAV, and IFNAR1 was found to be associated with hepatic carcinogenesis118. Other mRNA markers, including ELA2, MPO, STAT4, and FUS, were upregulated, and AXIN2, LEF1, SFRP2 were downregulated in AML119,120,121. However, some genes might be related to other forms of cancer. The eIF4E gene is overexpressed in cancers of the breast, head and neck, bladder, colon, lung, and prostate and is related to reduced patient survival122,123,124,125,126. Cancer development has also been shown to be linked to alternative splicing127. For example, a nonsense mutation in exon 18 of the tumor-suppressor BRCA1 gene, a well-known marker of breast and ovarian cancers, affects splicing and causes exon skipping128. Genome-wide studies have identified a large number of cancer-specific alternative splicing events or fusion proteins, which are ideal diagnostic markers and therapeutic targets129,130,131.

In 2000, Perou and colleagues reported a new method to classify breast cancer by gene expression profiling132 and started to use gene expression microarrays for the diagnosis of breast and other cancers133,134. Colorectal cancer (CRC) subtyping has also been addressed using genome-wide gene expression profiling in large-scale patient samples135 (Table 3). ALL is a heterogeneous disease with more than 12 subtypes. A 62-gene classifier was created in white children's samples for subtype classification and validated on a completely independent set of 100 Chinese samples134. Breast cancer can also be classified into different subtypes using miRNA and mRNA expression data at The Cancer Genome Atlas (TCGA) database136.

Table 3 mRNA signature to classify cancer subtypes.

NGS has also been used to analyze mRNA biomarkers in cancers. For example, Sinicropi et al applied RNA-seq using FFPE samples to profile genome-wide mRNA expression in 136 breast cancer patients and identified 32 mRNAs that can serve as breast cancer biomarkers, indicating that RNA-Seq can provide a practical, sensitive and precise platform for genome-wide biomarker discovery in FFPE tissue. Similarly, RNA-seq has been applied to study biomarkers in hepatocellular carcinoma137, colorectal cancer138, B-cell lymphoma139, melanoma140, and prostate cancer141. RNA-seq has also been used to analyze alternative splicing associated in lung cancer142.

mRNA profiles as prognostic cancer markers

The expression status of estrogen receptors (ER) and HER2/NEU (ERBB2) has been used in the prognosis of breast cancer. Different tumor types exhibit distinct gene expression signatures that can be used to sub-classify a certain cancer into prognostic groups (Table 3). For example, CRC subtypes can be classified using genome-wide gene expression profiling, and mRNA biomarkers can also be used for the prognosis and prediction of CRC143,144. One hundred fifteen gene signatures were identified through gene expression profiling of 68 ovarian cancer patients145. Platelet-derived growth factors (PDGFs) and their receptors (PDGFRs) are overexpressed in various cancers, and their expression levels correlate with tumor growth, invasion and prognosis. PDGF and PDGFR have been used as prognostic indicators in several carcinomas146,147. Mantle cell lymphoma (MCL) is an aggressive form of non-Hodgkin lymphoma. Kridel et al discovered recurrent mutations in NOTCH1 by performing whole transcriptome sequencing on 18 primary-tissue MCL samples and 2 cell lines148. In an independent pool of samples, they also verified that 12% of clinical samples and 20% of cell lines showed NOTCH1 mutations, which were associated with poor overall survival. Brodtkorb et al found that 6 of the NF-κB pathway genes were significantly associated with the transformation of follicular lymphoma149. Clearly, genome-wide approaches have been widely applied in cancer research and have supplied informative insights into cancer prognosis.

Clinical applications of genome-wide mRNA biomarkers

Some diagnostic and prognostic microarrays have been developed and clinically applied. Breast cancer is the most frequently diagnosed cancer in women, accounting for approximately 30% of all cancers diagnosed and approximately 16% of all cancer-caused deaths150,151. The MammaPrint prognostic microarray provides a 70-gene expression profile that can be used to guide the treatment and prognostication of breast. MammaPrint was developed on the Agilent platform and provides a poor prognosis readout, which consists of genes regulating cell cycle, invasion, metastasis and angiogenesis in patients who are lymph node negative at diagnosis152,153. An 18-gene signature for vascular invasion that is associated with aggressive features and reduced survival in breast cancer has been reported recently. By utilizing and compiling 11 open-access gene expression datasets from 2423 breast cancer patients, this 18-gene panel showed consistent associations with tumor grades, hormone receptor negativity, HER2 positivity, a basal-like phenotype, and reduced patient survival. Similarly, a 21-gene expression profile on Oncotype DX was used to guide the treatment and prognostication of breast cancer154. Other commercially available microarray-based prognostic assays for breast cancer include PAM50 (ROR-S), the Breast Cancer Index, and EndoPredict155.

Several assays have been applied in colon cancer diagnosis. The Oncotype DX colon cancer assay uses a 12-gene signature and was validated in more than 1800 patients from four adjuvant trials156. The coloprint (CP) assay is based on an 18-gene signature157 and has been validated to contain prognostic genomic signatures for patients with stage II colon cancer158. The ColonSentry mRNA expression panel is a patient-friendly test identifying 7 mRNA markers expressed in whole blood to diagnose colorectal cancer159.

Integrated analysis of miRNAs and mRNAs in cancer

Genome-wide gene expression profiling using microarray and deep sequencing technologies has significantly driven the discovery of cancer biomarkers. In addition to the expression signature of either miRNA or mRNA in a single type of cancer, many studies have applied integrated analyses of genome-wide mRNA and miRNA expression signatures in cancers.

Using Agilent miRNA microarrays and Affymetrix whole-genome expression microarrays, we have identified differentially expressed miRNAs and mRNAs in HCPT-resistant and 10-hydroxycamptothecin (HCPT)-sensitive gastric cancer cells in a study of human gastric adenocarcinoma cell line responses to HCPT treatment164. Moreover, by integrating miRNA target prediction, Gene Ontology (GO) analysis and pathway enrichment, we illustrated the genes and pathways related to carcinogenesis or chemoresponse, such as chemoresistant genes, drug metabolism-associated pathways (cytochrome P450, etc) and cell apoptosis-related pathways (S1P signaling pathway, etc). This study also reported coregulated miRNA-mRNA interactions in response to HCPT, such as the tumor suppressor gene BTG2 in gastric cancer cells, which was targeted by upregulated let-7g, miR-98, and miR-132 in HCPT-resistant cells. We found that approximately 1.3% of the predicted miRNA targets formed reciprocal “up-down” or “down-up” expression relationships with miRNAs, suggesting that the modulation of miRNA-mRNA interactions could be promising biomarkers for cancers.

Integrated expression analysis of miRNAs and mRNAs using microarray technology have revealed signatures implicated in ovarian cancer165,166, breast cancer167,168,169,170,171, metastatic osteosarcoma172, gastric cancer173,174, pancreatic cancer175, oral squamous cell carcinoma176, lung cancer177,178,179, prostate cancer180,181, follicular thyroid tumors182, retinoblastoma183, etc. The miRNA/mRNA interactome not only includes the interactions between an miRNA and its targets but also comprises the TF-miRNA network. For example, Barh et al observed upregulation of HMGA1, E2F6, IRF1, and TFDP1 and downregulation of SUV39H1, RBL1, and HNRPD, all of which are transcription factors, in blood samples of lung adenocarcinoma and squamous cell carcinoma NSCLC subtypes. The study also reported that HMGA1 and TFDP1 play vital roles in the miRNA-TF-miRNA interaction in the molecular mechanisms of tumor genesis.

To discover the potential oncogenic or tumor suppressive miRNA and mRNAs in carcinomas of the bladder, kidney and testis184, deep sequencing technology was used to profile aberrant miRNA and mRNA expressions at the systems level and revealed common hallmarks of human cancers including miRNAs and mRNAs that are involved in cell adhesion processes, p53 signaling, calcium signaling, ECM-receptor and cell cycle pathways, DNA repair and replication processes as well as immune and inflammatory response processes182. Furthermore, the authors also studied the correlation between each miRNA and its targets. Their results showed that disruption of key miRNAs may result in the global aberration of one or more pathways.

Integrated miRNA and mRNA expression analysis can identify the interaction between these molecules as well as the aberrant expression of miRNAs and mRNA. Moreover, new miRNA target genes can be found through the comprehensive study of the expression of miRNA and mRNA coupled with miRNA target prediction software. However, a multitude of interactions between an miRNA and its target mRNAs discovered by expression profiling and prediction software can be identified. For example, a study of inflammatory breast cancer identified 13 miRNAs from 17295 correlated miRNA–mRNA pairs185. Thus, function validation and luciferase reporter assay were used to confirm the miRNA-mRNA interactions. In another study, combined integrated miRNA and mRNA expression profiling identified 11 differentially expressed miRNAs and 35 known and novel target genes of the key molecules miR-200c, miR-205, and mir-375186.

In silico analysis approaches have also been utilized to analyze genome-wide miRNA and mRNA signatures using publically available datasets. For example, Diao et al identified differentially expressed genes between metastatic and non-metastatic osteosarcoma patients using an online dataset and then performed functional enrichment analysis using the DAVID database172. They discovered 134 upregulated genes enriched in 14 subcategories and most significantly in cytoskeleton organization and 189 downregulated genes. In addition, miR-202 and miR-9 were found to regulate CALD1 and STX1A, which were among the upregulated genes. This interaction might be important for OS metastasis. A similar strategy was applied to the identification of miRNA and mRNA biomarkers in prostate cancer180. To assist in the data mining of the vast available datasets, different algorithms have been developed to find significant miRNA-mRNA interactions in different cancer types. To discover disease-specific miRNA-mRNA correlation and regulatory modular networks between primary prostate cancer (PPC) and metastatic prostate cancer (MPC), Zhang et al used microarray data from prostate tumor samples, calculated miRNA-mRNA Pearson correlations, established miRNA-mRNA matrices and identified miRNA-mRNA modules specified in PC subtypes187. Kim et al proposed a data-driven, hypergraph structural model and constructed higher-order miRNA-mRNA networks from prostate cancer profiles188. The model is learned by iteration of structure and parameter learning and characterizes relationships within complex miRNA-mRNA modules showing oncogenic or tumor suppression characteristics, which are known to be associated with the properties of primary and metastatic prostate cancer.

An analysis of the miRNA and mRNA expression behavior of a large cohort of breast, lung, ovarian, and prostate carcinoma patients characterized a cancer-specific mRNA signature that can distinguish diseased and healthy patients, and the mRNAs is likely regulated not only by individual miRNAs but also by networks of miRNAs189. The miRNA and mRNA cancer signature can be characterized much more easily than drawing the interaction networks of miRNA and mRNA.

Summary and future perspectives

Genome-wide approaches of microarray analysis and NGS for the profiling of tumor or cancer cell line gene expression have revealed many specific gene signatures related to different cancer types or clinical information (diagnostic, prognostic, or treatment). High-throughput methods provide a fundamental understanding of the biological dynamics of cancer and more accurate diagnosis, prognosis and prediction in therapeutic management. These gene sets could be used as diagnostic or prognostic markers to monitor the risk of tumor progression and to guide therapeutic treatments. Furthermore, the application of miRNAs from plasma, serum, urine and other body fluids as non-invasive biomarkers in different cancer is undergoing rapid development190. The wide use of miRNA tests in different sample types such as body fluids and FFPE make it more widely applicable than mRNA markers.

Although miRNAs are promising detection and prognosis biomarkers, methodological and technical challenges remain. The variety of methodologies, cancer types, normalization strategies and the difficulties in detecting miRNA in body fluids have led to considerable variability among different platforms and laboratories. For example, the selection of endogenous reference miRNAs in the serum for RT-PCR or validation of miRNA biomarkers is still challenging. RUN6B, which is widely used as a reference gene in tissue and cell samples, is not stably expressed in plasma or serum from different individuals191, and miR-16192 is more frequently used. miR-93193, miR-191-5p, and U6194,195 have also been used as reference genes for serum samples. Other protocols, such as absolute quantitation of miRNA levels196 and identification of input volumes by adding spiked-in miRNA, such as ce-miR-39101, are also used. Normalization of miRNA array data usually utilizes the total RNA expression signal197, while NGS uses the total reads, which is much more stable than one reference gene in RT-PCR.

The gene-expression signatures or panels of miRNA and mRNA may improve diagnostic accuracy, prediction of therapeutic responses and overall survival of patients with cancer. However, miRNA and mRNA signatures cannot independently solve all problems, and combinations of known biomarkers, such as cancer-related antigens, gene mutation and CNV, together with miRNAs and mRNAs will increase the specificity and sensitivity of clinical tests19. Moreover, larger-scale clinical trials and long-term clinical follow-up are needed to validate the published biomarkers.

We are entering a new era with enormous amounts of data generated by NGS and microarray analysis. In the near future, with the advance of cancer research, more sensitive and specific miRNAs/mRNA signatures may be identified and become more routinely used in cancer detection, diagnostic and prognostic clinical trials.