Introduction

Microribonucleic acids (miRNAs) are a class of small (approximately 22 nucleotides) noncoding RNAs that negatively regulate gene expression at the posttranscriptional level. They play profound and pervasive roles in manipulating gene expression involved in cell development, proliferation and apoptosis in various eukaryotes (Sevignani et al. 2006). Also, recent evidence indicated that aberrant miRNA expression in the pathogenesis of several human diseases (van Rooij et al. 2006; Takamizawa et al. 2004; Iorio et al. 2005), revealing that miRNA genes could be a potential target for drug discovery (Sevignani et al. 2006; Liu et al. 2007; Tsuchiya et al. 2006).

In the past few years, several hundred miRNAs were identified in animals and plants, although it is estimated that miRNAs account for 1% of predicted genes in higher eukaryotic genomes (Griffiths-Jones 2004). Only a handful of miRNA, however, have been functionally characterized. These findings, together with the complicated expression patterns and large number of predicted targets, imply that miRNAs may regulate a broad range of physiological and developmental process.

Idnetifying targets of each miRNA is crucial for understanding the biological functions of miRNAs, as miRNA-directed regulation occurs at a posttranscriptional level via interaction with their target messenger RNA (mRNA) to elicit degradation or translational repression of complementary mRNA targets (Behm-Ansmant et al. 2006; Chendrimada et al. 2007). Also, miRNAs are known to be only partially complementary to their targets and can induce significant degradation of many mRNA targets in animals (Bagga et al. 2005; Jing et al. 2005; Giraldez et al. 2006; Rehwinkel et al. 2006). This ambiguity makes it difficult to predict the targets of miRNAs [up to 30% of genes have been predicted to be regulated by miRNAs (Lewis et al. 2003)]. However, most of the predicted miRNA-target pairs remain to be biologically verified, because a simple, high throughput method to biologically validate miRNA-targets does not exist. Several recent studies revealed spatial or temporal avoidance of miRNA coexpression with target genes (Farh et al. 2005; Sood et al. 2006). Thus, genes preferentially expressed at the same time and place as an miRNA have evolved to selectively avoid sites matching the miRNA (Farh et al. 2005). These findings may support the negative correlation between miRNA and their target mRNA expression level as a whole. Based on the idea that negative correlation between miRNA and mRNA expression levels may reflect miRNA-target relationship, we performed a global analysis of both miRNA and mRNA expression across 16 human cell lines. Global correlation analysis of the expression profiles collected revealed a useful approach to identify the miRNA-target interactions.

Materials and methods

Cell lines and RNA purification

We used 16 well-studied, various organ-derived cell lines in this study: human lung carcinoma A549 (Giard et al. 1973), fibrosarcoma HT1080 (Rasheed et al. 1974), cervix carcinoma Henrietta Lacks (HeLa) (Scherer et al. 1953), cervix carcinoma HeLaS3 (subclone of HeLa) (Puck et al. 1956), hepatocellular carcinoma Huh7 (Nakabayashi et al. 1985), breast adenocarcinoma MCF7 (Soule et al. 1973), breast adenocarcinoma MDAMB231 (Cailleau et al. 1974), embryonal kidney HEK293T (subclone of HEK293) (DuBridge et al. 1987), colon adenocarcinoma HT29 (Fogh et al. 1977), hepatocellular carcinoma HepG2 (Aden et al. 1979), neuroblastoma SKNMC (Biedler et al. 1973), colon adenocarcinoma Caco2 (Fogh et al. 1977), embryonal kidney HEK293 (Graham et al. 1977), and colon carcinoma HCT116 (Brattain et al. 1981). Cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM) containing 10% fetal bovine serum (FBS). Human T-cell lukemia Jurkat (Schneider et al. 1977) and chronic myeloid leukemia K562 (Lozzio and Lozzio 1973) cell lines were cultured in Roswell Park Memorial Institute (RPMI) medium containing 10% FBS. Exponentially growing cells were harvested, and total RNA was collected by standard procedures using ISOGEN (Nippon Gene) and chloroform and precipitated with isopropanol. Integrity and purity of RNA was verified with an Agilent Bioanalyzer.

Quantification of miRNAs using stem-loop real-time PCR

Expression of 148 miRNAs was measured by reverse transcription followed by the real-time polymerase chain reaction (RT-PCR) assay as previously described (Gaur et al. 2007). This method uses stem-loop primers for reverse transcription followed by RT-PCR (TaqMan MicroRNA Assays; Applied Biosystems). ABI PRISM™ 7700 Sequence Detector was used to detect amplification. This stem-loop RT-PCR method detects specifically mature, but not precursor, miRNA. RNA input was normalized by measuring U6 expression using the following TaqMan probes.

5′FAM-CCCCTGCGCAAGGA-MGB3′, forward primer: 5′-TGGAACGATACAGAGAAGATTAGCA-3′, reverse primer: 5′-AACGCTTCACGAATTTGCGT-3′. Expression data of a few miRNA was confirmed to be well correlated to those obtained by Northern analysis (data not shown).

Microarray experiments

Genome-wide mRNA expression profiles of 16 human cell lines were obtained by microarray analysis with the Affymetrix GeneChip Human Genome U133 Plus 2.0 Array, according to the manufacturer’s instructions. Briefly, double-stranded complementary DNA (cDNA) was synthesized from total RNA. An in vitro transcription reaction was then carried out to produce biotin-labeled complementary RNA (cRNA) from the cDNA. The cRNA was then fragmented and used for hybridization. The hybridized probe array was subsequently stained and scanned by a Genechip Scanner 3000. We used the robust multiarray analysis (RMA) expression measure that represents the log transform of (background corrected and normalized) intensities of the GeneChips (Gautier et al. 2004). RMA measures were computed using the R package program (http://www.bioconductor.org). The probe sets with the lowest maximal expression across the samples in the data set (30%) were removed.

Data analysis

Experimentally normalized ΔCt values for the miRNA profiles, or normalized microarray data sets for mRNA profiles, were used to classify cells by agglomerative hierarchical clustering based on Euclidean distances between data sets. The ΔCt values for the miRNA profiles were used to evaluate the 16 cell lines by agglomerative hierarchical clustering using average linkage and correlation similarity and verified for significance by multiscale bootstrap resampling analysis (Suzuki and Shimodaira 2006). Classical Pearson’s correlation tests were performed to verify the relationships between expression profiles of miRNA and mRNA. The significance of each correlation was assessed by assuming that the distribution of correlations under the null hypothesis of no correlation follows a t distribution with n − 2 degrees of freedom, where n is the number of measurements in the expression profile.

Target prediction and GO term enrichment analysis

Potential targets for miRNAs were predicted among 3′ untranslated region (UTR) sequences of inversely correlated target transcripts using the miRanda algorithm, which is associated with the Sanger MIRBASE (Enright et al. 2003). algorithm parameters were set as follows: score threshold at 50, energy threshold at −20 kcal/mol, scaling parameter at 4, gap-open penalty at −2, and gap-extend penalty at −8. The significance of enrichment of a list of target genes with genes belonging to a gene ontology (GO) group were scored using weight algorithm (Ashburner et al. 2000; Alexa et al. 2006). The algorithm for GO group scoring was implemented in the R programming language. The results were obtained using R version 2.5.0 and the libraries provided by the Bioconductor project, version 1.14 (Gentleman et al. 2004).

Results

Micro-RNA expression profiles in 16 human cell lines

We performed quantitative measurement of 155 kinds of mature human miRNAs in 16 human cell lines. The probe sets for miRNA with minimal Ct values >35 in all cell lines were excluded from the following analysis. Seven miRNAs were thus excluded, and the remaining 148 different mature miRNAs were used for further analysis (expression data in a text format is provided in Table S1).

To characterize the expression patterns of miRNAs, hierarchical clustering was performed using normalized ΔCt values (Fig. 1). The most prominent feature of the clustered data was that many miRNAs displayed similar expression pattern among the majority of samples, although some miRNAs displayed very specific expression patterns. Hierarchical clustering of 16 cell lines resulted in a dendrogram, which contained small clusters reflecting common tissue origin (e.g., HepG2 and Huh7) or subclones (e.g., HeLa and HeLaS3). We also found some cell lines showed characteristic expression patterns independent of their tissue origins. In particular, Caco2 had a very different expression pattern from HT-29 or HCT-116, although all of them originated from colon carcinoma. Thus, some miRNAs were specifically expressed in Caco2 (e.g., miR-372 and -373), but not in either HT-29 or HCT-116. Determining the differential expression of miRNAs from similar biological backgrounds would be of value to investigate the functions of miRNA.

Fig. 1
figure 1

Hierarchical clustering of 148 microribonucleic acid (miRNA) expression profiles in 16 human cell lines. Expression profiles (ΔCt values) of 148 miRNAs measured in total RNA from cell lines were clustered

To classify expression patterns of miRNAs, we employed hierarchical clustering based on correlation similarity between each miRNA (Fig. 2). This analysis revealed the relationships between miRNAs that had similar expression patterns among cell lines. To assess the robustness of these relationships, we conducted a multiscale bootstrap resampling anaysis (Suzuki and Shimodaira 2006) and verified the significance (Fig. 2). The resulting dendrogram had several statistically significant clusters of miRNAs, suggesting that their expression might be commonly regulated. Noticeably, many miRNAs that coexpressed formed genomic clusters. Frequent coexpression between neighboring miRNAs that formed genomic clusters were previously reported, analyzing miRNA expression patterns across 24 normal human organs (Baskerville and Bartel 2005). To test coexpression between neighboring miRNAs, we examined the relationship between genomic distance and correlation coefficients between neighboring miRNAs. Among the miRNAs measured, miRNAs with several genomic origins, which can be indicated by the presence of multiple premature miRNA, were excluded in the following analysis. Micro-RNAs from the same chromosome and oriented in the same direction were defined as “paired”. For the resulting 224 pairs (miRNA pairs are summarized in Table S2), we examined the relationship between the genomic distance of each miRNA pair and the correlation coefficient of the miRNA expression pattern (Fig. 3). Most pairs of miRNAs separated by <100 kb showed highly positive correlation.

Fig. 2
figure 2

Hierarchical clustering with bootstrap analysis based on correlational expression profiles among 148 microribonucleic acids (miRNAs). Expression profiles of 148 miRNAs were clustered and verified for significance by multiscale bootstrap resampling analysis (1,000 iterations, sampling with replacement). Clusters were scored as statistically significant in cases in which miRNAs clustered with a 99% or better confidence interval (corresponding to p < 0.01), as determined by the bootstrap analysis

Fig. 3
figure 3

Relationship between the distance separating mico-RNA (miRNA) loci and their coordinate expression in human tissues. Each miRNA was paired with each of the others lying in the same orientation on the same chromosome. For each pair, the correlation coefficient for their expression was plotted according to the distance between the two loci (white circle)

Genome-wide gene expression analysis

Next, we obtained a global gene expression profile by employing microarray analysis using the same RNA samples used for quantifying miRNA (microarray data was deposited in Gene Expression Omnibus accession number GSE10021). The miRNAs located in the intronic region of genes are usually coordinately expressed with their host gene mRNA (Baskerville and Bartel 2005). Among 148 miRNAs measured in this study, 40 were derived from the intronic region of genes and examined for their correlation with the corresponding host genes using standard Pearson’s correlation test (Table 1). The strong correlation between intronic miRNA and host genes was observed, suggesting they are derived from the same precursor genes, which might be under the control of the same promoter.

Table 1 Correlation of intronic microribonucleic acid (miRNA) expression with host gene expression

To determine whether the negative correlation observed between miRNA and gene expression reflected miRNA-target gene relationships, we first examined the relationship using a previously identified miRNA-target gene pairs. We examined miR-124a, which has the largest number of known target genes (Lim et al. 2005), as a control. The miR-124a expression level measured by RT-PCR analysis was compared with either those of the known target genes or those of all genes, both obtained by microarray analysis in 16 cell lines. Then, each data set was subjected to Pearson’s correlation test (Fig. 4). The distribution of correlation coefficient for miR-124a-target gene pairs was shifted toward negative side, compared to that of the miR-124a-total gene pairs. The mean of correlation coefficients between the two sets was significantly different (p < 0.01). Furthermore, Fisher’s exact test showed a significant accumulation of target genes within the inversely correlated group of genes [odds ratio (OR) 2.5, p < 0.01], indicating that the negative correlation observed between miRNA and gene expression in this reflects miRNA-target gene relationship.

Fig. 4
figure 4

Distribution of correlation coefficients between miR-124a and known targets of miR-124a. The estimated densities of distributions for target genes (red) or total genes (black) were plotted

To extract a global relationship of miRNA-target genes, all miRNA-mRNA pairs were subjected to Pearson’s correlation test. Of the total 5.7 million different pairs examined, 184,901 miRNA-mRNA pairs (3.2%) showed a negative correlation.

Applying miRNA-target prediction algorithm

Because negative correlation might also reflect indirect regulation of gene expression by miRNAs, we further examined whether the extracted genes contained the corresponding target sites for miRNA in the 3′ UTRs. To seek miRNA-directed target sites, we adopted the miRanda algorithm (Enright et al. 2003) to a set of 30,664 different transcripts enrolled in RefSeq and monitored them by microarray. This analysis yielded 44,572 miRNA-target gene pairs that fulfilled the conditions that the expression level of genes negatively correlated with that of the miRNA, and that the genes contain the target sequence against the miRNA. Hence, approximately 300 genes per miRNA were extracted on average. Based on the extracted miRNA-target gene pairs, we next tried to predict the functions of miRNA. Thus, we calculated the frequency of gene ontology (GO) term (biological process area) of target genes for each miRNA. The GO terms were chosen if the frequency of the GO term was statistically significantly more than that expected by chance. Table 2 lists the top 30 miRNAs that had the highest GO term frequency. Most miRNA-target gene pairs were found to be novel, and the frequently noted GO terms for each miRNA was a diverse and wide variety in the biological process.

Table 2 Gene ontology (GO) term enrichment analysis for predicted microribonucleic acid (miRNA)-target genes

Discussion

To investigate biological functions of miRNA, it is critical to identify miRNA-directed target genes. However, currently available computational methods (e.g., miRanda, PicTar, and TargetScan) predict numerous target genes that contain many false positives for miRNA (Mazière and Enright 2007). Also, experimental verification of miRNA-target relationship is complicated by the potential outcome of such an interaction being either translational repression or degradation. Furthermore, miRNAs can target multiple genes, and thereby the biological function of a single miRNA can be diverse. Hence, not only to achieve a higher degree of specificity of the prediction (few false positives) but also to comprehensively understand the function of miRNA, large-scale prediction of targets across a whole genome would be required.

In this study, we intended to investigate global miRNA-target relationships, assuming a negative correlation between miRNA and gene expression levels as an indicator for the miRNA-target relationship. The collected genome-wide expression data sets of both miRNA and mRNA enabled global correlation analysis between an miRNA and its target gene. Based on the observation that the expression of genes known as miR-124a targets tended to be negatively correlated with miR-124a expression, we reasoned that correlation analysis would be useful in investigating miRNA-target relationship. Then, we further extracted miRNA-target gene pairs by collecting genes that contain miRNA-directed target sites within 3′ UTR. The resulting list of miRNA gene pairs may provide useful information in investigating miRNA-target relationship.

Our miRNA-expression study provides important information on endogenous miRNA expression in 16 cell lines. One of the most serious problems in exploring miRNA function in a cell-base assay is the influence of endogenous miRNA expression. For example, in vitro reporter gene assay, a general approach used for target gene validation, has different results in different cell lines examined. This is thought to be due to the differential expression of endogenous miRNAs in the cell line used, as it strongly affects background expression of the reporter gene containing the test sequence. As we examined genetically tractable and well-studied cell lines in this study, the data obtained should be informative for studying functions of miRNA using those cell lines.

In collecting global miRNA and gene-expression profiles, we observed that many of the neighboring miRNA pairs, which were located within 100-kb region of the chromosome, showed a significantly positive correlation and that the expression of intronic miRNAs was generally correlated with that of host genes. These results were consistent with the idea that clustered miRNAs are processed from the same primary transcript (Suh et al. 2004) and that intronic miRNAs are from the same primary transcript as their host gene (Lau et al. 2001; Sempere et al. 2004). Unexpectedly, however, we observed that miRNAs expressed from the same miRNA precursor (ex, miR-142-3p and -5p) were not coexpressed. This discrepancy could be explained by the fact that expression of these miRNAs depends on asymmetric selection of mature miRNA strand followed by the cleavage of pre-miRNA (Hutvagner 2005). Although it likely involves differential binding to and then differential retention of the two individual RNA strands by Dicer and its associated proteins (Zeng 2006), the detailed molecular mechanism for this asymmetry is not clear. The differential expression among cell lines we studied may provide a clue to the mechanism of asymmetric selectivity.

Correlated expression of miRNA and corresponding genes may imply their association in function as well. GO-based annotation and functional enrichment analysis of the extracted miRNA-target gene pairs made it possible to indicate putative functions of miRNAs. The approach developed in this study should be of value for future studies into the functions of miRNAs.