Resource | Published:

Biogenic mechanisms and utilization of small RNAs derived from human protein-coding genes

Nature Structural & Molecular Biology volume 18, pages 10751082 (2011) | Download Citation

This article has been updated


Efforts to catalog eukaryotic transcripts have uncovered many small RNAs (sRNAs) derived from gene termini and splice sites. Their biogenesis pathways are largely unknown, but a mechanism based on backtracking of RNA polymerase II (RNAPII) has been suggested. By sequencing transcripts 12–100 nucleotides in length from cells depleted of major RNA degradation enzymes and RNAs associated with Argonaute (AGO1/2) effector proteins, we provide mechanistic models for sRNA production. We suggest that neither splice site–associated (SSa) nor transcription start site–associated (TSSa) RNAs arise from RNAPII backtracking. Instead, SSa RNAs are largely degradation products of splicing intermediates, whereas TSSa RNAs probably derive from nascent RNAs protected by stalled RNAPII against nucleolysis. We also reveal new AGO1/2-associated RNAs derived from 3′ ends of introns and from mRNA 3′ UTRs that appear to draw from noncanonical microRNA biogenesis pathways.


New technologies have revealed a wealth of eukaryotic noncoding RNAs (ncRNAs) and added new members to existing families1,2,3,4,5,6. Of the more established classes, microRNAs (miRNAs) are regulators of gene expression ~21–24 nucleotides (nt) in length that control many cellular processes7,8. Canonical miRNAs originate from RNAPII transcripts that form extended hairpins. These are processed by the sequential action of the RNase III family enzymes Drosha and Dicer and are incorporated into complexes containing effector proteins of the AGO family9,10,11.

Not so well established is a diverse set of sRNAs less than 200 nucleotides (nt) in length recently reported to be derived from genes in higher eukaryotes. The best documented are sRNAs from the promoter regions of protein-coding loci. In humans, these include promoter-associated small RNAs (PASRs)12,13 as well as the uncapped TSSa14 and transcription initiation RNAs (tiRNAs)15,16. The latter two species have been proposed to be byproducts of RNAPII arrest followed by backtracking and sRNA liberation by transcription factor IIS (TFIIS)-assisted cleavage of the nascent transcript3,4,15. Uncapped sRNAs whose 3′ termini map precisely to the 3′ end of exons have been suggested to derive from a similar mechanism and are termed splice-site RNAs (spliRNAs)16. Finally, sRNAs are also found at the 3′ end of genes12,17. These termini-associated RNAs (TASRs) can be both sense and antisense (aTASRs) with respect to the gene. The aTASRs carry a non–genomically encoded 5′ poly(U) tail and have been proposed to be generated by an as-yet-unidentified RNA-dependent RNA polymerase17.

To gain a better understanding of the biogenesis and potential utility of some of these sRNA species, we conducted deep sequencing analyses of RNAs of different size ranges and origins. Our study provides mechanistic models for the sources of several reported sRNAs and reveals previously unknown types of AGO1/2-associated molecules.


General mapping patterns of sRNAs from genic regions

To obtain an overview of sRNA species originating from protein-coding genes, we prepared and sequenced libraries of HeLa sRNAs carrying 5′ end monophosphates and size-selected them into pools of 18- to 30-nt sRNAs and 30- to 100-nt sRNAs (HeLa18–30 and HeLa30–100, respectively). The impact of RNA degradation machineries was investigated by depleting the hRRP40 core subunit of the 3′–5′ exo- and endonucleolytic RNA exosome (HeLa18–30(RRP40) and HeLa30–100(RRP40)) or both of the 5′–3′ exonucleases XRN1 and XRN2 (HeLa18–30(XRN1/2) and HeLa30–100(XRN1/2)). Moreover, HeLa nuclear RNAs in two different size ranges (12–20 nt and 18–30 nt) were sequenced (HeLa12–20(N) and HeLa18–30(N)). Finally, to reveal candidates related to the RNA-mediated interference (RNAi) pathway, we also prepared libraries from sRNAs immunoprecipitated by AGO1/2 proteins in either the presence (HeLa18–30(AGO1/2)) or the absence (HeLa18–30(AGO1/2/RRP40)) of hRRP40. All RNA libraries are listed in Supplementary Table 1. Appropriate factor depletion-, immunoprecipitation- or nucleocytoplasmic fractionation efficiencies were verified (Supplementary Fig. 1 and ref. 18). Sequencing reactions yielded from ~1.5 million to ~11 million single-site mappable sRNAs, of which 77% in the AGO1/2 libraries and 50–60% from the other libraries of 18- to 30-nt sRNAs originated from annotated miRNAs. However, a few highly expressed miRNAs dominated, and only 2–8% of all unique RNAs from the non-AGO1/2 18- to 30-nt libraries derived from annotated miRNAs (Supplementary Fig. 2). Thus, many other distinct sRNAs were present.

To focus on genic regions, densities of uniquely mapped sRNAs were plotted using a set of gene-associated reference points: the transcription start site, the 5′ splice site, the 3′ splice site and the 3′ end cleavage-polyadenylation site (Fig. 1a and Supplementary Fig. 2). To avoid biases due to high-read outliers, we removed RNAs overlapping repeat regions as well as known sRNAs (for example, miRNAs, tRNAs and snoRNAs). Moreover, as we were interested in general sRNA patterns, RNA counts were 'collapsed' so that a unique read was only counted once, regardless of how many times it was sequenced. We generally observed similar overall distributions using noncollapsed RNA counts, although high-read outliers (possibly unannotated miRNAs) occasionally confounded results. A schematic overview of the mapping pattern obtained is shown in Figure 1a. It shows that small RNAs are enriched in exons, upstream and downstream of TSSs, at exon-intron borders and near polyadenylation sites.

Figure 1: The majority of gene-derived sRNAs are decay products.
Figure 1

(a) Schematic overview of gene-derived sRNAs identified in this study. Red arrows indicate density and directionality of sequenced sRNAs that are overrepresented at TSSs (yellow), 5′ splice sites (SS, pink), 3′ splice sites (green) and 3′ ends (blue), and to a lesser degree within exons (white). (b) sRNA density in exons versus introns for different sRNA sizes. The y axes show the number of unique sRNAs per million in the indicated library, counting all nucleotides within each tag. The x axes show 3′ and 5′ parts of internal exons as well as intronic regions normalized to the same length. Positions of exon-intron and intron-exon borders are indicated by dashed gray lines.

sRNAs as degradation intermediates of introns and exons

For all sRNA sizes interrogated, mapping densities within exons are ~10- to 25-fold higher than within introns (Fig. 1b and Supplementary Figs. 2 and 3a). Moreover, when mapping sRNAs to cDNA, tag density is largely uniform over exon-exon junctions (Fig. 2a, middle, and Supplementary Fig. 3b,c), arguing that the overall majority of exonic sRNAs originate from spliced mRNA and that the diverse sizes probably arise from different degradation intermediates. Consistently, a larger fraction of sRNAs from HeLa18–30(XRN1/2), HeLa30–100(XRN1/2), HeLa18–30(RRP40) and HeLa30–100(RRP40) libraries map to exons, compared to the HeLa18–30 and HeLa30–100 libraries, in line with a contribution of both 5′–3′ and 3′–5′ exonucleolysis to mRNA decay (Fig. 2a, middle; Supplementary Fig. 3). We speculate that the decrease in sRNA 5′ ends at the −1 and +1 positions has a technical origin and may be due to an adaptor ligation bias toward these nucleotides, as previously reported19.

Figure 2: sRNAs around splice sites.
Figure 2

(a) Density of unique sRNA 5′ ends around exon-intron, exon-exon and intron-exon junctions. The y axes show the number of unique sRNA 5′ ends per million tags of the indicated libraries, counting only the 5′ nucleotide of each RNA. These values are broken up by sizes of the sRNAs (indicated by color) and represented as stacked bar plots. Dashed lines indicate the locations of exon-intron (left), exon-exon (middle) and intron-exon junctions (right). Negative (upstream) and positive (downstream) x axis coordinates are given relative to these locations. Depletion of sRNA 5′ ends just upstream of exon-intron junctions (left) is due to sRNAs partially mapping to the next exon. sRNA 5′ ends positioned just a few nucleotides upstream of the 3′ splice site are due to mapping ambiguities where the 3′ end of the intron is highly similar to the 3′ end of the preceding exon. (b) Model for the production of SSa RNAs. Top of figure schematically shows sRNAs found in this study mapping around exon-intron and intron-exon junctions. Exonic 5′ SSa RNAs are suggested to derive from the decay of upstream exons failing to undergo the second step of splicing (left), whereas intronic sRNAs are suggested to constitute intermediates from the decay of spliced-out introns (right).

One specific exonic feature, which stood out from this general 'background layer' of sRNA, was the presence of 5′ splice site–associated (5′SSa) RNAs whose 3′ ends aligned with the exon-intron border (Fig. 2a, left and middle, and Supplementary Fig. 4a,b), akin to the previously reported spliRNAs16. However, exonic 5′SSa RNAs were detectable independent of RNA size even when considering the RNAs that were 12- to 20-nt and 30- to 100-nt long and thus collectively show a staggered appearance of their 5′ ends (Fig. 2a, left and middle, and Supplementary Fig. 4a,b). The distinct RNA 3′ end positions and the fact that these sRNAs have lengths ranging from 12 to >36 nt, preferentially <15 nt, are difficult to reconcile with the idea that RNAPII backtracking followed by nascent transcript cleavage is the source of their production16. This is because TFIIS-induced relief of backtracked RNAPII typically liberates RNA products up to 9 nt in length and almost never exceeding 14 nt in length20,21,22 (see also Discussion). Instead, the observed mapping pattern is more consistent with 5′–3′ exonucleolytic trimming of exons liberated from their downstream introns, possibly as the result of a failure to undergo the second step of splicing (Fig. 2b, left). Thus, 5′SSa RNAs may be signature molecules of a rate-limiting step in the degradation process, for example, protection by leftover components of the splicing machinery, which could result in the enrichment of certain sRNAs; namely, 18 nt in the HeLa18–30 library and 16–17 nt in the HeLa12–20(N) library (Fig. 2a). Unexpectedly, the 5′SSa 18-nt species is more abundant in the total RNA preparation (HeLa18–30) (Fig. 2a) than in the nuclear fraction (HeLa18–30(N)) (Supplementary Fig. 3b), possibly indicating the export of a fraction of these species into the cytoplasm.

The sRNAs whose termini align to exon-intron borders are also enriched at intronic 5′- and 3′ ends (Fig. 2a, left and right; Supplementary Figs. 3b and 4). Those sRNAs mapping to the 5′ end of introns show 3′ end staggering (Fig. 2a, left, and Supplementary Fig. 4a,b). Conversely, RNAs at intron 3′ ends have staggered 5′ ends (Fig. 2a, right, and Supplementary Fig. 4c,d). Again, these profiles are most compatible with a scenario where introns are degraded exo- and/or endonucleolytically and where the final removal of intron termini provides a rate-limiting step (Fig. 2b, right). Tag densities of these RNAs are increased at both intron ends in the HeLa18–30(XRN1/2) and HeLa30–100(XRN1/2) libraries (Fig. 2a, right and middle, and Supplementary Fig. 4). Thus, with some redundancy provided by 3′–5′ exonucleolysis, intronic SSa RNAs and their precursors appear to be primarily removed by 5′–3′ degradation.


As observed in several previous studies14,15,16, HeLa cells also accumulate TSSa sRNAs that map in both sense and antisense directions with respect to the gene (Fig. 3a). As many promoters have an array of different start sites ('broad promoters') as opposed to a single, predominant TSS ('sharp promoters')23, we subdivided promoters into these two categories according to their TSS distributions as defined by cap-selected RNA 5′ ends (CAGE) data24, and we plotted 3′ end reads for the 18- to 30-nt–sized libraries (Fig. 3a and Supplementary Fig. 5). Broad promoters show a wider distribution of sense TSSa RNAs with the peak of sRNA 3′ ends located between 30 nt and 40 nt downstream of the TSS. Moreover, these promoters are associated with antisense TSSa RNAs whose 3′ ends show an even wider distribution, mapping 150–200 nt upstream of the TSS. Conversely, sharp promoters create a fairly narrow average sense TSSa RNA 3′ end peak at +38–39 nt relative to the TSS, although some 3′ end staggering is also evident (Fig. 3a). Notably, relative levels of antisense TSSa RNA signals from the −250 to −50 region were decreased by a factor of ~1.32 for sharp compared to broad promoters, whereas they increased by a factor of ~1.36 in the sense direction from the +1 to +100 region (Fig. 3a). Both changes are statistically significant (P << 0.001, exact binomial test). Thus, assuming that these RNAs are indicative of RNAPII transcription mechanisms, it appears that sharp promoters provide more accurate directionality, as also previously suggested24.

Figure 3: sRNA density around TSSs is dependent on promoter type and sRNA length.
Figure 3

(a) sRNA density around TSSs of broad and sharp promoters. sRNAs mapping to the sense (+) and antisense (−) strand, with respect to the associated gene, are colored blue and green, respectively. The y axis shows the number of unique sRNA 3′ ends per million tags and promoter. The x axes show their positions relative to the most prominent TSS. (b) Distribution of total sRNA tag per million counts in the +10 to +50 region relative to the TSS of the indicated libraries and as a function of sRNA lengths. (c) Distribution of unique TSSa sRNA 3′ ends per million tags and promoters from the HeLa12–20(N) library. Multi- and single-mapping sRNAs are plotted separately as in a and broken down by sRNA sizes as indicated.

Like exonic 5′ SSa RNAs, TSSa sRNAs have been proposed to be byproducts of RNAPII backtracking3,4,15,16. When we plotted the size distribution of TSSa sRNAs from HeLa18–30 libraries, we found that all sizes could be detected, albeit with a demonstrated preference for RNAs that were <22-nt long and, most prominently, 20-nt long (Fig. 3b). However, when we considered HeLa12–20(N) library reads that map perfectly to either a single or to multiple locations of the genome, sense TSSa RNAs >16 nt in length were clearly enriched immediately downstream of the TSS, whereas smaller RNAs showed a low and uniform density over the region, without any strand preference (Fig. 3c). This lack of enrichment of RNAs <17 nt in length was not due to mapping ambiguities caused by the short length of the sequence tags, as we reliably detected aggregations of similarly sized RNAs around splice sites (Fig. 2a). Thus, the size of TSSa RNAs is restricted to a length of ≥17 nt. As for exonic 5′SSa RNAs, this size range conflicts with the notion that these molecules are liberated as a result of RNAPII backtracking20,21,22, and we therefore considered alternative mechanisms for their origin.

In both Drosophila melanogaster and human cells, a substantial number of genes harbor stalled RNAPII immediately downstream of their TSSs22,25,26,27,28,29. To analyze the relationship between emission of TSSa RNAs and RNAPII positioning, we focused on genes that have one or more sRNA 3′ ends positioned exactly at +38 downstream of 'sharp' CAGE-defined promoters (the peak in Fig. 3a). For all four HeLa18–30 libraries tested, a markedly tight overlap between TSSa RNA 3′ end position and the position of the center of RNAPII as determined by chromatin immunoprecipitation (ChIP)30 was obtained (Fig. 4a and Supplementary Fig. 6a). Moreover, RNAPII levels appear to increase with the number of sRNAs. Taken together, these results strongly suggest that RNAPII and TSSa RNA 3′ ends are positioned as illustrated in Figure 4b. As the RNA residing inside the RNAPII complex is ~17–20 nt in length31, much like the average size of TSSa RNAs, an appealing model is that these transcripts are remnants of the decay of RNAs partly protected by stalled RNAPII complexes failing to resume transcription elongation. If such degradation is caused by 5′–3′ exonucleolysis, it is expected that XRN1/2 depletion would result in a higher proportion of intact RNAs >30-nt long, whose 5′ ends would map to the TSS, and fewer RNAs < 30-nt long ending approximately at position +38. Indeed, this is what we observed when plotting the density of RNA 3′ ends from the HeLa18–30 and HeLa18–30(XRN1/2) libraries and RNA 5′ ends from the HeLa30–100 and HeLa30–100(XRN1/2) libraries within TSS regions of genes having at least one HeLa18–30 sRNA 3′ end between positions +36 and +40 (Fig. 4c and Supplementary Fig. 6b,c). Although not conclusive, this combined pattern suggests that 5′–3′ exonucleolysis is a mechanism for the creation of TSSa RNAs (Fig. 4b).

Figure 4: TSSa RNAs are likely produced by protection of nascent RNA by stalled RNAPII.
Figure 4

(a) Relative positioning of RNAPII and TSSa RNA 3′ ends from sharp promoters harboring one or more sRNA 3′ ends at position +38 relative to the TSS (the peak in Fig. 3a). The y axes show the average number of RNAPII ChIP tags. The x axes show ChIP signal positions, with dashed lines indicating the +38 position. The colored lines indicate how many sRNA 3′ ends at position +38 were required for the promoter to be included in the analysis (see also Supplementary Fig. 6). (b) Model for the production of TSSa RNAs. Transcriptional stalling may in some cases trigger nucleolytic degradation of the nascent RNA 5′ end extruding from RNAPII (see text for details). (c) XRN1/2-dependent production of TSSa RNA. The number of sRNAs per million (counting up to 10 unique sRNAs per position; see Methods) is plotted on the y axis as a function of position relative to the TSS (x axes). Plots on the left show densities of RNA 5′ ends from the HeLa30–100 and HeLa30–100(XRN1/2) libraries, and plots on the right show densities of RNA 3′ ends from the HeLa18–30 and HeLa18–30(XRN1/2) libraries.

Identification of human tailed mirtrons

Although exonic sRNAs associated with TSSs and 5′SSs are generally not bound by AGO1/2 proteins, we observed one peak enriched by the AGO1/2 immunoprecipitate whose sRNA 5′ ends align with the 5′ ends of introns (Fig. 5a, left) and two additional peaks close to intron 3′SSs (Fig. 5a, right). All these sRNAs were enriched for molecules 20–24 nt in length, with 22 nt, the average size of miRNAs9, being the most prominent (Fig. 5b). The sharp 3′SS proximal sRNA peak was positioned such that the 3′ ends of its reads precisely coincided with the intron-exon junction (Fig. 5c and Supplementary Fig. 7). In cases where the read extended across the 3′SS, the additional nucleotides were most often nontemplated additions (Supplementary Fig. 8). The 5′ ends of the upstream and broader peak centered ~60 nt from the 3′SSs, reflecting a distance typical of the size of a precursor miRNA (pre-miRNA).

Figure 5: Human tailed mirtrons.
Figure 5

(a) sRNAs at intron termini are enriched in AGO1/2 immunoprecipitates. Numbers of unique sRNA 5′ ends from the HeLa18–30(AGO1/2) and the HeLa18–30 libraries are shown relative to the 5′ splice site (left) or 3′ splice site (right) regions. 3′ ends of the latter class coincide with the intron-exon junction. Dashed lines indicate splice junctions. (b) Size distribution of 3′ SSa sRNAs. The y axes show the total number of sRNAs per million tags in the −18 to −30 and −50 to −80 regions from the indicated libraries. The x axes show sRNA lengths. (c) University of California, Santa Cruz (UCSC) genome browser view of a putative 5′ tailed mirtron in the EEF1G gene. Only sRNA reads from the HeLa18–30 and HeLa18–30(AGO1/2) libraries are shown. Unique sRNAs are colored according to how many times they were sequenced; see color legend on the left. Shown at the bottom is the exon structure using UCSC gene annotation and the degree of conservation between 28 vertebrate species (Phastcons conservation). The predicted secondary structure with the nucleotides corresponding to the most frequent reads in red boxes is shown to the right. 'BP1' and 'BP2' point to two consensus branch-point sequences. (d) Genome browser view of a putative 3′ tailed mirtron in the human KHSRP gene. Conventions are as in c. (e) Validation of expression of both arms of the EEF1G-derived mirtron shown in c by splinted ligation of HeLa cell total RNA. Bands correspond to sRNA ~22 nt in length (~36 nt ligation products). '3p' and '5p' denote the 3′ arm and 5′ arm of the EEF1G-derived mirtron, respectively. 'No RNA controls' are reactions in which RNA was omitted.

These observed patterns of putative miRNA 5′ or 3′ ends aligning with exon-intron or intron-exon junctions are reminiscent of a mirtron biogenesis pathway, best known from D. melanogaster32,33, where the pre-miRNA is generated by pre-mRNA splicing instead of processing by Drosha. We therefore searched our data for candidate human mirtron genes. To this end, we subjected 289 mirtron candidates to a set of criteria (see Methods); most importantly, AGO1/2 protein-association as defined by their presence in the HeLa18–30(AGO1/2) or HeLa18–30(AGO1/2/RRP40) libraries, the propensity of the predicted pre-miRNA to fold into an RNA hairpin, and alignment of sRNA termini with at least one of the SSs. This analysis resulted in 37 newly revealed, confidently annotated putative mirtrons (Supplementary Table 2). Unexpectedly, only one of these, located in the ZYX gene, had both hairpin ends at the intron junctions. Inspection of the remaining candidates (see Fig. 5c,d and Supplementary Fig. 7 for examples), revealed a read signature characteristic of so-called tailed mirtrons, where only one pre-miRNA end is defined by splicing, and the other is processed by removal of the flanking tail as recently reported in D. melanogaster34. However, unlike in flies, where all known tailed pre-mirtrons bear 3′ extensions, we only identified two such examples (Fig. 5d and Supplementary Table 2). The remaining ones carried tails at their 5′ ends (Fig. 5c and Supplementary Table 2). The sRNA read pattern from a 5′ tailed Mus musculus mirtron35 (mmu-miR-1982) is very similar to the patterns found here, arguing that their biogenesis is also similar. Importantly, mmu-miR-1982 sRNA reads are depleted in Dicer, but not Drosha, knockout cells, demonstrating Dicer-dependent, Drosha-independent biogenesis35. We used a splinted ligation technique36 to validate expression of a representative 5′ tailed mirtron candidate in the EEF1G gene (Fig. 5e). Furthermore, we reclassified seven previously annotated human miRNAs as five 5′-tailed and two 3′-tailed mirtrons. Finally, a substantial fraction of the identified 5′-tailed mirtron reads carry 3′ non-templated A and/or U additions (Supplementary Fig. 8 and data not shown), a feature that is also frequently found on canonical microRNAs37.

The biogenesis pathway of the mature AGO1/2-bound miRNAs from these loci is unlikely to follow the exosome-dependent route described in flies34. Perhaps reflecting the requirement that 5′ exonucleases with limited processivity remove their tails, human introns harboring 5′-tailed mirtrons are markedly shorter than the overall average (1,069 nt versus 6,150 nt; Supplementary Table 2).

Argonaute-associated sRNAs derived from mRNA termini

We also found an enrichment of AGO1/2-associated sRNAs in 3′ untranslated regions (UTRs) compared to upstream protein-coding exons (Fig. 6a). Within 3′ UTRs, AGO1/2-associated sRNAs 22–24 nt in length (Fig. 6b) particularly cluster close to the mRNA 3′ end (Fig. 6c, compare non-AGO1/2 immunoprecipitate libraries (HeLa18–30) to HeLa18–30 (AGO1/2) libraries). Visual inspection of selected loci confirmed the presence of miRNA-sized sRNAs whose 3′ ends aligned with the annotated polyadenylation sites, suggesting either the canonical pre-mRNA 3′ end cleavage machinery in their biogenesis, or 3′ nucleolytic trimming of polyadenylated transcripts. We refer to these sRNAs as transcription termination site–associated (TTSa) RNAs. The most prominent example was found at the end of the RPL5 gene (Fig. 6d). Additional examples are shown in Supplementary Figure 9. Again, splinted ligation36 was used to validate the presence of TTSa RNAs of the expected size originating from the RPL5 locus. Importantly, the signal was enriched in the AGO1/2 immunoprecipitate material (Fig. 6e). The regions surrounding the AGO1/2-associated TTSa RNAs have poor potential to form secondary structures (data not shown), and we did not find evidence for molecules corresponding to the respective passenger strand10. Therefore, these sRNAs are probably not generated by the canonical miRNA biogenesis pathway.

Figure 6: AGO1/2-associating sRNAs located in 3′ UTRs.
Figure 6

(a) Ratio of unique (left) or all mapped (right) sRNAs in 3′ UTRs versus protein-coding exons (CDS) in the indicated libraries. (b) sRNA size distribution near gene termini. The y axes show the total number of sRNAs from the indicated libraries whose 5′ ends map within the −(35–18) region upstream of the 3′ UTR end, normalized by library size. The x axes show sRNA lengths. (c) Density of unique sRNAs within 3′ UTRs. The y axes show the number of unique sRNAs normalized by library size. The x axes show the location within the last 50% of annotated unique spliced 3′ UTRs, where all UTRs are normalized to the same length. (d) Genome browser view of HeLa18–30(AGO1/2) and HeLa18–30(AGO1/2/RRP40) sRNA reads mapping to the RPL5 gene. Unique RNAs are colored (legend on the left) according to how many times they were sequenced. The canonical 'A(A or U)UAAA' polyadenylation signal sequence and the G- and U-rich sequence downstream of the cleavage site (vertical red arrow) are boxed. Also shown is the track for experimentally verified (green bar) and predicted (purple bar) polyadenylation sites. (e) Detection of RPL5-derived sRNAs by splinted ligation in total HeLa- or Ago-immunoprecipitated RNA. The miR-21 band (positive control) corresponds to an sRNA ~22 nt in length (~36 nt ligation product), and the RPL5-derived band corresponds to an sRNA ~24 nt in length (~38 nt ligation product).


In recent years a multitude of previously unknown eukaryotic ncRNAs have been exposed. As most of these discoveries are not directed by genetic analyses, there is a growing need to sort these molecules by their modes of biogenesis and putative function. Here, we have focused on RNA species <100 nt in size originating from within and in close proximity to protein-coding genes. To get a complete view of the general origin of these sRNAs, we mapped library reads to both genomic and cDNA (mature mRNA) sequence information after filtering the data against highly expressed RNA species that would otherwise obscure any generic features. In all libraries investigated, sequence reads were derived more frequently from exonic than intronic regions (Fig. 1b), and read density was typically constant over exon-exon boundaries (Fig. 2a), indicating that these sRNAs derived from the degradation of mature mRNA. Moreover, we suggest that the prominent sRNA peaks not associated with AGO1/2, near the 3′ end of exons and at both intron termini, also originate from RNA decay. This is because one end of these reads generally aligns with exon-intron or intron-exon junctions, whereas the other appears staggered, creating multiple sRNA lengths of 12–100 nt that are consistent with exonucleolysis. Such a pattern most likely stems from degradation intermediates of 5′ exons that failed to complete splicing as well as the 3′–5′ and/or 5′–3′ removal of excised and debranched introns (Fig. 2b). The preferred size of exonic 5′SSa RNAs from the HeLa18–30 library is 18 nt, which was previously suggested to be a conserved feature of these molecules16. However, in the HeLa12–20(N) library, similarly positioned sRNAs of 12–17 nt in length with their 3′ ends aligned to the exon-intron border are detected at high density compared to flanking regions. We suggest that this represents a constraint to RNA decay, either as a result of the intrinsic properties of the responsible degradation enzyme(s) or by obstructing RNA binding proteins, possibly by splicing factors that remain associated with splicing intermediates. Similarly, we propose that the sRNA peaks positioned at the 5′ and 3′ ends of introns result from the same kind of rate-limiting steps of complete intron removal (Fig. 2b).

Because of the presence of 5′-monophosphate and 3′-hydroxyl groups, TSSa sRNAs have been proposed to arise from endocleavage of nascent RNA 3′ ends extruding from the RNAPII exit channel following backtracking away from impediments in transcription3,4,15,16. According to this model, realignment of the RNA 3′ end with the RNAPII active site would require TFIIS, which triggers an RNA cleavage activity intrinsic to RNAPII. Backtracking has been studied in vitro20,21 and in D. melanogaster S2 cells in vivo22, and in both systems the majority of TFIIS-dependent liberated RNA fragments were found to be in the size range of 4–14 nt. This appears to be incompatible with observations in this study that TSSa RNAs are predominantly ≥17 nt long (Fig. 3c). Rather, the minimal TSSa RNA length of 17 nt fits very well with the size of the nascent RNA residing inside, and presumably protected by, the RNAPII complex31 (Fig. 4b). In line with this idea, we find a strong correlation between the position of TSSa RNA 3′ ends and the center of RNAPII as defined by its ChIP sequencing (ChIP-Seq) peak (Fig. 4a), suggesting that this molecular arrangement indeed takes place in vivo. Data from D. melanogaster S2 cells have shown that the RNAPII ChIP-Seq peak corresponds to the position of RNAPII on the DNA template after backtracking22,26,38, making it further unlikely that TSSa production results from TFIIS-induced endocleavage, as TSSa RNA 3′ ends and the catalytic center of RNAPII would then have to be offset by ~20 nt relative to each other. We instead suggest that TSSa RNAs arise as a result of unsuccessful transcription elongation events, after RNAPII stalling at, for example, the +1 nucleosome. Notably, this does not rule out that backtracking-mediated TFIIS cleavage also occurs, generating sRNAs too short for our libraries to capture. Moreover, our data suggest that 5′–3′ exonucleolysis may contribute to TSSa RNA production (Fig. 4b,c). One intriguing possibility, therefore, is that an early transcriptional 'checkpoint' is associated with RNAPII stalling to discard transcription complexes erroneously engaged in the elongation of uncapped RNAs, a phenomenon previously reported in Saccharomyces cerevisiae39.

mRNA processing generates diverse miRNA-class small RNAs

First discovered in D. melanogaster and the nematode Caenorhabditis elegans, mirtrons are a class of short introns that can be spliced and debranched to form pre-miRNA mimics, thereby bypassing the need for Drosha to directly undergo Dicer cleavage and incorporation into silencing complexes32,33. Computational methods and high-throughput sequencing later suggested the presence of mirtrons in vertebrates ranging from Gallus gallus (chicken) to humans35,40,41,42,43. In contrast to mirtrons where the ends of the hairpin coincide precisely with both splice sites, only the 5′ end of the D. melanogaster locus mir-1017 coincides with the 5′SS. To allow Dicer cleavage, the tail separating the pre-miRNA hairpin from the 3′SS needs to first be trimmed by the exosome34.

Here, we identify 36 tailed human mirtrons (Supplementary Table 2). Because mirtrons have more lenient secondary structure requirements compared to classical miRNAs, often tolerating an extended stem or an increased size of the terminal loop44, this number is probably underestimated. It thus appears that tailed mirtrons constitute an underappreciated subgroup of human miRNAs. Despite the fact that a few identified candidates have already been annotated in miRBase45, none of the affiliated papers classify them as tailed mirtrons. Notably, in 34 out of 36 cases, the 3′ end of the proposed pre-miRNA coincides with the 3′SS, whereas the 5′ end is separated from the 5′SS by a tail of variable length. Recent studies in murine35, avian41 and bovine42 cells have identified a total of nine tailed mirtrons, all of which are 5′ tailed, suggesting that the 5′ tail preference is conserved in vertebrates. How this 5′ tail is removed before Dicer processing remains unknown. Possible mechanisms include 5′–3′ exonucleolysis or endonucleolysis. The putative mirtrons are evolutionarily poorly conserved, even among mammalian genomes. Lack of conservation of mirtrons has also been observed between closely related Drosophila species46, confirming that mirtrons evolve more rapidly than canonical miRNAs.

Ultimately, the function of these newly discovered species of AGO1/2-associated RNAs remains enigmatic. They might operate like canonical miRNAs by regulating gene expression in trans. Alternatively, their location, in particular the overlap of the precursor with the branch point and the polypyrimidine tract—two important splicing elements—positions them to putatively influence splicing of their host introns in cis. Indeed, intronic sequences with secondary structures reminiscent of mirtrons have been shown to be involved in alternative splicing47. Conversely, splicing regulators might affect the efficiency of mirtron production. Notably, the majority of the candidate 5′ tailed mirtrons fold into hairpins, such that the splicing branch-point consensus sequences ('YUNAY')48 of the host intron fall into the loop region between the two arms (Fig. 5c and Supplementary Fig. 7), which is often targeted by regulators of miRNA biogenesis, including splicing factors such as K(H)SRP, ASF/SF2 and hnRNPA7.

We also identified another rare class of sRNAs (TTSa RNAs) that was enriched by AGO1/2 immunoprecipitate. Notably, the 3′ ends of many of these ~23-nt long species align with the polyadenylation tail addition site of annotated genes (Fig. 6d and Supplementary Fig. 9), suggesting that mRNA 3′ end processing is part of their biogenesis pathway. The absence of any hairpin potential also strongly suggests that TTSa RNAs are generated by a mechanism distinct from known microRNA maturation. It is noteworthy that overlap analysis of TTSa RNAs from our AGO1/2 libraries and the recently discovered aTASRs, which run antisense to the very 3′ end of annotated transcripts17, shows that 73% of mRNA 3′ ends with at least one TTSa RNA read overlap with an aTASRs (Fig. 6d, Supplementary Fig. 9 and data not shown). Thus, a TTSa RNA biogenesis pathway involving RNA–RNA pairing between an aTASR and the mRNA 3′ end is one possibility. We also found several selected examples of 3′ UTR-derived AGO1/2-associated sRNAs that do not map to polyadenylation sites (data not shown). These may be similar to the Schizosaccharomyces pombe49 primal RNAs (priRNAs) and/or PIWI RNAs (piRNAs) found in flies and mammals50,51,52,53,54, because it is suggested that both these classes are processed from single-stranded host mRNA molecules. Our finding therefore points to these genic regions as conserved sources of sRNAs capable of interacting with a broad spectrum of AGO family proteins.

New sequencing technologies are identifying RNAs at a fast pace, creating an increasing gap between sRNA identification and characterization of their function and biogenesis. We used AGO1/2 association to suggest the function of sRNAs originating from human genic regions and found that such molecules are derived from introns and 3′ UTRs. Future analyses will reveal whether they operate as bona fide miRNAs and/or interrelate with the processing reactions from which they may derive. Although TSSa and SSa RNAs are not bound by AGO1/2 proteins, a putative function cannot be readily dismissed. However, we note for now that these sRNAs are probably signature molecules that reveal mechanistic features of eukaryotic transcription and splicing.


Cell culture, RNAi, RNA preparation and western blot analysis.

HeLa cells were grown in DMEM GlutaMAX medium (Invitrogen) supplemented with 10% (v/v) fetal bovine serum. Transfections were done with 20 nM siRNA for 3 d and repeated for another 3 d, each time using Lipofectamin2000 as transfecting agent according to the manufacturer's instructions (Invitrogen). XRN1, XRN2, RRP40 and control (eGFP) siRNA sequences were as previously described55. Total RNA was extracted using TRIzol-reagent (Invitrogen) according to the manufacturer's instructions. RNA was subjected to DNase I treatment, repurified by phenol-chloroform extraction and reprecipitated. Western blot analyses were carried out according to standard procedures and developed by enhanced chemoluminescence (ECL Plus; GE Healthcare). Polyclonal anti-XRN2 antibodies (A301-102A-1) were purchased from Bethyl Laboratories. Polyclonal anti-XRN1 and anti-RRP40 antibodies were gifts from J. Lykke-Andersen and G. J. Pruijn, respectively. Monoclonal anti-hnRNPc and anti-ADAR1 antibodies were gifts from D. L. Black and K. Nishikura, respectively.

AGO1/2 immunoprecipitations.

Monoclonal antibodies for human AGO1 (4B8)56 and AGO2 (11A9)57 were coupled to protein G–Sepharose beads overnight at 4 °C. Beads were washed once with PBS buffer and twice with lysis buffer (25 mM Tris-HCl, pH 7.4, 150 mM KCl, 0.5% (v/v) NP-40, 2 mM EDTA, 1 mM NaF). HeLa cell pellets (300 mg each) were lysed in ten volumes of lysis buffer for 20 min on ice. Lysates were cleared by centrifugation at 17,000g for 30 min before adding them to the beads. After 4 h of rotation at 4 °C, the beads were washed three times with wash buffer (300 mM NaCl, 50 mM Tris-HCl, pH 7.4, 1 mM MgCl2 and 0.1% (v/v) NP-40) and once with PBS. RNA was recovered by proteinase K digestion, followed by acidic phenol extraction and ethanol precipitation. Immunoprecipitation efficiency was assessed by carrying out western blot analysis on a fraction of the protein recovered on the beads (Supplementary Fig. 1c).

Library construction and sequencing.

Library construction and sequencing was a paid service from the Beijing Genome Institute (BGI). In brief, RNA of the desired size was isolated from polyacrylamide gels and ligated to 3′ and 5′ adaptor oligonucleotides. Ligation products were purified, reverse transcribed and PCR amplified. Sequences of the adaptors and primers are as published (Illumina). Samples were sequenced on an Illumina 2G Genome Analyzer.

Splinted ligation.

Small RNA detection by splinted ligation was carried out essentially as described36. The following oligonucleotides were used: ligation oligonucleotide: 5′-CGCTTATGACATT-3ddC-3′ (where '3ddC' denotes a 2′,3′-dideoxycytidine residue); bridging oligonucleotides: RPL5: 5′-GAATGTCATAAGCGGCTGTTCATAAGTTTATTATCTAT-3′; EEF1G 3p: 5′-GAATGTCATAAGCGGCTGGTGCAGAGGAAGGCAGGAAA-3′; EEF1G 5p: 5′-GAATGTCATAAGCGTGTTCTGCCTCTTTCCACACCCCT-3′.

sRNA mapping.

We used Bowtie58 with default settings to map all >18-nt libraries to hg18. If more than ten hits were found, the rest were discarded. Reads were normalized for multiple alignments (number of reads per alignments), and unless otherwise specified (for example, single mapping), we used these normalized scores. For the 12-nt to 20-nt library, we required zero mismatches and used a threshold of 50 hits.

Filtering and collapsing of reads.

Unless otherwise specified, we used the following conventions: we filtered reads overlapping repeats, RNA genes (wgRna, rnaGenes and rmsk from UCSC hg18) or the genes targeted in the knockdown. We collapsed sRNA reads so that identical reads could only contribute with a count of 1, regardless of how many times it was sequenced.

RNA distributions around reference locations.

We used FANTOM3 human CAGE-tag clusters24 with >30 tags for defining TSSs unless otherwise specified. We selected the largest peak within each tag cluster as our reference location. We used GM-distance clustering59, to divide the tag clusters into single and broad peaks, similar to what has previously been done24. Collapsed single-mapping sRNAs were summed for each position around each TSS. For 12-nt to 20-nt long reads, we used both single- and multimapping reads.

University of California, Santa Cruz (UCSC) gene annotations from assembly hg18 were used for TSSs, splice sites and 3′ ends. We only counted a unique location once even if multiple isoforms detected it. For each set, we counted the sum of unique single-mapping sRNAs as above. For RNAs 12–20 nt in length, we also used multimapping sRNAs. For assessing spliced mRNAs, all known genes in UCSC were spliced together, and we mapped sRNAs within the 3,000 nt upstream of the TSS to 1,000 nt downstream of the TTS of the spliced fragments. Only internal exons and perfect sRNA matches were used in the analysis. We normalized mapped counts by the total number of alignments. When assessing the distribution of sRNAs over introns or 3′ UTRs, we normalized individual regions by dividing by the total length of the region. For 3′ UTRs, this was done on spliced mRNAs.

TTS overlap analysis.

We mapped collapsed reads to within 10 nt of unique TTS and overlapped these with aTASR reads17. The P value was calculated with a Fisher's exact test using the number of TTS with aTASRs, TTSa RNAs, none or both in the contingency table.

Densities of sRNA within specific gene features.

For a given feature, we counted the number of sRNA reads (total or unique) whose 5′ or 3′ ends were located within that feature (Supplementary Fig. 2) on the same strand. The exonic category did not include the overlaps with other exon-derived categories, and it took precedence over intronic in cases of multiple annotations. Reads were filtered as described above in the 'Filtering and collapsing of reads' section, except for the miRNA category.

RNAPII ChIP sequencing.

ChIP data from ref. 30 were processed with MACS60, using standard settings.

XRN1/2-dependent production of TSSa RNA.

We analyzed CAGE tag clusters as above. To account for relative abundance, we counted up to ten identical sRNAs instead of collapsing unique reads (Supplementary Fig. 6). We used promoters with 3′ ends of HeLa18–30 sRNAs between positions +36 and +40.

Annotation of newly identified mirtrons.

The following set of criteria was applied for confident annotation of tailed mirtrons. (i) Multiple sequence reads are detected. (ii) Sequence reads are mapped to both arms of a predicted stem loop structure. (iii) Both the hairpin and one of the sequenced arms precisely flank a splice site. (iv) There are no multiple reads covering the expected Dicer cleavage site. (v) At least one of the arms is detected by deep sequencing of RNA immunoprecipitated with anti-AGO1/2 antibodies. (vi) Lastly, there is no annotation suggesting non-miRNA biogenesis.

Accession Codes.

All RNA sequence data have been deposited in the NCBI Gene Expression Omnibus (GEO) database under accession number GSE29116.

Change history

  • 21 August 2011

    In the version of this article initially published online, in Figure 5a, the x-axis tick marks and labels were placed incorrectly; in Figure 5c, there were two extraneous tracks; and in Figure 5d, the y-axis label was missing, a stem in the RNA was incorrectly colored in gray (instead of red) and the sRNA tracks were incorrectly shifted to the left. These errors have been corrected for the print, PDF and HTML versions of this article.


Gene Expression Omnibus


  1. 1.

    The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs. Nat. Rev. Genet. 10, 833–844 (2009).

  2. 2.

    & Whole genome transcriptome analysis. RNA Biol. 6, 107–112 (2009).

  3. 3.

    , , & Evolution, biogenesis and function of promoter-associated RNAs. Cell Cycle 8, 2332–2338 (2009).

  4. 4.

    , , & Divergent transcription: a new feature of active promoters. Cell Cycle 8, 2557–2564 (2009).

  5. 5.

    , , & RNA polymerase plays both sides: vivid and bidirectional transcription around and upstream of active promoters. Cell Cycle 8, 1106–1107 (2009).

  6. 6.

    & Pervasive transcription constitutes a new level of eukaryotic genome regulation. EMBO Rep. 10, 973–982 (2009).

  7. 7.

    , & The widespread regulation of microRNA biogenesis, function and decay. Nat. Rev. Genet. 11, 597–610 (2010).

  8. 8.

    & Origins and mechanisms of miRNAs and siRNAs. Cell 136, 642–655 (2009).

  9. 9.

    MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009).

  10. 10.

    & Argonaute proteins: key players in RNA silencing. Nat. Rev. Mol. Cell Biol. 9, 22–32 (2008).

  11. 11.

    & Argonaute proteins: mediators of RNA silencing. Mol. Cell 26, 611–623 (2007).

  12. 12.

    et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).

  13. 13.

    Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457, 1028–1032 (2009).

  14. 14.

    et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008).

  15. 15.

    et al. Tiny RNAs associated with transcription start sites in animals. Nat. Genet. 41, 572–578 (2009).

  16. 16.

    et al. Nuclear-localized tiny RNAs are associated with transcription initiation and splice sites in metazoans. Nat. Struct. Mol. Biol. 17, 1030–1034 (2010).

  17. 17.

    et al. New class of gene-termini-associated human RNAs suggests a novel RNA copying mechanism. Nature 466, 642–646 (2010).

  18. 18.

    et al. PROMoter uPstream Transcripts share characteristics with mRNAs and are produced upstream of all three major types of mammalian promoters. Nucleic Acids Res. published online, doi:10.1093/nar/gkr370 (19 May 2011).

  19. 19.

    et al. Limitations and possibilities of small RNA digital gene expression profiling. Nat. Methods 6, 474–476 (2009).

  20. 20.

    & Variation in the size of nascent RNA cleavage products as a function of transcript length and elongation competence. J. Biol. Chem. 270, 30441–30447 (1995).

  21. 21.

    & The increment of SII-facilitated transcript cleavage varies dramatically between elongation competent and incompetent RNA polymerase II ternary complexes. J. Biol. Chem. 268, 12874–12885 (1993).

  22. 22.

    et al. Global analysis of short RNAs reveals widespread promoter-proximal stalling and arrest of Pol II in Drosophila. Science 327, 335–338 (2010).

  23. 23.

    et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat. Rev. Genet. 8, 424–436 (2007).

  24. 24.

    et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).

  25. 25.

    , , , & A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77–88 (2007).

  26. 26.

    et al. RNA polymerase stalling at developmental control genes in the Drosophila melanogaster embryo. Nat. Genet. 39, 1512–1516 (2007).

  27. 27.

    et al. RNA polymerase is poised for activation across the genome. Nat. Genet. 39, 1507–1511 (2007).

  28. 28.

    et al. c-Myc regulates transcriptional pause release. Cell 141, 432–445 (2010).

  29. 29.

    et al. Nucleosome organization in the Drosophila genome. Nature 453, 358–362 (2008).

  30. 30.

    et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009).

  31. 31.

    et al. Structure-function studies of the RNA polymerase II elongation complex. Acta Crystallogr. D Biol. Crystallogr. 65, 112–120 (2009).

  32. 32.

    , , , & The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89–100 (2007).

  33. 33.

    , & Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83–86 (2007).

  34. 34.

    , , , & MicroRNA biogenesis via splicing and exosome-mediated trimming in Drosophila. Mol. Cell 38, 900–907 (2010).

  35. 35.

    , , , & Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev. 22, 2773–2785 (2008).

  36. 36.

    , , & Direct detection of small RNAs using splinted ligation. Nat. Protoc. 3, 279–287 (2008).

  37. 37.

    et al. A comprehensive survey of 3′ animal miRNA modification events and a possible role for 3′ adenylation in modulating miRNA targeting effectiveness. Genome Res. 20, 1398–1410 (2010).

  38. 38.

    et al. NELF and GAGA factor are linked to promoter-proximal pausing at many genes in Drosophila. Mol. Cell. Biol. 28, 3290–3300 (2008).

  39. 39.

    , , & The yeast 5′–3′ exonuclease Rat1p functions during transcription elongation by RNA polymerase II. Mol. Cell 37, 580–587 (2010).

  40. 40.

    , , , & Mammalian mirtron genes. Mol. Cell 28, 328–336 (2007).

  41. 41.

    et al. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res. 18, 957–964 (2008).

  42. 42.

    et al. Repertoire of bovine miRNA and miRNA-like small regulatory RNAs expressed upon viral infection. PLoS ONE 4, e6349 (2009).

  43. 43.

    et al. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24, 992–1009 (2010).

  44. 44.

    et al. Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans. Genome Res. 21, 286–300 (2011).

  45. 45.

    & miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 39, D152–D157 (2011).

  46. 46.

    et al. Evolutionary flux of canonical microRNAs and mirtrons in Drosophila. Nat. Genet. 42, 6–9, author reply 9–10 (2010).

  47. 47.

    et al. RNA secondary structure in mutually exclusive splicing. Nat. Struct. Mol. Biol. 18, 159–168 (2011).

  48. 48.

    , , & Human branch point consensus sequence is yUnAy. Nucleic Acids Res. 36, 2257–2267 (2008).

  49. 49.

    & Dicer-independent primal RNAs trigger RNAi and heterochromatin formation. Cell 140, 504–516 (2010).

  50. 50.

    et al. Collapse of germline piRNAs in the absence of Argonaute3 reveals somatic piRNAs in flies. Cell 137, 509–521 (2009).

  51. 51.

    et al. Specialized piRNA pathways act in germline and somatic tissues of the Drosophila ovary. Cell 137, 522–535 (2009).

  52. 52.

    et al. A regulatory circuit for piwi by the large Maf gene traffic jam in Drosophila. Nature 461, 1296–1299 (2009).

  53. 53.

    , , , & An in vivo RNAi assay identifies major genetic and cellular requirements for primary piRNA biogenesis in Drosophila. EMBO J. 29, 3301–3317 (2010).

  54. 54.

    et al. A broadly conserved pathway generates 3′UTR-directed primary piRNAs. Curr. Biol. 19, 2066–2076 (2009).

  55. 55.

    et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).

  56. 56.

    et al. A human snoRNA with microRNA-like functions. Mol. Cell 32, 519–528 (2008).

  57. 57.

    , , , & A multifunctional human Argonaute2-specific monoclonal antibody. RNA 14, 1244–1253 (2008).

  58. 58.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  59. 59.

    , , & Systematic clustering of transcription start site landscapes. PLoS ONE (in the press).

  60. 60.

    et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

Download references


We thank A. Jacquier, A.H. Lund, K. Adelman and members of the T.H.J. and A.S. laboratories for stimulating discussions. The following colleagues are acknowledged for sharing antibodies: J. Lykke-Andersen (Division of Biology, University of California, San Diego), D.L. Black (Howard Hughes Medical Institute, Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles), G.J. Pruijn (Department of Biomolecular Chemistry, Nijmegen Center for Molecular Life Sciences, Institute for Molecules and Materials, Radboud University) and K. Nishikura (The Wistar Institute). This work was supported by the Danish National Research Foundation, the Danish Cancer Society and the Lundbeck Foundation (to T.H.J.) and the EU 7th Framework Programme (FP7/2007–2013)/ERC grant agreement 204135, the Novo Nordisk Foundation, the Danish Cancer Society and the Lundbeck Foundation (to A.S.). E.V. was supported by the Danish Council for Independent Research. P.P. was the recipient of a research grant from the Lundbeck Foundation during part of this work. Work in the laboratory of G.M. was supported by the Bayerisches Staatsministerium für Wissenschaft, Forschung und Kunst (BayGene), the European Union (ERC grant 'sRNAs') and the Deutsche Forschungsgemeinschaft (DFG, Me 2064/2-2 and FOR855). Sequencing was carried out at the Beijing Genome Institute (BGI) in Shenzhen, China.

Author information

Author notes

    • Eivind Valen
    • , Pascal Preker
    •  & Peter Refsing Andersen

    These authors contributed equally to this work.


  1. The Bioinformatics Centre, Department of Biology and the Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.

    • Eivind Valen
    • , Xiaobei Zhao
    • , Yun Chen
    •  & Albin Sandelin
  2. Centre for mRNP Biogenesis and Metabolism, Department of Molecular Biology, Aarhus University, Denmark.

    • Pascal Preker
    • , Peter Refsing Andersen
    •  & Torben Heick Jensen
  3. Laboratory of RNA Biology, Max-Planck-Institute of Biochemistry, Martinsried, Germany.

    • Christine Ender
    •  & Gunter Meister
  4. Department of Biochemistry, University of Regensburg, Regensburg, Germany.

    • Anne Dueck
    •  & Gunter Meister


  1. Search for Eivind Valen in:

  2. Search for Pascal Preker in:

  3. Search for Peter Refsing Andersen in:

  4. Search for Xiaobei Zhao in:

  5. Search for Yun Chen in:

  6. Search for Christine Ender in:

  7. Search for Anne Dueck in:

  8. Search for Gunter Meister in:

  9. Search for Albin Sandelin in:

  10. Search for Torben Heick Jensen in:


E.V., P.P., P.R.A., G.M., A.S. and T.H.J. designed the experiments. P.P., P.R.A., C.E. and A.D. conducted the experiments. E.V., X.Z., Y.C. and A.S. did the bioinformatics analyses. E.V., P.P., P.R.A., G.M., A.S. and T.H.J. evaluated the results. E.V., P.P., P.R.A., X.Z., Y.C., A.S. and T.H.J. produced the figures. E.V., P.P., P.R.A., A.S. and T.H.J. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Albin Sandelin or Torben Heick Jensen.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–9 and Supplementary Tables 1 and 2

About this article

Publication history





Further reading

  • Regulation of microRNA biogenesis and its crosstalk with other cellular pathways

    • Thomas Treiber
    • , Nora Treiber
    •  & Gunter Meister

    Nature Reviews Molecular Cell Biology (2018)

  • Controlling nuclear RNA levels

    • Manfred Schmid
    •  & Torben Heick Jensen

    Nature Reviews Genetics (2018)

  • Deep intronic mutations and human disease

    • Rita Vaz-Drago
    • , Noélia Custódio
    •  & Maria Carmo-Fonseca

    Human Genetics (2017)

  • An integrated expression atlas of miRNAs and their promoters in human and mouse

    • Derek de Rie
    • , Imad Abugessaisa
    • , Tanvir Alam
    • , Erik Arner
    • , Peter Arner
    • , Haitham Ashoor
    • , Gaby Åström
    • , Magda Babina
    • , Nicolas Bertin
    • , A Maxwell Burroughs
    • , Ailsa J Carlisle
    • , Carsten O Daub
    • , Michael Detmar
    • , Ruslan Deviatiiarov
    • , Alexandre Fort
    • , Claudia Gebhard
    • , Daniel Goldowitz
    • , Sven Guhl
    • , Thomas J Ha
    • , Jayson Harshbarger
    • , Akira Hasegawa
    • , Kosuke Hashimoto
    • , Meenhard Herlyn
    • , Peter Heutink
    • , Kelly J Hitchens
    • , Chung Chau Hon
    • , Edward Huang
    • , Yuri Ishizu
    • , Chieko Kai
    • , Takeya Kasukawa
    • , Peter Klinken
    • , Timo Lassmann
    • , Charles-Henri Lecellier
    • , Weonju Lee
    • , Marina Lizio
    • , Vsevolod Makeev
    • , Anthony Mathelier
    • , Yulia A Medvedeva
    • , Niklas Mejhert
    • , Christopher J Mungall
    • , Shohei Noma
    • , Mitsuhiro Ohshima
    • , Mariko Okada-Hatakeyama
    • , Helena Persson
    • , Patrizia Rizzu
    • , Filip Roudnicky
    • , Pål Sætrom
    • , Hiroki Sato
    • , Jessica Severin
    • , Jay W Shin
    • , Rolf K Swoboda
    • , Hiroshi Tarui
    • , Hiroo Toyoda
    • , Kristoffer Vitting-Seerup
    • , Louise Winteringham
    • , Yoko Yamaguchi
    • , Kayoko Yasuzawa
    • , Misako Yoneda
    • , Noriko Yumoto
    • , Susan Zabierowski
    • , Peter G Zhang
    • , Christine A Wells
    • , Kim M Summers
    • , Hideya Kawaji
    • , Albin Sandelin
    • , Michael Rehli
    • , Yoshihide Hayashizaki
    • , Piero Carninci
    • , Alistair R R Forrest
    •  & Michiel J L de Hoon

    Nature Biotechnology (2017)

  • Novel equine tissue miRNAs and breed-related miRNA expressed in serum

    • Alicja Pacholewska
    • , Núria Mach
    • , Xavier Mata
    • , Anne Vaiman
    • , Laurent Schibler
    • , Eric Barrey
    •  & Vincent Gerber

    BMC Genomics (2016)