Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs

Journal name:
Nature
Volume:
469,
Pages:
97–101
Date published:
DOI:
doi:10.1038/nature09616
Received
Accepted
Published online

Post-transcriptional gene regulation frequently occurs through elements in mRNA 3′ untranslated regions (UTRs)1, 2. Although crucial roles for 3′UTR-mediated gene regulation have been found in Caenorhabditis elegans3, 4, 5, most C. elegans genes have lacked annotated 3′UTRs6, 7. Here we describe a high-throughput method for reliable identification of polyadenylated RNA termini, and we apply this method, called poly(A)-position profiling by sequencing (3P-Seq), to determine C. elegans 3′UTRs. Compared to standard methods also recently applied to C. elegans UTRs8, 3P-Seq identified 8,580 additional UTRs while excluding thousands of shorter UTR isoforms that do not seem to be authentic. Analysis of this expanded and corrected data set suggested that the high A/U content of C. elegans 3′UTRs facilitated genome compaction, because the elements specifying cleavage and polyadenylation, which are A/U rich, can more readily emerge in A/U-rich regions. Indeed, 30% of the protein-coding genes have mRNAs with alternative, partially overlapping end regions that generate another 10,480 cleavage and polyadenylation sites that had gone largely unnoticed and represent potential evolutionary intermediates of progressive UTR shortening. Moreover, a third of the convergently transcribed genes use palindromic arrangements of bidirectional elements to specify UTRs with convergent overlap, which also contributes to genome compaction by eliminating regions between genes. Although nematode 3′UTRs have median length only one-sixth that of mammalian 3′UTRs, they have twice the density of conserved microRNA sites, in part because additional types of seed-complementary sites are preferentially conserved. These findings reveal the influence of cleavage and polyadenylation on the evolution of genome architecture and provide resources for studying post-transcriptional gene regulation.

At a glance

Figures

  1. Identification of C. elegans 3[prime]UTRs.
    Figure 1: Identification of C. elegans 3′UTRs.

    a, Schematic of the 3P-Seq protocol. See text for description. b, Sequence composition of homopolymer runs that were found at 3′ termini of candidate 3P tags and included ≥1 untemplated nucleotide. c, Cleavage heterogeneity surrounding the most abundant cleavage site (position 0). Box plots show results for 380 cleavage sites that were both between two non-A residues (which enabled precise mapping) and within the top quintile of 3P-tag abundance. d, The lin-14 3′UTRs. 3P tags from egg were mapped relative to RNA-Seq data10, prior mRNA annotations from the indicated databases6, 11, and the proposed lin-4-binding region4. Distal and proximal cleavage sites are indicated (black and red arrowheads, respectively). A 50-nucleotide region containing the distal 3P cluster is enlarged (box). Each tag sequence with a unique genome match is depicted as a bar, coloured by tag frequency (key). e, Nucleotide sequence composition at mRNA end regions. Shown above are elements implicated in cleavage and polyadenylation (Supplementary Fig. 3c)30, with colours reflecting their nucleotide composition (A-rich, red; U-rich, blue). The sharp adenosine peak at position +1 (*) was due only partly to cleavage before an A. Also contributing to this peak (and to both depletion of A at position –1 and blurring of sequence composition at other positions) was cleavage after an A, for which the templated A was assigned to the poly(A) tail, resulting in a –1 nucleotide offset from the cleavage-site register.

  2. Alternative 3[prime]UTRs in C. elegans.
    Figure 2: Alternative 3′UTRs in C. elegans.

    a, Distribution of the 24,033 3P-Seq–supported UTRs among the types of alternative isoforms. For genes with ALEs that have tandem isoforms (bottom), the ALE tally indicates the number of distal isoforms of proximal ALEs (blue) and the tandem tally indicates the proximal tandem isoforms of all ALEs (red). In all cases, the distal isoform is the 3′-most cleavage site for each gene (black arrowhead). Also depicted are proximal tandem sites and proximal ALE sites (red and blue arrowheads, respectively). Listed (in parenthesis) is the number of cleavage sites associated with each isoform type for the 34,513 3P-Seq-supported cleavage sites (which exceeded the number of unique UTRs because OERs produced multiple cleavage sites for the same UTR). The nucleotide composition near proximal and distal sites is shown (right). b, Frequency of PAS motifs for isoform types indicated. c, Schematics of canonical and alternative operons.

  3. Evolution and topology of 3[prime]-end formation.
    Figure 3: Evolution and topology of 3′-end formation.

    a, 3′UTR length distributions for the indicated species, considering the most distal annotated isoform for each gene. b, A/U content for C. elegans 3′UTRs of the indicated lengths. c, Relationship between 3′UTR length and 3′UTR A/U content (disregarding content of the last 40 UTR nucleotides), 3′UTR length and genomic A/T content, and 5′UTR length and 5′UTR A/U content for the metazoan species in (a) (r2, Pearson correlation coefficients). d, OERs. Distances between neighbouring cleavage sites are plotted (left). For peaks in the distribution at 15–20 and 35–40 nucleotides (shaded), nucleotide compositions of OERs are shown (middle and right, respectively), with proposed RNA-recognition elements coloured as in Fig. 1e. Arrowheads indicate cleavage sites, with shading also indicating positions of upstream cleavage. e, Convergent UTR overlap. Distances between convergent 3′ ends are plotted (left), with negative values indicating overlap. For peaks at 15–22 and (−2)–8 nucleotides of overlap (shaded), nucleotide compositions are shown (middle and right, respectively) as in (d), with shading indicating positions of minus-strand cleavage.

  4. MicroRNA targeting.
    Figure 4: MicroRNA targeting.

    a, Expanded repertoire of seed-matched sites preferentially conserved in nematode 3′UTRs. Sites conserved only marginally above chance are above the dashed line. Watson–Crick-matched residues, blue or black; residues independent of the miRNA sequence, red. b, Density of miRNA sites conserved above background, combining all site types at the maximally sensitive cutoff. Error bars, one standard deviation (calculated by repeating the analysis for each site type 50 times, each time using a different cohort of control sequences that matched the properties of the miRNA sequences18). c, Relative strength of miRNA site types across clades. Within each clade, two species of comparable divergence were selected. For each miRNA site type, the fraction of sites conserved above background in the two species was normalized to that of the 8mer-A1 (shown in parentheses). d, Enrichment of 8mer-A1 3′UTR sites above expectation based on dinucleotide content. Error bars, one standard deviation, derived as in (b). e, Relationship between 3′UTR length and site enrichment. Site enrichment is plotted for 3′UTRs of the indicated species sorted by length into ten equally sized bins.

References

  1. Moore, M. J. From birth to death: the complex lives of eukaryotic mRNAs. Science 309, 15141518 (2005)
  2. Martin, K. C. & Ephrussi, A. mRNA localization: gene expression in the spatial dimension. Cell 136, 719730 (2009)
  3. Ahringer, J. & Kimble, J. Control of the sperm-oocyte switch in Caenorhabditis elegans hermaphrodites by the fem-3 3′ untranslated region. Nature 349, 346348 (1991)
  4. Wightman, B., Burglin, T. R., Gatto, J., Arasu, P. & Ruvkun, G. Negative regulatory sequences in the lin-14 3′-untranslated region are necessary to generate a temporal switch during Caenorhabditis elegans development. Genes Dev. 5, 18131824 (1991)
  5. Merritt, C., Rasoloson, D., Ko, D. & Seydoux, G. 3′ UTRs are the primary regulators of gene expression in the C. elegans germline. Curr. Biol. 18, 14761482 (2008)
  6. Rogers, A. et al. WormBase 2007. Nucleic Acids Res. 36, D612D617 (2008)
  7. Mangone, M., Macmenamin, P., Zegar, C., Piano, F. & Gunsalus, K. C. UTRome.org: a platform for 3′UTR biology in C. elegans . Nucleic Acids Res. 36, D57D62 (2008)
  8. Mangone, M. et al. The landscape of C. elegans 3′UTRs. Science 329, 432435 (2010)
  9. Nam, D. K. et al. Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription. Proc. Natl Acad. Sci. USA 99, 61526156 (2002)
  10. Hillier, L. W. et al. Massively parallel sequencing of the polyadenylated transcriptome of C. elegans . Genome Res. 19, 657666 (2009)
  11. Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61D65 (2007)
  12. Nunes, N. M., Li, W., Tian, B. & Furger, A. A functional human poly(A) site requires only a potent DSE and an A-rich upstream sequence. EMBO J. 29, 15231536 (2010)
  13. Evans, D. et al. A complex containing CstF-64 and the SL2 snRNP connects mRNA 3′ end formation and trans-splicing in C. elegans operons. Genes Dev. 15, 25622571 (2001)
  14. Prescott, E. M. & Proudfoot, N. J. Transcriptional collision between convergent genes in budding yeast. Proc. Natl Acad. Sci. USA 99, 87968801 (2002)
  15. Batista, P. J. et al. PRG-1 and 21U-RNAs interact to form the piRNA complex required for fertility in C. elegans . Mol. Cell 31, 6778 (2008)
  16. Chiang, H. R. et al. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24, 9921009 (2010)
  17. Ruby, J. G., Jan, C. H. & Bartel, D. P. Intronic microRNA precursors that bypass Drosha processing. Nature 448, 8386 (2007)
  18. Friedman, R. C., Farh, K. K., Burge, C. B. & Bartel, D. P. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92105 (2009)
  19. Zisoulis, D. G. et al. Comprehensive discovery of endogenous Argonaute binding sites in Caenorhabditis elegans . Nature Struct. Mol. Biol. 17, 173179 (2010)
  20. Clark, A. M. et al. The microRNA miR-124 controls gene expression in the sensory nervous system of Caenorhabditis elegans . Nucleic Acids Res. 38, 37803793 (2010)
  21. Lall, S. et al. A genome-wide map of conserved microRNA targets in C. elegans . Curr. Biol. 16, 460471 (2006)
  22. Reinhart, B. J. et al. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans . Nature 403, 901906 (2000)
  23. Abrahante, J. E. et al. The Caenorhabditis elegans hunchback-like gene lin-57/hbl-1 controls developmental time and is regulated by microRNAs. Dev. Cell 4, 625637 (2003)
  24. Tian, B., Hu, J., Zhang, H. & Lutz, C. S. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33, 201212 (2005)
  25. Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470476 (2008)
  26. Sandberg, R., Neilson, J. R., Sarma, A., Sharp, P. A. & Burge, C. B. Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320, 16431647 (2008)
  27. Ji, Z., Lee, J. Y., Pan, Z., Jiang, B. & Tian, B. Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc. Natl Acad. Sci. USA 106, 70287033 (2009)
  28. Mayr, C. & Bartel, D. P. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673684 (2009)
  29. Lau, N. C., Lim, L. P., Weinstein, E. G. & Bartel, D. P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans . Science 294, 858862 (2001)
  30. Mandel, C. R., Bai, Y. & Tong, L. Protein factors in pre-mRNA 3'-end processing. Cell Mol. Life Sci. 65, 10991122 (2008)
  31. Guo, H., Ingolia, N. T., Weissman, J. S. & Bartel, D. P. Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835840 (2010)
  32. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)
  33. Maglott, D., Ostell, J., Pruitt, K. D. & Tatusova, T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 35, D26D31 (2007)
  34. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 20122018 (1998)
  35. Blumenthal, T. Trans-splicing and operons. WormBook 25, 19 (2005)
  36. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 5154 (2003)

Download references

Author information

Affiliations

  1. Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA

    • Calvin H. Jan,
    • Robin C. Friedman,
    • J. Graham Ruby &
    • David P. Bartel
  2. Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Calvin H. Jan,
    • Robin C. Friedman,
    • J. Graham Ruby &
    • David P. Bartel
  3. Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Robin C. Friedman
  4. Present address: Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, California 94158, USA (J.G.R.).

    • J. Graham Ruby

Contributions

C.H.J. performed the experiments and computational analyses of 3P-Seq data. R.C.F. performed the computational analyses of miRNA targeting and motif conservation. J.G.R. performed the computational analyses of miRNAs. All authors contributed to study design and manuscript preparation.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

3P-Seq reads and 3P tags were deposited at the GEO as fastq and BED files, respectively (GSE24924). MicroRNA genes were deposited at miRBase (miR4805–miR4816).

Author details

Supplementary information

PDF files

  1. Supplementary Information (16M)

    The file contains a Supplementary Discussion, additional references, Supplementary Tables 1-5 and 7-10 (see separate file for Supplementary Table 6) and Supplementary Figures 1-15 with legends.

Text files

  1. Supplementary Dataset 1 (3.6M)

    This file contains the coordinates of processed data used in the analyses.

Zip files

  1. Supplementary Dataset 2 (1.7M)

    This file contains coordinates of UTRs defined in the study. It was noticed that a small fraction (<0.5%) of the UTRs listed in the original file were likely artefacts so a revised dataset 2 file was added on 06 January 2011 and this file was replaced on 18 February 2011, after it was noticed that one of its columns was missing.

  2. Supplementary Table 6 (214K)

    This file contains a table summarizing miRNA sequencing data, The original file was not displaying correctly and was replaced on 06 January 2011.

Additional data