Introduction

Gene expression is the composite output of multiple layers of RNA processing. Spanning from transcription to translation, the events that render protein production from DNA sequence are paramount to all aspects of cellular function. The study of mRNA transcripts, their regulation, and how they interact with RNA-binding proteins (RBPs) constitutes the systems biology of post-transcriptional gene regulation 1. While much critical knowledge in these areas has arisen from extensive research performed in yeast, advances in our understanding of co- and post-transcriptional mRNA processes are increasingly derived from large-scale studies conducted in metazoan organisms.

At the core of co- and post-transcriptional gene regulation are multifunctional RBPs that associate with mRNA transcripts and small non-coding RNAs to form messenger ribonucleoprotein (mRNP) complexes 2, 3, 4. The changing cast of RBPs interacting with an mRNP directs the events of mRNA processing, beginning with mRNA transcription in the nucleus (Figure 1). Nascent transcripts undergo capping, splicing, cleavage, polyadenylation, and surveillance prior to their export to the cytoplasm 5, 6, 7. The fate of a transcript is further governed by RBPs that mediate its cytoplasmic subcellular localization, translation, and degradation 8, 9, 10. RBPs interact with mRNAs through sequence-specific cis elements embedded in the protein-coding or in the untranslated regions (UTRs) of a transcript. In addition, RBPs confer specificity through mechanisms involving cooperative protein-protein interactions (reviewed in 3).

Figure 1
figure 1

The life cycle of an mRNA is regulated by dynamic association with mRNA-binding proteins. mRNAs navigate the journey from transcription to translation and degradation as protein-bound mRNP complexes. In the nucleus, transcripts are capped, spliced, cleaved, and polyadenylated by RNA-binding proteins that interact with the nascent transcript co- and post-transcriptionally. Quality control measures ensure that only properly processed transcripts are exported. An mRNP is then subject to multiple fates in the cytoplasm, including subcellular localization, translation, and degradation, as predicated by its changing cohort of associated RNA-binding proteins.

These steps outline the basic lifecycle of a transcript in metazoan cells. However, this assembly line analogy belies a more elaborate organization of mRNA processing involving coupling among and between co- and post-transcriptional steps. How selectivity is achieved in these processes and how RBPs contribute to the complexity of gene expression are among the questions addressed by systems-level approaches.

Imperative to systems biology studies is a definition of the constituents of the system. From genome-scale investigations of transcript expression or of RBP-bound targets, networks of similarly behaving transcripts may be constructed that shed light on specific mRNA processing events. Indeed, evaluation of the post-transcriptional operon hypothesis, that transcripts are organized through cis and trans acting factors to facilitate mRNA coregulation 11, has been enabled by large-scale definition and characterization of mRNP components.

In contrast to a generating a collection of singular parts however, systems biology seeks to understand how the components interact to give rise to emergent properties of the system. A second foundation of systems biology therefore involves synthesis of information from distinct experimental strategies. This aspect necessitates that data be portable such that results from differing experiments may be comparatively evaluated. While significant progress has been made in characterizing components and in establishing genome-scale interactions in metazoans, recent efforts are now beginning to integrate data from diverse experimental approaches. In this review we address studies that regard mRNA processing from a systems level through the use of genomic strategies and discuss outstanding challenges facing our understanding of mRNA processing in metazoan organisms.

Large-scale profiling defines transcript networks

The ability to profile thousands of transcripts in a cell, tissue, or in a whole organism represents a major achievement of modern biology 12, 13, 14, 15, 16. Microarray profiling of tissue expression as well as large-scale analyses of expression by in situ hybridization permits investigation of the organization among transcripts. Patterns of expression organization can be related among tissues 13 and between species 17 to identify expression networks. When performed over developmental time, expression surveys additionally establish transcript dynamics 17, 18, 19, 20, 21, 22, 23 that serve as hypothesis-generating sources regarding protein function and expression regulation.

Specific evaluation of the expression of gene regulators has uncovered higher-order expression patterns among mRNA processing factors 22, 23, 24. Investigation of RBP expression in the developing mammalian brain demonstrated that, while most RBPs are expressed throughout the nervous system, the majority shows non-uniform, regional distribution 23. Few RBPs appear to be tissue-restricted, yet many RBP genes exhibit a similar pattern of neural expression 23. These data are consistent with a consensus that the expression levels of RBPs are differentially regulated, perhaps in a cell type manner, and support the idea that multiple RBPs function concurrently. Whether these trends for RBPs hold in other tissues has not been examined on a large scale. Such studies, in combination with microarray and serial analysis of gene expression data, will be essential to define the zones of regulation for mRNA processing factors.

Genomic approaches define the mRNA targets of RBPs

Strategies including expression profiling, genome localization analysis, and mRNP immunoprecipitation followed by microarray analysis are utilized to assess those populations of mRNAs regulated by specific RBPs 2. Microarray profiling of cells aberrantly expressing RBPs has been employed to demarcate mRNA/RBP networks 25, 26, 27, 28, 29, 30, 31, 32, 33. While this approach has been useful in identifying potential targets of tissue-specific RBPs 29, 31, it has also revealed transcripts affected by proteins considered to be general processing factors. A case in point is the finding that exposure of murine macrophages to lipopolysaccharide specifically elevates expression of the cleavage stimulatory factor (CstF-64), but not other 3′ end processing proteins 25. The consequence of increased Cstf-64 includes changes in the expression and alternative polyadenylation site selection of particular genes 25, highlighting the degree of specificity that “general” RBPs may exert on gene regulation.

The associations of RBPs with genomic locations have been examined through chromatin immunoprecipitation followed by microarray hybridization of bound DNA. This approach (referred to as ChIP-chip or genome localization analysis) utilizes chemical cross-linking to covalently couple proteins with chromatin. ChIP-chip has been widely applied to assess the co-transcriptional roles of many yeast RBPs 34, 35 but has also identified the genome association of certain mammalian RBPs 36. The splicing factor polypyrimidine tract binding (PTB) protein, the mRNA export factor Aly, and the 3′ end cleavage factor Cstf-64 were each found to associate with gene promoters, implicating a level of coupling of RBPs to transcript initiation 36. These RBPs were also shown to have individual enrichment profiles at 3′ ends of genes and distinct distribution throughout exonic and intronic positions, indicating discrete roles for each in splicing and 3′ end processing. Use of the ChIP-chip approach to investigate the co-transcriptional involvement of RBPs in tandem may uncover combinatorial specificity achieved by groups of RBPs for the coding and non-coding regions of the genome.

The targets of numerous metazoan RBPs have been identified through RNA immunoprecipitation followed by microarray analysis or sequencing 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48. Perhaps the most comprehensive picture of an RBP/mRNA network has been developed for the neuro-oncological ventral antigen (Nova) splicing factors. The role of Nova proteins in directing alternative splicing is emerging from knockout experiments 49, RNA-immunoprecipitation studies 38 and the use of exon-junction microarrays 37. Mice lacking Nova1 die postnatally with motor deficits associated with neuronal apoptosis 49. The splicing of transcripts encoding functionally related proteins, many of which physically interact in neuronal synapses, is altered in neurons of Nova knockout mice 37. Further synthesis of findings from both immunoprecipitation and microarray studies have enabled prediction of the alternative splicing patterns of uncharacterized Nova1 targets 50. Collectively, these data establish regulatory modules that may advance the molecular understanding of the Nova1−/− phenotype and the autoimmune disorder paraneoplastic opsoclonus-myoclonus ataxia. The strategies used in the analysis of Nova-mediated alternative splicing serve as prime examples of systems level approaches and are likely to be highly informative for other RBPs.

Splicing goes genome-wide

The advent of microarray technology to resolve exon-level gene expression has enabled large-scale profiling of mRNA splicing in metazoan organisms (reviewed in 6). Both exon-junction and exon-centric platforms (Figure 2) have been utilized to examine thousands of splicing events 37, 51, 52, 53, 54, 55, 56, 57, 58, 59. These studies permit identification of potential splicing factor targets 37, 59 and allow examination of transcript diversity and its regulation that is paramount towards uncovering regulatory elements associated with specific splicing events 51, splicing factors 59, or signaling pathways 57. For example, new intronic sequence regulatory motifs have been identified through a comparative analysis of alternative splicing events in brain and muscle tissue 51. Genome-scale splicing studies also offer insight into splicing regulation and other forms of regulation of gene expression 56, 59 that have previously been confined to gene-by-gene investigations.

Figure 2
figure 2

Microarray platforms permit analysis of mRNA splicing. Large-scale investigation of mRNA splicing has been achieved through the use of microarrays that target exons or exon junctions. The distribution of probes (green) to interrogate exon use enables a finer resolution of transcript diversity than is attained by traditional arrays. The two platforms present distinct advantages in their ability to measure transcript structure and identify novel splicing events. Studies that combine an exon-centric approach to first filter expression and then investigate transcript architecture through junction arrays may therefore be highly valuable in examining splicing networks.

One tenet of alternative splicing is that splice site usage is regulated by the combinatorial properties of multiple splicing factors 60. While an analysis of the splicing profiles of four Drosophila splicing regulators, ASF/SF2, SRp55, PSI, and hrp48, revealed that each protein influences a distinct set of splice isoforms, a small but significant overlap of affected junctions was identified among specific splicing factors 59. ASF/SF2 and SRp55 similarly regulate a subset of splicing events, as do PSI and hrp48 59. These findings largely point to a unique involvement for these proteins in splicing regulation, at least among the splice sites examined. However, these data also demonstrate that global splicing profiling is an effective strategy to place both proteins and splicing events into splicing networks. Interestingly, among the majority of events affected by two factors, little evidence was found for splicing antagonism 59 that has been described through single gene studies 60. Whether these findings are representative of alternative splicing regulation as a whole awaits further investigation. It is clear, however, from this and studies examining the contribution of alternative splicing to nonsense-mediated decay (NMD) 56 that such investigations are imperative to broadening our understanding of splicing's impact on downstream gene expression.

A present challenge in exon-level microarray studies lies in separating expression networks from transcript diversity networks, as transcription and splicing are intimately coupled. Exon-junction platforms have the advantage of distinguishing the exonic architecture of transcripts by directly targeting specific arrangements of exons, but are limited to interrogating predetermined exon-junctions. Exon-centric platforms, in contrast, provide a view of the total transcriptional output from a gene locus. The latter array format therefore has the benefit of uncovering novel forms of exon use 61. Deciphering the overall architecture of a transcript, however, is much more difficult with the exon-centric approach. A combination of the two strategies, first by examining total exon use and then by interrogating specific junctions, may be an effective method of studying alternative splicing and other forms of transcript diversity, including alternative initiation and polyadenylation site selection. A comparable approach has been used to examine regulated splicing in the Toll-like receptor signaling pathway 62 as well as the sex-specific expression of thousands of Drosophila transcript variants 63. In these studies, researchers first aligned EST and cDNA sequences to catalog transcript diversity from specific loci, and then profiled splicing through custom microarrays. Future work to investigate other stimulus-driven and developmentally regulated splicing events will benefit from this type of approach.

Nuclear mRNA export

Before transcripts may be translated, they must first exit the nucleus. The nuclear envelope therefore serves as a barrier to translation and acts as an additional layer of gene regulation. mRNAs rely on export factors to ferry them across the nuclear pore. Current genome-wide analyses of metazoan mRNA export have focused largely on direct homologs of yeast export factors 64, 65. Studies to determine the mRNAs affected in the absence of the essential Drosophila export factors p15, NXF1, and UAP56 revealed overlapping roles for these proteins, reflective of a common export pathway for most transcripts 64. Interestingly, this study also uncovered small subsets of mRNAs that are unaffected by the loss of these export factors, as well transcripts that are influenced by depletion of a specific factor 64. These data demonstrate a level of specificity of export factors for certain transcripts and data point to the presence of unidentified export proteins. Future work addressing the direct cargoes of export factors as well as a systematic screen for other metazoan mRNA export proteins may help to resolve these questions.

Extensive characterization of export defects in Saccharomyces cerevisiae has uncovered numerous examples of coupling between the processes of splicing, mRNA quality control, and mRNA export (reviewed in 66). Although similar genome-scale investigations are lacking in metazoan systems, one study has identified a role for the U2 snRNP auxiliary factor, dU2AF50, in mRNA export. Profiling of Drosophila expressing a temperature-sensitive form of the essential dU2AF50 revealed an unexpected deficit in the export of intronless mRNA 28. Further investigation of transcripts by immunopurification and microarray analysis showed that the splicing factor associates with intronless mRNAs. Whether dU2AF50 subsequently recruits mRNA export factors is not known; however, these data provide genome-scale support for the model that RBPs participate in multiple levels of mRNA processing.

Multiple fates await mRNAs in the cytoplasm

mRNAs are subject to many fates in the cytoplasm. Transcripts may be localized to discrete cellular destinations, may associate with ribosomes to undergo translation, or may be degraded by cytoplasmic nucleases. The destiny of an mRNA may be directed by RBPs that associate with it in the nucleus and remain bound once in the cytoplasm. Shuttling proteins, as they provide an avenue of communication between nuclear and cytoplasmic events, may be highly informative regarding connections between splicing, export, and localization and translation. Although support for this theory remains confined to single gene studies, recent efforts have identified the cytoplasmic targets of two mammalian shuttling splicing factors 48. Immunopurification of mRNAs bound by either PTB or U2AF65 (the human homolog of dU2AF50) revealed that these factors associate with discrete populations 48. Transcripts bound by U2AF65were enriched for transcription factors and cell cycle regulators. In contrast, those mRNAs associated with PTB were over-represented by intracellular transport, vesicle trafficking, and apoptosis-related genes 48. These data indicate that certain splicing factors have multifunctional responsibilities, and further, that there is specificity in the cytoplasmic roles of these proteins. Whether PTB and U2AF65 remain associated with nuclear targets in transit to the cytoplasm or have separate interactions with distinct mRNA populations once across the nuclear pore, however, is not known.

Messages on the move

Transcript localization is a mechanism to sequester mRNAs in the cellular region in which the encoded protein is required (reviewed in 67). Critical for phenomena that rely on asymmetric mRNA distribution, transcript localization utilizes sequence determinants, generally in the 3′ UTR of target transcripts, as well as RBPs that mediate mRNA trafficking to ensure localization to discrete cellular destinations 67. Emerging from focused studies in Xenopus and Drosophila oocytes is a finer definition of RBPs such as Staufen, Barentz, and VgRBPs 68 that are associated with specific mRNAs. These proteins associate either directly or indirectly with cis elements of the target transcript. Significant advances have also been made in the genome-scale description of dendritically localized mRNAs. Profiling of rodent and Aplysia neuronal processes have uncovered hundreds of localized transcripts 69, 70, 71, 72, some of which exhibit altered distribution upon neuronal activity 71. A common finding among these diverse studies is the enrichment of mRNAs encoding components of the translational machinery 69, 70, 71, 72. These data point to localized translation as an important mode of expression regulation in neurons and assert the significance of post-transcriptional control in neuronal function.

Even in non-polarized cells, certain mRNAs are targeted to discrete cytoplasmic organelles 73. In yeast, genome-scale analyses of nuclear transcripts translated in the vicinity of mitochondria have uncovered a role for the 3′ UTR in mRNA localization 74, 75. Sorting of specific mRNAs to the mitochondrial vicinity appears to be conserved in mammalian cells 76; however, a similar appreciation for metazoan mRNA localization to the mitochondrial outer membrane has not yet been realized. Still outstanding is the identification of RBPs responsible for the mitochondrial-targeting of nuclear-encoded transcripts. Recent reports have established the pleiotropic heterogeneous nuclear ribonucleoprotein K 77 and other RBPs 78 as localized within the mitochondria. Whether these RBPs are involved in directing transcripts to mitochondria is not known.

mRNAs are transported to sites of local protein synthesis often as components of large mRNP granules. Through immunopurification followed by either microarray analyses or mass spectrometry, the mRNA and protein constituents of distinct (but likely heterogeneous) granules have been investigated 42, 77, 79, 80, 81. In the cases of the granule-associated zipcode binding protein, IMP1, and the Fragile X mental retardation protein, sequence analyses identified cis motifs enriched among bound mRNAs 81, 82. Although many associated transcripts were present that do not harbor these motifs, other structural RNA elements may be present that have not been distinguished. In these and other studies, the connection between mRNA localization and translation is readily apparent as mRNP granules often contain ribosomal subunits and translation initiation factors 79, 81.

mRNAs in translation

The understanding that mRNAs experience differential regulation in the cytoplasm has motivated genome-scale studies to specifically investigate mRNA populations undergoing translation. Global translation rates are measured by microarray profiling of transcripts associated with multiple ribosomes (polysomes) (Figure 3). mRNAs are first separated by centrifugation through a sucrose gradient and then analyzed based on their association with fractions corresponding to individual ribosomal subunits (40S and 60S), monosomes (80S), or polysomes. Actively translated messages are typically associated with polysomes fractions while translationally repressed messages may be sequestered in lighter gradient fractions 10.

Figure 3
figure 3

Strategies used to investigate mRNA populations. mRNAs may be examined for their translational or degradation status via purification procedures followed by microarray analysis. Translational profiling requires the separation of cytoplasmic mRNAs that are associated with multiple ribosomes (polysomes), often achieved through sucrose gradient fractionation. Lighter fractions contain RNAs associated with mRNPs and monosomes whereas heavier fractions contain RNAs bound to multiple ribosomes. Studies of mRNA turnover necessitate uncoupling of transcript synthesis from decay, generally achieved through use of agents that block transcription. mRNA is collected at various time points after the transcriptional block. Upon conversion to cDNA, samples are assessed by microarray analysis. Datasets may then be examined for groups of transcripts that share a common motif, either in primary or in secondary structure.

Various cellular stresses, including hypoxia, radiation, receptor-mediated cell death, and cytokine exposure have been examined for their affects on global translation rates 83, 84, 85, 86. Research performed in higher eukaryotic organisms has also determined that conditions that have only modest effects on total mRNA levels can dramatically alter mRNA association with polysomes 83, 84, 85, 87, 88, 89. These studies point to the large potential for translation in the control of gene expression and highlight the need for a more thorough understanding of translational regulation.

One step towards meeting this goal involves investigation of RBPs for their specific translational role. Polysome profiling uncovered a restricted subset of transcripts that were affected upon the selective knockout of an individual isoform of the eukaryotic translation initiation factor 4E (eIF4E) in Caenorhabditis elegans 90. Interestingly, affected transcripts are related to egg laying or are expressed in neurons or muscle 90. In mammalian cells, over-expression of eIF4E also results in aberrant, increased translation of a subset of mRNAs 91. Common to the 5′ UTRs of many of these mRNAs is a hairpin structure sufficient to activate translation of a reporter transcript 91, indicating that eIF4E operates in part through recognition of mRNA structural elements. These data suggest that other initiation factor isoforms operate equally selectively, perhaps in a tissue-specific manner. Similar analyses of other initiation factor isoforms and RBPs for their impact on global translation may likewise aid in delineating post-translational organization.

mRNA degradation

In addition to showing increased interest in translation regulation, researchers have been motivated to study the regulation of transcript abundance through decay routes 92, 93. Much like translation, mRNA turnover may be highly selective and dependent on cellular conditions and defined sequence elements. To specifically monitor mRNA turnover, transcript synthesis must be uncoupled from decay (Figure 3). Using a transcriptional blockage approach, investigators have uncovered both functional organization and shared sequence motifs, such as the adenosine and uracil rich element (ARE), among transcripts with similar turnover properties 92, 94, 95.

The importance of regulated mRNA turnover is highlighted by systems in which degradation is impaired. Mouse knockout studies of RBPs involved in decay illustrate both the consequences of transcript persistence and the networks of affected transcripts 96, 97, 98. Mice lacking the ARE-binding protein AUF1, though healthy, demonstrate fatal endotoxic shock resulting from the persistence of proinflammatory cytokine mRNAs 96. These data demonstrate the requirement for a select RBP under situation-specific circumstances. Interestingly, similar inflammation-associated phenotypes have been recognized in mice lacking Tristetraprolin or TIA-1 99. That these RBPs function by mediating the stability or translatability of mRNAs asserts the importance of post-transcriptional regulation in controlling the pathological over-expression of regulatory transcripts. While a number of target transcripts have been identified, whole-genome analyses of mRNAs affected by the loss of AUF1 or TIA-1 are needed to provide a finer understanding of transcript networks governed through ARE-binding proteins.

The interplay of mRNA stability and degradation emphasizes the role of the 3′ UTR in mRNP interactions; however, various parts of transcript anatomy (e.g. 5′ cap, poly(A)-tail) are prey for different forms of mRNA degradation 9. In addition to degradation via 5′ or 3′ exonucleases, transcripts may be cleaved internally by endonucleases. The specific effect of the endonuclease, inositol-requiring enzyme-1 (IRE1), on endoplasmic reticulum-associated mRNAs was recently determined through microarray analyses of transcripts from IRE1-depleted cells 100. This systems-level study identified potential IRE1 targets and elucidated a possible mechanism involving cotranslational translocation in IRE1-mediated mRNA decay 100.

Specific examination of the NMD pathway, a surveillance system that prevents the generation of defective proteins, has also received genome-scale attention 56, 101, 102, 103. Microarray studies of metazoan cells depleted of key NMD components revealed that 10% of transcripts are regulated by NMD 101, 102, 103, establishing this pathway as a significant contributor to gene regulation. Among affected transcripts are those that encode premature stop codons or that are incompletely processed, as well as gene products from transposons 103. Future work incorporating finer resolution subgenic microarrays may reveal a role for NMD in the degradation of non-coding transcripts. In addition, large-scale investigations of the exosome and of the 5′→3′ exonuclease, Xrn1, are necessary to assess the contribution of nucleases to global mRNA degradation in metazoan cells.

microRNAs in cytoplasmic mRNA processing

The specificity observed for many instances of transcript silencing, decay, or translation involves RBPs and cis sequence elements. Recent findings of microRNA association with polysomes and with translationally silent processing bodies (P-bodies) have additionally positioned microRNAs at the nexus of these cytoplasmic events (reviewed in 104). The finding that both RBPs and microRNAs bind elements in transcript 3′ UTRs has led to the hypothesis that these factors act antagonistically, directing the destiny of a transcript by recognition of the same or overlapping sequence determinants 105. Although this model has not been examined on a global scale, future investigations into the relationship among translation, degradation, and stabilization may benefit from incorporating information regarding microRNA target predictions in efforts to understand transcript fate.

The future of metazoan systems mRNA processing

The last decade has witnessed a vast increase both in the interest and in the technological ability to decipher post-transcriptional gene regulation. Advances in genomics and microarrays and their analyses have enabled organismal-scale characterization of expression networks and the identification of RBP mRNA targets. In addition, computational approaches to synthesize information from mRNP networks have led to the ability to predict novel mRNA targets through determination of cis elements that facilitate RBP/mRNA interactions 41, 43, 45. Currently however, hundreds of RBPs exist for which protein function remains largely conjectural and target mRNAs are completely unknown. The discovery of dozens of “orphan” mammalian 3′ UTR regulatory elements 106 additionally indicates that many regulatory elements and binding interactions have yet to be interrogated. Clearly, much research remains outstanding.

To fully evolve a systems view of mRNA processing, results from different types of experiments must be integrated into a comprehensive understanding of the system. This will include synthesizing information obtained through knockout or knockdown phenotype studies, expression analyses, and from protein-protein and protein-RNA interaction mapping. Data must be collected from comparable systems so that results from differing approaches may be directly compared. In addition, multiple RPBs must be evaluated for their bound mRNPs in the same system in order to discern the combinatorial nature of these gene regulators. Future studies that integrate different data types and information about multiple RBPs will undoubtedly broaden our understanding of metazoan mRNA processing.