Landscape and variation of RNA secondary structure across the human transcriptome

Abstract

In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program1. However, the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSSs) in a human family trio (mother, father and their child). This provides a comprehensive RSS map of human coding and non-coding RNAs. We identify unique RSS signatures that demarcate open reading frames and splicing junctions, and define authentic microRNA-binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1,900 transcribed single nucleotide variants (approximately 15% of all transcribed single nucleotide variants) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSSs. Selective depletion of ‘riboSNitches’ versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3′ untranslated regions, binding sites of microRNAs and RNA-binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: PARS reveals the landscape of human RNA structure.
Figure 2: RSS signatures of post-transcriptional regulation.
Figure 3: PARS identifies riboSNitches genome-wide.
Figure 4: Genetic evidence for functional RSS elements in the transcriptome.

Accession codes

Accessions

Gene Expression Omnibus

Data deposits

Data have been deposited in the Gene Expression Omnibus (GEO) under accession number GSE50676.

References

  1. 1

    Wan, Y., Kertesz, M., Spitale, R. C., Segal, E. & Chang, H. Y. Understanding the transcriptome through RNA structure. Nature Rev. Genet. 12, 641–655 (2011)

  2. 2

    Kertesz, M. et al. Genome-wide measurement of RNA secondary structure in yeast. Nature 467, 103–107 (2010)

  3. 3

    Li, F. et al. Global analysis of RNA secondary structure in two metazoans. Cell. Rep. 1, 69–82 (2012)

  4. 4

    Shabalina, S. A., Ogurtsov, A. Y. & Spiridonov, N. A. A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res. 34, 2428–2437 (2006)

  5. 5

    Barash, Y. et al. Deciphering the splicing code. Nature 465, 53–59 (2010)

  6. 6

    Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136, 215–233 (2009)

  7. 7

    Skalsky, R. L. et al. The viral and cellular microRNA targetome in lymphoblastoid cell lines. PLoS Pathog. 8, e1002484 (2012)

  8. 8

    Marín, R. M., Voellmy, F., von Erlach, T. & Vanicek, J. Analysis of the accessibility of CLIP bound sites reveals that nucleation of the miRNA:mRNA pairing occurs preferentially at the 3′-end of the seed match. RNA 18, 1760–1770 (2012)

  9. 9

    Grimson, A. et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91–105 (2007)

  10. 10

    Ritz, J., Martin, J. S. & Laederach, A. Evaluating our ability to predict the structural disruption of RNA by SNPs. BMC Genomics 13, (Suppl. 4) S6(2012)

  11. 11

    Halvorsen, M., Martin, J. S., Broadaway, S. & Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 6, e1001074 (2010)

  12. 12

    Ouyang, Z., Snyder, M. P. & Chang, H. Y. SeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data. Genome Res. 377–387 (2013)

  13. 13

    Salari, R., Kimchi-Sarfaty, C., Gottesman, M. M. & Przytycka, T. M. Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: application to disease studies. Nucleic Acids Res. 41, 44–53 (2013)

  14. 14

    Katz, Y., Wang, E. T., Airoldi, E. M. & Burge, C. B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods 7, 1009–1015 (2010)

  15. 15

    Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593 (2012)

  16. 16

    Spitale, R. C. et al. RNA SHAPE analysis in living cells. Nature Chem. Biol. 9, 18–20 (2013)

  17. 17

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012)

  18. 18

    Wilkinson, K. A., Merino, E. J. & Weeks, K. M. Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nature Protocols 1, 1610–1616 (2006)

  19. 19

    Lewis, B. P., Burge, C. B. & Bartel, D. P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20 (2005)

  20. 20

    Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA–mRNA interaction maps. Nature 460, 479–486 (2009)

  21. 21

    König, J. et al. iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nature Struct. Mol. Biol. 17, 909–915 (2010)

  22. 22

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009)

  23. 23

    International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003)

  24. 24

    Anders, G. et al. doRiNA: a database of RNA interactions in post-transcriptional regulation. Nucleic Acids Res. 40, D180–D186 (2012)

  25. 25

    Wilbert, M. L. et al. LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance. Mol. Cell 48, 195–206 (2012)

  26. 26

    Macias, S. et al. DGCR8 HITS-CLIP reveals novel functions for the Microprocessor. Nature Struct. Mol. Biol. 19, 760–766 (2012)

Download references

Acknowledgements

We thank members of the Chang laboratory, S. Rouskin, and J. Weissman, A. Mele and R. Darnell for discussion. This work is supported by NIH R01-HG004361 (H.Y.C. and E.S.). H.Y.C. is an Early Career Scientist of the Howard Hughes Medical Institute.

Author information

H.Y.C. conceived the project; Y.W. and H.Y.C. developed the protocol and designed the experiments; Y.W. and R.A.F. performed experiments; Y.W., K.Q., Q.C.Z., O.M., Z.O., J.Z., R.C.S., M.P.S., E.S., and H.Y.C. planned and conducted the data analysis; Y.W., K.Q. and H.Y.C. wrote the paper with contributions from all authors.

Correspondence to Yue Wan or Howard Y. Chang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 PARS data accurately maps to known structures.

a, RNase V1 and S1 nucleases were titrated to single hit kinetics in structure probing. Gel analysis of structure probing of yeast RNA in the presence of 1 µg of total human RNA using different dilutions of RNase V1 (lanes 4, 5), and S1 nuclease (lanes 6, 7), cleaved at 37 °C for 15 min. In addition, RNase T1 ladder (lane 2), alkaline hydrolysis (lane 1), and no nuclease treatment (lane 3) are shown. Dilution of V1 nuclease by 1:500 and S1 nuclease by 1:50 results in mostly intact RNA. b, PARS signal obtained for the P9–9.2 domain of Tetrahymena ribozyme using the double-strand enzyme RNase V1 (red line) or the single-strand enzyme S1 nuclease (green line) accurately matches the signals obtained by traditional footprinting (blue lines). c, Top 10th percentile of PARS scores (double-stranded, red arrows) and bottom 10th percentile of PARS score (single-stranded, green arrows) were mapped to the secondary structure of the Tetrahymena ribozyme.

Extended Data Figure 2 PARS data are reproducible between biological replicates.

a, Scatter plot of mRNA abundance between the cell lines GM12878, GM12891 and GM12892 indicates that gene expression between the cells is highly correlated (R > 0.9). b, Cumulative frequency distribution of the Pearson correlation of PARS scores in 20 nucleotide windows, with a coverage of at least 10 reads per base, in transcripts between the cells GM12878 versus GM12891, GM12878 versus GM12892 and GM12891 versus GM12892. The black dotted lines indicate the fraction of windows that are positively correlated. c, Cumulative frequency distribution of the Pearson correlation of PARS scores in 20 nucleotide windows, with a coverage of at least 10 reads per base, between GM12878 refolded transcripts versus biological replicate 1 of GM12878 native deproteinized transcripts, GM12878 refolded transcripts versus biological replicate 2 of GM12878 native deproteinized transcripts, as well as between the two biological replicates of native deproteinized transcripts.

Extended Data Figure 3 PARS can be applied to native deproteinized RNAs.

a, Schematic of PARS on native deproteinized transcripts. b, Gel analysis of structure probing of yeast RNA using RNase V1 in RNA structure buffer (lane 3), RNase V1 in lysis buffer containing 1% NP40, 0.1% SDS and 0.25% Na deoxycholate (lanes 5 and 6), S1 nuclease in RNA structure buffer (lane 4) and S1 nuclease in lysis buffer (lanes 7 and 8). In addition, RNase T1 ladder (lane 2) and alkaline hydrolysis (lane 1) are shown. The enzymes appear to cleave similarly in lysis buffer and in structure buffer. c, Structure probing of native deproteinized snoRNA74A. Top 10th percentile of PARS scores (high, red arrows) and bottom 10th percentile of PARS score (low, green arrows) were mapped onto the secondary structure model of snoRNA74A. d, Deep sequencing and mapping of PARS reads on native deproteinized transcripts provided structural information for thousands of transcripts, including coding and non-coding RNAs. e, We compared Pearson correlations of 20 nucleotide windows with a coverage of at least 100 reads (coverage ≥ 5) between transcripts that were refolded and native deproteinized. The y axis indicates the fraction of negatively correlated windows (R < 0) over the total number of windows for each RNA class. f, PARS scores across exon–exon junctions, averaged across all native deproteinized transcripts (load ≥ 1). Percentage of nucleotide C plus G was averaged across the transcripts.

Extended Data Figure 4 Increased accessibility 5′ of miRNA-target sites influences AGO binding.

a, Bases that show significantly different PARS scores between AGO bound and non-bound sites in PAR-CLIP. Base 0 is the most 5′ position of the mRNA that directly base-pairs with the miRNA seed region. The y axis indicates log10 of the P value, calculated by the Wilcoxon rank-sum test. b, Metagene analysis of the average AGO-bound reads using iCLIP in predicted miRNA-target sites that are single-stranded (green) or double-stranded (red) from bases −3 to 1. c, d, Average PARS score is calculated for bases −3 to 1 for each Targetscan-predicted site. Change in gene expression is plotted for genes with most accessible (100) and least accessible (100) sites, upon overexpression of miRNA142 (c) and miRNA148 (d). P value is calculated using the Wilcoxon rank-sum test. Whiskers of box plots indicate extreme values.

Extended Data Figure 5 PARS identified riboSNitches in the human transcriptome.

a, Cumulative frequency plot of PARS score differences between SNVs (GM12891 versus GM12892), doped in controls and structured RNAs including ribosomal RNAs (rRNAs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). Dotted black line indicates the threshold beyond which we call an SNV a riboSNitch. The x axis indicates the absolute change in PARS score between GM12891 and GM12892. b, Absolute change in PARS score around heterozygous, homozygous riboSNitches and biological noise. The red line indicates the change in PARS score between sequences that are the same (noise) across individuals. The blue line indicates the change in PARS score between two sequences that have a riboSNitch. The purple line indicates the change in PARS score between homozygous riboSNitches. c, Cumulative frequency plot of the eSDC for transcripts that contain or do not contain SNVs eSDC = (1− Pearson correlation) × sqrt(transcript length). d, Transcripts are ranked according to eSDC score and classified into the top 2,000 most and least structurally disrupted transcripts. The most structurally disrupted transcripts are more likely to contain SNVs, whereas the least structurally disrupted transcripts are less likely to contain SNVs. e, Pie chart showing the distribution of structurally changing bases (P = 0.05, FDR = 0.1) in transcripts with SNVs, riboSNitches, indels and no SNVs and no indels. 78.2% of these bases reside in transcripts with either SNVs or indels, indicating that nucleotide sequence is important for RNA structure. f, Number of riboSNitches identified by PARS between each pair of individuals in the trio. Grey indicates non-structurally-changing SNVs, red indicates riboSNitches.

Extended Data Figure 6 Footprinting validation of a riboSNitch in 5′ UTRs of MRPS21 identified by PARS.

a, Gel analysis of 150mer fragments of MRPS21 RNA using S1 nuclease (lanes 5 (father), 6 (mother)), and SHAPE probing ((lanes 9 (father), 10 (mother)). In addition, sequencing lanes (lanes 1, 2), uncut (lane 3 (father), lane 4 (mother), and DMSO-treated lanes (lane 7 (father), lane 8, (mother)) are also shown. Black arrows indicate the change in structure between the father’s and mother’s alleles. b, Top, the sequence of a portion of the transcript containing the riboSNitch was shown. The riboSNitch is in red. Bottom, single-strand profile by S1 sequencing of the father’s and mother’s alleles. The y axis indicates the percentage of signal at each base over the total signal in the region. c, d, Semi-automated footprinting analysis (SAFA) quantification of manual structure probing of both MRPS21 alleles using S1 nuclease (c) and SHAPE (d). e, S1 sequencing reads are mapped uniquely to either the A or C allele in the child. The grey box indicates the bases that show structural differences by allele-specific mapping in the child. f, Gel analysis of 150-mer fragments of MRPS21 RNA using DMS footprinting (lanes 1, 2 and 3 (father), 4, 5 and 6 (mother)). Black arrows indicate the change in structure between father’s and mother’s alleles. g, Quantification of DMS footprinting of both MRPS21 alleles using SAFA.

Extended Data Figure 7 Footprinting validation of a riboSNitch in HLA-DRB1 transcript identified by PARS.

a, The sequence of a portion of the transcript containing the riboSNitch was shown. The riboSNitch is in red. Gel analysis of two fragments of HLA-DRB1 RNA A and G alleles using S1 nuclease (lanes 5 (mother), 6 (father)), and SHAPE probing ((lanes 9 (mother), 10 (father)). In addition, sequencing lanes (lanes 1, 2), uncut lanes (lane 3 (mother), lane 4 (father)), and DMSO treated lanes (lane 7 (mother), lane 8, (father)) are also shown. Black arrows indicate the change in structure between the father’s and mother’s alleles. b, S1 sequencing reads across the riboSNitch for both father and mother. c, d, SAFA quantification of the RNA footprinting of both alleles using S1 nuclease (c) and SHAPE (d). e, Gel analysis of two fragments of HLA-DRB1 RNA A and G alleles using DMS (lanes 1, 3 and 4 (mother), 2, 5 and 6 (father)). Black arrows indicate the change in structure between father’s and mother’s alleles. f, Quantification of DMS footprinting of both HLA-DRB1 alleles using SAFA. g,h, Secondary structure models of the G alelle (g) and A allele (h) of HLA-DRB1, using SeqFold guided by PARS data. The two alleles of the riboSNitch are shown in orange and blue respectively. The red and green circles indicate bases with PARS scores ≥ 1 and ≤ −1, respectively.

Extended Data Figure 8 Footprinting validation of a riboSNitch in WSB1 transcript identified by PARS.

a, The sequence of a portion of the WSB1 transcript containing the riboSNitch was shown. The riboSNitch is in red. Gel analysis of two fragments of WSB1 RNA T and C alleles using RNase V1 (lanes 5 (mother), 6 (father)), S1 nuclease (lanes 7 (mother), 8 (father)), and SHAPE probing ((lanes 9 (mother), 10 (father)). In addition, sequencing lanes (lanes 1, 2), DMSO uncut lanes (lane 3 (mother), lane 4 (father)) are also shown. Black arrow indicates the change in structure between the father’s and mother’s alleles. b, Fraction of S1 sequencing reads over total S1 sequencing reads in the region, across the riboSNitch for both father and mother. c, d, SAFA quantification of the RNA footprinting of both alleles using S1 nuclease (c) and SHAPE (d).

Extended Data Figure 9 Additional footprinting validation of riboSNitches.

a, Top, gel analysis of fragments of father’s and mother’s alleles of HLA-DQA1, hnRNP-AB, HLA-DRA, LDHA, XRCC5, FNBP1 and YWHAB using SHAPE (lanes 4 (father), 6 (mother)). In addition, DMSO controls (lanes 3 (father), 5 (mother)) and ladder lanes (lanes 1 (T ladder), 2 (G ladder)) are also shown. The black line indicates the position of the SNV. The yellow bar along the side of the gel indicates the region that is changing between the father's and mother's alleles. Bottom, difference in PARS signal between father (GM12891) and mother (GM12892), centred at the riboSNitch. Positive PARS score indicates double stranded RNA, and should correspond to lower SHAPE signal. Negative PARS score indicates unpaired RNA with correspondingly higher SHAPE signal. Six out of seven cloned RNAs are validated by SHAPE in vitro. hnRNP-AB showed multiple differences surrounding the SNV; SHAPE data confirmed the riboSNitch and showed the structural rearrangement is more complex than indicated by PARS. SHAPE data of YWHAB did not show the predicted RSS difference. b, Bar graphs showing the number of homozygous SNVs in parents that are validated (in red) and not validated (grey) in the child by allele specific mapping. Homozygous riboSNitches between the father and mother are mapped to both the renatured child RNA (in vitro; child) and the native deproteinized child RNA (native deproteinized; child). As the depth of coverage is lower in native deproteinized samples, we detect fewer (31) SNVs that were homozygously different in the parents.

Extended Data Figure 10 Properties of riboSNitches.

a, b, Average PARS-score difference around SNVs that originally reside in increasingly single-stranded (a) or increasingly double-stranded (b) region. c, Average PARS-score difference around SNVs that were flanked by both double-stranded bases, both single-stranded bases, or one single- and one double-stranded base on each side. d, Density of other SNVs centred around riboSNitches versus a control group of 2,450 non-structure-changing SNVs. P value calculated by Kolmogorov–Smirnov test. e, Distribution of top 10% most structurally disruptive riboSNitches, calculated by biggest structural difference between the two alleles, versus a control group of 1,855 SNVs that do not change structure in 5′ UTRs, CDS and 3′ UTRs. f, Different genomic features or annotations of 993 unique riboSNitches are compared to 1,009 control SNVs. For each genomic annotation, the fraction of riboSNitches that reside in the genomic region covered by the annotation (for example, histone mark) was compared to the fraction of control SNVs by Student’s t-test. A cutoff value of P = 0.05 (t-test) was used.

Supplementary information

Supplementary Tables

This file contains Supplementary Tables 1-6. (PDF 2968 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wan, Y., Qu, K., Zhang, Q. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014). https://doi.org/10.1038/nature12946

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.