Classical approaches to determine structures of noncoding RNA (ncRNA) probed only one RNA at a time with enzymes and chemicals, using gel electrophoresis to identify reactive positions. To accelerate RNA structure inference, we developed fragmentation sequencing (FragSeq), a high-throughput RNA structure probing method that uses high-throughput RNA sequencing of fragments generated by digestion with nuclease P1, which specifically cleaves single-stranded nucleic acids. In experiments probing the entire mouse nuclear transcriptome, we accurately and simultaneously mapped single-stranded RNA regions in multiple ncRNAs with known structure. We probed in two cell types to verify reproducibility. We also identified and experimentally validated structured regions in ncRNAs with, to our knowledge, no previously reported probing data.
Gene Expression Omnibus
A.V.U. was supported in part by US National Institutes of Health (NIH) bioinformatics training grant 1 T32 GM070386-01 and by a US National Science Foundation Graduate Research fellowship. S.K. was supported in part by NIH National Human Genome Research Institute grant U41 HG004568-01. C.S.O. was supported by California Institute for Regenerative Medicine training grant T3-00006. This study was funded in part by NIH R01HG004002 to D.H.M. and NIH 1R03DA026061-01 to S.R.S. We thank D. Bernick, S. Kuersten and O. Uhlenbeck for helpful discussions; Y. Ponty for adding the feature to display enzymatic/chemical modifications to VARNA, the program used to visualize our probing data; E. Farias-Hesson and N. Pourmand of the University of California Santa Cruz Genome Sequencing Center for preparing samples; workers at ABI for carrying out the sequencing; and M. Storm and F. Ng of ABI for facilitating that sequencing run.
Stockholm-format (machine-readable) multiple alignment of U15b C/D box snoRNA homologs, containing structure models that were evaluated. See file for detailed comments.
Stockholm-format (machine-readable) multiple alignment of U22 C/D box snoRNA homologs, containing structure models that were evaluated. See file for detailed comments.
Stockholm-format (machine-readable) multiple alignment of U97 C/D box snoRNA homologs, containing structure models that were evaluated. See file for detailed comments.
FASTA-format file of sequences used for filtering out sequencing reads prior to mapping to genome (see Methods).
Six-column BED-format file containing genomic coordinates (mm9 genome assembly) of all RNAs examined in this study. This can be uploaded to the UCSC Genome Browser as a custom track.
FragSeq algorithm implementation, configuration files and Readme. All FragSeq algorithm software, scripts and configuration files needed to reproduce the analysis in this paper are provided. The Readme file contains complete instructions on how to rerun our analysis. However, read mappings are not provided owing to their large size and have to be downloaded from the GEO (accession number is listed in the paper; see the Readme file). The script dpToVarna.py is also provided (Supplementary Note 3).