Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome

Journal name:
Nature Biotechnology
Volume:
30,
Pages:
253–260
Year published:
DOI:
doi:10.1038/nbt.2122
Received
Accepted
Published online

Abstract

RNA editing is a post-transcriptional event that recodes hereditary information. Here we describe a comprehensive profile of the RNA editome of a male Han Chinese individual based on analysis of ~767 million sequencing reads from poly(A)+, poly(A) and small RNA samples. We developed a computational pipeline that carefully controls for false positives while calling RNA editing events from genome and whole-transcriptome data of the same individual. We identified 22,688 RNA editing events in noncoding genes and introns, untranslated regions and coding sequences of protein-coding genes. Most changes (~93%) converted A to I(G), consistent with known editing mechanisms based on adenosine deaminase acting on RNA (ADAR). We also found evidence of other types of nucleotide changes; however, these were validated at lower rates. We found 44 editing sites in microRNAs (miRNAs), suggesting a potential link between RNA editing and miRNA-mediated regulation. Our approach facilitates large-scale studies to profile and compare editomes across a wide range of samples.

At a glance

Figures

  1. High-throughput sequencing and bioinformatics for profiling the RNA editome of an individual.
    Figure 1: High-throughput sequencing and bioinformatics for profiling the RNA editome of an individual.

    (a) Schematic depiction of the experimental design of the study. Total RNA was isolated from the lymphoblastoid cell line (LCL) derived from a male Han Chinese individual (YH) and further processed into three different libraries for high-throughput whole-transcriptome sequencing. (b) Overview of algorithm for calling RNA editing sites or RNA-centric SNVs. The pipeline takes raw sequencing reads as input, filters them on the basis of several stringent criteria and outputs the inferred variants that are to be analyzed further. (c) Accuracy and sensitivity of the pipeline for each given filter stage. As successive filters were applied to simulated reads (harboring Aright arrowG variants at known positions categorized in DARNED; see Methods), the performance of the approach was evaluated. Accuracy is defined as the false discovery rate (FDR; dotted lines). Sensitivity (SN; gray bars) equals positive calling rate of the simulated editing sites. Notably, the pipeline yielded candidates at a high sensitivity while significantly eliminating the false positives. (d) Validation of inferred editing sites from RNA-Seq by Sanger sequencing. Sequencing chromatogram traces from two exemplary gene loci, CLEC2D and PLEKHA9, are shown. The editing positions (located in the intron of CLEC2D and coding sequence of PLEKHA9) are highlighted by yellow shading. Note the clustering of editing sites in the CLEC2D transcript. Top trace is genomic DNA (gDNA), bottom trace cDNA.

  2. Characterization of the editing sites in poly(A)+ and poly(A)- RNAs.
    Figure 2: Characterization of the editing sites in poly(A)+ and poly(A) RNAs.

    (a,b) Distribution of editing sites in the poly(A)+ RNAs (left) and poly(A) RNAs (right) transcriptome by editing type (a) or incidence per unit length (Mbp) (b) of the indicated structure classes. 'Unknown' denotes editing sites located in regions with conflicting annotations in the database. (ce) Distribution of RNA editing levels in poly(A)+ RNAs (c), the protein-coding (CDS) region of mRNA (d) or poly(A) RNAs (e). (f) Two examples of noncoding RNA genes with multiple edits (Jpx/NR_024582, top; Malat1/NR_002819, bottom) that, to our knowledge, have not been reported to show evidence of RNA editing. RNA editing sites identified from YH RNA-Seq data are highlighted by red boxes. Green boxes denote editing sites from DARNED. (g,h) Frequency of nucleotides in the flanking sequences (10 bp both upstream and downstream) of the editing loci in poly(A)+ RNA (g) andin poly(A) RNA (h). The editing loci are denoted as nucleotide position 0. (i) The conservation of editing sites in poly(A)+ RNA (left) and poly(A) RNA (right). Total sites as well as the non-Alu component of the data set were analyzed independently, as indicated. Shown are fractions of all, Aright arrowG or non-Aright arrowG sites that are located in the most evolutionarily conserved regions (score ≥200 in the UCSC conservation table). Statistical significance of the Aright arrowG versus non-Aright arrowG comparisons was calculated by the Fisher's exact test.

  3. The overlap of RNA editing sites between different data sets.
    Figure 3: The overlap of RNA editing sites between different data sets.

    (a) Extent of overlap in editing sites between data sets in terms of nucleotide position ('site') and corresponding gene ('gene'). The YH data were compared with those of DARNED and the breast cancer RNA-Seq study. Proportions of sites and genes that are unique or common between data sets are shown. (b,c) Examples of genes with multiple editing sites. Distribution of sites (nucleotide positions indicated on the right) in an mRNA gene (ARPC2) (b) and a noncoding gene (SLC35E3) (c) is shown.

  4. Functional link of RNA editing to other post-transcriptional events.
    Figure 4: Functional link of RNA editing to other post-transcriptional events.

    (a) Distribution of editing sites relative to miRNA target sites in the 3′-UTR and possible consequence of RNA editing. In total, 1,905 possible miRNA target sites were predicted in the 3′-UTRs. RNA editing may disrupt or switch miRNA recognition if editing occurs at nucleotides in the miRNA seed region (“altered”). Editing on non-miRNA target sites (“no match”) may generate new miRNA targets (“new targets”). (b) Distribution of RNA edits identified from miRNAs. (c,d) Examples of miRNA species that harbor RNA edits. The most abundant perfect-match and single-mismatch reads from the hsa-mir-200b (c) and hsa-mir-548o (d) loci support Aright arrowI editing in the seed region and Gright arrowA editing, respectively.

Accession codes

Referenced accessions

Sequence Read Archive

References

  1. Jepson, J.E. & Reenan, R.A. RNA editing in regulating gene expression in the brain. Biochim. Biophys. Acta 1779, 459470 (2008).
  2. Nishikura, K. Functions and regulation of RNA editing by ADAR deaminases. Annu. Rev. Biochem. 79, 321349 (2010).
  3. Osenberg, S. et al. Alu sequences in undifferentiated human embryonic stem cells display high levels of A-to-I RNA editing. PLoS ONE 5, e11173 (2010).
  4. Paz-Yaacov, N. et al. Adenosine-to-inosine RNA editing shapes transcriptome diversity in primates. Proc. Natl. Acad. Sci. USA 107, 1217412179 (2010).
  5. Athanasiadis, A., Rich, A. & Maas, S. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol. 2, e391 (2004).
  6. Blow, M., Futreal, P.A., Wooster, R. & Stratton, M.R. A survey of RNA editing in human brain. Genome Res. 14, 23792387 (2004).
  7. Kim, D.D. et al. Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res. 14, 17191725 (2004).
  8. Levanon, E.Y. et al. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 10011005 (2004).
  9. Kiran, A. & Baranov, P.V. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics 26, 17721776 (2010).
  10. Hundley, H.A. & Bass, B.L. ADAR editing in double-stranded UTRs and other noncoding RNA sequences. Trends Biochem. Sci. 35, 377383 (2010).
  11. Farajollahi, S. & Maas, S. Molecular diversity through RNA editing: a balancing act. Trends Genet. 26, 221230 (2010).
  12. Pullirsch, D. & Jantsch, M.F. Proteome diversification by adenosine to inosine RNA editing. RNA Biol. 7, 205212 (2010).
  13. Eisenberg, E., Li, J.B. & Levanon, E.Y. Sequence based identification of RNA editing sites. RNA Biol. 7, 248252 (2010).
  14. Reid, J.G. et al. Mouse let-7 miRNA populations exhibit RNA editing that is constrained in the 5′-seed/cleavage/anchor regions and stabilize predicted mmu-let-7a:mRNA duplexes. Genome Res. 18, 15711581 (2008).
  15. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 5763 (2009).
  16. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621628 (2008).
  17. Morozova, O., Hirst, M. & Marra, M.A. Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10, 135151 (2009).
  18. Wahlstedt, H., Daniel, C., Enstero, M. & Ohman, M. Large-scale mRNA sequencing determines global regulation of RNA editing during brain development. Genome Res. 19, 978986 (2009).
  19. Li, J.B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 12101213 (2009).
  20. Li, M. et al. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333, 5358 (2011).
  21. Bahn, J.H. et al. Accurate Identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 142150 (2012).
  22. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 6065 (2008).
  23. Morin, R. et al. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 45, 8194 (2008).
  24. Cirulli, E.T. et al. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome Biol. 11, R57 (2010).
  25. Tian, D., Sun, S. & Lee, J.T. The long noncoding RNA, Jpx, is a molecular switch for X chromosome inactivation. Cell 143, 390403 (2010).
  26. Tripathi, V. et al. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol. Cell 39, 925938 (2010).
  27. Bernard, D. et al. A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J. 29, 30823093 (2010).
  28. Shah, S.P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809813 (2009).
  29. Eisenberg, E. et al. Is abundant A-to-I RNA editing primate-specific? Trends Genet. 21, 7781 (2005).
  30. Borchert, G.M. et al. Adenosine deamination in human transcripts generates novel microRNA binding sites. Hum. Mol. Genet. 18, 48014807 (2009).
  31. Chiang, H.R. et al. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev. 24, 9921009 (2010).
  32. Kawahara, Y. et al. Frequency and fate of microRNA editing in human brain. Nucleic Acids Res. 36, 52705280 (2008).
  33. Kawahara, Y. et al. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 11371140 (2007).
  34. Kawahara, Y., Zinshteyn, B., Chendrimada, T.P., Shiekhattar, R. & Nishikura, K. RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep. 8, 763769 (2007).
  35. Degner, J.F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 32073212 (2009).
  36. Schrider, D.R., Gout, J.F. & Hahn, M.W. Very few RNA and DNA sequence differences in the human transcriptome. PLoS ONE 6, e25842 (2011).
  37. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22S32 (2009).
  38. He, T. et al. Computational detection and functional analysis of human tissue-specific A-to-I RNA editing. PLoS ONE 6, e18129 (2011).
  39. Maas, S. et al. Genome-wide evaluation and discovery of vertebrate A-to-I RNA editing sites. Biochem. Biophys. Res. Commun. 412, 407412 (2011).
  40. Tian, Z. et al. Transcriptome from a lymphoblastoid cell line taken from the YH Han Chinese individual. Giga Sci. http://dx.doi.org/10.5524/100013. (2011).
  41. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 19661967 (2009).
  42. Huang, S. et al. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Front. Genet. 2 (2011).
  43. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 18511858 (2008).
  44. Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 11241132 (2009).
  45. Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974984 (2011).
  46. Nakamura, K. et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 39, e90 (2011).
  47. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491498 (2011).
  48. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 5359 (2008).
  49. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872876 (2008).
  50. Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 10111015 (2009).

Download references

Author information

  1. These authors contributed equally to this work.

    • Zhiyu Peng,
    • Yanbing Cheng &
    • Bertrand Chin-Ming Tan

Affiliations

  1. BGI-Shenzhen, Shenzhen, China.

    • Zhiyu Peng,
    • Yanbing Cheng,
    • Lin Kang,
    • Zhijian Tian,
    • Yuankun Zhu,
    • Wenwei Zhang,
    • Yu Liang,
    • Xueda Hu,
    • Xuemei Tan,
    • Jing Guo,
    • Zirui Dong,
    • Yan Liang,
    • Li Bao &
    • Jun Wang
  2. Department of Biomedical Sciences and Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Tao-Yuan, Taiwan.

    • Bertrand Chin-Ming Tan
  3. The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark.

    • Jun Wang
  4. Department of Biology, University of Copenhagen, Copenhagen, Denmark.

    • Jun Wang

Contributions

Z.P., B.C.-M.T. and J.W. conceived and designed the experiment; Z.P., Y.C., B.C.-M.T., L.K. and Y.Z. performed data analysis and informatics; Z.T., Yu L., X.H., Yan L. and L.B. carried out sample preparation and sequencing experiments; Y.C., Z.T., W.Z., X.T., J.G. and Z.D. designed and executed experimental validation; Z.P., B.C.-M.T. and J.W. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (11M)

    Supplementary Tables 1, 4–5, 7, 10–11, 14, 16, Supplementary Discussion, Supplementary Methods and Supplementary Figures 1–8

Excel files

  1. Supplementary Tables 3, 6, 8–9, 12–13, 15 and 17 (3M)

Zip files

  1. Supplementary Data (135K)

Additional data