Accurate identification of human Alu and non-Alu RNA editing sites

Journal name:
Nature Methods
Year published:
Published online

We developed a computational framework to robustly identify RNA editing sites using transcriptome and genome deep-sequencing data from the same individual. As compared with previous methods, our approach identified a large number of Alu and non-Alu RNA editing sites with high specificity. We also found that editing of non-Alu sites appears to be dependent on nearby edited Alu sites, possibly through the locally formed double-stranded RNA structure.

At a glance


  1. A computational framework to identify RNA editing sites in Alu and non-Alu regions.
    Figure 1: A computational framework to identify RNA editing sites in Alu and non-Alu regions.

    (a) Pipeline for the identification of RNA editing sites. RNA-seq reads (short lines) were mapped to the human reference genome (blue lines) and regions spanning all known splicing junctions (yellow lines separated by dashes). Boxes denote exons, and striped parts of two adjacent exons are joined together as the splicing junction sequence. (b) Relationship between the percentage of A-to-G mismatches and the minimum number of reads with altered nucleotides in Alu, repetitive non-Alu and nonrepetitive regions in GM12878. For all non-Alu sites, a minimum frequency of 10% for the RNA variant was required, whereas no minimum variant frequency was used for Alu positions. In non-Alu regions at least three variant nucleotides are required to achieve high specificity in RNA editing detection. (c) Percentage of all 12 mismatch types in GM12878 (here '>' indicates 'to').

  2. Editing of many non-Alu sites appears to be dependent on nearby edited Alu sites.
    Figure 2: Editing of many non-Alu sites appears to be dependent on nearby edited Alu sites.

    (a) Venn diagram showing the overlap between genes that contain A-to-G editing sites in Alu (yellow), repetitive non-Alu (pink) and nonrepetitive regions (blue). A significant number of genes contained both Alu and repetitive non-Alu A-to-G editing sites (P = 1.9 × 10−83) and Alu and nonrepetitive sites (P = 3.7 × 10−71). (b) Example of a gene (CD22) that contains all three types of editing. Editing in Alu, repetitive non-Alu and nonrepetitive regions occurs in close proximity to each other. (c) Distribution of distances from the nearest Alu A-to-G site to nonrepetitive A-to-G sites, repetitive non-Alu A-to-G sites and random adenosines in genes with Alu editing (nonrepetitive sites versus random adenosines: P = 1.1 × 10−96; repetitive non-Alu sites versus random adenosines: P = 7.9 × 10−160). (d,e) Number of edited Alu repeats per gene for genes with Alu editing only as compared to genes with Alu and nonrepetitive editing (P = 2.0 × 10−40) (d) and genes with Alu and repetitive non-Alu editing (P = 2.8 × 10−20) (e).


  1. Rosenberg, B.R., Hamilton, C.E., Mwangi, M.M., Dewell, S. & Papavasiliou, F.N. Nat. Struct. Mol. Biol. 18, 230236 (2011).
  2. Nishikura, K. Annu. Rev. Biochem. 79, 321349 (2010).
  3. Levanon, E.Y. et al. Nat. Biotechnol. 22, 10011005 (2004).
  4. Silberberg, G. & Ohman, M. Curr. Opin. Genet. Dev. 21, 401406 (2011).
  5. Li, J.B. et al. Science 324, 12101213 (2009).
  6. Li, M. et al. Science 333, 5358 (2011).
  7. Ju, Y.S. et al. Nat. Genet. 43, 745752 (2011).
  8. Bahn, J.H. et al. Genome Res. 22, 142150 (2012).
  9. Peng, Z. et al. Nat. Biotechnol. 30, 253260 (2012).
  10. Schrider, D.R., Gout, J.F. & Hahn, M.W. PLoS One 6, e25842 (2011).
  11. Lin, W., Piskol, R., Tan, M.H. & Li, J.B. Science 335, 1302 (2012).
  12. Kleinman, C.L. & Majewski, J. Science 335, 1302 (2012).
  13. Pickrell, J.K., Gilad, Y. & Pritchard, J.K. Science 335, 1302 (2012).
  14. Li, H. & Durbin, R. Bioinformatics 25, 17541760 (2009).
  15. Neeman, Y., Levanon, E.Y., Jantsch, M.F. & Eisenberg, E. RNA 12, 18021809 (2006).
  16. Bustamante, C.D., Burchard, E.G. & De la Vega, F.M. Nature 475, 163165 (2011).
  17. Kiran, A. & Baranov, P.V. Bioinformatics 26, 17721776 (2010).
  18. Parkhomchuk, D. et al. Nucleic Acids Res. 37, e123 (2009).
  19. Li, H. & Homer, N. Brief. Bioinform. 11, 473483 (2010).
  20. Li, H. et al. Bioinformatics 25, 20782079 (2009).
  21. Durbin, R.M. et al. Nature 467, 10611073 (2010).

Download references

Author information

  1. These authors contributed equally to this work.

    • Gokul Ramaswami,
    • Wei Lin &
    • Robert Piskol


  1. Department of Genetics, Stanford University, Stanford, California, USA.

    • Gokul Ramaswami,
    • Robert Piskol,
    • Meng How Tan &
    • Jin Billy Li
  2. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.

    • Wei Lin &
    • Carrie Davis


G.R., W.L. and R.P. performed the computational analyses with help from M.H.T. and J.B.L. M.H.T. and G.R. carried out the validation experiments. C.D. generated the GM12878 RNA-seq data. R.P. and J.B.L. wrote the paper with input from the other authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2.5M)

    Supplementary Figures 1–9, Supplementary Tables 1–5, and Supplementary Notes 1–4

Excel files

  1. Supplementary Data 1 (7.9M)

    All RNA editing sites identified in GM12878 using our method.

  2. Supplementary Data 2 (46.7M)

    All RNA editing sites identified in YH using our method.

Additional data