Abundant long, noncoding RNAs (lncRNAs) in mammals can bind to DNA sequences and recruit histone- and DNA-modifying enzymes to binding sites to epigenetically regulate target genes. However, most lncRNAs’ binding motifs and target sites are unknown. The large numbers of lncRNAs and target sites in the whole genome make it infeasible to examine lncRNA binding to DNA purely experimentally. Here, we report a protocol for lncRNA/DNA-binding analysis that is built upon a database containing the GENCODE-annotated human and mouse lncRNAs, the orthologs of these lncRNAs in 17 mammals, and the genome sequences of the 17 mammals. Cross-species and genome-wide lncRNA/DNA-binding analysis begins with and is driven by database search. The predicted DNA-binding motifs and binding sites answer the general question of which lncRNAs may epigenetically regulate which genes, and can be used to identify potential sites for genome and epigenome editing. To use the protocol, preliminary knowledge of the base-pairing rules that guide the binding of noncoding RNAs to DNA to form triplexes, as well as the skills required to use the UCSC Genome Browser, are needed. A genome-wide prediction takes from 2 to 10 d, and the results are sent to users automatically by e-mail. The platform is updated continuously, making it possible to study more lncRNAs and larger genomic regions in less computational time.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $41.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
He, S., Zhang, H., Liu, H. & Zhu, H. Bioinformatics 31, 178–186 (2015): https://doi.org/10.1093/bioinformatics/btu643
Liu, H., Shang, X. & Zhu, H. Bioinformatics 33, 1431–1436 (2017): https://doi.org/10.1093/bioinformatics/btw818
Wang, S. et al. Cell Death Dis. 9, 805 (2018): https://doi.org/10.1038/s41419-018-0869-2
Maeda, N. et al. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet. 2, e62 (2006).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009).
Jia, H. et al. Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA 16, 1478–1487 (2010).
Kapranov, P. et al. The majority of total nuclear-encoded non-ribosomal RNA in a human cell is ‘dark matter’ un-annotated RNA. BMC Biol. 8, 149 (2010).
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).
Ulitsky, I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nat. Rev. Genet. 17, 601–614 (2016).
Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).
Tsai, M. C. et al. Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689–693 (2010).
Monnier, P. et al. H19 lncRNA controls gene expression of the Imprinted Gene Network by recruiting MBD1. Proc. Natl Acad. Sci. USA 110, 20693–20698 (2013).
Lee, J. T. Lessons from X-chromosome inactivation: long ncRNA as guides and tethers to the epigenome. Genes Dev. 23, 1831–1842 (2009).
Buske, F. A., Bauer, D. C., Mattick, J. S. & Bailey, T. L. Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res. 22, 1372–1381 (2012).
He, S., Zhang, H., Liu, H. & Zhu, H. LongTarget: a tool to predict lncRNA DNA-binding motifs and binding sites via Hoogsteen base-pairing analysis. Bioinformatics 31, 178–186 (2015).
Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008).
Ram, O. et al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells. Cell 147, 1628–1639 (2011).
GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Rigoutsos, I. et al. N-BLR, a primate-specific non-coding transcript leads to colorectal cancer invasion and migration. Genome Biol. 18, 98 (2017).
Liu, H., Shang, X. & Zhu, H. LncRNA/DNA binding analysis reveals losses and gains and lineage specificity of genomic imprinting in mammals. Bioinformatics 33, 1431–1436 (2017).
Abu Almakarem, A. S., Petrov, A. I., Stombaugh, J., Zirbel, C. L. & Leontis, N. B. Comprehensive survey and geometric classification of base triples in RNA structures. Nucleic Acids Res. 40, 1407–1423 (2012).
Kotake, Y. et al. Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing ofp15(INK4B) tumor suppressor gene. Oncogene 30, 1956–1962 (2011).
Gabory, A. et al. H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136, 3413–3421 (2009).
Lun, A. T., Chen, Y. & Smyth, G. K. It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods Mol. Biol. 1418, 391–416 (2016).
Flavahan, W. A., Gaskell, E. & Bernstein, B. E. Epigenetic plasticity and the hallmarks of cancer. Science 357, eaal2380 (2017).
Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 25, 927–936 (2015).
Li, L., Helms, J. A. & Chang, H. Y. Comment on “Hotair Is Dispensable for Mouse Development”. PLoS Genet. 12, e1006406 (2016).
Amandio, A. R., Necsulea, A., Joye, E., Mascrez, B. & Duboule, D. Hotair Is Dispensible for Mouse Development. PLoS Genet. 12, e1006232 (2016).
Dinger, M. E. et al. NRED: a database of long noncoding RNA expression. Nucleic Acids Res. 37, D122–D126 (2009).
Amaral, P. P., Clark, M. B., Gascoigne, D. K., Dinger, M. E. & Mattick, J. S. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res. 39, D146–D151 (2011).
Chen, G. et al. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 41, D983–D986 (2013).
Fang, S. et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 46, D308–D314 (2018).
Yu, W. et al. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 451, 202–206 (2008).
Yap, K. L. et al. Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol. Cell 38, 662–674 (2010).
Luo, M. et al. A genome-wide survey of imprinted genes in rice seeds reveals imprinting primarily occurs in the endosperm. PLoS Genet. 7, e1002125 (2011).
Smits, G. et al. Conservation of the H19 noncoding RNA and H19-IGF2 imprinting mechanism in therians. Nat. Genet. 40, 971–976 (2008).
Barlow, D. P. & Bartolomei, M. S. Genomic imprinting in mammals. Cold Spring Harb. Perspect. Biol. 6, a018382 (2014).
Johnsson, P. et al. A pseudogene long-noncoding-RNA network regulates PTEN transcription and translation in human cells. Nat. Struct. Mol. Biol. 20, 440–446 (2013).
Lister, N. et al. The molecular dynamics of long noncoding RNA control of transcription in PTEN and its pseudogene. Proc. Natl Acad. Sci. USA 114, 9942–9947 (2017).
He, S., Gu, W., Li, Y. & Zhu, H. ANRIL/CDKN2B-AS shows two-stage clade-specific evolution and becomes conserved after transposon insertions in simians. BMC Evol. Biol. 13, 247 (2013).
Pasmant, E. et al. Characterization of a germ-line deletion, including the entire INK4/ARF locus, in a melanoma-neural system tumor family: identification of ANRIL, an antisense noncoding RNA whose expression coclusters with ARF. Cancer Res. 67, 3963–3969 (2007).
This work received financial support (to H. Zhu) from the NSFC (31571348 and 31771456), the Special Program for Applied Research on SuperComputation of the NSFC-Guangdong Joint Fund, and the Guangzhou Science and Technology Innovation Committee (201607010067).
Integrated supplementary information
To open this window that shows TTSs of TFO1, click the blue TFO1 button in the webpage shown in Fig. 9.
Supplementary Figure 2 The initial records in the Excel file that reports the TTS distribution of H19 at all transcripts in the human genome hg38.
“bs” means binding site. bs_chr, bs_start, and bs_end indicate the chromosome number, start coordinate, and end coordinate of a TTS. TTS_area is the area of the peak of a TTS (as shown in custom tracks of TTS distributions) and indicates the strength of the TTS.
The picture is obtained from the GTEx Gene track in the UCSC Genome Browser by choosing the GTEx Transcript track in the Expression section to display GTEx genes graphically.
From top to bottom are 14 custom tracks of the TTS distribution of 14 lncRNAs, the track of Ensembl Genes, the custom track of CDKN2B-AS1_Marmoset, and the track of RepeatMasker. Three TTSs at transposable elements of Simple or Low Complexity are marked by three blue vertical lines. The results indicate that, as in humans, many lncRNAs bind to promoters of CDKN2A/2B.
Some TTSs at promoters and CpG islands (in green) are marked in yellow, and some TTSs at transposable elements (in the RepeatMasker track) and repetitive elements (in the SimpleRepeats track) are marked in blue. Some lncRNAs have TTSs only at transposable and/or repetitive elements.
Supplementary Figure 6 The TTS distributions of H19 and other human lncRNAs in the human IGF2 region.
From top to bottom are custom tracks of the TTS distribution of 16 lncRNAs, UCSC Genes, CpG Islands, ENCODE DNA Methylation (the colored lines indicate DNA methylation signals), and ENCODE Histone Modification (the colored areas indicate histone modification signals). This figure indicates that many lncRNAs may bind to the IGF2 region at the site H19 binds to.
This webpage shows the coordinates of all exons of the orthologue of human CDKN2B-AS1 in marmoset.
The gene name is CDKN2B-AS1_Marmoset as shown in Supplementary Figure 4. All custom gene track files should follow the same format, but can adopt any file name.
The figure shows some of the records in the Excel file that reports TTSs of lncRNA transcripts at the genomic regions of protein-coding transcripts. Here peak_area and TTS_area are defined as in Supplementary Figure 2.
The two ENST ID lists are the inputs of the M:N case of genome-wide prediction shown in Supplementary Figure 9. These Ensembl ENST IDs are differentially expressed lncRNA transcripts and differentially expressed protein-coding transcripts from an RNA-seq analysis we made that compare the gene expression in 12 human colorectal cancer tissues with the gene expression in 3 normal colorectal tissues (unpublished observations [Sha He, Yujian Wen, Hao Zhu]). These transcripts were assembled using reads of RNA-seq by the StringTie program (Nat. Biotechnol. 33, 290–295; 2015) and differential expression was determined by the EdgeR program (Genome Biol. 17, 75; 2016).