Although many long noncoding RNAs (lncRNAs) have been identified in human and other mammalian genomes, there has been limited systematic functional characterization of these elements. In particular, the contribution of lncRNAs to organ development remains largely unexplored. Here we analyse the expression patterns of lncRNAs across developmental time points in seven major organs, from early organogenesis to adulthood, in seven species (human, rhesus macaque, mouse, rat, rabbit, opossum and chicken). Our analyses identified approximately 15,000 to 35,000 candidate lncRNAs in each species, most of which show species specificity. We characterized the expression patterns of lncRNAs across developmental stages, and found many with dynamic expression patterns across time that show signatures of enrichment for functionality. During development, there is a transition from broadly expressed and conserved lncRNAs towards an increasing number of lineage- and organ-specific lncRNAs. Our study provides a resource of candidate lncRNAs and their patterns of expression and evolutionary conservation across mammalian organ development.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Data are available from the corresponding authors upon reasonable request.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).
Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017).
Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).
Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).
Washietl, S., Kellis, M. & Garber, M. Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res. 24, 616–628 (2014).
Hezroni, H. et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 11, 1110–1122 (2015).
Kopp, F. & Mendell, J. T. Functional classification and experimental dissection of long noncoding RNAs. Cell 172, 393–407 (2018).
Ponting, C. P., Oliver, P. L. & Reik, W. Evolution and functions of long noncoding RNAs. Cell 136, 629–641 (2009).
Ulitsky, I. Evolution to the rescue: using comparative genomics to understand long non-coding RNAs. Nat. Rev. Genet. 17, 601–614 (2016).
Necsulea, A. & Kaessmann, H. Evolutionary dynamics of coding and non-coding transcriptomes. Nat. Rev. Genet. 15, 734–748 (2014).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Ulitsky, I., Shkumatava, A., Jan, C. H., Sive, H. & Bartel, D. P. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 1537–1550 (2011).
Sauvageau, M. et al. Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife. 2, e01749 (2013).
Grote, P. & Herrmann, B. G. Long noncoding RNAs in organogenesis: making the difference. Trends Genet. 31, 329–335 (2015).
Goff, L. A. et al. Spatiotemporal expression and transcriptional perturbations by long noncoding RNAs in the mouse brain. Proc. Natl Acad. Sci. USA 112, 6855–6862 (2015).
Cardoso-Moreira, M. et al. Gene expression across mammalian organ development. Nature https://doi.org/10.1038/s41586-019-1338-5 (2019).
Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).
Conesa, A., Nueda, M. J., Ferrer, A. & Talón, M. maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 22, 1096–1102 (2006).
Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, eaah7111 (2017).
Mukherjee, N. et al. Integrative classification of human coding and noncoding genes through RNA metabolism profiles. Nat. Struct. Mol. Biol. 24, 86–96 (2017).
Soumillon, M. et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 3, 2179–2190 (2013).
Guttman, M. & Rinn, J. L. Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Kutter, C. et al. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 8, e1002841 (2012).
Quek, X. C. et al. lncRNAdbv2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015).
Melé, M. et al. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res. 27, 27–37 (2017).
Yevshin, I., Sharipov, R., Valeev, T., Kel, A. & Kolpakov, F. GTRD: a database of transcription factor binding sites identified by ChIP-seq experiments. Nucleic Acids Res. 45, D61–D67 (2017).
Olson, E. N. Gene regulatory networks in the evolution and development of the heart. Science 313, 1922–1927 (2006).
Ruf, S. et al. Large-scale analysis of the regulatory architecture of the mouse genome with a transposon-associated sensor. Nat. Genet. 43, 379–386 (2011).
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).
Amaral, P. P. et al. Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci. Genome Biol. 19, 32 (2018).
Luo, S. et al. Divergent lncRNAs regulate gene expression and lineage differentiation in pluripotent cells. Cell Stem Cell 18, 637–652 (2016).
Bester, A. C. et al. An integrated genome-wide CRISPRa approach to functionalize lncRNAs in drug resistance. Cell 173, 649–664 (2018).
Jiang, W., Liu, Y., Liu, R., Zhang, K. & Zhang, Y. The lncRNA DEANR1 facilitates human endoderm differentiation by activating FOXA2 expression. Cell Rep. 11, 137–148 (2015).
Jian, X. & Felsenfeld, G. Insulin promoter in human pancreatic β cells contacts diabetes susceptibility loci and regulates genes affecting insulin metabolism. Proc. Natl Acad. Sci. USA 115, E4633–E4641 (2018).
Spigoni, G., Gedressi, C. & Mallamaci, A. Regulation of Emx2 expression by antisense transcripts in murine cortico-cerebral precursors. PLoS ONE 5, e8658 (2010).
Ramos, A. D. et al. Integration of genome-wide approaches identifies lncRNAs of adult neural stem cells and their progeny in vivo. Cell Stem Cell 12, 616–628 (2013).
Li, W., Notani, D. & Rosenfeld, M. G. Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat. Rev. Genet. 17, 207–223 (2016).
Liu, S. J. et al. Single-cell analysis of long non-coding RNAs in the developing human neocortex. Genome Biol. 17, 67 (2016).
Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49, 1731–1740 (2017).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocols 7, 562–578 (2012).
Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).
Washietl, S. et al. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA 17, 578–594 (2011).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).
Li, L., Stoeckert, C. J. J. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Duret, L., Chureau, C., Samain, S., Weissenbach, J. & Avner, P. The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science 312, 1653–1655 (2006).
Hezroni, H. et al. A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes. Genome Biol. 18, 162 (2017).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. v.4.0.6 http://www.repeatmasker.org (2013–2015).
Chen, J. et al. Evolutionary analysis across mammals reveals distinct classes of long non-coding RNAs. Genome Biol. 17, 19 (2016).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).
Wucher, V. et al. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 45, e57 (2017).
Kolde, R. pheatmap: Pretty Heatmaps. v.1.0.10 (2015).
Hensman, J., Rattray, M. & Lawrence, N. D. Fast variational inference in the conjugate exponential family. In Advances in Neural Information Processing Systems 25 1–9 (2012).
Hensman, J., Lawrence, N. D. & Rattray, M. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters. BMC Bioinformatics 14, 252 (2013).
Hensman, J., Rattray, M. & Lawrence, N. D. Fast nonparametric clustering of structured time-series. IEEE Trans. Pattern Anal. Mach. Intell. 37, 383–393 (2015).
Wang, J., Vasaikar, S., Shi, Z., Greer, M. & Zhang, B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 45 (W1), W130–W137 (2017).
Kaessmann, H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 20, 1313–1326 (2010).
Revelle, W. psych: Procedures for Personality and Psychological Research. R package v.1.8.4 https://cran.r-project.org/web/packages/psych/index.html (2017).
Ebisuya, M., Yamamoto, T., Nakajima, M. & Nishida, E. Ripples from neighbouring transcription. Nat. Cell Biol. 10, 1106–1113 (2008).
R Core Team. R: A Language and Environment for Statistical Computing https://www.r-project.org/ (R Foundation for Statistical Computing, 2008).
Wickham, H., Romain, F., Henry, L. & Müller, K. dplyr: A Grammar of Data Manipulation. v.0.7.6 https://cran.r-project.org/web/packages/dplyr/index.html (2017).
Wickham, H. tidyr: Easily Tidy Data with ‘spread()’ and ‘gather()’ Functions. v.0.8.1 https://tidyr.tidyverse.org/ (2018).
Wickham, H. stringr: Simple, Consistent Wrappers for Common String Operations. v.1.3.1 https://stringr.tidyverse.org/ (2018).
Dowle, M. & Srinivasan, A. data.table: Extension of ‘data.frame’. v.1.11.4 https://cran.r-project.org/web/packages/data.table/index.html (2017).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis 2nd edn (Springer, 2016).
Auguie, B. gridExtra: Miscellaneous Functions for ‘Grid’ Graphics. v.2.3 https://rdrr.io/cran/gridExtra/ (2017).
Wickham, H. Reshaping data with the reshape package. J. Stat. Softw. 21, 1–20 (2007).
Wickham, H. The split-apply-combine strategy for data analysis. J. Stat. Softw. 40, 1–29 (2011).
Lê, S., Josse, J. & Husson, F. FactoMineR: an R Package for multivariate analysis. J. Stat. Softw. 25, 1–18 (2008).
We thank S. Anders, M. Sepp, E. Leushkin and members of the Kaessmann group for discussions, M. Sanchez-Delgado and N. Trost for assistance in figure design, and I. Moreira for help in the development of the interactive tool. We acknowledge support by the state of Baden-Württemberg through bwHPC and the German Research Foundation (DFG) through grant INST 35/1134-1 FUGG. This research was supported by grants from the European Research Council (615253, OntoTransEvol) and Swiss National Science Foundation (146474) to H.K., by the Marie Curie FP7-PEOPLE-2012-IIF to M.C.-M. (329902) and by a scholarship for MSc studies by the Alexander S. Onassis Public Benefit Foundation (F ZL 084-1/2015-2016) to I.S.
Peer review information
Nature thanks Camille Berthelot, Igor Ulitsky and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Schematic representation of the lncRNA annotation pipeline. b, Schematic representation of the pipeline for the detection of 1:1 lncRNA families.
a, Distribution of lncRNAs among genomic classes in each species. b, Comparison of genomic classes (left), evolutionary age (middle) and organ of maximum expression (right) for known (Ensembl19) and newly annotated (novel) human lncRNAs. c, Number of species with a detected lncRNA member for human families of various evolutionary ages. d, Comparison of the fraction of species with a detected lncRNA member for human families conserved across mammals (180 Ma) and amniotes (300 Ma) with a previous study8. e, Fraction of lncRNAs and protein-coding gene orthologues found in conserved synteny with at least one protein-coding gene neighbour for increasing evolutionary distances. f, Organ of maximum expression for expressed lncRNAs (≥1 RPKM) in each species. g, Number of lncRNAs expressed (≥1 RPKM) in each species during the development of each organ (in logarithmic scale).
a, Representative examples of human developmentally dynamic (n = 5,887) and non-dynamic (n = 25,791) lncRNA expression profiles (mean expression; vertical bars represent the minimum and maximum values across replicates) for varying levels of maximum expression, replicate reproducibility and expression windows. The vertical dashed line represents birth; the horizontal dashed line marks 1 RPKM. b, Summary statistics for the lncRNAs and protein-coding genes in this study. c, Number of organs with developmentally dynamic expression for dynamic lncRNAs and protein-coding genes in each species. d, e, Tissue-specificity (d) and median time-specificity (e) of non-dynamic and dynamic lncRNAs, and protein-coding genes, across species. Tissue- and time-specificity indexes range from 0 (broad expression) to 1 (specific expression). All comparisons between non-dynamic and dynamic lncRNAs, and protein-coding genes are significant (P = 2.2 × 10−16, two-sided Mann–Whitney U-test). f, Maximum expression levels (log10(RPKM)) for developmentally dynamic and non-dynamic lncRNAs across species (excluding samples from the sexually mature testis). Developmentally dynamic lncRNAs are more highly expressed in all species (P = 2.2 × 10−16, two-sided Mann–Whitney U-test). Box plots are as in Fig. 2.
a, Fraction of developmentally dynamic human lncRNAs (n = 5,887) for different genomic classes. Overrepresented classes were determined by comparing the fraction of dynamic lncRNAs in each class against all other classes. b, Normalized density distribution of the distance to the nearest protein-coding gene for dynamic (n = 5,887) and non-dynamic (n = 25,791) human lncRNAs. c, Generation of expression-matched dynamic (n = 2,906) and non-dynamic (n = 3,098) lncRNAs and their distribution among genomic classes. d, Fraction of developmentally dynamic human lncRNAs among isoforms with an increasing number of exons. The number of exons is significantly higher for developmentally dynamic lncRNAs (P = 2.2 × 10−16, two-sided Mann–Whitney U-test). e, Fraction of human lncRNAs that are intergenic, developmentally dynamic and that do not overlap enhancers25 (n = 16,481) among different age groups. f, Fraction of developmentally dynamic genes across expression-matched (n = 6,004) human lncRNAs of different age groups (top) and functionally characterized lncRNAs27 (bottom). g, Generation of expression-matched, lowly expressed (0.25–0.75 RPKM) dynamic (n = 798) and non-dynamic (n = 717) human lncRNAs and their distribution across different age groups. h, Fraction of developmentally dynamic human lncRNAs (n = 5,887) with or without a mouse (dynamic or not) orthologue (P = 2.2 × 10−16, Fisher’s exact test). i, Similarity of spatiotemporal expression (Spearman correlation coefficient between human and mouse organs/developmental stages) for 1:1 orthologues. j, Expression similarity across matched organs and developmental stages for mouse and rat 1:1 orthologous lncRNAs that are dynamic in both species, for different evolutionary ages. k, Fraction of lncRNAs present in the CRISPRi screen library21 resulting in a significant growth phenotype (hits) in at least one cell line for lncRNAs present (n = 2,364) or absent (n = 14,037) in our annotation and dynamic (n = 1,093) or non-dynamic (n = 1,277). l, Fraction of lncRNAs present in the CRISPRi screen library21 resulting in a significant growth phenotype (hits) in expression-matched dynamic (n = 2,906) and non-dynamic lncRNAs (n = 3,098). Box plots are as in Fig. 2. In a–l, statistical tests are two-sided.
a, Fraction of promoters of protein-coding genes, dynamic and non-dynamic lncRNAs, and size-matched random intergenic regions that overlap with binding sites for TFs. Each data point corresponds to a TF (n = 355). Box plots are as in Fig. 2. b, Selection of the 50 TFs with the highest binding variability across promoters of lncRNAs that were dynamic in different organs (in blue). TFs with maximum binding frequency ≤ 0.05 (red line) were not considered, as their high variability is probably associated with a low binding frequency. c, Spatiotemporal expression patterns of the 50 most variable TFs in mouse. The heat map is clustered by rows and shows expression levels in counts (after variance-stabilizing transformation).
a, Number of differentially expressed protein-coding genes and dynamic lncRNAs between adjacent stages of organ development in human, rat, rabbit, opossum and chicken. b, Number of differentially expressed ‘isolated intergenic’ (more than 100 kb from the closest protein-coding-gene) dynamic lncRNAs between adjacent stages during mouse development.
Clusters of developmentally dynamic lncRNAs and protein-coding genes across mouse organs (brain = 14,629 genes; cerebellum = 13,166; heart = 12,382; kidney = 14,634; liver = 13,888; ovary = 12,694; testis = 13,749). Grey lines represent individual gene trajectories and solid lines posterior mean trajectories for each cluster. Clusters are arranged by decreasing fraction of lncRNAs. Enriched representative biological processes (Benjamini–Hochberg adjusted P < 0.05, hypergeometric test) are shown for each cluster.
Extended Data Fig. 8 Characteristics of dynamic lncRNAs expressed in different developmental stages.
a, Expression similarity between human and mouse 1:1 orthologous protein-coding genes (n = 16,078), developmentally dynamic (n = 281) and non-dynamic (n = 1,386) lncRNAs across organs/developmental stages. Each point corresponds to the Spearman correlation coefficient of expression between human and mouse orthologues for matching samples. Lines and the 95% confidence interval (shaded regions) correspond to linear model predictions. Spearman correlation coefficients between expression similarity and developmental stage are given for each comparison. b, Expression similarity between dynamic human and mouse orthologous lncRNAs from a, summarized by organ. c, Fraction of conserved (≥80 Ma) dynamic lncRNAs expressed in each mouse organ during development. The colour signifies the focal organ for each comparison. d, Tissue-specificity for mouse lncRNAs with different developmental trajectories. e, Fraction of human lncRNAs with different developmental trajectories among functionally characterized lncRNAs27 (n = 59). f, CRISPRi growth screen hits21 (n = 98). g, Fraction of late-expressed dynamic (n = 2,956) and non-dynamic (n = 25,791) lncRNAs for different age groups and functionally characterized27 human lncRNAs. Box plots are as in Fig. 2. *P < 0.05, **P < 0.01, ***P < 0.001, two-sided Mann–Whitney U-test (b–d) or Fisher’s exact test (e–g).
a, Normalized density distribution of Pearson correlation coefficients (r) of spatiotemporal gene expression between adjacent paralogous (human = 267; mouse = 263) and non-paralogous (human = 3,359; mouse = 3,382) mRNA–mRNA pairs. b, Number of paralogous (human = 267; mouse = 263) and non-paralogous (human = 3,359; mouse = 3,382) adjacent mRNA–mRNA pairs detected as co-expressed above a range of Pearson’s r cut-offs. c, Relationship between distance and Pearson correlation of expression for lncRNA–mRNA (human = 4,881; mouse = 4,722) and mRNA–mRNA (human = 3,359; mouse = 3,382) pairs. Lines were estimated through LOESS regression and the 95% confidence interval is shown in grey. d, Distribution of Pearson’s r for lncRNA–mRNA and mRNA–mRNA pairs across different distance intervals. Box plots are as in Fig. 2. e, Density distributions of Pearson’s r between a protein-coding gene and its nearest dynamic lncRNA (human = 2,440; mouse = 2,549) and protein-coding gene (human = 1,606; mouse = 1,777) after excluding antisense and divergently transcribed lncRNAs. f, Enriched biological processes among human protein-coding genes with significantly higher expression correlations with their adjacent dynamic lncRNA than with the control protein-coding gene (n = 358; Benjamini–Hochberg adjusted P < 0.01, hypergeometric test; data for mouse are shown in Fig. 4b). In a–e, statistical tests are two-sided.
This file contains legends for Supplementary Tables 1-16.
Supplementary Tables 1-16.
This file contains the lncRNA annotations used in this study in gtf format. Coordinates correspond to the following genome assemblies (human: hg19; rhesus macaque: rheMac3; mouse: mm10; rat: Rnor_5.0; rabbit: OryCun2.0; opossum: monDom5; chicken: Galgal4).
This file contains expression tables (in RPKM) for lncRNAs, putative new coding genes (denoted with the suffix “.coding”) and Ensembl-annotated transcribed regions that don’t overlap our lncRNAs in the same strand (v75 for human, v77 for all other species).