Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Locus-specific expression of transposable elements in single cells with CELLO-seq

Abstract

Transposable elements (TEs) regulate diverse biological processes, from early development to cancer. Expression of young TEs is difficult to measure with next-generation, single-cell sequencing technologies because their highly repetitive nature means that short complementary DNA reads cannot be unambiguously mapped to a specific locus. Single CELl LOng-read RNA-sequencing (CELLO-seq) combines long-read single cell RNA-sequencing with computational analyses to measure TE expression at unique loci. We used CELLO-seq to assess the widespread expression of TEs in two-cell mouse blastomeres as well as in human induced pluripotent stem cells. Across both species, old and young TEs showed evidence of locus-specific expression with simulations demonstrating that only a small number of very young elements in the mouse could not be mapped back to the reference with high confidence. Exploring the relationship between the expression of individual elements and putative regulators revealed large heterogeneity, with TEs within a class showing different patterns of correlation and suggesting distinct regulatory mechanisms.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: CELLO-seq overview and its ability to aid in the study of allelic and isoform expression.
Fig. 2: CELLO-seq enables TE-derived isoform and TE expression analysis in single cells at single loci.
Fig. 3: Simulations characterizing the mapping of young L1 in the mouse and human genome.
Fig. 4: CELLO-seq enables the study of young TEs at unique loci.

Data availability

The datasets generated during the current study are available under ArrayExpress accession E-MTAB-9577. We analyzed two-cell RNA-seq data from GSE97778, GSE66390, GSE76687 and GSE71434; ATAC–seq from GSE76642 and GSE66390; H3K9me3 data from GSE97778; H3K4me3 from GSE73952, GSE76687 and GSE71434; H3K27me3 from GSE73952 and GSE76687; and whole-genome bisulfite data from GSE97778 and E-MTAB-9090. We analyzed hiPSC RNA-seq data from GSE47626 and GSE56568; H3K4me3, H3K9me3, H3K27me3 and whole-genome bisulfite data from GSE16265; and H3K4me3 from GSE16256.

Code availability

For data analysis the code is available in the following GitHub repositories: https://github.com/MarioniLab/CELLOseq, https://github.com/MarioniLab/sarlacc and https://github.com/MarioniLab/long_read_simulations.

References

  1. 1.

    Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).

    CAS  PubMed  Google Scholar 

  3. 3.

    Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Boroviak, T. et al. Single cell transcriptome analysis of human, marmoset and mouse embryos reveals common and divergent features of preimplantation development. Development 145, dev167833 (2018).

    PubMed  PubMed Central  Google Scholar 

  5. 5.

    Brocks, D., Chomsky, E., Mukamel, Z., Lifshitz, A. & Tanay, A. Single cell analysis reveals dynamics of transposable element transcription following epigenetic de-repression. Preprint at bioRxiv https://doi.org/10.1101/462853 (2019).

  6. 6.

    Ge, S. X. Exploratory bioinformatics investigation reveals importance of “junk” DNA in early embryo development. BMC Genomics 18, 200 (2017).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Hackett, J. A., Kobayashi, T., Dietmann, S. & Surani, M. A. Activation of lineage regulators and transposable elements across a pluripotent spectrum. Stem Cell Rep. 8, 1645–1658 (2017).

    CAS  Google Scholar 

  8. 8.

    Huang, Y. et al. Stella modulates transcriptional and endogenous retrovirus programs during maternal-to-zygotic transition. eLife 6, e22345 (2017).

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    Zhang, W. et al. Zscan4c activates endogenous retrovirus MERVL and cleavage embryo genes. Nucleic Acids Res. 47, 8485–8501 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Sexton, C. E. & Han, M. V. Paired-end mappability of transposable elements in the human genome. Mob. DNA https://www.researchgate.net/publication/334375540_Paired-end_mappability_of_transposable_elements_in_the_human_genome (2019).

  11. 11.

    Faulkner, G. J. et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics 91, 281–288 (2008).

    CAS  PubMed  Google Scholar 

  12. 12.

    Lanciano, S. & Cristofari, G. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 21, 721–736 (2020).

  13. 13.

    Deininger, P. et al. A comprehensive approach to expression of L1 loci. Nucleic Acids Res. 45, e31 (2017).

  14. 14.

    Philippe, C. et al. Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci. eLife 5, e13926 (2016).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Macia, A. et al. Epigenetic control of retrotransposon expression in human embryonic stem cells. Mol. Cell Biol. 31, 300–316 (2011).

    CAS  PubMed  Google Scholar 

  16. 16.

    Garcia-Perez, J. L., Widmann, T. J. & Adams, I. R. The impact of transposable elements on mammalian development. Development 143, 4101–4114 (2016).

    CAS  PubMed  Google Scholar 

  17. 17.

    Chinwalla, A. T. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

    PubMed  Google Scholar 

  18. 18.

    Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    CAS  PubMed  Google Scholar 

  19. 19.

    Craig, N. L. et al. (eds) Mobile DNA III 3rd edn (ASM Press, 2015).

  20. 20.

    Deininger, P. L., Batzer, M. A., Hutchison, C. A. & Edgell, M. H. Master genes in mammalian repetitive DNA amplification. Trends Genet. 8, 307–311 (1992).

    CAS  PubMed  Google Scholar 

  21. 21.

    Griffiths, D. J. Endogenous retroviruses in the human genome sequence. Genome Biol. 2, reviews 1017.1–1017.5 (2001).

  22. 22.

    Smit, A. F. A., Tóth, G., Riggs, A. D. & Jurka, J. Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J. Mol. Biol. 246, 401–417 (1995).

    CAS  PubMed  Google Scholar 

  23. 23.

    Ribet, D. et al. Murine endogenous retrovirus MuERV-L is the progenitor of the “orphan” epsilon viruslike particles of the early mouse embryo. J. Virol. 82, 1622–1625 (2008).

    CAS  PubMed  Google Scholar 

  24. 24.

    Walsh, C. P., Chaillet, J. R. & Bestor, T. H. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat. Genet. 20, 116–117 (1998).

    CAS  PubMed  Google Scholar 

  25. 25.

    Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).

    CAS  PubMed  Google Scholar 

  26. 26.

    Berrens, R. V. et al. An endosiRNA-based repression mechanism counteracts transposon activation during global DNA demethylation in embryonic stem cells. Cell Stem Cell 21, 694–703.e7 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).

    CAS  PubMed  Google Scholar 

  28. 28.

    Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Lebrigand, K., Magnone, V., Barbry, P. & Waldmann, R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nat. Commun. 11, 4025 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Peaston, A. E. et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7, 597–606 (2004).

    CAS  PubMed  Google Scholar 

  31. 31.

    Fadloun, A. et al. Chromatin signatures and retrotransposon profiling in mouse embryos reveal regulation of LINE-1 by RNA. Nat. Struct. Mol. Biol. 20, 332–338 (2013).

    CAS  PubMed  Google Scholar 

  32. 32.

    Van der Verren, S. E. et al. A dual-constriction biological nanopore resolves homonucleotide sequences with high fidelity. Nat. Biotechnol. 38, 1415–1420 (2020).

  33. 33.

    Karst, S. M. et al. Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).

  34. 34.

    Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Mincarelli, L., Uzun, V., Rushworth, S. A., Haerty, W. & Macaulay, I. C. Combined single-cell gene and isoform expression analysis in haematopoietic stem and progenitor cells. Preprint at bioRxiv https://doi.org/10.1101/2020.04.06.027474 (2020).

  36. 36.

    Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

    CAS  PubMed  Google Scholar 

  37. 37.

    Streeter, I. et al. The human-induced pluripotent stem cell initiative—data resources for cellular genetics. Nucleic Acids Res. 45, D691–D697 (2017).

    CAS  PubMed  Google Scholar 

  38. 38.

    Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Faulkner, G. J. et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).

    CAS  PubMed  Google Scholar 

  40. 40.

    Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Klawitter, S. et al. Reprogramming triggers endogenous L1 and Alu retrotransposition in human induced pluripotent stem cells. Nat. Commun. 7, 10286 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Wissing, S. et al. Reprogramming somatic cells into iPS cells activates LINE-1 retroelement mobility. Hum. Mol. Genet. 21, 208–218 (2012).

    PubMed  Google Scholar 

  43. 43.

    Wick, R. R. Badread: simulation of error-prone long reads. J. Open Source Softw. 4, 1316 (2019).

    Google Scholar 

  44. 44.

    Wang, C. et al. Reprogramming of H3K9me3-dependent heterochromatin during mammalian embryo development. Nat. Cell Biol. 20, 620–631 (2018).

    CAS  PubMed  Google Scholar 

  45. 45.

    Schöpp, T. et al. TEX15 is an essential executor of MIWI2-directed transposon DNA methylation and silencing. Nat. Commun. 11, 3739 (2020).

    PubMed  PubMed Central  Google Scholar 

  46. 46.

    Park, S.-J., Shirahige, K., Ohsugi, M. & Nakai, K. DBTMEE: a database of transcriptome in mouse early embryos. Nucleic Acids Res. 43, D771–D776 (2015).

    PubMed  Google Scholar 

  47. 47.

    Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Ewing, A. D. et al. Nanopore sequencing enables comprehensive transposable element epigenomic profiling. Mol. Cell 80, 915–928 (2020).

  49. 49.

    Brouha, B. et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl Acad. Sci. USA 100, 5280–5285 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Gardner, E. J. et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Pitkänen, E. et al. Frequent L1 retrotranspositions originating from TTC28 in colorectal cancer. Oncotarget 5, 853–859 (2014).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).

  54. 54.

    Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014).

    PubMed  PubMed Central  Google Scholar 

  55. 55.

    Okae, H. et al. Genome-wide analysis of DNA methylation dynamics during early human development. PLoS Genet. 10, e1004868 (2014).

    PubMed  PubMed Central  Google Scholar 

  56. 56.

    Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).

    CAS  PubMed  Google Scholar 

  57. 57.

    Volden, R. & Vollmers, C. Highly multiplexed single-cell full-length cDNA sequencing of human immune cells with 10X Genomics and R2C2. Preprint at bioRxiv https://doi.org/10.1101/2020.01.10.902361 (2021).

  58. 58.

    Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).

    CAS  PubMed  Google Scholar 

  61. 61.

    Hennig, B. P. et al. Large-scale low-cost NGS library preparation using a robust Tn5 purification and tagmentation protocol. G3 (Bethesda) 8, 79–89 (2018).

    CAS  Google Scholar 

  62. 62.

    Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 24, 2033–2040 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    BBMap. SourceForge https://sourceforge.net/projects/bbmap/ (2021).

  64. 64.

    Babraham Bioinformatics. Trim Galore http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (2019).

  65. 65.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Smith, T. S., Heger, A. & Sudbery, I. UMI-tools: modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

  67. 67.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Google Scholar 

  68. 68.

    Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    CAS  PubMed  Google Scholar 

  69. 69.

    Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).

    CAS  PubMed  Google Scholar 

  70. 70.

    Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    PubMed  PubMed Central  Google Scholar 

  71. 71.

    Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).

    PubMed  PubMed Central  Google Scholar 

  72. 72.

    Wick, R. rrwick/Porechop https://github.com/rrwick/Porechop (2020).

  73. 73.

    Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker https://www.repeatmasker.org/faq.html (1996).

  75. 75.

    Kent, W. J. et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Lun, A. T. L. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 63 (2019).

    PubMed  PubMed Central  Google Scholar 

  77. 77.

    Zhang, B. et al. Allelic reprogramming of the histone modification H3K4me3 in early mammalian development. Nature 537, 553–557 (2016).

    CAS  PubMed  Google Scholar 

  78. 78.

    Wu, J. et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534, 652–657 (2016).

    CAS  PubMed  Google Scholar 

  79. 79.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Zheng, H. et al. Resetting epigenetic memory by reprogramming of histone modifications in mammals. Mol. Cell 63, 1066–1079 (2016).

    CAS  PubMed  Google Scholar 

  81. 81.

    Liu, X. et al. Distinct features of H3K4me3 and H3K27me3 chromatin domains in pre-implantation embryos. Nature 537, 558–562 (2016).

    CAS  PubMed  Google Scholar 

  82. 82.

    Marchetto, M. C. N. et al. Differential LINE-1 regulation in pluripotent stem cells of humans and other great apes. Nature 503, 525–529 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. 83.

    Liu, Q. et al. Genome-wide temporal profiling of transcriptome and open chromatin of early cardiomyocyte differentiation derived from hiPSCs and hESCs. Circ. Res. 121, 376–391 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84.

    Guenther, M. G. et al. Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. Cell Stem Cell 7, 249–257 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85.

    Wheeler, D. L. et al. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31, 28–33 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. 86.

    Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).

  87. 87.

    Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. 88.

    Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the Sanger and WIMM cytometry core facility for sorting of hiPSCs. We thank the Sanger, CRUK and WIMM sequencing facility for sequencing NGS data. We thank the WIMM single-cell facility for generating the 10X data. We thank V. Sundaram for fruitful discussions. We thank P. Gould, W. Reik and D. O’Carroll for helpful comments on the manuscript. This research was supported by a Sir Henry Wellcome Fellowship to R.V.B. (no. 213612), an EBPOD Fellowship to A.Y., an HFSP Long Term Fellowship to C.E.L., support from Cancer Research UK (CRUK) (C9545/A29580) and Core support from EMBL to J.C.M. and a Wellcome Trust Fellowship to J.S.B.

Author information

Affiliations

Authors

Contributions

R.V.B. conceived, designed, executed and analyzed all experiments. A.Y. performed simulations and analyzed data. G.L. performed embryo collection. A.T.L.L. wrote the computational method sarlacc with the help of C.-T.L. and F.B. J.S.B. and N.B. performed and advised on ScNaUMI-seq experiments. M.I. and D.J.G. provided hiPSCs. C.E.L. helped conceive this study, and helped to design and optimize the CELLO-seq protocol. J.C.M. conceived and supervised this study. J.M. and R.V.B. wrote the final version of the manuscript. All authors commented on the final manuscript.

Corresponding authors

Correspondence to Rebecca V. Berrens or John C. Marioni.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Geoffrey Faulkner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 CELLO-seq library properties.

Boxplots of counts across the 50 highest expressed young L1 unique loci in mouse ES cells cultured in 2i medium measured by (a) ScNaUmi-seq or (b) CELLO-seq. (c) histogram of number of UMIs (y-axis) by UMI group size (x-axis) for 2-cell blastomeres CELLO-seq data sequenced on a MinION flow cell or Illumina platform. (d) Light field microscopy image of two-cell embryo blastomere isolation. Two-cell embryo (i) with zona pellucida; (ii) without zona pellucida; (iii) as single blastomeres. This experiment was repeated more than 20 times. Scale bar 25μm. (e) bargraph of read numbers for mouse 2-cell embryo dataset (right) and human iPSCs (left). (f) density plot of number of molecules (y-axis) by length of mapped molecules (x-axis) for hiPSCs and mouse blastomeres. (g) scatter plot of number of reads (y-axis) versus number of genes in hiPSCs from Smart-seq2 libraries sequenced by Illumina. (h) schematic of sarlacc workflow. We demultiplexed samples by grouping barcodes with a Levenshtein distance below the grouping threshold. We performed pregrouping by mapping the reads to the relevant transcriptome. We grouped the reads by UMI sequence and error corrected the reads in the true UMI group, or by picking a random read from the UMI group in deduplication mode. For this study we used error corrected reads. (i) barplot of fraction of reads (y-axis) and their relative position on a transcript (x-axis) from the start or the end of the molecule depending on the gene length. (j) Scatter plot of short read (y-axis) versus long read (x-axis) gene expression depending on the length of the gene. (k) Scatter plot of ERCC concentration (y-axis) and ERCC molecules (x-axis) of mouse blastomere CELLO-seq data. (l) Scatter plot of ERCC concentration (y-axis) and ERCC molecules (x-axis) from mouse blastomere CELLO-seq libraries with Illumina sequencing. For J-L Pearson Correlation coefficient (R) and two-sided p-value shown.

Extended Data Fig. 2 Isoform analysis.

(a) expression of TE derived-isoforms in human iPSCs and mouse 2-cell data stratified by whether a repeat acts as a transcript end site (TES) or as a transcript start site (TSS). Mouse: TES (n = 353), TSS (n = 76). Human: TES (n = 537), TSS (n = 73). (B) number of TE derived-isoforms in hiPSCs and mouse blastomeres with repeat as TES or TSS. (c) Barplot of repeat family underlying repeat derived-isoforms in hiPSCs. (d) Barplot of repeat family underlying repeat derived-isoforms in mouse blastomeres. (e) frequency plot of number of repeats (x-axis) by age of TE (mya) (y-axis). TEs that mapped to hiPSCs (top), all repeats from human UCSC repeatmasker annotation (bottom). TEs are grouped by class and color-coded by family. (f) boxplots of age (mya) of TEs (y-axis) of young human L1s either detected or not detected in hiPSC dataset. detected: L1HS (n = 15), L1PA2 (n = 34), L1PA3 (n = 34), L1PA4 (n = 20), L1PA5 (n = 18), L1PA6 (n = 17), not detected: L1HS (n = 1692), L1PA2 (n = 5148), L1PA3 (n = 11194), L1PA4 (n = 12471), L1PA5 (n = 11735), L1PA6 (n = 6195). (g) boxplots of age (mya) of TEs (y-axis) of young mouse L1s either detected or not detected in mouse blastomeres. detected: L1MdA (n = 27), L1MdT (n = 44), L1MdF3 (n = 8), L1MdF2 (n = 43), L1MdGf (n = 1), L1MdF (n = 5), not detected: L1MdA (n = 16817), L1MdT (n = 23644), L1MdF3 (n = 16138), L1MdF2 (n = 64855), L1MdGf (n = 1079), L1MdF (n = 4011). The boxplots in A,F and G show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 3 Simulation of correct mapping of young L1 in mouse and human genome.

(a) bargraph showing the number of reads (y-axis) by the simulation type with either 1x or 10x coverage (x-axis), color-coded by alignment type with mapped = read at correct location after mapping with minimap2 to the genome, mismapped = read maps at wrong location, unmapped = read not mapped, unresolved = group has more than one molecule present and group cannot be resolved to a unique read. mouse (left), human (right). (b) bargraph of proportion of read group sizes (y-axis) by alignment type (x-axis), left showing 1x read coverage, right showing 10x read coverage. Color-coded by group size. mouse (top), human (bottom). (c) Stacked bargraph showing proportion of L1 elements (y-axis) by simulation type using 10x read coverage (x-axis), colored by specificity score, mouse (left), human (right). (d) Jitter plot of TE subfamily (y-axis) by TE age (million years ago) grouped by simulation type and coloured by % of mapped reads with yellow being 0% mapped and dark blue being 100% mapped. Mouse L1 top panel and human L1 bottom panel. Simulation type: perfect = perfect read identity, ONT = ONT read identity, ONT 5x = ONT read identity with 5x coverage, sarlacc corrected 5x = ONT read identity score, 5x coverage with sarlacc error correction, sarlacc corrected 10x = ONT read identity score, 10x coverage with sarlacc error correction, sarlacc deduplicated 5x = ONT read identity score, 5x coverage with sarlacc deduplication by randomly choosing 1 read. PG = perfect grouping.

Extended Data Fig. 4 UMI simulations.

(a) Distribution of Levenshtein distance between randomly simulated UMI (x-axis) based on UMI length with RYN pattern (left) or NNN pattern (right). Light grey bar shows distance threshold for grouping of reads by UMIs used for most short read UMIs or CELLO-seq. (b) Line graph of fraction of pure groups (y-axis) by Levenshtein distance (x-axis) by UMI group, either with perfect read identity or ONT read identity. On the left is the line graph of UMI simulations without any pregrouping by mapping. On the right the line graph is UMI simulation where pregrouping was performed by random assignment of true UMI sequences into groups of 100 unique UMIs. (c) distribution plot of UMI group sizes (x-axis) by Levenshtein distance threshold (y-axis) based on UMI length, with perfect ONT read identity and no pregrouping (left) or pregrouping (right).

Extended Data Fig. 5 CELLO-seq to study locus specific TE expression.

(a) Heatmap of expression of all SINE elements in mouse blastomeres, with rows clustered by SINE family and colour-coded by TE subfamily. (b) Heatmap of expression of full-length (>5000nt) elements in mouse blastomeres, with rows clustered by TE family and color-coded by TE subfamily. (c) Heatmap of logcounts of highest expressed (mean expression > 1) elements in hiPSCs with rows clustered by TE subfamily. (d) Boxplot of percentage of reads mapped to TEs or TE families in CELLO-seq mouse 2-cells. P-value: L1Md to SINE B1/B2 = 0.004998, L1Md to MERVL = 0.004998, 2-sided Wilcoxon rank sum test. n = 6 cells. (e) Boxplot of percentage of TEs expressed by number of TEs in the genome in CELLO-seq mouse blastomeres. p-value: repeats to L1Md = 0.0022, repeats to SINE B1/B2 = 0.0022, repeats to MERVL = 0.0022, 2-sided Wilcoxon rank sum test. n = 6 cells. (f) boxplot of number of MERVL elements expressed in each cell of CELLO-seq 2-cells compared to published short read data. CELLO-seq (n = 6 cells), bulk (n = 7 independent experiments). (g) boxplot of number of HERVH-int elements expressed in each cell of CELLO-seq compared to published short read data. CELLO-seq (n = 96 cells), bulk (n = 10 independent experiments). (h) expression, methylation, ATAC-seq and ChIP-seq of MERVL elements with read counts in CELLO-seq libraries compared to MERVL elements with no counts in CELLO-seq libraries. expressed (n = 355 MERVLs), not expressed (n = 41 MERVLs), datasets: ATAC-seq (n = 1), DNA methylation (n = 2), H3K27me3 (n = 1), H3K4me3 (n = 1), H3K9me3 (n = 1), RNAseq (n = 3). (i) expression, methylation, ATAC-seq and ChIP-seq data of HERV-int elements with read counts in CELLO-seq libraries compared to HERVH-int with no counts in CELLO-seq libraries. expressed (n = 14 HERVH-ints), not expressed (n = 110 HERVH-ints), each dataset (n = 1), RNAseq (n = 3). The boxplots shown in D-I show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 6 CELLO-seq to study locus specific young L1 expression.

(a) DNA methylation of L1Md elements expressed in CELLO-seq mouse blastomeres. Methylation level of L1Mds across preimplantation development and in spermatogonia. Bold: L1s with full-length ORF by ORFfinder. (b) Methylation level of L1Mds across early development in human iPS cells as well as in tumour and normal tissue. Bold: L1s with full-length ORF by ORFfinder, underlined: L1s known to be mobile according to previous publications. (c-d) genome browser view of CELLO-seq reads overlapping young L1s (c) in mouse or (d) human. Arrows show direction of transcription of each L1 element.

Supplementary information

Reporting Summary

Supplementary Table 1

Median accuracy and read number per coverage of error-corrected and deduplicated reads with CELLO-seq.

Supplementary Table 2

Specificity score of L1 elements used in this study. We used only L1 elements with specificity score >80%.

Supplementary Table 3

Reads overlapping young L1 elements for each cell.

Supplementary Table 4

Information of mobility of young L1 elements transcribed according to CELLO-seq.

Supplementary Table 5

Oligonucleotide sequences used in this manuscript.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Berrens, R.V., Yang, A., Laumer, C.E. et al. Locus-specific expression of transposable elements in single cells with CELLO-seq. Nat Biotechnol (2021). https://doi.org/10.1038/s41587-021-01093-1

Download citation

Search

Quick links