Locus-specific expression of transposable elements in single cells with CELLO-seq

Berrens, Rebecca V.; Yang, Andrian; Laumer, Christopher E.; Lun, Aaron T. L.; Bieberich, Florian; Law, Cheuk-Ting; Lan, Guocheng; Imaz, Maria; Bowness, Joseph S.; Brockdorff, Neil; Gaffney, Daniel J.; Marioni, John C.

doi:10.1038/s41587-021-01093-1

Article
Published: 15 November 2021

Locus-specific expression of transposable elements in single cells with CELLO-seq

Nature Biotechnology volume 40, pages 546–554 (2022)Cite this article

11k Accesses
25 Citations
108 Altmetric
Metrics details

Subjects

Abstract

Transposable elements (TEs) regulate diverse biological processes, from early development to cancer. Expression of young TEs is difficult to measure with next-generation, single-cell sequencing technologies because their highly repetitive nature means that short complementary DNA reads cannot be unambiguously mapped to a specific locus. Single CELl LOng-read RNA-sequencing (CELLO-seq) combines long-read single cell RNA-sequencing with computational analyses to measure TE expression at unique loci. We used CELLO-seq to assess the widespread expression of TEs in two-cell mouse blastomeres as well as in human induced pluripotent stem cells. Across both species, old and young TEs showed evidence of locus-specific expression with simulations demonstrating that only a small number of very young elements in the mouse could not be mapped back to the reference with high confidence. Exploring the relationship between the expression of individual elements and putative regulators revealed large heterogeneity, with TEs within a class showing different patterns of correlation and suggesting distinct regulatory mechanisms.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: CELLO-seq overview and its ability to aid in the study of allelic and isoform expression.**

**Fig. 2: CELLO-seq enables TE-derived isoform and TE expression analysis in single cells at single loci.**

**Fig. 3: Simulations characterizing the mapping of young L1 in the mouse and human genome.**

**Fig. 4: CELLO-seq enables the study of young TEs at unique loci.**

Identifying transposable element expression dynamics and heterogeneity during development at the single-cell level with a processing pipeline scTE

Article Open access 05 March 2021

SoloTE for improved analysis of transposable elements in single-cell RNA-Seq data using locus-specific expression

Article Open access 06 October 2022

Measuring and interpreting transposable element expression

Article 23 June 2020

Data availability

The datasets generated during the current study are available under ArrayExpress accession E-MTAB-9577. We analyzed two-cell RNA-seq data from GSE97778, GSE66390, GSE76687 and GSE71434; ATAC–seq from GSE76642 and GSE66390; H3K9me3 data from GSE97778; H3K4me3 from GSE73952, GSE76687 and GSE71434; H3K27me3 from GSE73952 and GSE76687; and whole-genome bisulfite data from GSE97778 and E-MTAB-9090. We analyzed hiPSC RNA-seq data from GSE47626 and GSE56568; H3K4me3, H3K9me3, H3K27me3 and whole-genome bisulfite data from GSE16265; and H3K4me3 from GSE16256.

Code availability

For data analysis the code is available in the following GitHub repositories: https://github.com/MarioniLab/CELLOseq, https://github.com/MarioniLab/sarlacc and https://github.com/MarioniLab/long_read_simulations.

References

Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).
Article CAS PubMed Google Scholar
Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).
Article CAS PubMed PubMed Central Google Scholar
Boroviak, T. et al. Single cell transcriptome analysis of human, marmoset and mouse embryos reveals common and divergent features of preimplantation development. Development 145, dev167833 (2018).
Article CAS PubMed PubMed Central Google Scholar
Brocks, D., Chomsky, E., Mukamel, Z., Lifshitz, A. & Tanay, A. Single cell analysis reveals dynamics of transposable element transcription following epigenetic de-repression. Preprint at bioRxiv https://doi.org/10.1101/462853 (2019).
Ge, S. X. Exploratory bioinformatics investigation reveals importance of “junk” DNA in early embryo development. BMC Genomics 18, 200 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hackett, J. A., Kobayashi, T., Dietmann, S. & Surani, M. A. Activation of lineage regulators and transposable elements across a pluripotent spectrum. Stem Cell Rep. 8, 1645–1658 (2017).
Article CAS Google Scholar
Huang, Y. et al. Stella modulates transcriptional and endogenous retrovirus programs during maternal-to-zygotic transition. eLife 6, e22345 (2017).
Article PubMed PubMed Central Google Scholar
Zhang, W. et al. Zscan4c activates endogenous retrovirus MERVL and cleavage embryo genes. Nucleic Acids Res. 47, 8485–8501 (2019).
CAS PubMed PubMed Central Google Scholar
Sexton, C. E. & Han, M. V. Paired-end mappability of transposable elements in the human genome. Mob. DNA https://www.researchgate.net/publication/334375540_Paired-end_mappability_of_transposable_elements_in_the_human_genome (2019).
Faulkner, G. J. et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics 91, 281–288 (2008).
Article CAS PubMed Google Scholar
Lanciano, S. & Cristofari, G. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 21, 721–736 (2020).
Deininger, P. et al. A comprehensive approach to expression of L1 loci. Nucleic Acids Res. 45, e31 (2017).
Philippe, C. et al. Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci. eLife 5, e13926 (2016).
Article CAS PubMed PubMed Central Google Scholar
Macia, A. et al. Epigenetic control of retrotransposon expression in human embryonic stem cells. Mol. Cell Biol. 31, 300–316 (2011).
Article CAS PubMed Google Scholar
Garcia-Perez, J. L., Widmann, T. J. & Adams, I. R. The impact of transposable elements on mammalian development. Development 143, 4101–4114 (2016).
Article CAS PubMed Google Scholar
Chinwalla, A. T. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Article CAS PubMed Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS PubMed Google Scholar
Craig, N. L. et al. (eds) Mobile DNA III 3rd edn (ASM Press, 2015).
Deininger, P. L., Batzer, M. A., Hutchison, C. A. & Edgell, M. H. Master genes in mammalian repetitive DNA amplification. Trends Genet. 8, 307–311 (1992).
Article CAS PubMed Google Scholar
Griffiths, D. J. Endogenous retroviruses in the human genome sequence. Genome Biol. 2, reviews 1017.1–1017.5 (2001).
Smit, A. F. A., Tóth, G., Riggs, A. D. & Jurka, J. Ancestral, mammalian-wide subfamilies of LINE-1 repetitive sequences. J. Mol. Biol. 246, 401–417 (1995).
Article CAS PubMed Google Scholar
Ribet, D. et al. Murine endogenous retrovirus MuERV-L is the progenitor of the “orphan” epsilon viruslike particles of the early mouse embryo. J. Virol. 82, 1622–1625 (2008).
Article CAS PubMed Google Scholar
Walsh, C. P., Chaillet, J. R. & Bestor, T. H. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat. Genet. 20, 116–117 (1998).
Article CAS PubMed Google Scholar
Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).
Article CAS PubMed Google Scholar
Berrens, R. V. et al. An endosiRNA-based repression mechanism counteracts transposon activation during global DNA demethylation in embryonic stem cells. Cell Stem Cell 21, 694–703.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).
Article CAS PubMed Google Scholar
Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lebrigand, K., Magnone, V., Barbry, P. & Waldmann, R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nat. Commun. 11, 4025 (2020).
Article CAS PubMed PubMed Central Google Scholar
Peaston, A. E. et al. Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev. Cell 7, 597–606 (2004).
Article CAS PubMed Google Scholar
Fadloun, A. et al. Chromatin signatures and retrotransposon profiling in mouse embryos reveal regulation of LINE-1 by RNA. Nat. Struct. Mol. Biol. 20, 332–338 (2013).
Article CAS PubMed Google Scholar
Van der Verren, S. E. et al. A dual-constriction biological nanopore resolves homonucleotide sequences with high fidelity. Nat. Biotechnol. 38, 1415–1420 (2020).
Karst, S. M. et al. Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
Hoang, M. L. et al. Genome-wide quantification of rare somatic mutations in normal human tissues using massively parallel sequencing. Proc. Natl Acad. Sci. USA 113, 9846–9851 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mincarelli, L., Uzun, V., Rushworth, S. A., Haerty, W. & Macaulay, I. C. Combined single-cell gene and isoform expression analysis in haematopoietic stem and progenitor cells. Preprint at bioRxiv https://doi.org/10.1101/2020.04.06.027474 (2020).
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Article CAS PubMed Google Scholar
Streeter, I. et al. The human-induced pluripotent stem cell initiative—data resources for cellular genetics. Nucleic Acids Res. 45, D691–D697 (2017).
Article CAS PubMed Google Scholar
Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).
Article CAS PubMed PubMed Central Google Scholar
Faulkner, G. J. et al. The regulated retrotransposon transcriptome of mammalian cells. Nat. Genet. 41, 563–571 (2009).
Article CAS PubMed Google Scholar
Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).
Article CAS PubMed PubMed Central Google Scholar
Klawitter, S. et al. Reprogramming triggers endogenous L1 and Alu retrotransposition in human induced pluripotent stem cells. Nat. Commun. 7, 10286 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wissing, S. et al. Reprogramming somatic cells into iPS cells activates LINE-1 retroelement mobility. Hum. Mol. Genet. 21, 208–218 (2012).
Article CAS PubMed Google Scholar
Wick, R. R. Badread: simulation of error-prone long reads. J. Open Source Softw. 4, 1316 (2019).
Article Google Scholar
Wang, C. et al. Reprogramming of H3K9me3-dependent heterochromatin during mammalian embryo development. Nat. Cell Biol. 20, 620–631 (2018).
Article CAS PubMed Google Scholar
Schöpp, T. et al. TEX15 is an essential executor of MIWI2-directed transposon DNA methylation and silencing. Nat. Commun. 11, 3739 (2020).
Article CAS PubMed PubMed Central Google Scholar
Park, S.-J., Shirahige, K., Ohsugi, M. & Nakai, K. DBTMEE: a database of transcriptome in mouse early embryos. Nucleic Acids Res. 43, D771–D776 (2015).
Article CAS PubMed Google Scholar
Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ewing, A. D. et al. Nanopore sequencing enables comprehensive transposable element epigenomic profiling. Mol. Cell 80, 915–928 (2020).
Brouha, B. et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl Acad. Sci. USA 100, 5280–5285 (2003).
Article CAS PubMed PubMed Central Google Scholar
Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gardner, E. J. et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pitkänen, E. et al. Frequent L1 retrotranspositions originating from TTC28 in colorectal cancer. Oncotarget 5, 853–859 (2014).
Article PubMed PubMed Central Google Scholar
Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 52, 306–319 (2020).
Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science 345, 1251343 (2014).
Article CAS PubMed PubMed Central Google Scholar
Okae, H. et al. Genome-wide analysis of DNA methylation dynamics during early human development. PLoS Genet. 10, e1004868 (2014).
Article CAS PubMed PubMed Central Google Scholar
Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
Article CAS PubMed Google Scholar
Volden, R. & Vollmers, C. Highly multiplexed single-cell full-length cDNA sequencing of human immune cells with 10X Genomics and R2C2. Preprint at bioRxiv https://doi.org/10.1101/2020.01.10.902361 (2021).
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Article CAS PubMed PubMed Central Google Scholar
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Article CAS PubMed Google Scholar
Hennig, B. P. et al. Large-scale low-cost NGS library preparation using a robust Tn5 purification and tagmentation protocol. G3 (Bethesda) 8, 79–89 (2018).
Article CAS Google Scholar
Picelli, S. et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 24, 2033–2040 (2014).
Article CAS PubMed PubMed Central Google Scholar
BBMap. SourceForge https://sourceforge.net/projects/bbmap/ (2021).
Babraham Bioinformatics. Trim Galore http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ (2019).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Smith, T. S., Heger, A. & Sudbery, I. UMI-tools: modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020).
Article CAS PubMed Google Scholar
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article PubMed PubMed Central Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wick, R. rrwick/Porechop https://github.com/rrwick/Porechop (2020).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker https://www.repeatmasker.org/faq.html (1996).
Kent, W. J. et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS PubMed PubMed Central Google Scholar
Lun, A. T. L. et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 63 (2019).
Article PubMed PubMed Central Google Scholar
Zhang, B. et al. Allelic reprogramming of the histone modification H3K4me3 in early mammalian development. Nature 537, 553–557 (2016).
Article CAS PubMed Google Scholar
Wu, J. et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534, 652–657 (2016).
Article CAS PubMed Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Zheng, H. et al. Resetting epigenetic memory by reprogramming of histone modifications in mammals. Mol. Cell 63, 1066–1079 (2016).
Article CAS PubMed Google Scholar
Liu, X. et al. Distinct features of H3K4me3 and H3K27me3 chromatin domains in pre-implantation embryos. Nature 537, 558–562 (2016).
Article CAS PubMed Google Scholar
Marchetto, M. C. N. et al. Differential LINE-1 regulation in pluripotent stem cells of humans and other great apes. Nature 503, 525–529 (2013).
Article CAS PubMed PubMed Central Google Scholar
Liu, Q. et al. Genome-wide temporal profiling of transcriptome and open chromatin of early cardiomyocyte differentiation derived from hiPSCs and hESCs. Circ. Res. 121, 376–391 (2017).
Article CAS PubMed PubMed Central Google Scholar
Guenther, M. G. et al. Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. Cell Stem Cell 7, 249–257 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, D. L. et al. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31, 28–33 (2003).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the Sanger and WIMM cytometry core facility for sorting of hiPSCs. We thank the Sanger, CRUK and WIMM sequencing facility for sequencing NGS data. We thank the WIMM single-cell facility for generating the 10X data. We thank V. Sundaram for fruitful discussions. We thank P. Gould, W. Reik and D. O’Carroll for helpful comments on the manuscript. This research was supported by a Sir Henry Wellcome Fellowship to R.V.B. (no. 213612), an EBPOD Fellowship to A.Y., an HFSP Long Term Fellowship to C.E.L., support from Cancer Research UK (CRUK) (C9545/A29580) and Core support from EMBL to J.C.M. and a Wellcome Trust Fellowship to J.S.B.

Author information

Aaron T. L. Lun
Present address: Genentech, South San Francisco, CA, USA
Florian Bieberich
Present address: ETH Zürich, Basel, Switzerland
Cheuk-Ting Law
Present address: Department of Pathology, Li Ka Shing Faculty of Medicine, University of Hong Kong, Pok Fu Lam, Hong Kong
Guocheng Lan
Present address: School of Biomedical Sciences,Stem Cell and Regenerative Consortium, Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, University of Hong Kong, Pok Fu Lam, Hong Kong
Maria Imaz
Present address: Division of Cardiovascular Medicine, University of Cambridge, Cambridge, UK
Daniel J. Gaffney
Present address: Genomics Plc, Oxford, UK
These authors contributed equally: Andrian Yang, Christopher E. Laumer.

Authors and Affiliations

Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
Rebecca V. Berrens, Aaron T. L. Lun, Florian Bieberich, Cheuk-Ting Law, Guocheng Lan & John C. Marioni
Developmental Epigenetics, Department of Biochemistry, University of Oxford, Oxford, UK
Rebecca V. Berrens, Joseph S. Bowness & Neil Brockdorff
European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
Andrian Yang, Christopher E. Laumer & John C. Marioni
Wellcome Sanger Institute, Cambridge, UK
Maria Imaz, Daniel J. Gaffney & John C. Marioni

Authors

Rebecca V. Berrens
View author publications
You can also search for this author in PubMed Google Scholar
Andrian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Christopher E. Laumer
View author publications
You can also search for this author in PubMed Google Scholar
Aaron T. L. Lun
View author publications
You can also search for this author in PubMed Google Scholar
Florian Bieberich
View author publications
You can also search for this author in PubMed Google Scholar
Cheuk-Ting Law
View author publications
You can also search for this author in PubMed Google Scholar
Guocheng Lan
View author publications
You can also search for this author in PubMed Google Scholar
Maria Imaz
View author publications
You can also search for this author in PubMed Google Scholar
Joseph S. Bowness
View author publications
You can also search for this author in PubMed Google Scholar
Neil Brockdorff
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Gaffney
View author publications
You can also search for this author in PubMed Google Scholar
John C. Marioni
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.V.B. conceived, designed, executed and analyzed all experiments. A.Y. performed simulations and analyzed data. G.L. performed embryo collection. A.T.L.L. wrote the computational method sarlacc with the help of C.-T.L. and F.B. J.S.B. and N.B. performed and advised on ScNaUMI-seq experiments. M.I. and D.J.G. provided hiPSCs. C.E.L. helped conceive this study, and helped to design and optimize the CELLO-seq protocol. J.C.M. conceived and supervised this study. J.M. and R.V.B. wrote the final version of the manuscript. All authors commented on the final manuscript.

Corresponding authors

Correspondence to Rebecca V. Berrens or John C. Marioni.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks Geoffrey Faulkner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 CELLO-seq library properties.

Boxplots of counts across the 50 highest expressed young L1 unique loci in mouse ES cells cultured in 2i medium measured by (a) ScNaUmi-seq or (b) CELLO-seq. (c) histogram of number of UMIs (y-axis) by UMI group size (x-axis) for 2-cell blastomeres CELLO-seq data sequenced on a MinION flow cell or Illumina platform. (d) Light field microscopy image of two-cell embryo blastomere isolation. Two-cell embryo (i) with zona pellucida; (ii) without zona pellucida; (iii) as single blastomeres. This experiment was repeated more than 20 times. Scale bar 25μm. (e) bargraph of read numbers for mouse 2-cell embryo dataset (right) and human iPSCs (left). (f) density plot of number of molecules (y-axis) by length of mapped molecules (x-axis) for hiPSCs and mouse blastomeres. (g) scatter plot of number of reads (y-axis) versus number of genes in hiPSCs from Smart-seq2 libraries sequenced by Illumina. (h) schematic of sarlacc workflow. We demultiplexed samples by grouping barcodes with a Levenshtein distance below the grouping threshold. We performed pregrouping by mapping the reads to the relevant transcriptome. We grouped the reads by UMI sequence and error corrected the reads in the true UMI group, or by picking a random read from the UMI group in deduplication mode. For this study we used error corrected reads. (i) barplot of fraction of reads (y-axis) and their relative position on a transcript (x-axis) from the start or the end of the molecule depending on the gene length. (j) Scatter plot of short read (y-axis) versus long read (x-axis) gene expression depending on the length of the gene. (k) Scatter plot of ERCC concentration (y-axis) and ERCC molecules (x-axis) of mouse blastomere CELLO-seq data. (l) Scatter plot of ERCC concentration (y-axis) and ERCC molecules (x-axis) from mouse blastomere CELLO-seq libraries with Illumina sequencing. For J-L Pearson Correlation coefficient (R) and two-sided p-value shown.

Extended Data Fig. 2 Isoform analysis.

(a) expression of TE derived-isoforms in human iPSCs and mouse 2-cell data stratified by whether a repeat acts as a transcript end site (TES) or as a transcript start site (TSS). Mouse: TES (n = 353), TSS (n = 76). Human: TES (n = 537), TSS (n = 73). (B) number of TE derived-isoforms in hiPSCs and mouse blastomeres with repeat as TES or TSS. (c) Barplot of repeat family underlying repeat derived-isoforms in hiPSCs. (d) Barplot of repeat family underlying repeat derived-isoforms in mouse blastomeres. (e) frequency plot of number of repeats (x-axis) by age of TE (mya) (y-axis). TEs that mapped to hiPSCs (top), all repeats from human UCSC repeatmasker annotation (bottom). TEs are grouped by class and color-coded by family. (f) boxplots of age (mya) of TEs (y-axis) of young human L1s either detected or not detected in hiPSC dataset. detected: L1HS (n = 15), L1PA2 (n = 34), L1PA3 (n = 34), L1PA4 (n = 20), L1PA5 (n = 18), L1PA6 (n = 17), not detected: L1HS (n = 1692), L1PA2 (n = 5148), L1PA3 (n = 11194), L1PA4 (n = 12471), L1PA5 (n = 11735), L1PA6 (n = 6195). (g) boxplots of age (mya) of TEs (y-axis) of young mouse L1s either detected or not detected in mouse blastomeres. detected: L1MdA (n = 27), L1MdT (n = 44), L1MdF3 (n = 8), L1MdF2 (n = 43), L1MdGf (n = 1), L1MdF (n = 5), not detected: L1MdA (n = 16817), L1MdT (n = 23644), L1MdF3 (n = 16138), L1MdF2 (n = 64855), L1MdGf (n = 1079), L1MdF (n = 4011). The boxplots in A,F and G show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 3 Simulation of correct mapping of young L1 in mouse and human genome.

(a) bargraph showing the number of reads (y-axis) by the simulation type with either 1x or 10x coverage (x-axis), color-coded by alignment type with mapped = read at correct location after mapping with minimap2 to the genome, mismapped = read maps at wrong location, unmapped = read not mapped, unresolved = group has more than one molecule present and group cannot be resolved to a unique read. mouse (left), human (right). (b) bargraph of proportion of read group sizes (y-axis) by alignment type (x-axis), left showing 1x read coverage, right showing 10x read coverage. Color-coded by group size. mouse (top), human (bottom). (c) Stacked bargraph showing proportion of L1 elements (y-axis) by simulation type using 10x read coverage (x-axis), colored by specificity score, mouse (left), human (right). (d) Jitter plot of TE subfamily (y-axis) by TE age (million years ago) grouped by simulation type and coloured by % of mapped reads with yellow being 0% mapped and dark blue being 100% mapped. Mouse L1 top panel and human L1 bottom panel. Simulation type: perfect = perfect read identity, ONT = ONT read identity, ONT 5x = ONT read identity with 5x coverage, sarlacc corrected 5x = ONT read identity score, 5x coverage with sarlacc error correction, sarlacc corrected 10x = ONT read identity score, 10x coverage with sarlacc error correction, sarlacc deduplicated 5x = ONT read identity score, 5x coverage with sarlacc deduplication by randomly choosing 1 read. PG = perfect grouping.

Extended Data Fig. 4 UMI simulations.

(a) Distribution of Levenshtein distance between randomly simulated UMI (x-axis) based on UMI length with RYN pattern (left) or NNN pattern (right). Light grey bar shows distance threshold for grouping of reads by UMIs used for most short read UMIs or CELLO-seq. (b) Line graph of fraction of pure groups (y-axis) by Levenshtein distance (x-axis) by UMI group, either with perfect read identity or ONT read identity. On the left is the line graph of UMI simulations without any pregrouping by mapping. On the right the line graph is UMI simulation where pregrouping was performed by random assignment of true UMI sequences into groups of 100 unique UMIs. (c) distribution plot of UMI group sizes (x-axis) by Levenshtein distance threshold (y-axis) based on UMI length, with perfect ONT read identity and no pregrouping (left) or pregrouping (right).

Extended Data Fig. 5 CELLO-seq to study locus specific TE expression.

(a) Heatmap of expression of all SINE elements in mouse blastomeres, with rows clustered by SINE family and colour-coded by TE subfamily. (b) Heatmap of expression of full-length (>5000nt) elements in mouse blastomeres, with rows clustered by TE family and color-coded by TE subfamily. (c) Heatmap of logcounts of highest expressed (mean expression > 1) elements in hiPSCs with rows clustered by TE subfamily. (d) Boxplot of percentage of reads mapped to TEs or TE families in CELLO-seq mouse 2-cells. P-value: L1Md to SINE B1/B2 = 0.004998, L1Md to MERVL = 0.004998, 2-sided Wilcoxon rank sum test. n = 6 cells. (e) Boxplot of percentage of TEs expressed by number of TEs in the genome in CELLO-seq mouse blastomeres. p-value: repeats to L1Md = 0.0022, repeats to SINE B1/B2 = 0.0022, repeats to MERVL = 0.0022, 2-sided Wilcoxon rank sum test. n = 6 cells. (f) boxplot of number of MERVL elements expressed in each cell of CELLO-seq 2-cells compared to published short read data. CELLO-seq (n = 6 cells), bulk (n = 7 independent experiments). (g) boxplot of number of HERVH-int elements expressed in each cell of CELLO-seq compared to published short read data. CELLO-seq (n = 96 cells), bulk (n = 10 independent experiments). (h) expression, methylation, ATAC-seq and ChIP-seq of MERVL elements with read counts in CELLO-seq libraries compared to MERVL elements with no counts in CELLO-seq libraries. expressed (n = 355 MERVLs), not expressed (n = 41 MERVLs), datasets: ATAC-seq (n = 1), DNA methylation (n = 2), H3K27me3 (n = 1), H3K4me3 (n = 1), H3K9me3 (n = 1), RNAseq (n = 3). (i) expression, methylation, ATAC-seq and ChIP-seq data of HERV-int elements with read counts in CELLO-seq libraries compared to HERVH-int with no counts in CELLO-seq libraries. expressed (n = 14 HERVH-ints), not expressed (n = 110 HERVH-ints), each dataset (n = 1), RNAseq (n = 3). The boxplots shown in D-I show the median, first and third quartiles as a box, and the whiskers indicate the most extreme data point within 1.5 lengths of the box.

Extended Data Fig. 6 CELLO-seq to study locus specific young L1 expression.

(a) DNA methylation of L1Md elements expressed in CELLO-seq mouse blastomeres. Methylation level of L1Mds across preimplantation development and in spermatogonia. Bold: L1s with full-length ORF by ORFfinder. (b) Methylation level of L1Mds across early development in human iPS cells as well as in tumour and normal tissue. Bold: L1s with full-length ORF by ORFfinder, underlined: L1s known to be mobile according to previous publications. (c-d) genome browser view of CELLO-seq reads overlapping young L1s (c) in mouse or (d) human. Arrows show direction of transcription of each L1 element.

Supplementary information

Reporting Summary

Supplementary Table 1

Median accuracy and read number per coverage of error-corrected and deduplicated reads with CELLO-seq.

Supplementary Table 2

Specificity score of L1 elements used in this study. We used only L1 elements with specificity score >80%.

Supplementary Table 3

Reads overlapping young L1 elements for each cell.

Supplementary Table 4

Information of mobility of young L1 elements transcribed according to CELLO-seq.

Supplementary Table 5

Oligonucleotide sequences used in this manuscript.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berrens, R.V., Yang, A., Laumer, C.E. et al. Locus-specific expression of transposable elements in single cells with CELLO-seq. Nat Biotechnol 40, 546–554 (2022). https://doi.org/10.1038/s41587-021-01093-1

Download citation

Received: 18 September 2020
Accepted: 13 September 2021
Published: 15 November 2021
Issue Date: April 2022
DOI: https://doi.org/10.1038/s41587-021-01093-1

This article is cited by

Jump-starting life: balancing transposable element co-option and genome integrity in the developing mammalian embryo
- Marlies E Oomen
- Maria-Elena Torres-Padilla
EMBO Reports (2024)
Activation of human endogenous retroviruses and its physiological consequences
- Nicholas Dopkins
- Douglas F. Nixon
Nature Reviews Molecular Cell Biology (2024)
Towards targeting transposable elements for cancer therapy
- Yonghao Liang
- Xuan Qu
- Ting Wang
Nature Reviews Cancer (2024)
Regulation and function of transposable elements in cancer genomes
- Michael Lee
- Syed Farhan Ahmad
- Jian Xu
Cellular and Molecular Life Sciences (2024)
capTEs enables locus-specific dissection of transcriptional outputs from reference and nonreference transposable elements
- Xuemei Li
- Keying Lu
- Dan Xie
Communications Biology (2023)