Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Sibbesen, Jonas A.; Eizenga, Jordan M.; Novak, Adam M.; Sirén, Jouni; Chang, Xian; Garrison, Erik; Paten, Benedict

doi:10.1038/s41592-022-01731-9

Article
Published: 16 January 2023

Haplotype-aware pantranscriptome analyses using spliced pangenome graphs

Nature Methods volume 20, pages 239–247 (2023)Cite this article

8562 Accesses
7 Citations
214 Altmetric
Metrics details

Subjects

Abstract

Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our toolchain, which consists of additions to the VG toolkit and a standalone tool, RPVG, can construct spliced pangenome graphs, map RNA sequencing data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. We show that this workflow improves accuracy over state-of-the-art RNA sequencing mapping methods, and that it can efficiently quantify haplotype-specific transcript expression without needing to characterize the haplotypes of a sample beforehand.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Diagram of haplotype-aware transcriptome analysis pipeline.**

**Fig. 2: Mapping benchmark using RNA-seq data from NA12878.**

**Fig. 3: HST quantification benchmark using RNA-seq data from NA12878.**

**Fig. 4: HLA typing and allele concordance evaluation using RNA-seq data from trios and different tissues.**

**Fig. 5: Exploratory demonstration of analyzing genomic imprinting using data from NA12878 lymphoblastoid cell line.**

Pangenome graph construction from genome alignments with Minigraph-Cactus

Article 10 May 2023

Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets

Article Open access 16 September 2020

GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

Article Open access 27 November 2019

Data availability

All data used in this study are available at https://github.com/jonassibbesen/vgrna-project-paper. Data that are available from public repositories are provided as web links only. Accession numbers are included when relevant, and accession numbers for sequencing data are also listed in Supplementary Table 4. The repository also includes all spliced pangenome graphs and pantranscriptome haplotype-specific transcript sets, which may be freely used in other projects. Mapping benchmark tables and haplotype-specific expression estimates are archived in Zenodo (https://doi.org/10.5281/zenodo.7234454).

Code availability

The source code for VG and RPVG is publicly available at https://github.com/vgteam/vg and https://github.com/jonassibbesen/rpvg, respectively. Both tools are licensed under the MIT License. A full list of the versions of all computational tools used is available in Supplementary Table 6. All bash scripts with exact command-lines used to generate the results are available at https://github.com/jonassibbesen/vgrna-project-paper. This repository also includes the custom C++, Python, and R scripts used for analysis and plotting, together with references to Docker containers and log files from the analyses.

References

Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinform. 12, 1–16 (2011).
Article Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS PubMed Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS PubMed PubMed Central Google Scholar
Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).
Article CAS PubMed PubMed Central Google Scholar
Eizenga, J. M. et al. Pangenome graphs. Annu. Rev. Genomics Hum. Gen. 21, 139–162 (2020).
Article CAS Google Scholar
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rakocevic, G. et al. Fast and accurate genomic analyses using genome graphs. Nat. Genetics 51, 354–362 (2019).
Article CAS PubMed Google Scholar
Hickey, G. et al. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biol. 21, 1–17 (2020).
Article Google Scholar
Sibbesen, J. A., Maretty, L. & Krogh, A. Accurate genotyping across variant classes and lengths using variant graphs. Nat. Genet. 50, 1054–1059 (2018).
Article CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rautiainen, M. et al. AERON: Transcript quantification and gene-fusion detection using long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.01.27.921338 (2020).
Rautiainen, M. & Marschall, T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 21, 1–28 (2020).
Article Google Scholar
Denti, L. et al. ASGAL: aligning RNA-seq data to a splicing graph to detect novel alternative splicing events. BMC Bioinform. 19, 1–21 (2018).
Article Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zink, F. et al. Insights into imprinting from parent-of-origin phased methylomes and transcriptomes. Nat. Genet. 50, 1542–1552 (2018).
Article CAS PubMed Google Scholar
Castek, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).
Article Google Scholar
Van De Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Article PubMed PubMed Central Google Scholar
Rozowsky, J. et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Sys. Biol. 7, 522 (2011).
Article Google Scholar
Raghupathy, N. et al. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics 34, 2177–2184 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lee, W., Plant, K., Humburg, P. & Knight, J. C. AltHapAlignR: improved accuracy of RNA-seq analyses through the use of alternative haplotypes. Bioinformatics 34, 2401–2408 (2018).
Article CAS PubMed PubMed Central Google Scholar
Aguiar, V. R. C., César, J., Delaneau, O., Dermitzakis, E. T. & Meyer, D. Expression estimation and eQTL mapping for HLA genes with a personalized pipeline. PLoS Genet. 15, e1008091 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sirén, J., Garrison, E., Novak, A. M., Paten, B. & Durbin, R. Haplotype-aware graph indexes. Bioinformatics 36, 400–407 (2020).
Article PubMed Google Scholar
Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. Preprint at bioRxiv https://doi.org/10.1101/672931 (2020).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Article CAS PubMed Google Scholar
Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015).
Article Google Scholar
Consortium, T. E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article Google Scholar
Davis, C. A. et al. The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2017).
Article PubMed Central Google Scholar
Berger, K., Somineni, H., Prince, J., Kugathasan, S. & Gibson, G. Altered splicing associated with the pathology of inflammatory bowel disease. Hum. Genomics 15, 1–10 (2021).
Article Google Scholar
Micheletti, S. J. et al. Genetic consequences of the transatlantic slave trade in the Americas. Am. J. Hum. Genet. 107, 265–277 (2020).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. et al. IPD-IMGT/HLA database. Nucleic Acids Res. 48, D948–D955 (2020).
CAS PubMed Google Scholar
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Article PubMed PubMed Central Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gourraud, P.-A. et al. HLA diversity in the 1000 Genomes dataset. PloS ONE 9, e97282 (2014).
Article PubMed PubMed Central Google Scholar
Abi-Rached, L. et al. Immune diversity sheds light on missing variation in worldwide genetic diversity panels. PloS ONE 13, e0206512 (2018).
Article PubMed PubMed Central Google Scholar
Orenbuch, R. et al. arcasHLA: high-resolution HLA typing from RNAseq. Bioinformatics 36, 33–40 (2019).
Article PubMed Central Google Scholar
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Maretty, L. et al. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature 548, 87–91 (2017).
Article CAS PubMed Google Scholar
Baran, Y. et al. The landscape of genomic imprinting across diverse adult human tissues. Genome Res. 25, 927–936 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jadhav, B. et al. RNA-seq in 296 phased trios provides a high-resolution map of genomic imprinting. BMC Biol. 17, 1–20 (2019).
Article CAS Google Scholar
Nakabayashi, K. et al. Methylation screening of reciprocal genome-wide UPDs identifies novel human-specific imprinted genes. Hum. Mol. Genet. 20, 3188–3197 (2011).
Article CAS PubMed Google Scholar
Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell 182, 162–176 (2020).
Article CAS PubMed Google Scholar
Zhou, Y. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature 606, 527–534 (2022).
Article CAS PubMed PubMed Central Google Scholar
Liao, W.-W. et al. A draft human pangenome reference. Preprint at bioRxiv https://doi.org/10.1101/2022.07.09.499321 (2022).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).
Article PubMed PubMed Central Google Scholar
Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 1–19 (2020).
Article Google Scholar
Hickey, G. et al. Pangenome graph construction from genome alignment with Minigraph-Cactus. Preprint at bioRxiv https://doi.org/10.1101/2022.10.06.511217 (2022).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).
Article Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Eizenga, J. M. et al. Efficient dynamic variation graphs. Bioinformatics 36, 5139–5144 (2020).
Article CAS PubMed Central Google Scholar
Gagie, T., Navarro, G. & Prezza, N. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67, 1–54 (2020).
Article Google Scholar
Sirén, J. Indexing variation graphs. In 2017 Proc. 19th Workshop on Algorithm Engineering and Experiments (ALENEX) 13–27 (SIAM, 2017).
Chang, X., Eizenga, J., Novak, A. M., Sirén, J. & Paten, B. Distance indexing and seed clustering in sequence graphs. Bioinformatics 36, 146–153 (2020).
Article Google Scholar
Paten, B. et al. Superbubbles, ultrabubbles, and cacti. J. Comput. Biol. 25, 649–663 (2018).
Article CAS PubMed PubMed Central Google Scholar
Eades, P., Lin, X. & Smyth, W. F. A fast and effective heuristic for the feedback arc set problem. Inf. Process. Lett. 47, 319–323 (1993).
Article Google Scholar
Lee, C., Grasso, C. & Sharlow, M. F. Multiple sequence alignment using partial order graphs. Bioinformatics 18, 452–464 (2002).
Article CAS PubMed Google Scholar
Burset, M., Seledtsov, I. A. & Solovyev, V. V. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28, 4364–4375 (2017).
Article Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wala, J. & Beroukhim, R. SeqLib: a C++ API for rapid BAM manipulation, sequence alignment and sequence assembly. Bioinformatics 33, 751–753 (2016).
Article PubMed Central Google Scholar
Karlin, S. & Altschul, S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl Acad. Sci. USA 87, 2264–2268 (1990).
Article CAS PubMed PubMed Central Google Scholar
Flecher, C., Allard, D. & Naveau, P. Truncated skew-normal distributions: moments, estimation by weighted moments and application to climatic data. Metron 68, 331–345 (2010).
Article Google Scholar
Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cock, P. J. A. et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Research reported in this publication was supported by the National Human Genome Research Institute of the National Institutes of Health under award numbers U01HG010961, R01HG010485, U41HG010972, U24HG011853, and OT2OD026682 to B.P. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work of J.A.S. was supported by the Carlsberg Foundation. We thank the ENCODE Consortium, the Thomas Gingeras Laboratory (Cold Spring Harbor Laboratory), the Ali Mortazavi Laboratory (University of California Irvine) and the Joe Ecker Laboratory (Salk Institute for Biological Studies) for generating and sharing the ENCODE data used in this study. We would also like to thank M. Dennis (University of California Davis) for generating and providing access to the CHM13 RNA-seq data on behalf of the T2T consortium. Finally, we would like to thank J. Monlong and G. Hickey for feedback on the manuscript, and everybody else in the VG Team.

Author information

These authors contributed equally: Jonas A. Sibbesen and Jordan M. Eizenga.

Authors and Affiliations

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Jonas A. Sibbesen, Jordan M. Eizenga, Adam M. Novak, Jouni Sirén, Xian Chang & Benedict Paten
University of Tennessee Health Science Center, Memphis, TN, USA
Erik Garrison

Authors

Jonas A. Sibbesen
View author publications
You can also search for this author in PubMed Google Scholar
Jordan M. Eizenga
View author publications
You can also search for this author in PubMed Google Scholar
Adam M. Novak
View author publications
You can also search for this author in PubMed Google Scholar
Jouni Sirén
View author publications
You can also search for this author in PubMed Google Scholar
Xian Chang
View author publications
You can also search for this author in PubMed Google Scholar
Erik Garrison
View author publications
You can also search for this author in PubMed Google Scholar
Benedict Paten
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.A.S. and J.M.E. developed software, designed and carried out experiments, analyzed data, and wrote the paper. A.M.N., J.S., X.C., and E.G. contributed to developing the software. B.P. contributed to project conceptualization, supervised the research, and edited the paper.

Corresponding author

Correspondence to Benedict Paten.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Michael Love, Harold Pimentel, and the other, anonymous, reviewer for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Diagram of a multipath alignment.

A diagrammatic comparison between the multipath alignment output of VG MPMAP and the single-path alignment output of other graph aligners (such as VG MAP). a A read and b a sequence graph, which have been colored to indicate which parts of the read could plausibly align to which parts of the graph. c A single-path alignment. The read sequence is aligned to one path from the graph. d A multipath alignment. The alignment can split and rejoin to express the alignment uncertainty to different paths in the graph.

Extended Data Fig. 2 Mapping benchmark for primary alignments using RNA-seq data from NA12878.

Mapping error and recall for VG MPMAP and three other methods using simulated Illumina data. Colored numbers indicate different mapping quality thresholds. Reads are considered correctly mapped if their primary alignments cover 90% of the true reference sequence alignment.

Extended Data Fig. 3 Mapping benchmark stratified by edit distance using RNA-seq data from NA12878.

Mapping recall (a) and error (b) for VG MPMAP and three other methods using simulated Illumina data as a function of edit distance. Unique alignments are primary alignments with a mapping quality of at least 30. Reads are considered correctly mapped if their alignments cover 90% of the true reference sequence alignment.

Extended Data Fig. 4 Mapping benchmark stratified by non-reference variants using RNA-seq data from NA12878.

Mapping error and recall for VG MPMAP and three other methods using simulated Illumina data. Colored numbers indicate different mapping quality thresholds. Reads are considered correctly mapped if one of their multi-alignments covers 90% of the true reference sequence alignment. Reads are stratified into those that a contain no variants, b contain no insertions or deletions (indels) and one single nucleotide variant (SNV), c contain no indels and two SNVs, d contain no indels and three SNVs, e contain no indels and more than three SNVs, and f contain any indels.

Extended Data Fig. 5 Allelic bias benchmark using RNA-seq data from NA12878.

Allelic mapping bias for VG MPMAP and four other methods using simulated Illumina RNA-seq reads, which were simulated without allelic bias. STAR was used as the aligner for the WASP pipeline. The WASP (STAR) pipeline were provided the 1000GP NA12878 haplotypes as input. The number of variant sites with coverage at least 20 is plotted against the observed rate of false positive hypothesis tests of allelic skew (two-sided binomial test, α = 0.01). Coverage was calculated from primary alignments with a mapping quality value of at least 30. The bottom row shows a zoomed view without WASP (STAR).

Extended Data Fig. 6 Haplotype-specific transcript uniqueness in a 1000 Genomes Project.

The fraction of HSTs that are unique to each of the 2504 samples in the 1000 Genomes Project (1000GP) when compared to different subsets of samples in the 1000GP. Left box plots show the fraction unique when comparing to all other samples, middle box plots show the fraction unique when comparing to all other samples excluding the samples’ population, and right box plots show the fraction unique when comparing to all other samples excluding the samples’ super population. AFR: African (n = 661), AMR: Admixed American (n = 347), EAS: East Asian (n = 504), EUR: European (n = 503), SAS: South Asian (n = 489). The horizontal line in the boxes corresponds to the median, and the box bounds (inter-quartile range) to the 25th and 75th percentile. The whiskers extend to the minimum and maximum value, but no further than 1.5 times the inter-quartile range from the box bounds. Values outside the whiskers are displayed as points.

Extended Data Fig. 7 Allele-specific expression benchmark using RNA-seq data from NA12878.

Allele-specific expression (ASE) results comparing the MPMAP-RPVG pipeline against WASP (with STAR as the aligner) using simulated data. Shows true positive rate and false positive rate of ASE significance for different thresholds of variant read count in the simulated data. Variants were defined as showing significant ASE using a two-sided binomial test of the allele-specific read counts with p-values adjusted using the Benjamini-Hochberg procedure and a False Discovery Rate (FDR) α = 0.1. All heterozygotic NA12878 variants from the 1000 Genomes Project (1000GP) with at least one read in the simulated data were used for the benchmark. For the MPMAP-RPVG pipeline, we used the personal transcriptome generated from the 1000GP NA12878 haplotypes (Supplementary Table 3). WASP was provided the 1000GP NA12878 haplotypes as input. Note, we only used WASP for bias correction and allele-specific read counting, and not its downstream inference method.

Extended Data Fig. 8 Proportion of marginal expression attributed to ≤2 HSTs of a transcript.

For an African American individual (left) and a European American individual (right), the proportion of transcripts for which the marginal expression has at least X proportion assigned to ≤2 HSTs is shown for various values of X. Colors correspond to different thresholds on the proportion of marginal expression. A pantranscriptome generated from all 1000 Genomes Project haplotypes were used for the evaluation (“Whole” in Supplementary Table 3). Transcripts with fewer than 1 inferred read are omitted.

Extended Data Fig. 9 Multipath alignment benchmark using RNA-seq data from NA12878.

Haplotype-specific transcript (HST) quantification results comparing RPVG with single-path and multipath alignments from VG MPMAP and VG MAP as input using simulated and real Illumina data. For details on the pantranscriptomes used see Supplementary Table 3. The VG MPMAP single-path alignments were created by finding the best scoring path in each multipath alignment. a Recall and precision of whether a transcript is correctly assigned nonzero expression for different expression value thresholds (colored numbers for “Whole (excl. CEU)” pantranscriptome) using simulated data. Expression is measured in transcripts per million (TPM). b Mean absolute relative expression difference (MARD) between simulated and estimated expression (in TPM) for different pantranscriptomes using simulated data. MARD was calculated using either all HSTs in the pantranscriptome (solid bars) or using only the NA12878 HSTs (shaded bars). c Number of expressed transcripts from NA12878 haplotypes shown against the number from non-NA12878 haplotypes for different expression value thresholds (colored numbers) using real data. d Fraction of transcript expression (in TPM) assigned to NA12878 haplotypes for different pantranscriptomes using simulated (left) and real (right) data.

Extended Data Fig. 10 Examples of allele expression concordance across tissues.

A set of examples showing allele concordance across tissues using two different variant expression thresholds. Only three tissues are used in the example for simplicity. Blue and orange bars correspond to reference and alternative allele expression, respectively. Variant expression is calculated as the sum of the two alleles. An allele is defined as concordant if it is either consistently expressed or consistently not expressed across all tissues for which the corresponding variant is expressed. Using this definition all alternative alleles except for the allele in variant 2 are defined as concordant when the minimum variant expression threshold is set to 0. If the variant expression threshold is increased to 3, the alternative allele in variant 2 becomes concordant since tissue 2 will be filtered for this variant. Moreover, variant 4 will be excluded due to tissue 3 being filtered since at least two expressed tissues are needed to compute concordance.

Supplementary information

Supplementary Information

Supplementary Figs 1–18, Supplementary Tables 1–6, Supplementary Note, Supplementary Algorithms 1–8.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sibbesen, J.A., Eizenga, J.M., Novak, A.M. et al. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. Nat Methods 20, 239–247 (2023). https://doi.org/10.1038/s41592-022-01731-9

Download citation

Received: 18 June 2021
Accepted: 28 November 2022
Published: 16 January 2023
Issue Date: February 2023
DOI: https://doi.org/10.1038/s41592-022-01731-9

This article is cited by

Introgressions lead to reference bias in wheat RNA-seq analysis
- Benedict Coombes
- Thomas Lux
- Anthony Hall
BMC Biology (2024)
A survey of mapping algorithms in the long-reads era
- Kristoffer Sahlin
- Thomas Baudeau
- Camille Marchet
Genome Biology (2023)
Graph construction method impacts variation representation and analyses in a bovine super-pangenome
- Alexander S. Leonard
- Danang Crysnanto
- Hubert Pausch
Genome Biology (2023)
A draft human pangenome reference
- Wen-Wei Liao
- Mobin Asri
- Benedict Paten
Nature (2023)