TACO produces robust multisample transcriptome assemblies from RNA-seq

Niknafs, Yashar S; Pandian, Balaji; Iyer, Hariharan K; Chinnaiyan, Arul M; Iyer, Matthew K

doi:10.1038/nmeth.4078

Brief Communication
Published: 21 November 2016

TACO produces robust multisample transcriptome assemblies from RNA-seq

Yashar S Niknafs^1,2,
Balaji Pandian¹,
Hariharan K Iyer³,
Arul M Chinnaiyan^1,2,4,5,6,7 &
…
Matthew K Iyer¹

Nature Methods volume 14, pages 68–70 (2017)Cite this article

7128 Accesses
116 Citations
32 Altmetric
Metrics details

Subjects

Abstract

Accurate transcript structure and abundance inference from RNA sequencing (RNA-seq) data is foundational for molecular discovery. Here we present TACO, a computational method to reconstruct a consensus transcriptome from multiple RNA-seq data sets. TACO employs novel change-point detection to demarcate transcript start and end sites, leading to improved reconstruction accuracy compared with other tools in its class. The tool is available at http://tacorna.github.io and can be readily incorporated into RNA-seq analysis workflows.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Assessment of TACO performance.**

**Figure 3: Examples of TACO performance.**

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Srinivas Niranj Chandrasekaran, Beth A. Cimini, … Anne E. Carpenter

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

References

Djebali, S. et al. Nature 489, 101–108 (2012).
Article CAS Google Scholar
Mercer, T.R. et al. Nat. Biotechnol. 30, 99–104 (2011).
Article Google Scholar
Iyer, M.K. et al. Nat. Genet. 47, 199–208 (2015).
Article CAS Google Scholar
Harrow, J. et al. Genome Res. 22, 1760–1774 (2012).
Article CAS Google Scholar
Pruitt, K.D. et al. Nucleic Acids Res. 42, D756–D763 (2014).
Article CAS Google Scholar
Cunningham, F. et al. Nucleic Acids Res. 43, D662–D669 (2015).
Article CAS Google Scholar
Cabili, M.N. et al. Genes Dev. 25, 1915–1927 (2011).
Article CAS Google Scholar
Derrien, T. et al. Genome Res. 22, 1775–1789 (2012).
Article CAS Google Scholar
Weinstein, J.N. et al. Nat. Genet. 45, 1113–1120 (2013).
Article Google Scholar
International Cancer Genome Consortium. Nature 464, 993–998 (2010).
GTEx Consortium. Nat. Genet. 45, 580–585 (2013).
Steijger, T. et al. Nat. Methods 10, 1177–1184 (2013).
Article CAS Google Scholar
Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).
Article CAS Google Scholar
Pertea, M. et al. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS Google Scholar
Pertea, M., Kim, D., Pertea, G.M., Leek, J.T. & Salzberg, S.L. Nat. Protoc. 11, 1650–1667 (2016).
Article CAS Google Scholar
Dobin, A. et al. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
Barretina, J. et al. Nature 483, 603–607 (2012).
Article CAS Google Scholar
Howald, C. et al. Genome Res. 22, 1698–1710 (2012).
Article CAS Google Scholar
Su, M., Yuan, Y. & Zhu, M. A relationship between the average precision and the area under the ROC curve. In Proc. 2015 International Conference on the Theory of Information Retrieval 349–352 (ACM New York, 2015).
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. Nat. Biotechnol. 31, 1009–1014 (2013).
Article CAS Google Scholar
Tilgner, H. et al. Nat. Biotechnol. 33, 736–742 (2015).
Article CAS Google Scholar
Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS Google Scholar
Thorvaldsdóttir, H., Robinson, J.T. & Mesirov, J.P. Brief. Bioinform. 14, 178–192 (2013).
Article Google Scholar
Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Biostatistics 5, 557–572 (2004).
Article Google Scholar
Chikhi, R. & Medvedev, P. Bioinformatics 30, 31–37 (2014).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported in part by the NIH Prostate Specialized Program of Research Excellence grant P50CA186786 (A.M.C.), F30 CA 200328 (Y.S.N.), U01CA214170 (A.M.C.), and U24 CA210967 (A.M.C.). A.M.C. is supported by the Prostate Cancer Foundation and the Howard Hughes Medical Institute. A.M.C. is an American Cancer Society Research Professor and a Taubman Scholar of the University of Michigan.

Author information

Authors and Affiliations

Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan, USA
Yashar S Niknafs, Balaji Pandian, Arul M Chinnaiyan & Matthew K Iyer
Department of Cellular and Molecular Biology, University of Michigan, Ann Arbor, Michigan, USA
Yashar S Niknafs & Arul M Chinnaiyan
Department of Statistics, Colorado State University, Fort Collins, Colorado, USA
Hariharan K Iyer
Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
Arul M Chinnaiyan
Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan, USA
Arul M Chinnaiyan
Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan, USA
Arul M Chinnaiyan
Department of Urology, University of Michigan, Ann Arbor, Michigan, USA
Arul M Chinnaiyan

Authors

Yashar S Niknafs
View author publications
You can also search for this author in PubMed Google Scholar
Balaji Pandian
View author publications
You can also search for this author in PubMed Google Scholar
Hariharan K Iyer
View author publications
You can also search for this author in PubMed Google Scholar
Arul M Chinnaiyan
View author publications
You can also search for this author in PubMed Google Scholar
Matthew K Iyer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.K.I. designed the core of TACO method with assistance from Y.S.N., B.P., and H.K.I. The change-point detection method was developed by Y.S.N. and M.K.I. Code optimization was performed by M.K.I., B.P., and Y.S.N. Performance benchmarking was performed by Y.S.N. Y.S.N., A.M.C., and M.K.I. wrote the manuscript. A.M.C. supervised development of the tool and guided the project to completion. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Arul M Chinnaiyan.

Ethics declarations

Competing interests

Oncomine is supported by ThermoFisher, Inc. (previously Life Technologies and Compendia Biosciences). A.M.C. was a cofounder of Compendia Biosciences and served on the scientific advisory board of Life Technologies before it was acquired.

Integrated supplementary information

Supplementary Figure 1 Schematic example of TACO splice and path graph generation.

a. Depiction of splice graph generation from 4 input transfrags for a given gene. Expression of input transfrags listed to right of transfrags. Individual nodes indicated by numerical identifier. b. Path graphs for k=2 and k=3 from the splice graph represented in (a). At k=2, traversing the path graph generates 4 isoforms, one of which is not represented in the input transfrags, additionally the expression distribution of the transfrags does not accurately reflect the input. At k=3, path graph generates 4 transcripts, which are accurate representations of the input data, manifesting the power of utilizing path graphs instead of splice graphs.

Supplementary Figure 2 Schematic depicting the change point detection method

Multiple transfrags with different expression are used as input. The expression profile is then determined for each base in each node of the splice graph. Changes in the expression profile are then identified as change points.

Supplementary Figure 3 Assessment of TACO performance at different isoform fraction cutoffs

Performance metrics (recall, precision, and f-measure) describe the merging tool performance on the 55 CCLE breast cell lines at multiple isoform fraction cutoffs (n=50, ranging from 0.001-0.999) for Cuffmerge (blue), Stringtie (organge), and TACO (green) utilizing transcriptome assembly input Cufflinks (squares) and Stringtie (triangles). Performance measured for (a) splicing patterns, (b) splice junctions, and (c) bases.

Supplementary Figure 4 Assessment of TACO performance using long-read sequencing reference

a. Precision-recall plots highlighting the performance for the three merging tools at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Statistics describe the merger of the 51 GTEX RNA-seq samples from 17 different tissues types. PacBio long-read sequencing data from a multi-tissue panel was used as a reference for comparison statistics. b. Precision-recall plots highlighting the performance for the three merging tools at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Statistics describe the merger of the 50 randomly selected GTEX benign brain tissue RNA-seq sample. Long-read sequencing data of multiple brain samples was used as a reference for comparison statistics.

Supplementary Figure 5 Assessment of TACO performance for the highest expressed transcripts

Performance metrics (recall, precision, and f-measure) describe the merging tool performance on the 55 CCLE breast cell lines at multiple numbers of the highest expressed transcripts in each meta-assembly generated by Cuffmerge (blue), Stringtie (organge), and TACO (green). Performance measured for (a) splicing patterns, (b) splice junctions, and (c) bases.

Supplementary Figure 6 Assessment of read-through transcription

a. Scatterplot depicting the fraction of read-through genes produced in the meta-assembly from the three merge tools. Gene designated as being a read-through if it contains at least one transcript isoform bearing exonic overlap with two separate protein-coding genes on the same strand. Fraction of read-throughs was determined for assemblies at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Meta-assembly was performed on the 55 CCLE breast cancer cell lines. b. Barplot depicting the fraction of read-through genes produced by meta-assembly of the CCLE breast cohort, the GTEX tissue cohort, and the GTEX brain cohort by all three merging tools. Additionally, the fraction of read-through genes in the long-read sequencing data is also shown. Statistics for meta-assemblers are shown at the default isoform fraction for each tool.

Supplementary Figure 7 Assessment of expression filtering on meta-assembly performance

Precision-recall plots are shown highlighting the performance for the three merging tools at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Statistics describe the merger of the 51 GTEX RNA-seq samples from 17 different tissues types. PacBio long-read sequencing data from a multi-tissue panel was used as a reference for comparison statistics. TACO and StringTie-merge were run with and without a 1 FPKM expression filter, delineated by the shape of the point.

Supplementary Figure 8 Change point detection parameters

Precision, recall, and f-measure statistics for the performance of TACO on merging the 55 CCLE breast samples. Different p-values for the Mann-Whitney U measure performed during change point detection were tested. Additionally, different expression fold-change cutoffs were tested for the two groups produced following change point detection. Statistics reported for (a) splicing patterns, (b) splice junctions, and (c) bases.

Supplementary Figure 9 Example of meta-assembly for coding and non-coding genes.

Depiction of the assemblies produced from merger of 100 samples for the (a) HNRNPK gene and the (b) GAS5 lncRNA by all three tools. Poly-exonic transcripts in the negative strand are depicted. The assembly produced by Cuffmerge is shown in blue, Stringtie in orange, and TACO in green. The Refseq reference annotation is shown above in red.

Supplementary Figure 10 Detail for example of meta-assembly of the GAS5 locus

a. Depiction of the 1q25 locus. Meta-assemblies produced by Cuffmerge, Stringtie, and TACO are shown. Poly-exonic transcripts in the positive (red) and negative (blue) strand are shown. Refseq reference annotation is shown, depicting the location of the DARS2, GAS5, GAS5-AS1, and ZBTB37 genes. b. Meta-assembly of the GAS5 locus from merging of the GTEX tissue panel depicted. Pacbio reference shown in purple.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niknafs, Y., Pandian, B., Iyer, H. et al. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat Methods 14, 68–70 (2017). https://doi.org/10.1038/nmeth.4078

Download citation

Received: 17 May 2016
Accepted: 17 October 2016
Published: 21 November 2016
Issue Date: January 2017
DOI: https://doi.org/10.1038/nmeth.4078

This article is cited by

The hagfish genome and the evolution of vertebrates
- Ferdinand Marlétaz
- Nataliya Timoshevskaya
- Daniel S. Rokhsar
Nature (2024)
Characterization of H3K9me3 and DNA methylation co-marked CpG-rich regions during mouse development
- Hui Yang
- Yiman Wang
- Yong Zhang
BMC Genomics (2023)
Cell-specific clock-controlled gene expression program regulates rhythmic fiber cell growth in cotton
- Dehe Wang
- Xiao Hu
- Kun Wang
Genome Biology (2023)
The landscape of the long non-coding RNAs in developing mouse retinas
- Dongliang Yu
- Yuqing Wu
- Lin Gan
BMC Genomics (2023)
Re-annotation of the Liriodendron chinense genome identifies novel genes and improves genome annotation quality
- Hainan Wu
- Ziyuan Hao
- Huogen Li
Tree Genetics & Genomes (2023)