Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

TACO produces robust multisample transcriptome assemblies from RNA-seq

Abstract

Accurate transcript structure and abundance inference from RNA sequencing (RNA-seq) data is foundational for molecular discovery. Here we present TACO, a computational method to reconstruct a consensus transcriptome from multiple RNA-seq data sets. TACO employs novel change-point detection to demarcate transcript start and end sites, leading to improved reconstruction accuracy compared with other tools in its class. The tool is available at http://tacorna.github.io and can be readily incorporated into RNA-seq analysis workflows.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: Assessment of TACO performance.
Figure 3: Examples of TACO performance.

Similar content being viewed by others

References

  1. Djebali, S. et al. Nature 489, 101–108 (2012).

    Article  CAS  Google Scholar 

  2. Mercer, T.R. et al. Nat. Biotechnol. 30, 99–104 (2011).

    Article  Google Scholar 

  3. Iyer, M.K. et al. Nat. Genet. 47, 199–208 (2015).

    Article  CAS  Google Scholar 

  4. Harrow, J. et al. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  Google Scholar 

  5. Pruitt, K.D. et al. Nucleic Acids Res. 42, D756–D763 (2014).

    Article  CAS  Google Scholar 

  6. Cunningham, F. et al. Nucleic Acids Res. 43, D662–D669 (2015).

    Article  CAS  Google Scholar 

  7. Cabili, M.N. et al. Genes Dev. 25, 1915–1927 (2011).

    Article  CAS  Google Scholar 

  8. Derrien, T. et al. Genome Res. 22, 1775–1789 (2012).

    Article  CAS  Google Scholar 

  9. Weinstein, J.N. et al. Nat. Genet. 45, 1113–1120 (2013).

    Article  Google Scholar 

  10. International Cancer Genome Consortium. Nature 464, 993–998 (2010).

  11. GTEx Consortium. Nat. Genet. 45, 580–585 (2013).

  12. Steijger, T. et al. Nat. Methods 10, 1177–1184 (2013).

    Article  CAS  Google Scholar 

  13. Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).

    Article  CAS  Google Scholar 

  14. Pertea, M. et al. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  Google Scholar 

  15. Pertea, M., Kim, D., Pertea, G.M., Leek, J.T. & Salzberg, S.L. Nat. Protoc. 11, 1650–1667 (2016).

    Article  CAS  Google Scholar 

  16. Dobin, A. et al. Bioinformatics 29, 15–21 (2013).

    Article  CAS  Google Scholar 

  17. Barretina, J. et al. Nature 483, 603–607 (2012).

    Article  CAS  Google Scholar 

  18. Howald, C. et al. Genome Res. 22, 1698–1710 (2012).

    Article  CAS  Google Scholar 

  19. Su, M., Yuan, Y. & Zhu, M. A relationship between the average precision and the area under the ROC curve. In Proc. 2015 International Conference on the Theory of Information Retrieval 349–352 (ACM New York, 2015).

  20. Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. Nat. Biotechnol. 31, 1009–1014 (2013).

    Article  CAS  Google Scholar 

  21. Tilgner, H. et al. Nat. Biotechnol. 33, 736–742 (2015).

    Article  CAS  Google Scholar 

  22. Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  Google Scholar 

  23. Thorvaldsdóttir, H., Robinson, J.T. & Mesirov, J.P. Brief. Bioinform. 14, 178–192 (2013).

    Article  Google Scholar 

  24. Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Biostatistics 5, 557–572 (2004).

    Article  Google Scholar 

  25. Chikhi, R. & Medvedev, P. Bioinformatics 30, 31–37 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the NIH Prostate Specialized Program of Research Excellence grant P50CA186786 (A.M.C.), F30 CA 200328 (Y.S.N.), U01CA214170 (A.M.C.), and U24 CA210967 (A.M.C.). A.M.C. is supported by the Prostate Cancer Foundation and the Howard Hughes Medical Institute. A.M.C. is an American Cancer Society Research Professor and a Taubman Scholar of the University of Michigan.

Author information

Authors and Affiliations

Authors

Contributions

M.K.I. designed the core of TACO method with assistance from Y.S.N., B.P., and H.K.I. The change-point detection method was developed by Y.S.N. and M.K.I. Code optimization was performed by M.K.I., B.P., and Y.S.N. Performance benchmarking was performed by Y.S.N. Y.S.N., A.M.C., and M.K.I. wrote the manuscript. A.M.C. supervised development of the tool and guided the project to completion. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Arul M Chinnaiyan.

Ethics declarations

Competing interests

Oncomine is supported by ThermoFisher, Inc. (previously Life Technologies and Compendia Biosciences). A.M.C. was a cofounder of Compendia Biosciences and served on the scientific advisory board of Life Technologies before it was acquired.

Integrated supplementary information

Supplementary Figure 1 Schematic example of TACO splice and path graph generation.

a. Depiction of splice graph generation from 4 input transfrags for a given gene. Expression of input transfrags listed to right of transfrags. Individual nodes indicated by numerical identifier. b. Path graphs for k=2 and k=3 from the splice graph represented in (a). At k=2, traversing the path graph generates 4 isoforms, one of which is not represented in the input transfrags, additionally the expression distribution of the transfrags does not accurately reflect the input. At k=3, path graph generates 4 transcripts, which are accurate representations of the input data, manifesting the power of utilizing path graphs instead of splice graphs.

Supplementary Figure 2 Schematic depicting the change point detection method

Multiple transfrags with different expression are used as input. The expression profile is then determined for each base in each node of the splice graph. Changes in the expression profile are then identified as change points.

Supplementary Figure 3 Assessment of TACO performance at different isoform fraction cutoffs

Performance metrics (recall, precision, and f-measure) describe the merging tool performance on the 55 CCLE breast cell lines at multiple isoform fraction cutoffs (n=50, ranging from 0.001-0.999) for Cuffmerge (blue), Stringtie (organge), and TACO (green) utilizing transcriptome assembly input Cufflinks (squares) and Stringtie (triangles). Performance measured for (a) splicing patterns, (b) splice junctions, and (c) bases.

Supplementary Figure 4 Assessment of TACO performance using long-read sequencing reference

a. Precision-recall plots highlighting the performance for the three merging tools at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Statistics describe the merger of the 51 GTEX RNA-seq samples from 17 different tissues types. PacBio long-read sequencing data from a multi-tissue panel was used as a reference for comparison statistics. b. Precision-recall plots highlighting the performance for the three merging tools at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Statistics describe the merger of the 50 randomly selected GTEX benign brain tissue RNA-seq sample. Long-read sequencing data of multiple brain samples was used as a reference for comparison statistics.

Supplementary Figure 5 Assessment of TACO performance for the highest expressed transcripts

Performance metrics (recall, precision, and f-measure) describe the merging tool performance on the 55 CCLE breast cell lines at multiple numbers of the highest expressed transcripts in each meta-assembly generated by Cuffmerge (blue), Stringtie (organge), and TACO (green). Performance measured for (a) splicing patterns, (b) splice junctions, and (c) bases.

Supplementary Figure 6 Assessment of read-through transcription

a. Scatterplot depicting the fraction of read-through genes produced in the meta-assembly from the three merge tools. Gene designated as being a read-through if it contains at least one transcript isoform bearing exonic overlap with two separate protein-coding genes on the same strand. Fraction of read-throughs was determined for assemblies at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Meta-assembly was performed on the 55 CCLE breast cancer cell lines. b. Barplot depicting the fraction of read-through genes produced by meta-assembly of the CCLE breast cohort, the GTEX tissue cohort, and the GTEX brain cohort by all three merging tools. Additionally, the fraction of read-through genes in the long-read sequencing data is also shown. Statistics for meta-assemblers are shown at the default isoform fraction for each tool.

Supplementary Figure 7 Assessment of expression filtering on meta-assembly performance

Precision-recall plots are shown highlighting the performance for the three merging tools at 50 different isoform fraction cutoffs ranging from 0.001-0.999. Statistics describe the merger of the 51 GTEX RNA-seq samples from 17 different tissues types. PacBio long-read sequencing data from a multi-tissue panel was used as a reference for comparison statistics. TACO and StringTie-merge were run with and without a 1 FPKM expression filter, delineated by the shape of the point.

Supplementary Figure 8 Change point detection parameters

Precision, recall, and f-measure statistics for the performance of TACO on merging the 55 CCLE breast samples. Different p-values for the Mann-Whitney U measure performed during change point detection were tested. Additionally, different expression fold-change cutoffs were tested for the two groups produced following change point detection. Statistics reported for (a) splicing patterns, (b) splice junctions, and (c) bases.

Supplementary Figure 9 Example of meta-assembly for coding and non-coding genes.

Depiction of the assemblies produced from merger of 100 samples for the (a) HNRNPK gene and the (b) GAS5 lncRNA by all three tools. Poly-exonic transcripts in the negative strand are depicted. The assembly produced by Cuffmerge is shown in blue, Stringtie in orange, and TACO in green. The Refseq reference annotation is shown above in red.

Supplementary Figure 10 Detail for example of meta-assembly of the GAS5 locus

a. Depiction of the 1q25 locus. Meta-assemblies produced by Cuffmerge, Stringtie, and TACO are shown. Poly-exonic transcripts in the positive (red) and negative (blue) strand are shown. Refseq reference annotation is shown, depicting the location of the DARS2, GAS5, GAS5-AS1, and ZBTB37 genes. b. Meta-assembly of the GAS5 locus from merging of the GTEX tissue panel depicted. Pacbio reference shown in purple.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Note. (PDF 2308 kb)

Supplementary Table 1

Samples used. (XLSX 59 kb)

Supplementary Table 2

Batches used. (XLSX 119 kb)

Supplementary Table 3

Batch size statistics. (XLSX 56 kb)

Supplementary Table 4

Isoform fraction statistics. (XLSX 52 kb)

Supplementary Table 5

GTEX Samples used. (XLSX 10 kb)

Supplementary Table 6

High expression statistics. (XLSX 14 kb)

Supplementary Table 7

Changepoint parameter statistics. (XLSX 23 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niknafs, Y., Pandian, B., Iyer, H. et al. TACO produces robust multisample transcriptome assemblies from RNA-seq. Nat Methods 14, 68–70 (2017). https://doi.org/10.1038/nmeth.4078

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4078

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer