Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Abstract

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Multiple sequence alignments of eight SH3 protein domains.
Figure 2: Structural classification of the cysteine-rich domain (CRD) family.
Figure 3: Comparison of T-Coffee and R-Coffee alignments.
Figure 4: Pro-Coffee: alignment of promoter regions using Pro-Coffee.

References

  1. Edgar, R.C. & Batzoglou, S. Multiple sequence alignment. Curr. Opin. Struct. Biol. 16, 368–373 (2006).

    Article  CAS  Google Scholar 

  2. Kemena, C. & Notredame, C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455–2465 (2009).

    Article  CAS  Google Scholar 

  3. Notredame, C., Higgins, D.G. & Heringa, J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).

    Article  CAS  Google Scholar 

  4. Di Tommaso, P. et al. Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud. Bioinformatics 26, 1903–1904 (2010).

    Article  CAS  Google Scholar 

  5. Orbitg, M. et al. Exploiting parallelism on progressive alignment methods. J. Supercomputing 1, 1–9 (2009).

    Google Scholar 

  6. Wong, K.M., Suchard, M.A. & Huelsenbeck, J.P. Alignment uncertainty and genomic analysis. Science 319, 473–476 (2008).

    Article  CAS  Google Scholar 

  7. Gribskov, M., Luethy, R. & Eisenberg, D. Profile analysis. Methods Enzymol. 183, 146–159 (1990).

    Article  CAS  Google Scholar 

  8. Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res 37, D211–D215 (2009).

    Article  CAS  Google Scholar 

  9. Gardner, P.P. et al. Rfam: updates to the RNA families database. Nucleic Acids Res. 37, D136–D140 (2009).

    Article  CAS  Google Scholar 

  10. Wistrand, M. & Sonnhammer, E.L. Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics 6, 99 (2005).

    Article  Google Scholar 

  11. Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 36, D281–D288 (2008).

    Article  CAS  Google Scholar 

  12. Neumann, R. Publication Analysis 1997–2008- molecular genetics and genomics. Lab. Times 5, 47–48 (2010).

    Google Scholar 

  13. Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 10915–10919 (1992).

    Article  CAS  Google Scholar 

  14. Just, W. Computational complexity of multiple sequence alignment with SP-score. J. Comput. Biol. 8, 615–623 (2001).

    Article  CAS  Google Scholar 

  15. Larkin, M.A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).

    Article  CAS  Google Scholar 

  16. Lassmann, T. & Sonnhammer, E.L. Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res. 34, W596–W599 (2006).

    Article  CAS  Google Scholar 

  17. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    Article  CAS  Google Scholar 

  18. Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9, 286–298 (2008).

    Article  CAS  Google Scholar 

  19. Do, C.B. et al. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005).

    Article  CAS  Google Scholar 

  20. Pei, J., Kim, B.H. & Grishin, N.V. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 36, 2295–2300 (2008).

    Article  CAS  Google Scholar 

  21. Notredame, C., Holm, L. & Higgins, D.G. COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998).

    Article  CAS  Google Scholar 

  22. Myers, E.W. & Miller, W. Optimal alignments in linear space. Comput. Applic. Biosci. 4, 11–17 (1988).

    CAS  Google Scholar 

  23. Thompson, J.D. et al. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6, e18093 (2011).

    Article  CAS  Google Scholar 

  24. Wallace, I.M. et al. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006).

    Article  CAS  Google Scholar 

  25. O'Sullivan, O. et al. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340, 385–395 (2004).

    Article  CAS  Google Scholar 

  26. Armougom, F. et al. Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 34, W604–W608 (2006).

    Article  CAS  Google Scholar 

  27. Orengo, C.A. & Taylor, W.R. SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol. 266, 617–635 (1996).

    Article  CAS  Google Scholar 

  28. Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005).

    Article  Google Scholar 

  29. Wilm, A., Higgins, D.G. & Notredame, C. R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res. 36, e52 (2008).

    Article  Google Scholar 

  30. Thompson, J.D. et al. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005).

    Article  CAS  Google Scholar 

  31. Notredame, C. & Abergel, C. Using multiple alignment methods to assess the quality of genomic data analysis. in Bioinformatics and Genomes: Current Perspectives (ed. Andrade, M.) 30–50 (Horizon Scientific Press, 2003).

  32. Magis, C. et al. T-RMSD: a fine-grained, structure-based classification method and its application to the functional characterization of TNF receptors. J. Mol. Biol. 400, 605–617 (2010).

    Article  CAS  Google Scholar 

  33. Jordan, G.E. & Piel, W.H. PhyloWidget: web-based visualizations for the tree of life. Bioinformatics 24, 1641–1642 (2008).

    Article  CAS  Google Scholar 

  34. Kiryu, H., Kin, T. & Asai, K. Rfold: an exact algorithm for computing local base pairing probabilities. Bioinformatics 24, 367–373 (2008).

    Article  CAS  Google Scholar 

  35. Bernhart, S.H. et al. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics 9, 474 (2008).

    Article  Google Scholar 

  36. Fernandez-Ballester, G. et al. Structure-based prediction of the Saccharomyces cerevisiae SH3-ligand interactions. J. Mol. Biol. 388, 902–916 (2009).

    Article  CAS  Google Scholar 

  37. Abraham, M. et al. Analysis and classification of RNA tertiary structures. RNA 14, 2274–2289 (2008).

    Article  CAS  Google Scholar 

  38. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040.

    Article  CAS  Google Scholar 

  39. Blanco, E. et al. Transcription factor map alignment of promoter regions. PLoS Comput. Biol 2, e49 (2006).

    Article  Google Scholar 

  40. Felsenstein, J. PHYLIP: phylogeny inference package. Cladistics 5, 355–356 (1988).

    Google Scholar 

  41. Rose, P.W. et al. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res. 39, D392–D3401 (2011).

    Article  CAS  Google Scholar 

  42. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Article  Google Scholar 

Download references

Acknowledgements

We thank J. Ramón González-Vallinas and E. Eyras for providing ChIP-seq analysis used in Figure 4. This project is supported by the Plan Nacional BFU2008-00419, the LEISHDRUG (no. 223414) and the Quantomics (KBBE-2A-222664) projects of the 7th Framework Programme of the European Commission and by a 'la Caixa' International PhD Program fellowship. Computational resources are provided by the Center for Genomic Regulation (CRG) of Barcelona.

AUTHOR CONTRIBUTIONS

J.-F.T., C.M., J.-M.C. and C.N. conceived and executed the experiments about protein sequences and structures. G.B., C.K. and C.N. conceived and executed the experiments about RNA sequences. I.E. and C.N. conceived and executed the experiments about DNA sequences. P.D.T. and C.N. conceived and developed the installation procedure. J.-F.T., C.M., G.B., J.-M.C., P.D.T., I.E., J.E.-C., C.K. and C.N. wrote the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cedric Notredame.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taly, JF., Magis, C., Bussotti, G. et al. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures. Nat Protoc 6, 1669–1682 (2011). https://doi.org/10.1038/nprot.2011.393

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2011.393

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing