Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric

doi:10.1038/nprot.2011.393

Protocol
Published: 06 October 2011

Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Jean-Francois Taly¹^na1,
Cedrik Magis¹^na1,
Giovanni Bussotti¹,
Jia-Ming Chang¹,
Paolo Di Tommaso¹,
Ionas Erb¹,
Jose Espinosa-Carrasco¹,
Carsten Kemena¹ &
…
Cedric Notredame¹

Nature Protocols volume 6, pages 1669–1682 (2011)Cite this article

3449 Accesses
75 Citations
4 Altmetric
Metrics details

Subjects

Abstract

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Multiple sequence alignments of eight SH3 protein domains.**

**Figure 2: Structural classification of the cysteine-rich domain (CRD) family.**

**Figure 3: Comparison of T-Coffee and R-Coffee alignments.**

**Figure 4: Pro-Coffee: alignment of promoter regions using Pro-Coffee.**

Large multiple sequence alignments with a root-to-leaf regressive method

Article 02 December 2019

Edgar Garriga, Paolo Di Tommaso, … Cedric Notredame

R2DT is a framework for predicting and visualising RNA secondary structure using templates

Article Open access 09 June 2021

Blake A. Sweeney, David Hoksza, … Anton I. Petrov

US-align: universal structure alignments of proteins, nucleic acids, and macromolecular complexes

Article 29 August 2022

Chengxin Zhang, Morgan Shine, … Yang Zhang

References

Edgar, R.C. & Batzoglou, S. Multiple sequence alignment. Curr. Opin. Struct. Biol. 16, 368–373 (2006).
Article CAS Google Scholar
Kemena, C. & Notredame, C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 25, 2455–2465 (2009).
Article CAS Google Scholar
Notredame, C., Higgins, D.G. & Heringa, J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).
Article CAS Google Scholar
Di Tommaso, P. et al. Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud. Bioinformatics 26, 1903–1904 (2010).
Article CAS Google Scholar
Orbitg, M. et al. Exploiting parallelism on progressive alignment methods. J. Supercomputing 1, 1–9 (2009).
Google Scholar
Wong, K.M., Suchard, M.A. & Huelsenbeck, J.P. Alignment uncertainty and genomic analysis. Science 319, 473–476 (2008).
Article CAS Google Scholar
Gribskov, M., Luethy, R. & Eisenberg, D. Profile analysis. Methods Enzymol. 183, 146–159 (1990).
Article CAS Google Scholar
Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res 37, D211–D215 (2009).
Article CAS Google Scholar
Gardner, P.P. et al. Rfam: updates to the RNA families database. Nucleic Acids Res. 37, D136–D140 (2009).
Article CAS Google Scholar
Wistrand, M. & Sonnhammer, E.L. Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics 6, 99 (2005).
Article Google Scholar
Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 36, D281–D288 (2008).
Article CAS Google Scholar
Neumann, R. Publication Analysis 1997–2008- molecular genetics and genomics. Lab. Times 5, 47–48 (2010).
Google Scholar
Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89, 10915–10919 (1992).
Article CAS Google Scholar
Just, W. Computational complexity of multiple sequence alignment with SP-score. J. Comput. Biol. 8, 615–623 (2001).
Article CAS Google Scholar
Larkin, M.A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
Article CAS Google Scholar
Lassmann, T. & Sonnhammer, E.L. Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res. 34, W596–W599 (2006).
Article CAS Google Scholar
Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS Google Scholar
Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. 9, 286–298 (2008).
Article CAS Google Scholar
Do, C.B. et al. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005).
Article CAS Google Scholar
Pei, J., Kim, B.H. & Grishin, N.V. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 36, 2295–2300 (2008).
Article CAS Google Scholar
Notredame, C., Holm, L. & Higgins, D.G. COFFEE: an objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998).
Article CAS Google Scholar
Myers, E.W. & Miller, W. Optimal alignments in linear space. Comput. Applic. Biosci. 4, 11–17 (1988).
CAS Google Scholar
Thompson, J.D. et al. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6, e18093 (2011).
Article CAS Google Scholar
Wallace, I.M. et al. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006).
Article CAS Google Scholar
O'Sullivan, O. et al. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340, 385–395 (2004).
Article CAS Google Scholar
Armougom, F. et al. Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res. 34, W604–W608 (2006).
Article CAS Google Scholar
Orengo, C.A. & Taylor, W.R. SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol. 266, 617–635 (1996).
Article CAS Google Scholar
Soding, J. Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960 (2005).
Article Google Scholar
Wilm, A., Higgins, D.G. & Notredame, C. R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res. 36, e52 (2008).
Article Google Scholar
Thompson, J.D. et al. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61, 127–136 (2005).
Article CAS Google Scholar
Notredame, C. & Abergel, C. Using multiple alignment methods to assess the quality of genomic data analysis. in Bioinformatics and Genomes: Current Perspectives (ed. Andrade, M.) 30–50 (Horizon Scientific Press, 2003).
Magis, C. et al. T-RMSD: a fine-grained, structure-based classification method and its application to the functional characterization of TNF receptors. J. Mol. Biol. 400, 605–617 (2010).
Article CAS Google Scholar
Jordan, G.E. & Piel, W.H. PhyloWidget: web-based visualizations for the tree of life. Bioinformatics 24, 1641–1642 (2008).
Article CAS Google Scholar
Kiryu, H., Kin, T. & Asai, K. Rfold: an exact algorithm for computing local base pairing probabilities. Bioinformatics 24, 367–373 (2008).
Article CAS Google Scholar
Bernhart, S.H. et al. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics 9, 474 (2008).
Article Google Scholar
Fernandez-Ballester, G. et al. Structure-based prediction of the Saccharomyces cerevisiae SH3-ligand interactions. J. Mol. Biol. 388, 902–916 (2009).
Article CAS Google Scholar
Abraham, M. et al. Analysis and classification of RNA tertiary structures. RNA 14, 2274–2289 (2008).
Article CAS Google Scholar
Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040.
Article CAS Google Scholar
Blanco, E. et al. Transcription factor map alignment of promoter regions. PLoS Comput. Biol 2, e49 (2006).
Article Google Scholar
Felsenstein, J. PHYLIP: phylogeny inference package. Cladistics 5, 355–356 (1988).
Google Scholar
Rose, P.W. et al. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res. 39, D392–D3401 (2011).
Article CAS Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article Google Scholar

Download references

Acknowledgements

We thank J. Ramón González-Vallinas and E. Eyras for providing ChIP-seq analysis used in Figure 4. This project is supported by the Plan Nacional BFU2008-00419, the LEISHDRUG (no. 223414) and the Quantomics (KBBE-2A-222664) projects of the 7th Framework Programme of the European Commission and by a 'la Caixa' International PhD Program fellowship. Computational resources are provided by the Center for Genomic Regulation (CRG) of Barcelona.

AUTHOR CONTRIBUTIONS

J.-F.T., C.M., J.-M.C. and C.N. conceived and executed the experiments about protein sequences and structures. G.B., C.K. and C.N. conceived and executed the experiments about RNA sequences. I.E. and C.N. conceived and executed the experiments about DNA sequences. P.D.T. and C.N. conceived and developed the installation procedure. J.-F.T., C.M., G.B., J.-M.C., P.D.T., I.E., J.E.-C., C.K. and C.N. wrote the manuscript.

Author information

Jean-Francois Taly and Cedrik Magis: These authors contributed equally to this work.

Authors and Affiliations

Comparative Bioinformatics Group, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), Universitat Pompeu Fabra (UPF), Barcelona, Spain
Jean-Francois Taly, Cedrik Magis, Giovanni Bussotti, Jia-Ming Chang, Paolo Di Tommaso, Ionas Erb, Jose Espinosa-Carrasco, Carsten Kemena & Cedric Notredame

Authors

Jean-Francois Taly
View author publications
You can also search for this author in PubMed Google Scholar
Cedrik Magis
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Bussotti
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Ming Chang
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Di Tommaso
View author publications
You can also search for this author in PubMed Google Scholar
Ionas Erb
View author publications
You can also search for this author in PubMed Google Scholar
Jose Espinosa-Carrasco
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Kemena
View author publications
You can also search for this author in PubMed Google Scholar
Cedric Notredame
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cedric Notredame.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taly, JF., Magis, C., Bussotti, G. et al. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures. Nat Protoc 6, 1669–1682 (2011). https://doi.org/10.1038/nprot.2011.393

Download citation

Published: 06 October 2011
Issue Date: November 2011
DOI: https://doi.org/10.1038/nprot.2011.393

This article is cited by

A review of alignment based similarity measures for web usage mining
- Vinh-Trung Luu
- Germain Forestier
- Pierre-Alain Muller
Artificial Intelligence Review (2020)
Towards reconstructing the dipteran demise of an ancient essential gene: E3 ubiquitin ligase Murine double minute
- Naveen Jasti
- Dylan Sebagh
- Markus Friedrich
Development Genes and Evolution (2020)
A revised biosynthetic pathway for the cofactor F420 in prokaryotes
- Ghader Bashiri
- James Antoney
- Colin J. Jackson
Nature Communications (2019)
Crystal structure of a membrane-bound O-acyltransferase
- Dan Ma
- Zhizhi Wang
- Wenqing Xu
Nature (2018)
A new aerobic chemolithoautotrophic arsenic oxidizing microorganism isolated from a high Andean watershed
- Javiera M. Anguita
- Claudia Rojas
- Ignacio T. Vargas
Biodegradation (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.