Visualization of multiple alignments, phylogenies and gene family evolution

Procter, James B; Thompson, Julie; Letunic, Ivica; Creevey, Chris; Jossinet, Fabrice; Barton, Geoffrey J

doi:10.1038/nmeth.1434

Review Article
Published: 01 March 2010

Visualization of multiple alignments, phylogenies and gene family evolution

James B Procter¹,
Julie Thompson²,
Ivica Letunic³,
Chris Creevey⁴,
Fabrice Jossinet⁵ &
…
Geoffrey J Barton¹

Nature Methods volume 7, pages S16–S25 (2010)Cite this article

8045 Accesses
61 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Software for visualizing sequence alignments and trees are essential tools for life scientists. In this review, we describe the major features and capabilities of a selection of stand-alone and web-based applications useful when investigating the function and evolution of a gene family. These range from simple viewers, to systems that provide sophisticated editing and analysis functions. We conclude with a discussion of the challenges that these tools now face due to the flood of next generation sequence data and the increasingly complex network of bioinformatics information sources.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Multiple alignment visualization.**

**Figure 3: Examples of automatically generated summary annotation for an alignment generated by MSA visualization tools.**

SWAV: a web-based visualization browser for sliding window analysis

Article Open access 10 January 2020

Generation of accurate, expandable phylogenomic trees with uDance

Article 27 July 2023

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Article Open access 20 November 2023

References

Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Johnson, M. et al. NCBI BLAST: a better web interface. Nucleic Acids Res. 36, W5–W9 (2008).
CAS PubMed PubMed Central Google Scholar
Lu, G. & Moriyama, E.N. Vector NTI, a balanced all-in-one sequence analysis suite. Brief. Bioinform. 5, 378–388 (2004).
CAS PubMed Google Scholar
Thompson, J.D., Gibson, T.J. & Higgins, D.G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 2, 2.3.1–2.3.22 (2002).
Google Scholar
Notredame, C., Higgins, D.G. & Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).
CAS PubMed Google Scholar
Edgar, R.C. & Batzoglou, S. Multiple sequence alignment. Curr. Opin. Struct. Biol. 16, 368–373 (2006). A comprehensive review of the approaches available for the alignment of many sequences.
CAS PubMed Google Scholar
Raghava, G.P., Searle, S.M., Audley, P.C., Barber, J.D. & Barton, G.J. OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 4, 47 (2003).
CAS PubMed PubMed Central Google Scholar
Gouet, P., Robert, X. & Courcelle, E. ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of proteins. Nucleic Acids Res. 31, 3320–3323 (2003).
CAS PubMed PubMed Central Google Scholar
Barton, G.J. ALSCRIPT: a tool to format multiple sequence alignments. Protein Eng. 6, 37–40 (1993).
CAS PubMed Google Scholar
Goodstadt, L. & Ponting, C.P. CHROMA: consensus-based colouring of multiple alignments for publication. Bioinformatics 17, 845–846 (2001).
CAS PubMed Google Scholar
Barrio, A.M., Lagercrantz, E., Sperber, G.O., Blomberg, J. & Bongcam-Rudloff, E. Annotation and visualization of endogenous retroviral sequences using the Distributed Annotation System (DAS) and eBioX. BMC Bioinformatics 10 (suppl. 6), S18 (2009).
PubMed Google Scholar
Finn, R.D. et al. The Pfam protein families database. Nucleic Acids Res. 36, D281–D288 (2008).
CAS PubMed Google Scholar
Lin, K., May, A.C. & Taylor, W.R. Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types. J. Theor. Biol. 216, 361–365 (2002). The empirical analysis underlying the 'Taylor' amino acid color scheme; this builds on Taylor's earlier work (1986) concerning approaches for the classification of amino acids.
CAS PubMed Google Scholar
Valdar, W.S. Scoring residue conservation. Proteins 48, 227–241 (2002).
CAS PubMed Google Scholar
Chakrabarti, S. & Lanczycki, C.J. Analysis and prediction of functionally important sites in proteins. Protein Sci. 16, 4–13 (2007).
CAS PubMed PubMed Central Google Scholar
Schneider, T.D. & Stephens, R.M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990).
CAS PubMed PubMed Central Google Scholar
Schneider, T.D. Twenty years of Delila and molecular information theory: the Altenberg-Austin Workshop in Theoretical Biology biological information, beyond metaphor: causality, explanation, and unification Altenberg, Austria, 11–14 July 2002. Biol. Theory 1, 250–260 (2006).
PubMed PubMed Central Google Scholar
Caffrey, D.R. et al. PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments. BMC Bioinformatics 8, 381 (2007).
PubMed PubMed Central Google Scholar
Rastogi, P.A. MacVector. Integrated sequence analysis for the Macintosh. Methods Mol. Biol. 132, 47–69 (2000).
CAS PubMed Google Scholar
Gille, C. & Robinson, P.N. HotSwap for bioinformatics: a STRAP tutorial. BMC Bioinformatics 7, 64 (2006).
PubMed PubMed Central Google Scholar
Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
CAS PubMed PubMed Central Google Scholar
Landan, G. & Graur, D. Characterization of pairwise and multiple sequence alignment errors. Gene 441, 141–147 (2009). To our knowledge, this is the first detailed analysis of the errors that may be introduced by tree based sequence alignment algorithms.
CAS PubMed Google Scholar
Galtier, N., Gouy, M. & Gautier, C. SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12, 543–548 (1996).
CAS PubMed Google Scholar
Lord, P.W., Selley, J.N. & Attwood, T.K. CINEMA-MX: a modular multiple alignment editor. Bioinformatics 18, 1402–1403 (2002).
CAS PubMed Google Scholar
Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M. & Barton, G.J. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).
CAS PubMed PubMed Central Google Scholar
Margulies, E.H. & Birney, E. Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat. Rev. Genet. 9, 303–313 (2008).
CAS PubMed Google Scholar
Hulo, N. et al. The 20 years of PROSITE. Nucleic Acids Res. 36, D245–D249 (2008).
CAS PubMed Google Scholar
Wingender, E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief. Bioinform. 9, 326–332 (2008).
CAS PubMed Google Scholar
Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).
CAS PubMed Google Scholar
Zvelebil, M.J., Barton, G.J., Taylor, W.R. & Sternberg, M.J. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 195, 957–961 (1987).
CAS PubMed Google Scholar
Chakrabarti, S. & Panchenko, A.R. Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 10, 207 (2009).
PubMed PubMed Central Google Scholar
Horner, D.S., Pirovano, W. & Pesole, G. Correlated substitution analysis and the prediction of amino acid structural contacts. Brief. Bioinform. 9, 46–56 (2008).
CAS PubMed Google Scholar
Casari, G., Sander, C. & Valencia, A. A method to predict functional residues in proteins. Nat. Struct. Biol. 2, 171–178 (1995).
CAS PubMed Google Scholar
Schwarz, R. et al. Detecting species-site dependencies in large multiple sequence alignments. Nucleic Acids Res. 37, 5959–5968 (2009).
CAS PubMed PubMed Central Google Scholar
Joachimiak, M.P. & Cohen, F.E. JEvTrace: refinement and variations of the evolutionary trace in JAVA. Genome Biol. 3, RESEARCH0077 (2002).
PubMed PubMed Central Google Scholar
Goldenberg, O., Erez, E., Nimrod, G. & Ben-Tal, N. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures. Nucleic Acids Res. 37, D323–D327 (2009).
CAS PubMed Google Scholar
Li, W. & Godzik, A. VISSA: a program to visualize structural features from structure sequence alignment. Bioinformatics 22, 887–888 (2006).
CAS PubMed Google Scholar
Brown, J.W. et al. The RNA structure alignment ontology. RNA 15, 1623–1631 (2009).
CAS PubMed PubMed Central Google Scholar
Chen, K., Durand, D. & Farach-Colton, M. NOTUNG: a program for dating gene duplications and optimizing gene family trees. J. Comput. Biol. 7, 429–447 (2000).
CAS PubMed Google Scholar
Vernot, B., Stolzer, M., Goldman, A. & Durand, D. Reconciliation with non-binary species trees. J. Comput. Biol. 15, 981–1006 (2008).
CAS PubMed PubMed Central Google Scholar
Bingham, J. & Sudarsanam, S. Visualizing large hierarchical clusters in hyperbolic space. Bioinformatics 16, 660–661 (2000).
CAS PubMed Google Scholar
Hughes, T., Hyun, Y. & Liberles, D.A. Visualising very large phylogenetic trees in three dimensional hyperbolic space. BMC Bioinformatics 5, 48 (2004).
PubMed PubMed Central Google Scholar
Livingstone, C.D. & Barton, G.J. Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput. Appl. Biosci. 9, 745–756 (1993).
CAS PubMed Google Scholar
Sankararaman, S. & Sjolander, K. INTREPID–INformation-theoretic TREe traversal for Protein functional site IDentification. Bioinformatics 24, 2445–2452 (2008).
CAS PubMed PubMed Central Google Scholar
Engelen, S., Trojan, L.A., Sacquin-Mora, S., Lavery, R. & Carbone, A. Joint evolutionary trees: a large-scale method to predict protein interfaces based on sequence sampling. PLoS Comput. Biol. 5, e1000267 (2009).
PubMed PubMed Central Google Scholar
Chevenet, F., Brun, C., Banuls, A.L., Jacq, B. & Christen, R. TreeDyn: towards dynamic graphics and annotations for analyses of trees. BMC Bioinformatics 7, 439 (2006).
PubMed PubMed Central Google Scholar
Santamaría, R. & Theron, R. Treevolution: visual analysis of phylogenetic trees. Bioinformatics 25, 1970–1971 (2009).
PubMed Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127–128 (2007).
CAS PubMed Google Scholar
Müller, J. & Müller, K. TreeGraph: automated drawing of complex tree figures using an extensible tree description format. Mol. Ecol. Notes 4, 786–788 (2004).
Google Scholar
Pettifer, S. et al. Visualising biological data: a semantic approach to tool and database integration. BMC Bioinformatics 10 (supp. 6), S19 (2009).
PubMed PubMed Central Google Scholar
Raphael, B., Zhi, D., Tang, H. & Pevzner, P. A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346 (2004). Introduces the partially ordered alignment algorithm and demonstrates how this graph based alignment visualization provides a more compact view of complex alignments.
CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). Describes the CIRCOS approach for visualization of comparative genomic data, which can provide a more compact view of large multiple sequence alignments.
CAS PubMed PubMed Central Google Scholar
UniProt Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 37, D169–D174 (2009).
Berman, H.M. et al. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759 (1992).
CAS PubMed PubMed Central Google Scholar
Taylor, W.R. The classification of amino acid conservation. J. Theor. Biol. 119, 205–218 (1986).
CAS PubMed Google Scholar
Mirny, L.A. & Shakhnovich, E.I. Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J. Mol. Biol. 291, 177–196 (1999).
CAS PubMed Google Scholar
Schuster-Böckler, B. & Bateman, A. Visualizing profile-profile alignment: pairwise HMM logos. Bioinformatics 21, 2912–2913 (2005).
PubMed Google Scholar
Eddy, S.R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).
CAS PubMed Google Scholar
Seibel, P.N., Muller, T., Dandekar, T. & Wolf, M. Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4SALE. BMC Res. Notes 1, 91 (2008).
PubMed PubMed Central Google Scholar
Wilm, A., Linnenbrink, K. & Steger, G. ConStruct: improved construction of RNA consensus structures. BMC Bioinformatics 9, 219 (2008).
PubMed PubMed Central Google Scholar
Jossinet, F. & Westhof, E. Sequence to Structure (S2S): display, manipulate and interconnect RNA data from sequence to structure. Bioinformatics 21, 3320–3321 (2005).
CAS PubMed Google Scholar
Andersen, E.S. et al. Semiautomated improvement of RNA alignments. RNA 13, 1850–1859 (2007).
CAS PubMed PubMed Central Google Scholar
Gille, C. Structural interpretation of mutations and SNPs using STRAP-NT. Protein Sci. 15, 208–210 (2006).
CAS PubMed PubMed Central Google Scholar
Mizuguchi, K., Deane, C.M., Blundell, T.L., Johnson, M.S. & Overington, J.P. JOY: protein sequence-structure representation and analysis. Bioinformatics 14, 617–623 (1998).
CAS PubMed Google Scholar
Zmasek, C.M. & Eddy, S.R. ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics 17, 383–384 (2001).
CAS PubMed Google Scholar
Archer, J. & Robertson, D.L. CTree: comparison of clusters between phylogenetic trees made easy. Bioinformatics 23, 2952–2953 (2007).
CAS PubMed Google Scholar
Huson, D.H. et al. Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinformatics 8, 460 (2007).
PubMed PubMed Central Google Scholar
Perrière, G. & Gouy, M. WWW-query: an on-line retrieval system for biological sequence banks. Biochimie 78, 364–369 (1996).
PubMed Google Scholar
Hillis, D.M., Heath, T.A. & St. John, K. Analysis and visualization of tree space. Syst. Biol. 54, 471–482 (2005). A demonstration of different kinds of tree visualization, and an examination of how spatial techniques such as multidimensional scaling can be used to visualize and compare ensembles of trees.
PubMed Google Scholar
Page, R.D. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12, 357–358 (1996).
CAS PubMed Google Scholar
Munzner, T., Guimbretiere, F., Tasiran, S., Zhang, L. & Zhou, Y. TreeJuxtaposer: scalable tree comparison using focus+context with guaranteed visibility. ACM Trans. Graph. 22, 453–462 (2003).
Google Scholar
Kumar, S., Nei, M., Dudley, J. & Tamura, K. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief. Bioinform. 9, 299–306 (2008).
CAS PubMed Google Scholar
Huson, D.H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2006). Describes the phylogenetic network visualization approach implemented in SplitsTree4, where evolutionary distance and bootstrap support are represented in one network structure, rather than an annotated tree.
CAS PubMed Google Scholar
Milne, I. et al. TOPALi v2: a rich graphical interface for evolutionary analyses of multiple alignments on HPC clusters and multi-core desktops. Bioinformatics 25, 126–127 (2009).
CAS PubMed Google Scholar
Jordan, G.E. & Piel, W.H. PhyloWidget: web-based visualizations for the tree of life. Bioinformatics 24, 1641–1642 (2008).
CAS PubMed Google Scholar
Prlić, A. et al. Integrating sequence and structural biology with DAS. BMC Bioinformatics 8, 333 (2007).
PubMed PubMed Central Google Scholar
Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
CAS PubMed PubMed Central Google Scholar
Thompson, J.D. et al. MACSIMS: multiple alignment of complete sequences information management system. BMC Bioinformatics 7, 318 (2006).
PubMed PubMed Central Google Scholar
Barrell, D. et al. The GOA database in 2009–an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 37, D396–D403 (2009).
CAS PubMed Google Scholar
The Gene Ontology's Reference Genome Project. A unified framework for functional annotation across species. PLoS Comput. Biol. 5, e1000431 (2009).
Reeves, G.A. et al. The Protein Feature Ontology: a tool for the unification of protein feature annotations. Bioinformatics 24, 2767–2772 (2008).
CAS PubMed Google Scholar
Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
PubMed PubMed Central Google Scholar
Sayers, E.W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 37, D5–D15 (2009).
CAS PubMed Google Scholar
Holder, M. & Lewis, P.O. Phylogeny estimation: traditional and Bayesian approaches. Nat. Rev. Genet. 4, 275–284 (2003).
CAS PubMed Google Scholar
Swofford, D.L., Olsen, G.J., Waddell, P.J. & Hillis, D.M. Phylogenetic inference. in Molecular Systematics (eds. Hillis, D.M., Moritz, C. & Mable, B.K.) 407–514 (Sinauer, Sunderland, Massachusetts, USA, 1996).
Google Scholar
Felsenstein, J. Inferring Phylogenies (Sinauer, Sunderland, Massachusetts, USA, 2004).
Google Scholar
Felsenstein, J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
CAS PubMed Google Scholar
Huelsenbeck, J.P., Ronquist, F., Nielsen, R. & Bollback, J.P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001).
CAS PubMed Google Scholar

Download references

Acknowledgements

J.B.P. acknowledges the support of the ENFIN European Network of Excellence (contract LSHG-CT-2005-518254) awarded to G.J.B. Several tools were made available as prereleases to the authors for evaluation purposes, and we thank the individuals and companies who obliged our requests.

Author information

Authors and Affiliations

School of Life Sciences Research, College of Life Sciences, University of Dundee, Dundee, UK
James B Procter & Geoffrey J Barton
Institute of Genetics and Molecular and Cellular Biology (IGBMC), Strasbourg, France
Julie Thompson
European Molecular Biology Laboratory, Heidelberg, Germany
Ivica Letunic
Animal Bioscience Centre, Teagasc, Ireland
Chris Creevey
Architecture et réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du Centre National de la Recherche Scientifique (CNRS), Strasbourg, France
Fabrice Jossinet

Authors

James B Procter
View author publications
You can also search for this author in PubMed Google Scholar
Julie Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Ivica Letunic
View author publications
You can also search for this author in PubMed Google Scholar
Chris Creevey
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Jossinet
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey J Barton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James B Procter.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 (PDF 2274 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Procter, J., Thompson, J., Letunic, I. et al. Visualization of multiple alignments, phylogenies and gene family evolution. Nat Methods 7 (Suppl 3), S16–S25 (2010). https://doi.org/10.1038/nmeth.1434

Download citation

Published: 01 March 2010
Issue Date: March 2010
DOI: https://doi.org/10.1038/nmeth.1434

This article is cited by

Using sound to understand protein sequence data: new sonification algorithms for protein sequences and multiple sequence alignments
- Edward J. Martin
- Thomas R. Meagher
- Daniel Barker
BMC Bioinformatics (2021)
Replacing the eleven native tryptophans by directed evolution produces an active P-glycoprotein with site-specific, non-conservative substitutions
- Douglas J. Swartz
- Anukriti Singh
- Ina L. Urbatsch
Scientific Reports (2020)
Integrated visual analysis of protein structures, sequences, and feature data
- Christian Stolte
- Kenneth S Sabir
- Seán I O'Donoghue
BMC Bioinformatics (2015)
Sequence analysis reveals a conserved extension in the capping enzyme of the alphavirus supergroup, and a homologous domain in nodaviruses
- Tero Ahola
- David G Karlin
Biology Direct (2015)
Mu-8: visualizing differences between proteins and their families
- Johnathan D Mercer
- Balaji Pandian
- Hanspeter Pfister
BMC Proceedings (2014)