Review

Heredity (2009) 102, 330–341; doi:10.1038/hdy.2008.130; published online 21 January 2009

Genetic markers in the playground of multivariate analysis

T Jombart1, D Pontier1 and A-B Dufour1

1Université de Lyon, F-69000, Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France

Correspondence: Dr T Jombart, UMR CNRS 5558–LBBE, ‘Biométrie et Biologie Évolutive’, UCB Lyon 1—Bât. Grégor Mendel, 43 bd du 11 novembre 1918, 69622 VILLEURBANNE cedex, France. E-mail: jombart@biomserv.univ-lyon1.fr

Received 25 July 2008; Revised 24 November 2008; Accepted 8 December 2008; Published online 21 January 2009.

Top

Abstract

Multivariate analyses such as principal component analysis were among the first statistical methods employed to extract information from genetic markers. From their early applications to current innovations, these approaches have proven to be efficient for the analysis of the genetic variability in various contexts such as human genetics, conservation and adaptation studies. However, because multivariate analysis is a wide and diversified area of statistics, choosing a method appropriate to both the data and to the question being asked can be difficult. Moreover, some particularities of genetic markers need to be taken into account when using multivariate methods. As a consequence, multivariate analyses are often used as black boxes, which results in frequent mistakes in the literature. In this review, we provide a critical analysis of the application of multivariate methods to genetic markers, using a general framework that unifies all these methods for the sake of clarity. First, we focus on some common mistakes in these applications and ways to avoid these pitfalls. We then detail the most critical particularities of allele frequencies that demand adaptations of multivariate methods, and we propose solutions to the subsequent problems. Finally, we tackle several questions of interest in which multivariate analysis has a great role to play, such as the study of the typological coherence of different genetic markers, or the investigation of spatial genetic patterns.

Keywords:

multivariate analysis, ordination, genetic markers, principal component analysis, statistics, methods

Top

MORE ARTICLES LIKE THIS

These links to content published by NPG are automatically generated