Modern computational approaches for analysing molecular genetic variation data

Marjoram, Paul; Tavaré, Simon

doi:10.1038/nrg1961

Review Article
Published: 01 October 2006

Modern computational approaches for analysing molecular genetic variation data

Paul Marjoram¹ &
Simon Tavaré^1,2,3

Nature Reviews Genetics volume 7, pages 759–770 (2006)Cite this article

2979 Accesses
160 Citations
1 Altmetric
Metrics details

Key Points

An explosive growth is occurring in both the quantity of molecular data that are being collected and the efficiency of the computational machinery that is commonly used to analyse those data.
One of the traditional analytical paradigms has been based on models that are designed to capture the key features of the evolutionary processes.
A variety of approaches exist, and the choice of the most appropriate method, and model, depends on the features of the problem of interest.
The rapid growth in the size of data leads to an increasing computational burden for existing methods. In many cases this burden becomes overwhelming.
This has motivated a move away from exact methods (often because exact answers cannot be calculated) and towards more approximate methods. The principle is that it is better to obtain a rough answer than to seek an exact answer that cannot be computed in a reasonable time.
There will be a continuing trend to move away from exact methods and towards approximate methods as the quantity and complexity of data continue to grow.
Unfortunately, there is no 'one-size-fits-all' computational analysis method. We discuss a range of methods, but the performance of each will vary from problem to problem.

Abstract

An explosive growth is occurring in the quantity, quality and complexity of molecular variation data that are being collected. Historically, such data have been analysed by using model-based methods. Models are useful for sharpening intuition, for explanation and for prediction: they add to our understanding of how the data were formed, and they can provide quantitative answers to questions of interest. We outline some of these model-based approaches, including the coalescent, and discuss the applicability of the computational methods that are necessary given the highly complex nature of current and future data sets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

A method for genome-wide genealogy estimation for thousands of samples

Article 02 September 2019

Leo Speidel, Marie Forest, … Simon R. Myers

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

Article Open access 29 August 2022

Eran Elhaik

The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method

Article Open access 04 August 2020

Joshua M. Miller, Catherine I. Cullingham & Rhiannon M. Peery

References

Hubby, L. & Lewontin, R. C. A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. Genetics 54, 577–594 (1966).
CAS PubMed PubMed Central Google Scholar
Jeffreys, A. J. DNA sequence variants in the Gγ-, Aγ-, Δ- and β-globin genes. Cell 18, 1–10 (1979).
CAS PubMed Google Scholar
Kan, Y. W. & Dozy, A. M. Polymorphism of DNA sequence adjacent to human β-globin structural gene: relationship to sickle mutation. Proc. Natl Acad. Sci. USA 75, 5631–5635 (1978).
CAS PubMed PubMed Central Google Scholar
Kreitman, M. Nucleotide polymorphism at the alcohol-dehydrogenase locus of Drosophila melanogaster. Nature 304, 412–417 (1983).
CAS PubMed Google Scholar
Cann, R. L., Stoneking, M. & Wilson, A. C. Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987).
CAS PubMed Google Scholar
Ward, R. H., Frazier, B. L., Dew-Jager, K. & Pääbo, S. Extensive mitochondrial diversity within a single Amerindian tribe. Proc. Natl Acad. Sci. USA 88, 8720–8724 (1991).
CAS PubMed PubMed Central Google Scholar
Whitfield, L. S., Sulston, J. E. & Goodfellow, P. N. Sequence variation of the human Y chromosome. Nature 378, 379–380 (1995).
CAS PubMed Google Scholar
Dorit, R. L., Akashi, H. & Gilbert, W. Absence of polymorphism at the ZFY locus on the human Y chromosome. Science 268, 1183–1185 (1995).
CAS PubMed Google Scholar
Jorde, L. B. et al. The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y chromosome data. Am. J. Hum. Genet. 66, 979–988 (2000).
CAS PubMed PubMed Central Google Scholar
Rosenberg, N. A. et al. Genetic structure of human populations. Science 298, 2381–2385 (2002).
CAS PubMed Google Scholar
Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, 1289–1299 (2005).
CAS Google Scholar
Altshuler, D. et al. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Google Scholar
Yu, J. & Buckler, E. S. Genetic association mapping and genome organization of maize. Curr. Opin. Biotechnol. 17, 155–160 (2006).
CAS PubMed Google Scholar
Provine, W. B. The Origins of Theoretical Population Genetics (Univ. Chicago Press, Chicago; London, 1971).
Google Scholar
Ewens, W. J. Mathematical Population Genetics (Springer, Berlin; Heidelberg; New York, 1979). Describes the state-of-the-art in population genetics theory before the appearance of the coalescent.
Google Scholar
Slatkin, M. & Hudson, R. R. Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129, 555–562 (1991).
CAS PubMed PubMed Central Google Scholar
Kingman, J. F. C. On the genealogy of large populations. J. Appl. Prob. 19A, 27–43 (1982). Introduces the coalescent as a way of exploiting ancestry in population genetics models.
Google Scholar
Kingman, J. F. C. The coalescent. Stochastic Proc. App. 13, 235–248 (1982).
Google Scholar
Hudson, R. R. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983). Introduces the coalescent with recombination.
CAS PubMed Google Scholar
Hudson, R. R. in Oxford Surveys in Evolutionary Biology (eds Futuyma, D. & Antonovics, J.) (Oxford Univ. Press, New York, 1991).
Google Scholar
Donnelly, P. & Tavaré, S. Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29, 401–421 (1995).
CAS PubMed Google Scholar
Nordborg, M. in Handbook of Statistical Genetics (eds Balding, D. J., Bishop, M. J. & Cannings, C.) (John Wiley & Sons, New York, 2001).
Google Scholar
Hudson, R. R. Generating samples under a Wright–Fisher neutral model. Bioinformatics 18, 337–338 (2002).
CAS PubMed Google Scholar
McVean, G. A. T. & Cardin, N. J. Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond. B 360, 1387–1393 (2005).
CAS Google Scholar
Marjoram, P. & Wall, J. D. Fast 'coalescent' simulation. BMC Genetics 7, 16 (2006).
PubMed PubMed Central Google Scholar
Peng, B. & Kimmel, M. simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21, 3686–3687 (2005).
CAS PubMed Google Scholar
Ewens, W. J. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972). The first rigorous statistical treatment of inference for molecular population genetics data.
CAS PubMed Google Scholar
Watterson, G. A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276 (1975). A classic paper that introduces the number of segregating sites as the basis of an efficient estimator for mutation rate.
CAS PubMed Google Scholar
Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595 (1989).
CAS PubMed PubMed Central Google Scholar
Griffiths, R. C. & Tavaré, S. The age of a mutation in a general coalescent tree. Stochastic Models 14, 273–295 (1998).
Google Scholar
Slatkin, M. & Rannala, B. Estimating allele age. Annu. Rev. Genomics Hum. Genet. 1, 225–249 (2000).
CAS PubMed Google Scholar
Tavaré, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times for molecular sequence data. Genetics 145, 505–518 (1997).
PubMed PubMed Central Google Scholar
Tang, H., Siegmund, D. O., Shen, P., Oefner, P. J. & Feldman, M. W. Frequentist estimation of coalescence times from nucleotide sequence data using a tree-based partition. Genetics 161, 447–459 (2002).
CAS PubMed PubMed Central Google Scholar
Meligkotsidou, L. & Fearnhead, P. Maximum-likelihood estimation of coalescence times in genealogical trees. Genetics 171, 2073–2084 (2005).
CAS PubMed PubMed Central Google Scholar
Tavaré, S. in Case Studies in Mathematical Modeling: Ecology, Physiology, and Cell Biology (eds Othmer, H. G. et al.) (Prentice–Hall, New Jersey,1997).
Google Scholar
Diggle, P. J. & Gratton, R. J. Monte Carlo methods of inference for implicit statistical models. J. R. Stat. Soc. B 46, 193–227 (1984).
Google Scholar
Ripley, B. D. Stochastic Simulation (John Wiley & Sons, New York, 1987).
Google Scholar
Griffiths, R. C. & Tavaré, S. Simulating probability distributions in the coalescent. Theor. Popul. Biol. 46, 131–159 (1994).
Google Scholar
Griffiths, R. C. & Tavaré, S. Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. B 344, 403–410 (1994).
CAS Google Scholar
Griffiths, R. C. & Tavaré, S. Unrooted genealogical tree probabilities in the infinitely-many-sites model. Math. Biosci. 127, 77–98 (1995).
CAS PubMed Google Scholar
Griffiths, R. C. & Tavaré, S. Ancestral inference in population genetics. Stat. Sci. 9, 307–319 (1994).
Google Scholar
Griffiths, R. C. & Tavaré, S. Monte Carlo inference methods in population genetics. Math. Comput. Model. 23, 141–158 (1996).
Google Scholar
Felsenstein, J., Kuhner, M., Yamato, J. & Beerli, P. in Statistics in Molecular Biology and Genetics (ed. Seillier-Moiseiwitsch, F.) 163–185 (Hayward, California, 1999).
Google Scholar
Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Stat. Soc. B 62, 605–655 (2000).
Google Scholar
De Iorio, M. & Griffiths, R. C. Importance sampling on coalescent histories. I. Adv. Appl. Prob. 36, 417–433 (2004).
Google Scholar
Griffiths, R. C. & Marjoram, P. Ancestral inference from samples of DNA sequences with recombination. J. Comp. Biol. 3, 479–502 (1996).
CAS Google Scholar
Stephens, M. in Handbook of Statistical Genetics (eds Balding, D. J., Bishop, M. J. & Cannings, C.) 213–238 (John Wiley & Sons, New York, 2001).
Google Scholar
Liu, J. S. Monte Carlo Strategies in Scientific Computing (Springer, New York, 2001).
Google Scholar
De Iorio, M. & Griffiths, R. C. Importance sampling on coalescent histories. II. Subdivided population models. Adv. Appl. Prob. 36, 434–454 (2004).
Google Scholar
De Iorio, M., Griffiths, R. C., Lebois, R. & Rousset, F. Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theor. Popul. Biol. 68, 41–53 (2005).
PubMed Google Scholar
Chen, Y. & Xie, J. Stopping-time resampling for sequential Monte Carlo methods. J. R. Stat. Soc. B 67, 199–217 (2005).
Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. & Teller, E. Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953).
CAS Google Scholar
Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970).
Google Scholar
Cowles, M. K. & Carlin, B. P. Markov chain Monte Carlo diagnostics: a comparative review. J. Am. Stat. Assoc. 91, 883–904 (1995).
Google Scholar
Brooks, S. P. & Roberts, G. O. Assessing convergence of Markov chain Monte Carlo algorithms. Stat. Comput. 8, 319–335 (1998).
Google Scholar
Wilson, I. J. & Balding, D. J. Genealogical inference from microsatellite data. Genetics 150, 499–510 (1998).
CAS PubMed PubMed Central Google Scholar
Nielsen, R. & Palsboll, P. J. Single-locus tests of microsatellite evolution: multi-step mutations and constraints on allele size. Mol. Phylogenet. Evol. 11, 477–484 (1999).
CAS PubMed Google Scholar
Markovtsova, L., Marjoram, P. & Tavaré, S. The age of a unique event polymorphism. Genetics 156, 401–409 (2000).
CAS PubMed PubMed Central Google Scholar
Markovtsova, L., Marjoram, P. & Tavaré, S. The effects of rate variation on ancestral inference in the coalescent. Genetics 156, 1427–1436 (2000).
CAS PubMed PubMed Central Google Scholar
Nielsen, R. & Wakeley, J. W. Distinguishing migration from isolation: an MCMC approach. Genetics 158, 885–896 (2001).
CAS PubMed PubMed Central Google Scholar
Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001).
CAS PubMed PubMed Central Google Scholar
Fearnhead, P. & Donnelly, P. Approximate likelihood methods for estimating local recombination rates. J. R. Stat. Soc. B 64, 657–680 (2002).
Google Scholar
Li, N. & Stephens, M. Modelling linkage disequilibrium, and identifying recombination hotspots using SNP data. Genetics 165, 2213–2233 (2003). An early application of the ABC idea; it is used here to construct tractable approximations to more complex evolutionary models.
CAS PubMed PubMed Central Google Scholar
Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003).
CAS PubMed Google Scholar
Thorne, J. L., Kishino, H. & Felsenstein, J. Inching towards reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3–16 (1992).
CAS PubMed Google Scholar
Felsenstein, J. Evolutionary trees from DNA sequence data: a maximum likelihood approach. J. Mol. Evol. 17, 368–376 (1981).
CAS PubMed Google Scholar
Geyer, C. J. in Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (ed. Keramidas, E. M.) (Interface Foundation, Fairfax Station, 1991).
Google Scholar
Geyer, C. J. & Thompson, E. A. Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assoc. 90, 909–920 (1995).
Google Scholar
Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A. & Feldman, M. W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol . Biol . Evol. 16, 1791–1798 (1999).
CAS PubMed Google Scholar
Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. Markov chain Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 100, 15324–15328 (2003).
CAS PubMed PubMed Central Google Scholar
Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002). Coins the term approximate Bayesian computation, and applies it to microsatellite data.
PubMed PubMed Central Google Scholar
Bortot, P., Coles, S. G. & Sisson, S. A. Inference for stereological extremes. J. Am. Stat. Assoc. (in the press).
Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).
CAS PubMed Google Scholar
Wiuf, C. & Hein, J. The ancestry of a sample of sequences subject to recombination. Genetics 151, 1217–1228 (1999). References 73 and 74 present an elegant construction of the coalescent in the presence of recombination.
CAS PubMed PubMed Central Google Scholar
Box, G. E. P. in Robustness in Statistics (eds Launer, R. L. & Wilkinson, G. N.) (Academic Press, New York, 1979).
Google Scholar
Schaffner, S. F. et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583 (2005). A comprehensive study that shows that the coalescent is a good model for complex evolutionary data.
CAS PubMed PubMed Central Google Scholar
Robert, C. P. & Casella, G. Monte Carlo Statistical Methods (Springer, New York, 2004).
Google Scholar
Spiegelhalter, D. J., Thomas, A., Best, N. & Lunn, D. WinBUGS Version 1.4 User Manual [online], (2003).
Google Scholar
Kuhner, M., Yamato, J. & Felsenstein, J. Estimating effective population size and mutation rate from sequence data using Metropolis–Hastings sampling. Genetics 140, 1421–1430 (1995).
CAS PubMed PubMed Central Google Scholar
Wall, J. D. A comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17, 156–163 (2000).
CAS PubMed Google Scholar
Smith, N. G. C. & Fearnhead, P. A comparison of three estimators of the population-scaled recombination rate: accuracy and robustness. Genetics 171, 2051–2062 (2005).
CAS PubMed PubMed Central Google Scholar
Hudson, R. R. Two-locus sampling distributions and their applications. Genetics 159, 1805–1817 (2001).
CAS PubMed PubMed Central Google Scholar
McVean, G. A. T. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004).
CAS PubMed Google Scholar
Beerli, P. & Felsenstein, J. Maximum likelihood estimation of migration rates and effective population numbers in two populations. Genetics 152, 763–773 (1999).
CAS PubMed PubMed Central Google Scholar
Kuhner, M., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149, 429–434 (1998).
CAS PubMed PubMed Central Google Scholar
Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000). Introduces a widely used method for inferring population structure.
CAS PubMed PubMed Central Google Scholar
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
PubMed PubMed Central Google Scholar
Pollinger, J. P. et al. Selective sweep mapping of genes with large phenotypic effects. Genome Res. 15, 1809–1819 (2006).
Google Scholar
Nordborg, M. & Tavaré, S. Linkage disequilibrium: what history has to tell us. Trends Genet. 18, 83–90 (2002).
CAS PubMed Google Scholar
Stephens, M., Smith, N. J. & Donnelly, P. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001). Describes an elegant use of the coalescent to infer haplotype phase from SNP data.
CAS PubMed PubMed Central Google Scholar
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
CAS PubMed PubMed Central Google Scholar
Crawford, D. C. et al. Evidence for substantial fine-scale variation in the recombination rate across the human genome. Nature Genet. 36, 700–706 (2004).
CAS PubMed Google Scholar
Fearnhead, P. & Smith, N. G. C. A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am. J. Hum. Genet. 77, 781–794 (2005).
CAS PubMed PubMed Central Google Scholar
Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005).
CAS PubMed Google Scholar
Tiemann-Boege, I., Calabrese, P., Cochran, D. M., Sokol, R. & Arnheim, N. High resolution recombination patterns in a region of human chromosome 21 measured by sperm typing. PLoS Genet. 2, e70 (2006).
PubMed PubMed Central Google Scholar
Balding, D. J. A tutorial on statistical methods for population association studies. Nature Rev. Genet. 7, 781–791 (2006).
CAS PubMed Google Scholar
Hein, J., Schierup, M. H. & Wiuf, C. Gene Genealogies, Variation and Evolution (Oxford Univ. Press, New York, 2005).
Google Scholar
Tavaré, S. in Lectures on Probability Theory and Statistics. Ecole d'Etés de Probabilité de Saint-Flour XXXI — 2001 (ed. Picard, J.) (Springer, Berlin; Heidelberg, 2004).
Google Scholar
Gilks, W. R., Richardson, S. & Spiegelhalter, D. J. Markov chain Monte Carlo in Practice (Chapman & Hall, London, 1996).
Google Scholar

Download references

Acknowledgements

The authors were supported in part by two grants from the US National Institutes of Health. S.T. is a Royal Society-Wolfson Research Merit Award holder. We thank the reviewers for helpful comments on an earlier version of the manuscript.

Author information

Authors and Affiliations

University of Southern California, Keck School of Medicine, Preventive Medicine, 1540 Alcazar Street, CHP-220, Los Angeles, 90089-99011, California, USA
Paul Marjoram & Simon Tavaré
Program in Molecular and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, 90089-2910, California, USA
Simon Tavaré
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Centre for Mathematical Sciences, Wilberforce Road, Cambridge, CB3 0WA, UK
Simon Tavaré

Authors

Paul Marjoram
View author publications
You can also search for this author in PubMed Google Scholar
Simon Tavaré
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Tavaré.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Restriction fragment length polymorphisms: Variations between individuals in the lengths of DNA regions that are cut by a particular endonuclease.
Microsatellite marker loci: Polymorphic loci at which short DNA sequences are repeated a varying number of times.
Stochastic model: A model that is used to describe the behaviour of a random process.
Coalescent: A popular probabilistic model for the evolution of 'individuals'. Individuals might be single nucleotides, mitochondrial DNA, chromosomes and so on, depending on the context.
Selective sweep: The increase in the frequency of an allele (and closely linked chromosomal segments) that is caused by selection for the allele. Sweeps initially reduce variation and subsequently lead to increased homozygosity.
Likelihood: The probability of the data under a particular model, viewed as a function of the parameters of that model (note that data discussed in this paper are discrete).
Mitochondrial Eve: The most recent maternal common ancestor of the entire human mitochondrial population.
Gene conversion: A non-reciprocal recombination process that results in the alteration of the sequence of a gene to that of its homologue during meiosis.
Admixture: Gene flow between differentiated populations.
Maximum likelihood: A statistical analysis in which one aims to find the parameter value that maximizes the likelihood of the data.
Test statistic: A numerical summary of the data that is used to measure support for a null hypothesis. Either the test statistic has a known probability distribution (such as χ²) under the null hypothesis, or its null distribution is approximated computationally.
Tajima's D: A statistic that compares the observed nucleotide diversity to what is expected under a neutral, constant population-sized model.
Prior distribution: The distribution of likely parameter values before any data are examined.
Posterior distribution: The distribution that is proportional to the product of the likelihood and prior distribution.
Coverage: The range of values for which the probability is non-zero.
Summary statistics: A statistic that tries to capture a complicated data set in a simpler way. An example is the use of the number of segregating sites as a surrogate for a set of DNA fragments.
Markov process: One in which the probability of the next state depends solely on the previous state, and not on the sequence of states before it.
Stationarity: The state in which a process has become independent of its starting position and has settled into its long-term behaviour. In an MCMC context, the process is typically assumed to be stationary at the end of a 'burn-in' period.
Local maxima: A local region in which a distribution takes a value that is higher than those taken at other nearby points, but which is lower than at least one value taken in some other, more distant region.
Sufficiency: The statistic S is sufficient for the parameter η if the probability of the data, given S and η , does not depend on η.
Haplotype: The sequence of bases along a single copy of (typically, part of) a chromosome.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marjoram, P., Tavaré, S. Modern computational approaches for analysing molecular genetic variation data. Nat Rev Genet 7, 759–770 (2006). https://doi.org/10.1038/nrg1961

Download citation

Issue Date: 01 October 2006
DOI: https://doi.org/10.1038/nrg1961

This article is cited by

Enumeration of binary trees compatible with a perfect phylogeny
- Julia A. Palacios
- Anand Bhaskar
- Noah A. Rosenberg
Journal of Mathematical Biology (2022)
Clonal replacement and heterogeneity in breast tumors treated with neoadjuvant HER2-targeted therapy
- Jennifer L. Caswell-Jin
- Katherine McNamara
- Christina Curtis
Nature Communications (2019)
Quantitative evidence for early metastatic seeding in colorectal cancer
- Zheng Hu
- Jie Ding
- Christina Curtis
Nature Genetics (2019)
A multivariate phylogenetic comparative method incorporating a flexible function between discrete and continuous traits
- Yuki Haba
- Nobuyuki Kutsukake
Evolutionary Ecology (2019)
Quantification of subclonal selection in cancer from bulk sequencing data
- Marc J. Williams
- Benjamin Werner
- Trevor A. Graham
Nature Genetics (2018)

Modern computational approaches for analysing molecular genetic variation data

Key Points

Abstract

Access options

Similar content being viewed by others

A method for genome-wide genealogy estimation for thousands of samples

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Enumeration of binary trees compatible with a perfect phylogeny

Clonal replacement and heterogeneity in breast tumors treated with neoadjuvant HER2-targeted therapy

Quantitative evidence for early metastatic seeding in colorectal cancer

A multivariate phylogenetic comparative method incorporating a flexible function between discrete and continuous traits

Quantification of subclonal selection in cancer from bulk sequencing data

Search

Quick links

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links