Sequencing studies in human genetics: design and interpretation

Goldstein, David B.; Allen, Andrew; Keebler, Jonathan; Margulies, Elliott H.; Petrou, Steven; Petrovski, Slavé; Sunyaev, Shamil

doi:10.1038/nrg3455

Review Article
Published: 11 June 2013

Sequencing studies in human genetics: design and interpretation

David B. Goldstein¹,
Andrew Allen^1,2,
Jonathan Keebler¹,
Elliott H. Margulies³,
Steven Petrou^4,5,
Slavé Petrovski^1,6 &
…
Shamil Sunyaev⁷

Nature Reviews Genetics volume 14, pages 460–470 (2013)Cite this article

23k Accesses
184 Citations
38 Altmetric
Metrics details

Subjects

Disease genetics

Key Points

The interpretation of next-generation sequencing data is technically and conceptually much more challenging than the data used in genome-wide association studies.
Minimizing false-positive signals in sequencing studies depends on careful management of the overall work flow and, in particular, on appropriate statistical criteria used to support claims of significant association.
One key feature in the interpretation of sequence data is that most researchers currently distinguish among variants in their prior probabilities of influencing disease, either implicitly or explicitly. Considerable development of appropriate ways to do this, however, is still required.
Population genetic, phylogenetic and other data sources can help to establish frameworks for distinguishing among the prior probabilities of variants influencing disease.
Although establishing appropriate statistical criteria for interpreting sequence data remains a work in progress, good study designs mandate careful consideration and appropriate correction for the real number of tests that are inherent in any given study design.
Interpretation of sequence data should always take into account the narrative potential that is inherent in any human genome, in that all genomes carry many functional and probably deleterious (in an evolutionary sense) rare variants that could be used to argue that the mutations influence traits of interest.
Whereas functional characterization of pathogenic mutations is essential in order to derive translational benefits from genetic discoveries, functional characterization should not be used to buttress weak statistical arguments for pathogenicity. In general, with only narrowly defined exceptions, evidence of pathogenicity should come from the genetics alone.

Abstract

Next-generation sequencing is becoming the primary discovery tool in human genetics. There have been many clear successes in identifying genes that are responsible for Mendelian diseases, and sequencing approaches are now poised to identify the mutations that cause undiagnosed childhood genetic diseases and those that predispose individuals to more common complex diseases. There are, however, growing concerns that the complexity and magnitude of complete sequence data could lead to an explosion of weakly justified claims of association between genetic variants and disease. Here, we provide an overview of the basic workflow in next-generation sequencing studies and emphasize, where possible, measures and considerations that facilitate accurate inferences from human sequencing studies.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Eric Vallabh Minikel, Jeffery L. Painter, … Matthew R. Nelson

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

References

Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
CAS PubMed PubMed Central Google Scholar
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008). This influential Review compiles into one paper the basics of doing a GWAS, including best practice guidelines, such as controlling for population stratification. The Review also reinforces the universally followed guideline of 5 × 10⁻⁸ as a threshold for significance in GWAS.
Article CAS PubMed Google Scholar
Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).
Article PubMed Google Scholar
Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Rev. Genet. 11, 415–425 (2010).
Article CAS PubMed Google Scholar
Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011).
Article CAS PubMed Google Scholar
Meyerson, M., Gabriel, S. & Getz, G. Advances in understanding cancer genomes through second-generation sequencing. Nature Rev. Genet. 11, 685–696 (2010).
Article CAS PubMed Google Scholar
Ding, L., Wendl, M. C., Koboldt, D. C. & Mardis, E. R. Analysis of next-generation genomic data in cancer: accomplishments and challenges. Hum. Mol. Genet. 19, R188–R196 (2010).
Article CAS PubMed PubMed Central Google Scholar
Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotech. 26, 1135–1145 (2008).
Article CAS Google Scholar
Ajay, S. S., Parker, S. C., Abaan, H. O., Fajardo, K. V. & Margulies, E. H. Accurate and comprehensive sequencing of personal genomes. Genome Res. 21, 1498–1505 (2011).
Article PubMed PubMed Central Google Scholar
Genomes Project, C. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Article CAS Google Scholar
Wendl, M. C. & Wilson, R. K. The theory of discovering rare variants via DNA sequencing. BMC Genomics 10, 485 (2009).
Article CAS PubMed PubMed Central Google Scholar
Need, A. C. et al. Clinical application of exome sequencing in undiagnosed genetic conditions. J. Med. Genet. 49, 353–361 (2012). This is the first study that estimates the 'success rate' of getting a genetic diagnosis through whole-exome sequencing of undiagnosed conditions in a real clinical setting considering 12 children with a broad range of severe childhood genetic conditions. The primary conclusion is that the success rate is remarkably high but depends in many cases on functional characterization of previously unidentified mutations in already known disease genes.
Article CAS PubMed Google Scholar
Heinzen, E. L. et al. Exome sequencing followed by large-scale genotyping fails to identify single rare variants of large effect in idiopathic generalized epilepsy. Am. J. Hum. Genet. 91, 293–302 (2012). The largest epilepsy exome-sequencing study to date is reported in this paper. The results suggest high locus and allelic heterogeneity for both disorders, requiring larger sample sizes.
Article CAS PubMed PubMed Central Google Scholar
Need, A. C. et al. Exome sequencing followed by large-scale genotyping suggests a limited role for moderately rare risk factors of strong effect in schizophrenia. Am. J. Hum. Genet. 91, 303–312 (2012). The largest schizophrenia exome-sequencing study to date is reported in this paper. The results suggest high locus and allelic heterogeneity for both disorders, requiring larger sample sizes.
Article CAS PubMed PubMed Central Google Scholar
Zhu, M. et al. Using ERDS to infer copy-number variants in high-coverage genomes. Am. J. Hum. Genet. 91, 408–421 (2012).
Article CAS PubMed PubMed Central Google Scholar
Heinzen, E. L. et al. De novo mutations in ATP1A3 cause alternating hemiplegia of childhood. Nature Genet. 44, 1030–1034 (2012).
Article CAS PubMed Google Scholar
Li, B. et al. A likelihood-based framework for variant calling and de novo mutation detection in families. PLoS Genet. 8, e1002944 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Rev. Genet. 12, 443–451 (2011).
CAS PubMed Google Scholar
Flicek, P. & Birney, E. Sense from sequence reads: methods for alignment and assembly. Nature Methods 6, S6–S12 (2009).
Article CAS PubMed Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011). This paper describes what has become the most widely used variant-calling environment.
Article CAS PubMed Google Scholar
Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Improving SNP discovery by base alignment quality. Bioinformatics 27, 1157–1158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Meacham, L. R. et al. Diabetes mellitus in long-term survivors of childhood cancer. Increased risk associated with radiation therapy: a report for the childhood cancer survivor study. Arch. Intern. Med. 169, 1381–1388 (2009).
Article PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Neale, B. M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012). This paper was one of the first to analyse a large number of patients with a common disease using a trio design. Importantly, the authors established a formal framework for assessing whether excess de novo mutations are observed over expectation under the null hypothesis and found that autism genomes carry only modest excess of such mutations.
Article CAS PubMed PubMed Central Google Scholar
Chen, W. et al. Genotype calling and haplotyping in parent-offspring trios. Genome Res. 23, 142–151 (2013).
Article CAS PubMed PubMed Central Google Scholar
Conrad, D. F. et al. Variation in genome-wide mutation rates within and between human families. Nature Genet. 43, 712–714 (2011).
Article CAS PubMed Google Scholar
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009).
Article CAS PubMed Google Scholar
Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature Methods 7, 576–577 (2010).
Article CAS PubMed PubMed Central Google Scholar
Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).
Article CAS PubMed PubMed Central Google Scholar
de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012).
Article CAS PubMed Google Scholar
Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674–1682 (2012).
Article CAS PubMed Google Scholar
Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).
Article CAS PubMed PubMed Central Google Scholar
O'Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Saunders, C. J. et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci. Transl. Med. 4, 154ra135 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bell, C. J. et al. Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci. Transl. Med. 3, 65ra4 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Press, 1983).
Book Google Scholar
Sim, N. L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Stone, E. A. & Sidow, A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res. 15, 978–986 (2005).
Article CAS PubMed PubMed Central Google Scholar
Jordan, D. M., Ramensky, V. E. & Sunyaev, S. R. Human allelic variation: perspective from protein function, structure, and evolution. Curr. Opin. Struct. Biol. 20, 342–350 (2010).
Article CAS PubMed PubMed Central Google Scholar
Schwarz, J. M., Rodelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods 7, 575–576 (2010).
Article CAS PubMed Google Scholar
Hicks, S., Wheeler, D. A., Plon, S. E. & Kimmel, M. Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum. Mutat. 32, 661–668 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Rev. Genet. 12, 628–640 (2011). A comprehensive Review is presented here of the priors, such as evolutionary knowledge, in silico protein effect assessment and others, that can be used to prioritize variants on the basis of putative damaging impact scores.
Article CAS PubMed Google Scholar
Bustamante, C. D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005).
Article CAS PubMed Google Scholar
Asthana, S. et al. Widely distributed noncoding purifying selection in the human genome. Proc. Natl Acad. Sci. USA 104, 12410–12415 (2007).
Article CAS PubMed PubMed Central Google Scholar
Stenson, P. D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003).
Article CAS PubMed Google Scholar
Morgenthaler, S. & Thilly, W. G. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat. Res. 615, 28–56 (2007).
Article CAS PubMed Google Scholar
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).
Article CAS PubMed PubMed Central Google Scholar
Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).
Article PubMed PubMed Central Google Scholar
Neale, B. M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lin, D. Y. & Tang, Z. Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011).
Article CAS PubMed PubMed Central Google Scholar
Basu, S. & Pan, W. Comparison of statistical tests for disease association with rare variants. Genet. Epidemiol. 35, 606–619 (2011).
Article PubMed PubMed Central Google Scholar
Bansal, V., Libiger, O., Torkamani, A. & Schork, N. J. Statistical analysis strategies for association studies involving rare variants. Nature Rev. Genet. 11, 773–785 (2010).
Article CAS PubMed Google Scholar
Stitziel, N. O., Kiezun, A. & Sunyaev, S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol. 12, 227 (2011).
Article PubMed PubMed Central Google Scholar
Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nature Genet. 44, 623–630 (2012).
Article CAS PubMed Google Scholar
Ladouceur, M., Dastani, Z., Aulchenko, Y. S., Greenwood, C. M. & Richards, J. B. The empirical power of rare variant association methods: results from Sanger sequencing in 1,998 individuals. PLoS Genet. 8, e1002496 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Q. et al. A genome-wide comparison of the functional properties of rare and common genetic variants in humans. Am. J. Hum. Genet. 88, 458–468 (2011).
Article CAS PubMed PubMed Central Google Scholar
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Article CAS PubMed Google Scholar
Harrison, P. J. & Weinberger, D. R. Schizophrenia genes, gene expression, and neuropathology: on the matter of their convergence. Mol. Psychiatry 10, 40–68 (2005).
Article CAS PubMed Google Scholar
Prathikanti, S. & Weinberger, D. R. Psychiatric genetics—the new era: genetic research and some clinical implications. Br. Med. Bull. 73–74, 107–122 (2005).
Mutsuddi, M. et al. Analysis of high-resolution HapMap of DTNBP1 (Dysbindin) suggests no consistency between reported common variant associations and schizophrenia. Am. J. Hum. Genet. 79, 903–909 (2006).
Article CAS PubMed PubMed Central Google Scholar
Need, A. C. et al. A genome-wide investigation of SNPs and CNVs in schizophrenia. PLoS Genet. 5, e1000373 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hoefen, R. et al. In silico cardiac risk assessment in patients with long QT syndrome: type 1: clinical predictability of cardiac models. J. Am. Coll. Cardiol 60, 2182–2191 (2012).
Article PubMed Google Scholar
Berecki, G., Zegers, J. G., Wilders, R. & Van Ginneken, A. C. Cardiac channelopathies studied with the dynamic action potential-clamp technique. Methods Mol. Biol. 403, 233–250 (2007).
Article CAS PubMed Google Scholar
Zareba, W., Moss, A. J. & le Cessie, S. Dispersion of ventricular repolarization and arrhythmic cardiac death in coronary artery disease. Am. J. Cardiol. 74, 550–553 (1994).
Article CAS PubMed Google Scholar
Redfern, W. S. et al. Relationships between preclinical cardiac electrophysiology, clinical QT interval prolongation and torsade de pointes for a broad range of drugs: evidence for a provisional safety margin in drug development. Cardiovasc. Res. 58, 32–45 (2003).
Article CAS PubMed Google Scholar
Di Ventura, B., Lemerle, C., Michalodimitrakis, K. & Serrano, L. From in vivo to in silico biology and back. Nature 443, 527–533 (2006).
Article CAS PubMed Google Scholar
Reid, C. A. et al. Multiple molecular mechanisms for a single GABAA mutation in epilepsy. Neurology 80, 1003–1008 (2013). This paper uses an animal model to provide remarkable resolution in dissecting how a single mutation can result in two distinct clinical manifestations with one seizure type resulting from haploinsufficiency and the other from a distinct gain of function.
Article CAS PubMed PubMed Central Google Scholar
Freimuth, J. et al. Epistatic interactions between Tgfb1 and genetic loci, Tgfbm2 and Tgfbm3, determine susceptibility to an asthmatic stimulus. Proc. Natl Acad. Sci. USA 109, 18042–18047 (2012).
Article PubMed PubMed Central Google Scholar
Lehner, B. Genotype to phenotype: lessons from model organisms for human genetics. Nature Rev Genet. 14, 168–178 (2013).
Article CAS PubMed Google Scholar
Tiscornia, G., Vivas, E. L. & Izpisua Belmonte, J. C. Diseases in a dish: modeling human genetic disorders using induced pluripotent cells. Nature Med. 17, 1570–1576 (2011).
Article CAS PubMed Google Scholar
Overington, J. P., Al-Lazikani, B. & Hopkins, A. L. How many drug targets are there? Nature Rev. Drug Discov. 5, 993–996 (2006).
Article CAS Google Scholar
Consortium, E. P. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article CAS Google Scholar
Ge, D. et al. SVA: software for annotating and visualizing sequenced human genomes. Bioinformatics 27, 1998–2000 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316–1323 (2009).
Article CAS PubMed PubMed Central Google Scholar
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article CAS PubMed PubMed Central Google Scholar
Henikoff, S. & Henikoff, J. G. Amino acid substitution matrices from protein blocks. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992).
Article CAS PubMed PubMed Central Google Scholar
Choi, J. W., Kang, D. K., Park, H., deMello, A. J. & Chang, S. I. High-throughput analysis of protein-protein interactions in picoliter-volume droplets using fluorescence polarization. Anal. Chem. 84, 3849–3854 (2012).
Article CAS PubMed Google Scholar
Ghosh, S., Matsuoka, Y., Asai, Y., Hsin, K. Y. & Kitano, H. Software for systems biology: from tools to integrated platforms. Nature Rev. Genet. 12, 821–832 (2011).
Article CAS PubMed Google Scholar
Ashcroft, F. M. From molecule to malady. Nature 440, 440–447 (2006).
Article CAS PubMed Google Scholar
Owens, J. Determining druggability. Nature Rev. Drug Discov. 6, 187 (2007).
Article CAS Google Scholar
Marth, G. T. et al. A general approach to single-nucleotide polymorphism discovery. Nature Genet. 23, 452–456 (1999).
Article CAS PubMed Google Scholar
Bruce, H. A. et al. Long tandem repeats as a form of genomic copy number variation: structure and length polymorphism of a chromosome 5p repeat in control and schizophrenia populations. Psychiatr. Genet. 19, 64–71 (2009).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank the reviewers for their helpful comments. D.B.G. thanks L. Biesecker (NGHRI) for helpful discussions that contributed to the development of this Review. S.P. is a National Health and Medical Research Council (NHMRC) CJ Martin Fellow.

Author information

Authors and Affiliations

Center for Human Genome Variation, Duke University School of Medicine, 308 Research Drive, Box 91009, LSRC B Wing, Room 330, Durham, 27708, North Carolina, USA
David B. Goldstein, Andrew Allen, Jonathan Keebler & Slavé Petrovski
Department of Biostatistics and Bioinformatics, Duke University Medical Center, 2424 Erwin Road, Suite 1102, Hock Plaza, Box 2721, Durham, 27710, North Carolina, USA
Andrew Allen
Illumina Cambridge, Chesterford Research Park, Little Chesterford, Saffron Walden, CB10 1XL, UK
Elliott H. Margulies
Florey Institute of Neuroscience and Mental Health, Melbourne Brain Centre, 30 Royal Parade, University of Melbourne, Parkville, Victoria, 3010, Australia
Steven Petrou
Centre for Neural Engineering, Old Engineering Building, University of Melbourne, Parkville, Victoria, 3010, Australia
Steven Petrou
Departments of Medicine, Austin Health and Royal Melbourne Hospital, University of Melbourne, Austin Hospital, 145 Studley Road, Heidelberg, Victoria, 3084, Australia
Slavé Petrovski
Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, 75 Francis Street, Boston, 02115, Massachusetts, USA
Shamil Sunyaev

Authors

David B. Goldstein
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Allen
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Keebler
View author publications
You can also search for this author in PubMed Google Scholar
Elliott H. Margulies
View author publications
You can also search for this author in PubMed Google Scholar
Steven Petrou
View author publications
You can also search for this author in PubMed Google Scholar
Slavé Petrovski
View author publications
You can also search for this author in PubMed Google Scholar
Shamil Sunyaev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David B. Goldstein.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

PowerPoint slide for Table 1

Glossary

Priors: Used to reflect assumptions about the involvement of different classes of mutations before the evidence available from a given study is considered.
Cluster density: The density of clonal double-stranded DNA fragment clusters bound to an Illumina flow cell, typically expressed as clusters per mm². It is used as a quality-control metric early during the sequencing reaction: low cluster densities will result in a lower sequencing yield in the resulting fastq library, whereas very high cluster densities will result in poor sequence quality.
Locus heterogeneity: Refers to the number of different genes in the genome that can carry mutations that influence risk of given disease.
Allelic heterogeneity: Refers to the number of different mutations at a single gene that can influence risk of disease.
Structural variation: Occurs in DNA regions generally greater than 1 kb in size, and includes genomic imbalances (namely, insertions and deletions (also known as copy number variants)), inversions and translocations.
De novo mutations: Non-inherited novel mutations in an individual that result from a germline mutation.
Indel: An alternative form of genetic variation to single- nucleotide variants that represents small insertion and deletion mutations.
Insert size: The length of the fragmented sequence between ligate adaptors. In paired-end sequencing, the insert size generally ranges from 200 to 500 bp.
Batch effects: Differences observed for samples that are experimentally handled in different ways that are unrelated to the biological or scientific variables being studied. If batch effects are not properly accounted for in sequence studies, they can generate false signals of association between genetic variation and the traits under study.
Library: The collection of processed genome fragments that are prepared for sequencing. In a bioinformatics context, the term may also generally refer to the set of sequences found in a single fastq file.
Variant call format files: (VCF files). A flexible text file format developed within the 1000 Genomes Project that contains data specific to one or more genomic sites, including site coordinates, reference allele, observed alternative allele (or alleles) and base-call quality metrics (see Further information).
Polymorphism-to-divergence ratios: Comparing sequence divergence across species with population polymorphism data (for example, McDonald–Kreitman test) facilitates identifying where selective forces are acting on the genomic sequence.
Site frequency spectra: Reflecting the distribution of allele frequencies. They are defined by the number of sites that has each of the possible allele frequencies. Different forms of selection perturb the site frequency spectrum in known ways.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goldstein, D., Allen, A., Keebler, J. et al. Sequencing studies in human genetics: design and interpretation. Nat Rev Genet 14, 460–470 (2013). https://doi.org/10.1038/nrg3455

Download citation

Published: 11 June 2013
Issue Date: July 2013
DOI: https://doi.org/10.1038/nrg3455

This article is cited by

Potentials of single nucleotide polymorphisms and genetic diversity studies at HSP90AB1 gene in Nigerian White Fulani, Muturu, and N’Dama cattle breeds
- John S. De Campos
- Gbolabo O. Onasanya
- Christian O. Ikeobi
Tropical Animal Health and Production (2024)
Clinical and genetic analyses of a Swedish patient series diagnosed with ataxia
- Sorina Gorcenco
- Efthymia Kafantari
- Andreas Puschmann
Journal of Neurology (2024)
Novel prostate cancer susceptibility gene SP6 predisposes patients to aggressive disease
- Csilla Sipeky
- Teuvo L. J. Tammela
- Johanna Schleutker
Prostate Cancer and Prostatic Diseases (2021)
dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs
- Xiaoming Liu
- Chang Li
- Yicheng Tu
Genome Medicine (2020)
Sodium sensitivity of blood pressure in Chinese populations
- Yang Liu
- Mengyao Shi
- Jiang He
Journal of Human Hypertension (2020)