Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing

Abstract

We report the analysis of a Japanese male using high-throughput sequencing to ×40 coverage. More than 99% of the sequence reads were mapped to the reference human genome. Using a Bayesian decision method, we identified 3,132,608 single nucleotide variations (SNVs). Comparison with six previously reported genomes revealed an excess of singleton nonsense and nonsynonymous SNVs, as well as singleton SNVs in conserved non-coding regions. We also identified 5,319 deletions smaller than 10 kb with high accuracy, in addition to copy number variations and rearrangements. De novo assembly of the unmapped sequence reads generated around 3 Mb of novel sequence, which showed high similarity to non-reference human genomes and the human herpesvirus 4 genome. Our analysis suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Allelic frequency spectrum of seven genomes.
Figure 2: Distribution of the number of SNVs within 1-Mb windows of seven individuals.
Figure 3: Identification of deletions.
Figure 4: De novo assembly of unmapped reads.

References

  1. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  2. Ozaki, K. et al. Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32, 650–654 (2002).

    Google Scholar 

  3. Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

    Google Scholar 

  4. Pritchard, J.K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

    Google Scholar 

  5. Gorlov, I.P., Gorlova, O.Y., Sunyaev, S.R., Spitz, M.R. & Amos, C.I. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).

    Google Scholar 

  6. Nejentsev, S., Walker, N., Riches, D., Egholm, M. & Todd, J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324, 387–389 (2009).

    Google Scholar 

  7. Kruglyak, L. The road to genome-wide association studies. Nat. Rev. Genet. 9, 314–318 (2008).

    Google Scholar 

  8. Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).

    Google Scholar 

  9. Tucker, T., Marra, M. & Friedman, J.M. Massively parallel sequencing: the next big thing in genetic medicine. Am. J. Hum. Genet. 85, 142–154 (2009).

    Google Scholar 

  10. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Google Scholar 

  11. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Google Scholar 

  12. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Google Scholar 

  13. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Google Scholar 

  14. Ahn, S.M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622–1629 (2009).

    Google Scholar 

  15. Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 1011–1015 (2009).

    Google Scholar 

  16. McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).

    Google Scholar 

  17. Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

    Google Scholar 

  18. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  19. Yamaguchi-Kabata, Y. et al. Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. Am. J. Hum. Genet. 83, 445–456 (2008).

    Google Scholar 

  20. Abdulla, M.A. et al. Mapping human genetic diversity in Asia. Science 326, 1541–1545 (2009).

    Google Scholar 

  21. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Google Scholar 

  22. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Google Scholar 

  23. Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

    Google Scholar 

  24. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

    Google Scholar 

  25. Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Google Scholar 

  26. Zhang, J. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50, 56–68 (2000).

    Google Scholar 

  27. Tsunoda, T. et al. Variation of gene-based SNPs and linkage disequilibrium patterns in the human genome. Hum. Mol. Genet. 13, 1623–1632 (2004).

    Google Scholar 

  28. Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

    Google Scholar 

  29. Menashe, I. et al. Genetic elucidation of human hyperosmia to isovaleric acid. PLoS Biol. 5, e284 (2007).

    Google Scholar 

  30. Begun, D.J. & Aquadro, C.F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356, 519–520 (1992).

    Google Scholar 

  31. Charlesworth, B., Morgan, M.T. & Charlesworth, D. The effect of deleterious mutations on neutral molecular variation. Genetics 134, 1289–1303 (1993).

    Google Scholar 

  32. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).

    Google Scholar 

  33. Hirayasu, K. et al. Evidence for natural selection on leukocyte immunoglobulin-like receptors for HLA class I in Northeast Asians. Am. J. Hum. Genet. 82, 1075–1083 (2008).

    Google Scholar 

  34. Torkar, M. et al. Arrangement of the ILT gene cluster: a common null allele of the ILT6 gene results from a 6.7-kbp deletion. Eur. J. Immunol. 30, 3655–3662 (2000).

    Google Scholar 

  35. Hosono, N. et al. CYP2D6 genotyping for functional-gene dosage analysis by allele copy number detection. Clin. Chem. 55, 1546–1554 (2009).

    Google Scholar 

  36. Qin, J., Jones, R.C. & Ramakrishnan, R. Studying copy number variations using a nanofluidic platform. Nucleic Acids Res. 36, e116 (2008).

    Google Scholar 

  37. Tsend-Ayush, E. et al. Plasticity of human chromosome 3 during primate evolution. Genomics 83, 193–202 (2004).

    Google Scholar 

  38. Hillier, L.W. et al. Generation and annotation of the DNA sequences of human chromosomes 2 and 4. Nature 434, 724–731 (2005).

    Google Scholar 

  39. Haga, H., Yamada, R., Ohnishi, Y., Nakamura, Y. & Tanaka, T. Gene-based SNP discovery as part of the Japanese Millennium Genome Project: identification of 190,562 genetic variations in the human genome. Single-nucleotide polymorphism. J. Hum. Genet. 47, 605–610 (2002).

    Google Scholar 

  40. Clark, A.G., Hubisz, M.J., Bustamante, C.D., Williamson, S.H. & Nielsen, R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 15, 1496–1502 (2005).

    Google Scholar 

  41. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).

    Google Scholar 

  42. Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).

    Google Scholar 

  43. Xie, C. & Tammi, M.T. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10, 80 (2009).

    Google Scholar 

  44. Chiang, D.Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99–103 (2009).

    Google Scholar 

  45. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Google Scholar 

  46. Przeworski, M., Hudson, R.R. & Di Rienzo, A. Adjusting the focus on human variation. Trends Genet. 16, 296–302 (2000).

    Google Scholar 

  47. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    Google Scholar 

  48. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Google Scholar 

  49. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Google Scholar 

Download references

Acknowledgements

We thank K. Misawa for comments on the manuscript. This study was supported in part by the National Project on 'Next-generation Integrated Living Matter Simulation' of the Ministry of Education, Culture, Sports, Science and Technology (MEXT). The super-computing resource was provided by the Human Genome Center, University of Tokyo, Japan (http://sc.hgc.jp/shirokane.html).

Author information

Authors and Affiliations

Authors

Contributions

H.N., Y.N., A.F. and T.T. designed the study. A.F., T.A., M.N., K.A.B. and R.Y. performed computational analyses. A.F., T.T., H.N. and K.A.B. wrote the manuscript. K.N. performed sequencing. N.H., A.F. and K.N. performed validation experiments. H.N. and T.T. obtained funding for the study. T.T., H.N., T.S., S.M. and M.K. advised on data analysis.

Corresponding authors

Correspondence to Hidewaki Nakagawa or Tatsuhiko Tsunoda.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–24 and Supplementary Tables 1,6–8 (PDF 11217 kb)

Supplementary Table 2

List of non-3n deletions in coding regions (XLSX 59 kb)

Supplementary Table 3

List of deletions detected by distance between paired reads and read depth (XLSX 413 kb)

Supplementary Table 4

List of copy number gains (XLSX 64 kb)

Supplementary Table 5

List of copy number losses (XLSX 54 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Fujimoto, A., Nakagawa, H., Hosono, N. et al. Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet 42, 931–936 (2010). https://doi.org/10.1038/ng.691

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.691

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing