Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals

Abstract

Massively parallel sequencing technologies have identified a broad spectrum of human genome diversity. Here we deep sequenced and correlated 18 genomes and 17 transcriptomes of unrelated Korean individuals. This has allowed us to construct a genome-wide map of common and rare variants and also identify variants formed during DNA-RNA transcription. We identified 9.56 million genomic variants, 23.2% of which appear to be previously unidentified. From transcriptome sequencing, we discovered 4,414 transcripts not previously annotated. Finally, we revealed 1,809 sites of transcriptional base modification, where the transcriptional landscape is different from the corresponding genomic sequences, and 580 sites of allele-specific expression. Our findings suggest that a considerable number of unexplored genomic variants still remain to be identified in the human genome, and that the integrated analysis of genome and transcriptome sequencing is powerful for understanding the diversity and functional aspects of human genomic variants.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: New and rare SNPs in individual genomes.
Figure 2: Linkage disequilibrium between new non-synonymous SNPs and known SNPs.
Figure 3: Detection of large deletions.
Figure 4: Transcriptome analyses.
Figure 5: Comparison of genome and transcriptome sequences.

Similar content being viewed by others

References

  1. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 1011–1015 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).

    Article  CAS  PubMed  Google Scholar 

  5. Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Baranzini, S.E. et al. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature 464, 1351–1356 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Lupski, J.R. et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med. 362, 1181–1191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Ju, Y.S. et al. Reference-unbiased copy number variant analysis using CGH microarrays. Nucleic Acids Res. 38, e190 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Schuster, S.C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943–947 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  PubMed  Google Scholar 

  14. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

    Article  CAS  PubMed  Google Scholar 

  15. Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

    Article  CAS  PubMed  Google Scholar 

  17. Li, J.B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 1210–1213 (2009).

    Article  CAS  PubMed  Google Scholar 

  18. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Durbin, R.M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Article  CAS  PubMed  Google Scholar 

  20. Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Ju, Y.S., Yoo, Y.J., Kim, J.I. & Seo, J.S. The first Irish genome and ways of improving sequence accuracy. Genome Biol. 11, 132 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hong, D. et al. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology. Nucleic Acids Res. 39, D883–D888 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Altshuler, D.M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Article  CAS  PubMed  Google Scholar 

  24. Cusi, D. et al. Polymorphisms of alpha-adducin and salt sensitivity in patients with essential hypertension. Lancet 349, 1353–1357 (1997).

    Article  CAS  PubMed  Google Scholar 

  25. Yoshiura, K. et al. A SNP in the ABCC11 gene is the determinant of human earwax type. Nat. Genet. 38, 324–330 (2006).

    Article  CAS  PubMed  Google Scholar 

  26. Fujimoto, A. et al. A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835–843 (2008).

    Article  CAS  PubMed  Google Scholar 

  27. Zhao, Y. et al. Cancer resistance in transgenic mice expressing the SAC module of Par-4. Cancer Res. 67, 9276–9285 (2007).

    Article  CAS  PubMed  Google Scholar 

  28. Kim, J.I., Ju, Y.S., Kim, S., Hong, D. & Seo, J.S. Detection of hydin gene duplication in personal genome sequence data. Genomics Inform. 7, 159–162 (2009).

    Article  Google Scholar 

  29. Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. McClellan, J. & King, M.C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).

    Article  CAS  PubMed  Google Scholar 

  31. Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat. Genet. 42, 400–405 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Conrad, D.F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet. 42, 385–391 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).

    Article  CAS  PubMed  Google Scholar 

  34. Mills, R.E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  PubMed  Google Scholar 

  36. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  PubMed  Google Scholar 

  37. Toung, J.M., Morley, M., Li, M. & Cheung, V.G. RNA-sequence analysis of human B-cells. Genome Res. 21, 991–998 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400–404 (2005).

    Article  CAS  PubMed  Google Scholar 

  39. Wulff, B.E., Sakurai, M. & Nishikura, K. Elucidating the inosinome: global approaches to adenosine-to-inosine RNA editing. Nat. Rev. Genet. 12, 81–85 (2011).

    Article  CAS  PubMed  Google Scholar 

  40. Levanon, E.Y. et al. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001–1005 (2004).

    Article  CAS  PubMed  Google Scholar 

  41. Conticello, S.G. The AID/APOBEC family of nucleic acid mutators. Genome Biol. 9, 229 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Kiran, A. & Baranov, P.V. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics 26, 1772–1776 (2010).

    Article  CAS  PubMed  Google Scholar 

  43. Rosenberg, B.R., Hamilton, C.E., Mwangi, M.M., Dewell, S. & Papavasiliou, F.N. Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3′ UTRs. Nat. Struct. Mol. Biol. 18, 230–236 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Knight, J.C. Allele-specific gene expression uncovered. Trends Genet. 20, 113–116 (2004).

    Article  CAS  PubMed  Google Scholar 

  45. Pastinen, T. Genome-wide allele-specific analysis: insights into regulatory variation. Nat. Rev. Genet. 11, 533–538 (2010).

    Article  CAS  PubMed  Google Scholar 

  46. Khaja, R. et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  50. Meng, Y. et al. RNA editing of nuclear transcripts in Arabidopsis thaliana. BMC Genomics 11 (Suppl 4), S12 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We acknowledge the anonymous Korean individuals who participated in this study. We thank Y.J. Yoo and T. Bleazard at Seoul National University for their personal comments regarding this manuscript. We are indebted to the scientists who contributed to this work but were not included in the author list. This work has been supported in part by Green Cross Therapeutics (grant # 0411-20080023 to J.-S.S.), the Korean Ministry of Knowledge Economy (grant # 10037410 to J.-I.K.), the Korean Ministry of Education, Science and Technology (grant # M10305030000 to J.-S.S.; grant # 2010-0013662 to J.-I.K.) and the National Human Genome Research Institute of the US National Institutes of Health (grant # P41HG4221 to C.L.).

Author information

Authors and Affiliations

Authors

Contributions

J.-S.S. and C.L. conceived of the project. J.-S.S. planned and managed the project. Y.S.J., J.-I.K., Sheehyun Kim, D.H., W.-C.L., Sujung Kim and S.-B.Y. analyzed sequencing data. D.H. and S.-S.P. developed the genome browser. J.-Y.S., S.-H.S., J.-Y.Y., H.C., K.-S.Y. and H.K. constructed libraries and executed sequencing. J.H.J. analyzed genotyping microarray experiments. H.P., S.L., H.-J.K., H.P.K. and O.G. assisted in the data analysis. Y.S.J., S.L., D.-S.L. and M.Y. performed validation analyses. J.-S.S., C.L., Y.S.J., J.-I.K., Sheehyun Kim, D.H., O.G. and D.R.G. wrote the manuscript.

Corresponding author

Correspondence to Jeong-Sun Seo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Figures 1–9 and Supplementary Tables 1, 2, 4, 5, 11, 13–15 and 18. (PDF 1862 kb)

Supplementary Table 3

Primers for validations (XLS 37 kb)

Supplementary Table 4

Indel list of 10 individuals extracted by whole genome sequencing (TXT 70434 kb)

Supplementary Table 6

Non-synonymous SNP list detected from 18 individuals (XLS 7569 kb)

Supplementary Table 7

Funtional assessment of nsSNP of 18 individuals (XLS 692 kb)

Supplementary Table 8

Super nsSNP gene list (XLS 44 kb)

Supplementary Table 9

List of Korean common novel nsSNP LD (XLS 212 kb)

Supplementary Table 10

Total 5,496 large deletion list of 8 individuals (XLS 1152 kb)

Supplementary Table 12

Breakpoints list of NA10851 (XLS 83 kb)

Supplementary Table 16

Expression map represented in RPKM value on all RefSeq genes (XLS 12011 kb)

Supplementary Table 17

List of Korean common novel transcripts (XLS 776 kb)

Supplementary Table 19

1,809 TBM sites (XLS 1606 kb)

Supplementary Table 20

580 Allele Specific Expression sites (XLS 448 kb)

Supplementary Table 21

Contig list generated by de novo assembly (XLS 4580 kb)

Supplementary Table 22

Alignment result of de novo assemble contigs (XLS 67 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ju, Y., Kim, JI., Kim, S. et al. Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat Genet 43, 745–752 (2011). https://doi.org/10.1038/ng.872

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.872

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing