Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals

Ju, Young Seok; Kim, Jong-Il; Kim, Sheehyun; Hong, Dongwan; Park, Hansoo; Shin, Jong-Yeon; Lee, Seungbok; Lee, Won-Chul; Kim, Sujung; Yu, Saet-Byeol; Park, Sung-Soo; Seo, Seung-Hyun; Yun, Ji-Young; Kim, Hyun-Jin; Lee, Dong-Sung; Yavartanoo, Maryam; Kang, Hyunseok Peter; Gokcumen, Omer; Govindaraju, Diddahally R; Jung, Jung Hee; Chong, Hyonyong; Yang, Kap-Seok; Kim, Hyungtae; Lee, Charles; Seo, Jeong-Sun

doi:10.1038/ng.872

Article
Published: 03 July 2011

Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals

Young Seok Ju^1,2^na1,
Jong-Il Kim^1,3,4,5^na1,
Sheehyun Kim^1,2^na1,
Dongwan Hong¹^nAff8,
Hansoo Park^1,6,
Jong-Yeon Shin^1,5,
Seungbok Lee^1,4,
Won-Chul Lee^1,4,
Sujung Kim⁵,
Saet-Byeol Yu⁵,
Sung-Soo Park⁵,
Seung-Hyun Seo⁵,
Ji-Young Yun⁵,
Hyun-Jin Kim^1,4,
Dong-Sung Lee^1,4,
Maryam Yavartanoo^1,4,
Hyunseok Peter Kang¹,
Omer Gokcumen⁶,
Diddahally R Govindaraju⁶,
Jung Hee Jung²,
Hyonyong Chong^2,7,
Kap-Seok Yang²,
Hyungtae Kim²,
Charles Lee⁶ &
…
Jeong-Sun Seo^1,2,3,4,5,7

Nature Genetics volume 43, pages 745–752 (2011)Cite this article

5375 Accesses
110 Citations
50 Altmetric
Metrics details

Subjects

Abstract

Massively parallel sequencing technologies have identified a broad spectrum of human genome diversity. Here we deep sequenced and correlated 18 genomes and 17 transcriptomes of unrelated Korean individuals. This has allowed us to construct a genome-wide map of common and rare variants and also identify variants formed during DNA-RNA transcription. We identified 9.56 million genomic variants, 23.2% of which appear to be previously unidentified. From transcriptome sequencing, we discovered 4,414 transcripts not previously annotated. Finally, we revealed 1,809 sites of transcriptional base modification, where the transcriptional landscape is different from the corresponding genomic sequences, and 580 sites of allele-specific expression. Our findings suggest that a considerable number of unexplored genomic variants still remain to be identified in the human genome, and that the integrated analysis of genome and transcriptome sequencing is powerful for understanding the diversity and functional aspects of human genomic variants.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: New and rare SNPs in individual genomes.**

**Figure 2: Linkage disequilibrium between new non-synonymous SNPs and known SNPs.**

**Figure 3: Detection of large deletions.**

**Figure 5: Comparison of genome and transcriptome sequences.**

A draft human pangenome reference

Article Open access 10 May 2023

Towards population-scale long-read sequencing

Article 28 May 2021

GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data

Article 21 August 2023

References

Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 1011–1015 (2009).
Article CAS PubMed PubMed Central Google Scholar
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).
Article CAS PubMed Google Scholar
Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).
Article CAS PubMed PubMed Central Google Scholar
Baranzini, S.E. et al. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature 464, 1351–1356 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lupski, J.R. et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med. 362, 1181–1191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ju, Y.S. et al. Reference-unbiased copy number variant analysis using CGH microarrays. Nucleic Acids Res. 38, e190 (2010).
Article PubMed PubMed Central Google Scholar
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Article PubMed PubMed Central Google Scholar
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
Article CAS PubMed PubMed Central Google Scholar
Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).
Article CAS PubMed PubMed Central Google Scholar
Schuster, S.C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943–947 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Article CAS PubMed Google Scholar
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Article CAS PubMed Google Scholar
Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).
Article CAS PubMed PubMed Central Google Scholar
Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).
Article CAS PubMed Google Scholar
Li, J.B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 1210–1213 (2009).
Article CAS PubMed Google Scholar
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Article CAS PubMed PubMed Central Google Scholar
Durbin, R.M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Article CAS PubMed Google Scholar
Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ju, Y.S., Yoo, Y.J., Kim, J.I. & Seo, J.S. The first Irish genome and ways of improving sequence accuracy. Genome Biol. 11, 132 (2010).
Article PubMed PubMed Central Google Scholar
Hong, D. et al. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology. Nucleic Acids Res. 39, D883–D888 (2010).
Article PubMed PubMed Central Google Scholar
Altshuler, D.M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Article CAS PubMed Google Scholar
Cusi, D. et al. Polymorphisms of alpha-adducin and salt sensitivity in patients with essential hypertension. Lancet 349, 1353–1357 (1997).
Article CAS PubMed Google Scholar
Yoshiura, K. et al. A SNP in the ABCC11 gene is the determinant of human earwax type. Nat. Genet. 38, 324–330 (2006).
Article CAS PubMed Google Scholar
Fujimoto, A. et al. A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835–843 (2008).
Article CAS PubMed Google Scholar
Zhao, Y. et al. Cancer resistance in transgenic mice expressing the SAC module of Par-4. Cancer Res. 67, 9276–9285 (2007).
Article CAS PubMed Google Scholar
Kim, J.I., Ju, Y.S., Kim, S., Hong, D. & Seo, J.S. Detection of hydin gene duplication in personal genome sequence data. Genomics Inform. 7, 159–162 (2009).
Article Google Scholar
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
Article CAS PubMed Google Scholar
McClellan, J. & King, M.C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Article CAS PubMed Google Scholar
Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat. Genet. 42, 400–405 (2010).
Article CAS PubMed PubMed Central Google Scholar
Conrad, D.F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet. 42, 385–391 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).
Article CAS PubMed Google Scholar
Mills, R.E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
CAS PubMed Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Article CAS PubMed Google Scholar
Toung, J.M., Morley, M., Li, M. & Cheung, V.G. RNA-sequence analysis of human B-cells. Genome Res. 21, 991–998 (2011).
Article CAS PubMed PubMed Central Google Scholar
Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400–404 (2005).
Article CAS PubMed Google Scholar
Wulff, B.E., Sakurai, M. & Nishikura, K. Elucidating the inosinome: global approaches to adenosine-to-inosine RNA editing. Nat. Rev. Genet. 12, 81–85 (2011).
Article CAS PubMed Google Scholar
Levanon, E.Y. et al. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001–1005 (2004).
Article CAS PubMed Google Scholar
Conticello, S.G. The AID/APOBEC family of nucleic acid mutators. Genome Biol. 9, 229 (2008).
Article PubMed PubMed Central Google Scholar
Kiran, A. & Baranov, P.V. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics 26, 1772–1776 (2010).
Article CAS PubMed Google Scholar
Rosenberg, B.R., Hamilton, C.E., Mwangi, M.M., Dewell, S. & Papavasiliou, F.N. Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3′ UTRs. Nat. Struct. Mol. Biol. 18, 230–236 (2011).
Article CAS PubMed PubMed Central Google Scholar
Knight, J.C. Allele-specific gene expression uncovered. Trends Genet. 20, 113–116 (2004).
Article CAS PubMed Google Scholar
Pastinen, T. Genome-wide allele-specific analysis: insights into regulatory variation. Nat. Rev. Genet. 11, 533–538 (2010).
Article CAS PubMed Google Scholar
Khaja, R. et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).
Article CAS PubMed PubMed Central Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Article CAS PubMed PubMed Central Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS PubMed Google Scholar
Meng, Y. et al. RNA editing of nuclear transcripts in Arabidopsis thaliana. BMC Genomics 11 (Suppl 4), S12 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge the anonymous Korean individuals who participated in this study. We thank Y.J. Yoo and T. Bleazard at Seoul National University for their personal comments regarding this manuscript. We are indebted to the scientists who contributed to this work but were not included in the author list. This work has been supported in part by Green Cross Therapeutics (grant # 0411-20080023 to J.-S.S.), the Korean Ministry of Knowledge Economy (grant # 10037410 to J.-I.K.), the Korean Ministry of Education, Science and Technology (grant # M10305030000 to J.-S.S.; grant # 2010-0013662 to J.-I.K.) and the National Human Genome Research Institute of the US National Institutes of Health (grant # P41HG4221 to C.L.).

Author information

Dongwan Hong
Present address: Present address: Division of Convergence Technology, Functional Genomics Branch, National Cancer Center, Goyang-si, Korea.,
Young Seok Ju, Jong-Il Kim and Sheehyun Kim: These authors contributed equally to this work.

Authors and Affiliations

Genomic Medicine Institute (GMI), Medical Research Center, Seoul National University, Seoul, Korea
Young Seok Ju, Jong-Il Kim, Sheehyun Kim, Dongwan Hong, Hansoo Park, Jong-Yeon Shin, Seungbok Lee, Won-Chul Lee, Hyun-Jin Kim, Dong-Sung Lee, Maryam Yavartanoo, Hyunseok Peter Kang & Jeong-Sun Seo
Macrogen Inc., Seoul, Korea
Young Seok Ju, Sheehyun Kim, Jung Hee Jung, Hyonyong Chong, Kap-Seok Yang, Hyungtae Kim & Jeong-Sun Seo
Department of Biochemistry, Seoul National University College of Medicine, Seoul, Korea
Jong-Il Kim & Jeong-Sun Seo
Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, Korea
Jong-Il Kim, Seungbok Lee, Won-Chul Lee, Hyun-Jin Kim, Dong-Sung Lee, Maryam Yavartanoo & Jeong-Sun Seo
Psoma Therapeutics Inc., Seoul, Korea
Jong-Il Kim, Jong-Yeon Shin, Sujung Kim, Saet-Byeol Yu, Sung-Soo Park, Seung-Hyun Seo, Ji-Young Yun & Jeong-Sun Seo
Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Hansoo Park, Omer Gokcumen, Diddahally R Govindaraju & Charles Lee
Axeq Technologies, Rockville, Maryland, USA
Hyonyong Chong & Jeong-Sun Seo

Authors

Young Seok Ju
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Il Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sheehyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dongwan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Hansoo Park
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Yeon Shin
View author publications
You can also search for this author in PubMed Google Scholar
Seungbok Lee
View author publications
You can also search for this author in PubMed Google Scholar
Won-Chul Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sujung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Saet-Byeol Yu
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Soo Park
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Hyun Seo
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Young Yun
View author publications
You can also search for this author in PubMed Google Scholar
Hyun-Jin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Sung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Yavartanoo
View author publications
You can also search for this author in PubMed Google Scholar
Hyunseok Peter Kang
View author publications
You can also search for this author in PubMed Google Scholar
Omer Gokcumen
View author publications
You can also search for this author in PubMed Google Scholar
Diddahally R Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar
Jung Hee Jung
View author publications
You can also search for this author in PubMed Google Scholar
Hyonyong Chong
View author publications
You can also search for this author in PubMed Google Scholar
Kap-Seok Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hyungtae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Charles Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-Sun Seo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.-S.S. and C.L. conceived of the project. J.-S.S. planned and managed the project. Y.S.J., J.-I.K., Sheehyun Kim, D.H., W.-C.L., Sujung Kim and S.-B.Y. analyzed sequencing data. D.H. and S.-S.P. developed the genome browser. J.-Y.S., S.-H.S., J.-Y.Y., H.C., K.-S.Y. and H.K. constructed libraries and executed sequencing. J.H.J. analyzed genotyping microarray experiments. H.P., S.L., H.-J.K., H.P.K. and O.G. assisted in the data analysis. Y.S.J., S.L., D.-S.L. and M.Y. performed validation analyses. J.-S.S., C.L., Y.S.J., J.-I.K., Sheehyun Kim, D.H., O.G. and D.R.G. wrote the manuscript.

Corresponding author

Correspondence to Jeong-Sun Seo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Figures 1–9 and Supplementary Tables 1, 2, 4, 5, 11, 13–15 and 18. (PDF 1862 kb)

Supplementary Table 3

Primers for validations (XLS 37 kb)

Supplementary Table 4

Indel list of 10 individuals extracted by whole genome sequencing (TXT 70434 kb)

Supplementary Table 6

Non-synonymous SNP list detected from 18 individuals (XLS 7569 kb)

Supplementary Table 7

Funtional assessment of nsSNP of 18 individuals (XLS 692 kb)

Supplementary Table 8

Super nsSNP gene list (XLS 44 kb)

Supplementary Table 9

List of Korean common novel nsSNP LD (XLS 212 kb)

Supplementary Table 10

Total 5,496 large deletion list of 8 individuals (XLS 1152 kb)

Supplementary Table 12

Breakpoints list of NA10851 (XLS 83 kb)

Supplementary Table 16

Expression map represented in RPKM value on all RefSeq genes (XLS 12011 kb)

Supplementary Table 17

List of Korean common novel transcripts (XLS 776 kb)

Supplementary Table 19

1,809 TBM sites (XLS 1606 kb)

Supplementary Table 20

580 Allele Specific Expression sites (XLS 448 kb)

Supplementary Table 21

Contig list generated by de novo assembly (XLS 4580 kb)

Supplementary Table 22

Alignment result of de novo assemble contigs (XLS 67 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ju, Y., Kim, JI., Kim, S. et al. Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat Genet 43, 745–752 (2011). https://doi.org/10.1038/ng.872

Download citation

Received: 17 December 2010
Accepted: 03 June 2011
Published: 03 July 2011
Issue Date: August 2011
DOI: https://doi.org/10.1038/ng.872

This article is cited by

Detection technologies for RNA modifications
- Yan Zhang
- Liang Lu
- Xiaoyu Li
Experimental & Molecular Medicine (2022)
A novel long noncoding RNA Linc-ASEN represses cellular senescence through multileveled reduction of p21 expression
- Hyung Chul Lee
- Donghee Kang
- Jae-Seon Lee
Cell Death & Differentiation (2020)
Epitranscriptomic technologies and analyses
- Xiaoyu Li
- Qiao-Xia Liang
- Ke-Ren Zhou
Science China Life Sciences (2020)
NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants
- Seong-Keun Yoo
- Chang-Uk Kim
- Jeong-Sun Seo
Genome Medicine (2019)
Integrative analysis of oncogenic fusion genes and their functional impact in colorectal cancer
- Yuri Choi
- Chae Hwa Kwon
- Do Youn Park
British Journal of Cancer (2018)