Abstract
Massively parallel sequencing technologies have identified a broad spectrum of human genome diversity. Here we deep sequenced and correlated 18 genomes and 17 transcriptomes of unrelated Korean individuals. This has allowed us to construct a genome-wide map of common and rare variants and also identify variants formed during DNA-RNA transcription. We identified 9.56 million genomic variants, 23.2% of which appear to be previously unidentified. From transcriptome sequencing, we discovered 4,414 transcripts not previously annotated. Finally, we revealed 1,809 sites of transcriptional base modification, where the transcriptional landscape is different from the corresponding genomic sequences, and 580 sites of allele-specific expression. Our findings suggest that a considerable number of unexplored genomic variants still remain to be identified in the human genome, and that the integrated analysis of genome and transcriptome sequencing is powerful for understanding the diversity and functional aspects of human genomic variants.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Kim, J.I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 1011–1015 (2009).
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).
Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).
Baranzini, S.E. et al. Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature 464, 1351–1356 (2010).
Lupski, J.R. et al. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N. Engl. J. Med. 362, 1181–1191 (2010).
Ju, Y.S. et al. Reference-unbiased copy number variant analysis using CGH microarrays. Nucleic Acids Res. 38, e190 (2010).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
Roach, J.C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).
Schuster, S.C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943–947 (2010).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).
Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).
Li, J.B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 1210–1213 (2009).
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Durbin, R.M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).
Ju, Y.S., Yoo, Y.J., Kim, J.I. & Seo, J.S. The first Irish genome and ways of improving sequence accuracy. Genome Biol. 11, 132 (2010).
Hong, D. et al. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology. Nucleic Acids Res. 39, D883–D888 (2010).
Altshuler, D.M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Cusi, D. et al. Polymorphisms of alpha-adducin and salt sensitivity in patients with essential hypertension. Lancet 349, 1353–1357 (1997).
Yoshiura, K. et al. A SNP in the ABCC11 gene is the determinant of human earwax type. Nat. Genet. 38, 324–330 (2006).
Fujimoto, A. et al. A scan for genetic determinants of human hair morphology: EDAR is associated with Asian hair thickness. Hum. Mol. Genet. 17, 835–843 (2008).
Zhao, Y. et al. Cancer resistance in transgenic mice expressing the SAC module of Par-4. Cancer Res. 67, 9276–9285 (2007).
Kim, J.I., Ju, Y.S., Kim, S., Hong, D. & Seo, J.S. Detection of hydin gene duplication in personal genome sequence data. Genomics Inform. 7, 159–162 (2009).
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
McClellan, J. & King, M.C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Park, H. et al. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing. Nat. Genet. 42, 400–405 (2010).
Conrad, D.F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet. 42, 385–391 (2010).
Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).
Mills, R.E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Toung, J.M., Morley, M., Li, M. & Cheung, V.G. RNA-sequence analysis of human B-cells. Genome Res. 21, 991–998 (2011).
Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400–404 (2005).
Wulff, B.E., Sakurai, M. & Nishikura, K. Elucidating the inosinome: global approaches to adenosine-to-inosine RNA editing. Nat. Rev. Genet. 12, 81–85 (2011).
Levanon, E.Y. et al. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001–1005 (2004).
Conticello, S.G. The AID/APOBEC family of nucleic acid mutators. Genome Biol. 9, 229 (2008).
Kiran, A. & Baranov, P.V. DARNED: a DAtabase of RNa EDiting in humans. Bioinformatics 26, 1772–1776 (2010).
Rosenberg, B.R., Hamilton, C.E., Mwangi, M.M., Dewell, S. & Papavasiliou, F.N. Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3′ UTRs. Nat. Struct. Mol. Biol. 18, 230–236 (2011).
Knight, J.C. Allele-specific gene expression uncovered. Trends Genet. 20, 113–116 (2004).
Pastinen, T. Genome-wide allele-specific analysis: insights into regulatory variation. Nat. Rev. Genet. 11, 533–538 (2010).
Khaja, R. et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Meng, Y. et al. RNA editing of nuclear transcripts in Arabidopsis thaliana. BMC Genomics 11 (Suppl 4), S12 (2010).
Acknowledgements
We acknowledge the anonymous Korean individuals who participated in this study. We thank Y.J. Yoo and T. Bleazard at Seoul National University for their personal comments regarding this manuscript. We are indebted to the scientists who contributed to this work but were not included in the author list. This work has been supported in part by Green Cross Therapeutics (grant # 0411-20080023 to J.-S.S.), the Korean Ministry of Knowledge Economy (grant # 10037410 to J.-I.K.), the Korean Ministry of Education, Science and Technology (grant # M10305030000 to J.-S.S.; grant # 2010-0013662 to J.-I.K.) and the National Human Genome Research Institute of the US National Institutes of Health (grant # P41HG4221 to C.L.).
Author information
Authors and Affiliations
Contributions
J.-S.S. and C.L. conceived of the project. J.-S.S. planned and managed the project. Y.S.J., J.-I.K., Sheehyun Kim, D.H., W.-C.L., Sujung Kim and S.-B.Y. analyzed sequencing data. D.H. and S.-S.P. developed the genome browser. J.-Y.S., S.-H.S., J.-Y.Y., H.C., K.-S.Y. and H.K. constructed libraries and executed sequencing. J.H.J. analyzed genotyping microarray experiments. H.P., S.L., H.-J.K., H.P.K. and O.G. assisted in the data analysis. Y.S.J., S.L., D.-S.L. and M.Y. performed validation analyses. J.-S.S., C.L., Y.S.J., J.-I.K., Sheehyun Kim, D.H., O.G. and D.R.G. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Figures 1–9 and Supplementary Tables 1, 2, 4, 5, 11, 13–15 and 18. (PDF 1862 kb)
Supplementary Table 3
Primers for validations (XLS 37 kb)
Supplementary Table 4
Indel list of 10 individuals extracted by whole genome sequencing (TXT 70434 kb)
Supplementary Table 6
Non-synonymous SNP list detected from 18 individuals (XLS 7569 kb)
Supplementary Table 7
Funtional assessment of nsSNP of 18 individuals (XLS 692 kb)
Supplementary Table 8
Super nsSNP gene list (XLS 44 kb)
Supplementary Table 9
List of Korean common novel nsSNP LD (XLS 212 kb)
Supplementary Table 10
Total 5,496 large deletion list of 8 individuals (XLS 1152 kb)
Supplementary Table 12
Breakpoints list of NA10851 (XLS 83 kb)
Supplementary Table 16
Expression map represented in RPKM value on all RefSeq genes (XLS 12011 kb)
Supplementary Table 17
List of Korean common novel transcripts (XLS 776 kb)
Supplementary Table 19
1,809 TBM sites (XLS 1606 kb)
Supplementary Table 20
580 Allele Specific Expression sites (XLS 448 kb)
Supplementary Table 21
Contig list generated by de novo assembly (XLS 4580 kb)
Supplementary Table 22
Alignment result of de novo assemble contigs (XLS 67 kb)
Rights and permissions
About this article
Cite this article
Ju, Y., Kim, JI., Kim, S. et al. Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat Genet 43, 745–752 (2011). https://doi.org/10.1038/ng.872
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.872
This article is cited by
-
Detection technologies for RNA modifications
Experimental & Molecular Medicine (2022)
-
A novel long noncoding RNA Linc-ASEN represses cellular senescence through multileveled reduction of p21 expression
Cell Death & Differentiation (2020)
-
Epitranscriptomic technologies and analyses
Science China Life Sciences (2020)
-
NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants
Genome Medicine (2019)
-
Integrative analysis of oncogenic fusion genes and their functional impact in colorectal cancer
British Journal of Cancer (2018)