Abstract
Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified ∼5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present in patterns consistent with known human migration paths. Cross-species conservation analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain ∼19–40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals?
Journal of Animal Science and Biotechnology Open Access 05 May 2023
-
Pangenomic analysis of Chinese gastric cancer
Nature Communications Open Access 15 September 2022
-
Oxford Nanopore and Bionano Genomics technologies evaluation for plant structural variation detection
BMC Genomics Open Access 21 April 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




Accession codes
Primary accessions
GenBank/EMBL/DDBJ
References
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).
Khaja, R. et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Li, Y. & Wang, J. Faster human genome sequencing. Nat. Biotechnol. 27, 820–821 (2009).
Li, R. et al. De novo assembly of the human genomes with massively parallel short read sequencing. Genome Res. (in the press).
Bovee, D. et al. Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat. Genet. 40, 96–101 (2008).
Cann, H.M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).
Cavalli-Sforza, L.L. The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6, 333–340 (2005).
Tishkoff, S.A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).
Wang, S. et al. Genetic variation and population structure in native Americans. PLoS Genet. 3, e185 (2007).
Rosenberg, N.A. et al. Genetic structure of human populations. Science 298, 2381–2385 (2002).
Li, J.Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).
Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).
Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).
Underhill, P.A. & Kivisild, T. Use of y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu. Rev. Genet. 41, 539–564 (2007).
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22, 231–238 (1999).
Ahn, S.M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622–1629 (2009).
Wong, G.K. et al. A population threshold for functional polymorphisms. Genome Res. 13, 1873–1879 (2003).
Beckers, M. et al. Active genes in junk DNA? Characterization of DUX genes embedded within 3.3 kb repeated elements. Gene 264, 51–57 (2001).
Holland, P.W., Booth, H.A. & Bruford, E.A. Classification and nomenclature of all human homeobox genes. BMC Biol. 5, 47 (2007).
Dekker, J., Rossen, J.W., Buller, H.A. & Einerhand, A.W. The MUC family: an obituary. Trends Biochem. Sci. 27, 126–131 (2002).
Krishna, S.S., Majumdar, I. & Grishin, N.V. Structural classification of zinc fingers: survey and summary. Nucleic Acids Res. 31, 532–550 (2003).
Young, J.M. et al. Extensive copy-number variation of the human olfactory receptor gene family. Am. J. Hum. Genet. 83, 228–242 (2008).
Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol. Ecol. Notes 7, 574–578 (2007).
Acknowledgements
This project is supported by the Chinese Academy of Science (GJHZ0701-6), the National Natural Science Foundation of China (30725008; 30890032), Shenzhen local government, the Danish Platform for Integrative Biology, the Ole Rømer grant from the Danish Natural Science Research Council. L. Goodman edited the manuscript. J. Sun, M. Zhao, Y. Liu, Y. Zheng and H. Wang helped on designing the primers. W. Jin helped on experimental validation. San A, J. Wang, Y. Huang, M. Jian, M. Chen, Y. Huang, Xiaoli Ren, H. Liang, H. Zheng, S. Lin helped on the data production.
Author information
Authors and Affiliations
Contributions
Ruiq. L., Y.L., Ha. Z. and Ruib. L. contributed equally to this work. H.Y., Ju. W. and Ji. W. managed the project. Ju. W., Ruiq. L., L.B. and Y.L. designed the analyses. Ju. W., Ruiq. L., Y.L., Ha. Z., Ruib. L., Ho. Z., Q.L., W.Q., G.Z., H.W., J.Q., X.J., D.L., Hon. C., S.L. and K.K. performed the data analyses. H.B. and How. C. contributed the DNA samples. Y.R., X.H. and Xu. Z. performed PCR validation. G.T., J. L., Xi. Z. performed sequencing. Ju. W., Ruiq. L., Y.L. and Ruib. L. wrote the paper.
Corresponding authors
Supplementary information
Supplementary Information
Supplementary Figs. 1–7, Supplementary Tables 1 and 2, and Supplementary Discussion (PDF 498 kb)
Rights and permissions
About this article
Cite this article
Li, R., Li, Y., Zheng, H. et al. Building the sequence map of the human pan-genome. Nat Biotechnol 28, 57–63 (2010). https://doi.org/10.1038/nbt.1596
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.1596
This article is cited by
-
A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals?
Journal of Animal Science and Biotechnology (2023)
-
Oxford Nanopore and Bionano Genomics technologies evaluation for plant structural variation detection
BMC Genomics (2022)
-
Graph-based pan-genome reveals structural and sequence variations related to agronomic traits and domestication in cucumber
Nature Communications (2022)
-
Pangenomic analysis of Chinese gastric cancer
Nature Communications (2022)
-
The Human Pangenome Project: a global resource to map genomic diversity
Nature (2022)