Subjects

Abstract

Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to all available human DNA sequence and by PCR validation using the human genome diversity cell line panel. We found novel sequences present in patterns consistent with known human migration paths. Cross-species conservation analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain 19–40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

References

  1. 1.

    International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  2. 2.

    , , & Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).

  3. 3.

    et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

  4. 4.

    et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).

  5. 5.

    The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  6. 6.

    & Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).

  7. 7.

    et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).

  8. 8.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  9. 9.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

  10. 10.

    et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).

  11. 11.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

  12. 12.

    et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

  13. 13.

    et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

  14. 14.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  15. 15.

    & Faster human genome sequencing. Nat. Biotechnol. 27, 820–821 (2009).

  16. 16.

    et al. De novo assembly of the human genomes with massively parallel short read sequencing. Genome Res. (in the press).

  17. 17.

    et al. Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nat. Genet. 40, 96–101 (2008).

  18. 18.

    et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).

  19. 19.

    The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6, 333–340 (2005).

  20. 20.

    et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).

  21. 21.

    et al. Genetic variation and population structure in native Americans. PLoS Genet. 3, e185 (2007).

  22. 22.

    et al. Genetic structure of human populations. Science 298, 2381–2385 (2002).

  23. 23.

    et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

  24. 24.

    et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).

  25. 25.

    , & Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

  26. 26.

    & Use of y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu. Rev. Genet. 41, 539–564 (2007).

  27. 27.

    et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22, 231–238 (1999).

  28. 28.

    et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622–1629 (2009).

  29. 29.

    et al. A population threshold for functional polymorphisms. Genome Res. 13, 1873–1879 (2003).

  30. 30.

    et al. Active genes in junk DNA? Characterization of DUX genes embedded within 3.3 kb repeated elements. Gene 264, 51–57 (2001).

  31. 31.

    , & Classification and nomenclature of all human homeobox genes. BMC Biol. 5, 47 (2007).

  32. 32.

    , , & The MUC family: an obituary. Trends Biochem. Sci. 27, 126–131 (2002).

  33. 33.

    , & Structural classification of zinc fingers: survey and summary. Nucleic Acids Res. 31, 532–550 (2003).

  34. 34.

    et al. Extensive copy-number variation of the human olfactory receptor gene family. Am. J. Hum. Genet. 83, 228–242 (2008).

  35. 35.

    BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

  36. 36.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  37. 37.

    , & Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol. Ecol. Notes 7, 574–578 (2007).

Download references

Acknowledgements

This project is supported by the Chinese Academy of Science (GJHZ0701-6), the National Natural Science Foundation of China (30725008; 30890032), Shenzhen local government, the Danish Platform for Integrative Biology, the Ole Rømer grant from the Danish Natural Science Research Council. L. Goodman edited the manuscript. J. Sun, M. Zhao, Y. Liu, Y. Zheng and H. Wang helped on designing the primers. W. Jin helped on experimental validation. San A, J. Wang, Y. Huang, M. Jian, M. Chen, Y. Huang, Xiaoli Ren, H. Liang, H. Zheng, S. Lin helped on the data production.

Author information

Author notes

    • Ruiqiang Li
    • , Yingrui Li
    • , Hancheng Zheng
    •  & Ruibang Luo

    These authors contributed equally to this work.

Affiliations

  1. BGI-Shenzhen, Shenzhen 518083, China.

    • Ruiqiang Li
    • , Yingrui Li
    • , Hancheng Zheng
    • , Ruibang Luo
    • , Hongmei Zhu
    • , Qibin Li
    • , Wubin Qian
    • , Yuanyuan Ren
    • , Geng Tian
    • , Jinxiang Li
    • , Guangyu Zhou
    • , Xuan Zhu
    • , Honglong Wu
    • , Junjie Qin
    • , Xin Jin
    • , Dongfang Li
    • , Hongzhi Cao
    • , Xueda Hu
    • , Xiuqing Zhang
    • , Songgang Li
    • , Lars Bolund
    • , Karsten Kristiansen
    • , Huanming Yang
    • , Jun Wang
    •  & Jian Wang
  2. Department of Biology, University of Copenhagen, Copenhagen, Denmark.

    • Ruiqiang Li
    • , Karsten Kristiansen
    •  & Jun Wang
  3. School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China.

    • Hancheng Zheng
    • , Ruibang Luo
    •  & Xin Jin
  4. Fondation Jean Dausset, Centre d'Étude du Polymorphisme Humain (CEPH), Paris, France.

    • Hélène Blanche
    •  & Howard Cann
  5. Institute of Human Genetics, University of Aarhus, Aarhus, Denmark.

    • Lars Bolund
  6. Genome Research Institute, Shenzhen University Medical School, Shenzhen, China.

    • Honglong Wu
    • , Dongfang Li
    •  & Hongzhi Cao

Authors

  1. Search for Ruiqiang Li in:

  2. Search for Yingrui Li in:

  3. Search for Hancheng Zheng in:

  4. Search for Ruibang Luo in:

  5. Search for Hongmei Zhu in:

  6. Search for Qibin Li in:

  7. Search for Wubin Qian in:

  8. Search for Yuanyuan Ren in:

  9. Search for Geng Tian in:

  10. Search for Jinxiang Li in:

  11. Search for Guangyu Zhou in:

  12. Search for Xuan Zhu in:

  13. Search for Honglong Wu in:

  14. Search for Junjie Qin in:

  15. Search for Xin Jin in:

  16. Search for Dongfang Li in:

  17. Search for Hongzhi Cao in:

  18. Search for Xueda Hu in:

  19. Search for Hélène Blanche in:

  20. Search for Howard Cann in:

  21. Search for Xiuqing Zhang in:

  22. Search for Songgang Li in:

  23. Search for Lars Bolund in:

  24. Search for Karsten Kristiansen in:

  25. Search for Huanming Yang in:

  26. Search for Jun Wang in:

  27. Search for Jian Wang in:

Contributions

Ruiq. L., Y.L., Ha. Z. and Ruib. L. contributed equally to this work. H.Y., Ju. W. and Ji. W. managed the project. Ju. W., Ruiq. L., L.B. and Y.L. designed the analyses. Ju. W., Ruiq. L., Y.L., Ha. Z., Ruib. L., Ho. Z., Q.L., W.Q., G.Z., H.W., J.Q., X.J., D.L., Hon. C., S.L. and K.K. performed the data analyses. H.B. and How. C. contributed the DNA samples. Y.R., X.H. and Xu. Z. performed PCR validation. G.T., J. L., Xi. Z. performed sequencing. Ju. W., Ruiq. L., Y.L. and Ruib. L. wrote the paper.

Corresponding authors

Correspondence to Jun Wang or Jian Wang.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    Supplementary Figs. 1–7, Supplementary Tables 1 and 2, and Supplementary Discussion

Zip files

  1. 1.

    Supplementary Tables 3-9

  2. 2.

    Supplementary Data Set 3

Text files

  1. 1.

    Supplementary Data Set 1

  2. 2.

    Supplementary Data Set 2

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.1596

Further reading