Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Using population admixture to help complete maps of the human genome

Abstract

Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces using the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed at the RNA level and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Admixture mapping of the human genome's missing pieces.
Figure 2: Approximate locations of previously unplaced genome sequence scaffolds that were mapped by our approach.
Figure 3: Cryptic paralogs of the PRIM2 gene.
Figure 4: FISH analysis confirmed the presence of cryptic segmental duplications.
Figure 5: Expression of cryptic gene paralogs from pericentromeric regions of the human genome.

References

  1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  3. Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

    Article  CAS  PubMed  Google Scholar 

  4. Kidd, J.M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods 7, 365–371 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kirsch, S. et al. Interchromosomal segmental duplications of the pericentromeric region on the human Y chromosome. Genome Res. 15, 195–204 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lyle, R. et al. Islands of euchromatin-like sequence and expressed polymorphic sequences within the short arm of human chromosome 21. Genome Res. 17, 1690–1696 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  8. Lander, E.S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011).

    Article  CAS  PubMed  Google Scholar 

  9. Pickrell, J.K., Gaffney, D.J., Gilad, Y. & Pritchard, J.K. False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions. Bioinformatics 27, 2144–2146 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Eichler, E.E., Clark, R.A. & She, X. An assessment of the sequence gaps: unfinished business in a finished human genome. Nat. Rev. Genet. 5, 345–354 (2004).

    Article  CAS  PubMed  Google Scholar 

  11. Botstein, D., White, R.L., Skolnick, M. & Davis, R.W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32, 314–331 (1980).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Donis-Keller, H. et al. A genetic linkage map of the human genome. Cell 51, 319–337 (1987).

    Article  CAS  PubMed  Google Scholar 

  13. Weissenbach, J. et al. A second-generation linkage map of the human genome. Nature 359, 794–801 (1992).

    Article  CAS  PubMed  Google Scholar 

  14. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002).

    Article  CAS  PubMed  Google Scholar 

  15. Reich, D. et al. A whole-genome admixture scan finds a candidate locus for multiple sclerosis susceptibility. Nat. Genet. 37, 1113–1118 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. Winkler, C.A., Nelson, G.W. & Smith, M.W. Admixture mapping comes of age. Annu. Rev. Genomics Hum. Genet. 11, 65–89 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Hinch, A.G. et al. The landscape of recombination in African Americans. Nature 476, 170–175 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat. Genet. 43, 847–853 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Seldin, M.F., Pasaniuc, B. & Price, A.L. New approaches to disease mapping in admixed populations. Nat. Rev. Genet. 12, 523–528 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. & Sayers, E.W. GenBank. Nucleic Acids Res. 39, D32–D37 (2011).

    Article  CAS  PubMed  Google Scholar 

  22. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  23. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  24. Taylor, H.A. Jr. et al. Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn. Dis. 15, S6-4-17 (2005).

    PubMed  Google Scholar 

  25. Musunuru, K. et al. Candidate gene association resource (CARe): design, methods, and proof of concept. Circ. Cardiovasc. Genet. 3, 267–275 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

  27. Martin, J. et al. The sequence and analysis of duplication-rich human chromosome 16. Nature 432, 988–994 (2004).

    Article  CAS  PubMed  Google Scholar 

  28. Doggett, N.A. et al. A 360-kb interchromosomal duplication of the human HYDIN locus. Genomics 88, 762–771 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Kim, J.I., Ju, Y.S., Kim, S., Hong, D. & Seo, J.S. Detection of HYDIN gene duplication in personal genome sequence data. Genomics Inform. 7, 159–162 (2009).

    Article  Google Scholar 

  30. Reiner, A.P. et al. Genome-wide association study of white blood cell count in 16,388 African Americans: the Continental Origins and Genetic Epidemiology Network (COGENT). PLoS Genet. 7, e1002108 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Guipponi, M. et al. Genomic structure of a copy of the human TPTE gene which encompasses 87 kb on the short arm of chromosome 21. Hum. Genet. 107, 127–131 (2000).

    Article  CAS  PubMed  Google Scholar 

  32. Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. & Eichler, E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

    Article  CAS  PubMed  Google Scholar 

  34. Bailey, J.A. et al. Human-specific duplication and mosaic transcripts: the recent paralogous structure of chromosome 22. Am. J. Hum. Genet. 70, 83–100 (2002).

    Article  CAS  PubMed  Google Scholar 

  35. Golfier, G. et al. The 200-kb segmental duplication on human chromosome 21 originates from a pericentromeric dissemination involving human chromosomes 2, 18 and 13. Gene 312, 51–59 (2003).

    Article  CAS  PubMed  Google Scholar 

  36. Ruault, M., Ventura, M., Galtier, N., Brun, M.E. & Archidiacono, N. BAGE genes generated by juxtacentromeric reshuffling in the Hominidae lineage are under selective pressure. Genomics 81, 391–399 (2003).

    Article  CAS  PubMed  Google Scholar 

  37. Dennis, M.Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. BAC Resource Consortium. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

  40. Mahtani, M.M. & Willard, H.F. Physical and genetic mapping of the human X chromosome centromere: repression of recombination. Genome Res. 8, 100–110 (1998).

    Article  CAS  PubMed  Google Scholar 

  41. Samonte, R.V. & Eichler, E.E. Segmental duplications and the evolution of the primate genome. Nat. Rev. Genet. 3, 65–72 (2002).

    Article  CAS  PubMed  Google Scholar 

  42. She, X. et al. The structure and evolution of centromeric transition regions within the human genome. Nature 430, 857–864 (2004).

    Article  CAS  PubMed  Google Scholar 

  43. Zhang, J., Feuk, L., Duggan, G.E., Khaja, R. & Scherer, S.W. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet. Genome Res. 115, 205–214 (2006).

    Article  CAS  PubMed  Google Scholar 

  44. Ryan, D.P. et al. Mutations in potassium channel Kir2.6 cause susceptibility to thyrotoxic hypokalemic periodic paralysis. Cell 140, 88–98 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Eichler, E.E. Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17, 661–669 (2001).

    Article  CAS  PubMed  Google Scholar 

  46. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2011).

    Article  CAS  PubMed  Google Scholar 

  47. Christiansen, J. et al. Chromosome 1q21.1 contiguous gene deletion is associated with congenital heart disease. Circ. Res. 94, 1429–1435 (2004).

    Article  CAS  PubMed  Google Scholar 

  48. International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455, 237–241 (2008).

  49. Stefansson, H. et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Mefford, H.C. et al. Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes. N. Engl. J. Med. 359, 1685–1699 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Brunetti-Pierri, N. et al. Recurrent reciprocal 1q21.1 deletions and duplications associated with microcephaly or macrocephaly and developmental and behavioral abnormalities. Nat. Genet. 40, 1466–1471 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Sırmacı, A. et al. A truncating mutation in SERPINB6 is associated with autosomal-recessive nonsyndromic sensorineural hearing loss. Am. J. Hum. Genet. 86, 797–804 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).

    Article  CAS  PubMed  Google Scholar 

  55. Ju, Y.S. et al. Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals. Nat. Genet. 43, 745–752 (2011).

    Article  CAS  PubMed  Google Scholar 

  56. Church, D.M. et al. Modernizing reference genome assemblies. PLoS Biol. 9, e1001091 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Korn, J.M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Price, A.L. et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5, e1000519 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This study was supported by grants RC1 GM091332-01 (S.A.M. and J.G.W.), R01 HG006855 (S.A.M.) and R01DK54931 (G.G. and M.R.P.) from the US National Institutes of Health and by a Smith Family Foundation Award for Excellence in Biomedical Research (S.A.M.).

The Jackson Heart Study is supported and conducted in collaboration with Jackson State University (N01-HC-95170), University of Mississippi Medical Center (N01-HC-95171) and Touglaoo College (N01-HC-95172) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute for Minority Health and Health Disparities (NIMHD), with additional support from the National Institute on Biomedical Imaging and Bioengineering (NIBIB).

The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by NHLBI contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C and HHSN268201100012C).

The Coronary Artery Risk Development in Young Adults Study (CARDIA) is conducted and supported by the NHLBI in collaboration with the University of Alabama at Birmingham (N01-HC95095 and N01-HC48047), the University of Minnesota (N01-HC48048), Northwestern University (N01-HC48049) and the Kaiser Foundation Research Institute (N01-HC48050).

MESA, MESA Family and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with the MESA investigators. Support for MESA is provided by contracts N01-HC-95159, through N01-HC-95169, and RR-024156. Funding for MESA Family is provided by grants R01-HL-071051, R01-HL-071205, R01-HL-071250, R01-HL-071251, R01-HL-071252, R01-HL-071258 and R01-HL-071259. MESA Air is funded by the US Environmental Protection Agency (EPA)–Science to Achieve Results (STAR) Program Grant RD831697. Funding for genotyping was provided by NHLBI contracts N02-HL-6-4278 and N01-HC-65226.

This manuscript does not necessarily reflect the opinions or views of ARIC, CARDIA, JHS, MESA or the NHLBI.

Author information

Authors and Affiliations

Authors

Contributions

G.G. and S.A.M. conceived the project, designed the analyses and wrote the manuscript. G.G. performed the analysis of the CARe, ICDB, JHS and BodyMap 2.0 data sets. R.E.H. performed the sequence read depth analysis of selected regions. H.L. performed the alignments of HuRef scaffolds and GenBank clones. N.A. contributed the analysis of the HuRef unplaced scaffolds. A.M.L. performed the FISH experiments. K.C. organized and contributed to the design of the Sequenom experiment. B.P., A.L.P. and D.R. provided advice for the local ancestry inference. C.C.M. participated in the interpretation of the FISH experiments. M.R.P. participated in planning discussions for the linkage analysis. J.G.W. participated in planning discussions, coordinated interactions with JHS and edited the manuscript.

Corresponding authors

Correspondence to Giulio Genovese or Steven A McCarroll.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Tables 1–13 and Supplementary Figures 1–24 (PDF 1973 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Genovese, G., Handsaker, R., Li, H. et al. Using population admixture to help complete maps of the human genome. Nat Genet 45, 406–414 (2013). https://doi.org/10.1038/ng.2565

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.2565

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing