Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions

This article has been updated

Abstract

Over 900 genes have been annotated within duplicated regions of the human genome, yet their functions and potential roles in disease remain largely unknown. One major obstacle has been the inability to accurately and comprehensively assay genetic variation for these genes in a high-throughput manner. We developed a sequencing-based method for rapid and high-throughput genotyping of duplicated genes using molecular inversion probes designed to target unique paralogous sequence variants. We applied this method to genotype all members of two gene families, SRGAP2 and RH, among a diversity panel of 1,056 humans. The approach could accurately distinguish copy number in paralogs having up to 99.6% sequence identity, identify small gene-disruptive deletions, detect single-nucleotide variants, define breakpoints of unequal crossover and discover regions of interlocus gene conversion. The ability to rapidly and accurately genotype multiple gene families in thousands of individuals at low cost enables the development of genome-wide gene conversion maps and 'unlocks' many previously inaccessible duplicated genes for association with human traits.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it

$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: MIP copy-number genotyping assay for duplicated genes.
Figure 2: Accuracy of paralog-specific copy-number genotyping.
Figure 3: Resolution of complex structural variation in SRGAP2.
Figure 4: Detection of gene conversion at the RH locus.
Figure 5: Resolution of nonallelic homologous recombination (NAHR)-associated RHD deletion and duplication breakpoints.
Figure 6: Extensive interlocus gene conversion between SRGAP2 paralogs.

Accession codes

Primary accessions

Sequence Read Archive

Change history

  • 13 August 2013

    In the version of this article initially published online, the affiliation for Corrado Romano was incorrect. The correct affiliation is: Pediatrics and Medical Genetics, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Associazione Oasi Maria Santissima, Troina, Italy. The error has been corrected for the print, PDF and HTML versions of this article.

References

  1. Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Campbell, C.D. et al. Population-genetic properties of differentiated human copy number polymorphisms. Am. J. Hum. Genet. 88, 317–332 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

    Article  CAS  PubMed  Google Scholar 

  5. Ohno, S. Evolution by Gene Duplication (Springer, New York, 1970).

  6. Semple, C.A., Rolfe, M. & Dorin, J.R. Duplication and selection in the evolution of primate beta-defensin genes. Genome Biol. 4, R31 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Han, M.V., Demuth, J.P., McGrath, C.L., Casola, C. & Hahn, M.W. Adaptive evolution of young gene duplicates in mammals. Genome Res. 19, 859–867 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Bailey, J.A. & Eichler, E.E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).

    Article  CAS  PubMed  Google Scholar 

  9. Lefebvre, S. et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell 80, 155–165 (1995).

    Article  CAS  PubMed  Google Scholar 

  10. Olbrich, H. et al. Recessive HYDIN mutations cause primary ciliary dyskinesia without randomization of left-right body asymmetry. Am. J. Hum. Genet. 91, 672–684 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Bunge, S. et al. Homologous nonallelic recombinations between the iduronate-sulfatase gene and pseudogene cause various intragenic deletions and inversions in patients with mucopolysaccharidosis type II. Eur. J. Hum. Genet. 6, 492–500 (1998).

    Article  CAS  PubMed  Google Scholar 

  12. Lupski, J.R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998).

    Article  CAS  PubMed  Google Scholar 

  13. Dennis, M.Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Doggett, N.A. et al. A 360-kb interchromosomal duplication of the human HYDIN locus. Genomics 88, 762–771 (2006).

    Article  CAS  PubMed  Google Scholar 

  15. Locke, D.P. et al. Linkage disequilibrium and heritability of copy number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37–S42 (2007).

    Article  CAS  PubMed  Google Scholar 

  17. Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005).

    Article  CAS  PubMed  Google Scholar 

  20. Bhattacharya, T. et al. CCL3L1 and HIV/AIDS susceptibility. Nat. Med. 15, 1112–1115 (2009).

    Article  CAS  PubMed  Google Scholar 

  21. Cantsilieris, S., Baird, P.N. & White, S.J. Molecular methods for genotyping complex copy number polymorphisms. Genomics 101, 86–93 (2013).

    Article  CAS  PubMed  Google Scholar 

  22. Armour, J.A.L. et al. Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats. Nucleic Acids Res. 35, e19 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Schouten, J.P. et al. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res. 30, e57 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Armour, J.A., Sismani, C., Patsalis, P.C. & Cross, G. Measurement of locus copy number by hybridisation with amplifiable probes. Nucleic Acids Res. 28, 605–609 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Waszak, S.M. et al. Systematic inference of copy number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput. Biol. 6, e1000988 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat. Biotechnol. 21, 673–678 (2003).

    Article  CAS  PubMed  Google Scholar 

  27. Hardenbol, P. et al. Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome Res. 15, 269–275 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Porreca, G.J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931–936 (2007).

    Article  CAS  PubMed  Google Scholar 

  29. Turner, E.H. et al. Massively parallel exon capture and library-free resequencing across 16 genomes. Nat. Methods 6, 315–316 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Colin, Y. et al. Genetic basis of the RhD-positive and RhD-negative blood group polymorphism as determined by Southern analysis. Blood 78, 2747–2752 (1991).

    CAS  PubMed  Google Scholar 

  31. Wagner, F.F. & Flegel, W.A. RHD gene deletion occurred in the Rhesus box . Blood 95, 3662–3668 (2000).

    Article  CAS  PubMed  Google Scholar 

  32. Kitano, T. & Saitou, N. Evolution of Rh blood group genes have experienced gene conversions and positive selection. J. Mol. Evol. 49, 615–626 (1999).

    Article  CAS  PubMed  Google Scholar 

  33. O'Roak, B.J. et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619–1622 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lee, J.A., Carvalho, C.M. & Lupski, J.R.A. DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007).

    Article  CAS  PubMed  Google Scholar 

  37. Zhang, F. et al. The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat. Genet. 41, 849–853 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Fledel-Alon, A. et al. Broad-scale recombination patterns underlying proper disjunction in humans. PLoS Genet. 5, e1000658 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Carritt, B., Kemp, T.J. & Poulter, M. Evolution of the human RH (rhesus) blood group genes: a 50 year old prediction (partially) fulfilled. Hum. Mol. Genet. 6, 843–850 (1997).

    Article  CAS  PubMed  Google Scholar 

  40. Edwards, M.C. & Gibbs, R.A. Multiplex PCR: advantages, development, and applications. PCR Methods Appl. 3, S65–S75 (1994).

    Article  CAS  PubMed  Google Scholar 

  41. Markoulatos, P., Siafakas, N. & Moncany, M. Multiplex polymerase chain reaction: a practical approach. J. Clin. Lab. Anal. 16, 47–51 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Groth, M. et al. High-resolution mapping of the 8p23.1 beta-defensin cluster reveals strictly concordant copy number variation of all genes. Hum. Mutat. 29, 1247–1254 (2008).

    Article  CAS  PubMed  Google Scholar 

  43. Aldhous, M.C. et al. Measurement methods and accuracy in copy number variation: failure to replicate associations of beta-defensin copy number with Crohn's disease. Hum. Mol. Genet. 19, 4930–4938 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Fernando, M.M. et al. Assessment of complement C4 gene copy number using the paralog ratio test. Hum. Mutat. 31, 866–874 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Hiatt, J.B. et al. Single molecule molecular inversion probes for targeted, high accuracy detection of low frequency variation. Genome Res. 23, 843–854 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Itsara, A. et al. Resolving the breakpoints of the 17q21.31 microdeletion syndrome with next-generation sequencing. Am. J. Hum. Genet. 90, 599–613 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Jackson, M.S. et al. Evidence for widespread reticulate evolution within human duplicons. Am. J. Hum. Genet. 77, 824–840 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Schildkraut, E., Miller, C.A. & Nickoloff, J.A. Gene conversion and deletion frequencies during double-strand break repair in human cells are controlled by the distance between direct repeats. Nucleic Acids Res. 33, 1574–1580 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Ezawa, K., Oota, S. & Saitou, N. Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Genome-wide search of gene conversions in duplicated genes of mouse and rat. Mol. Biol. Evol. 23, 927–940 (2006).

    Article  CAS  PubMed  Google Scholar 

  50. Chen, J.M. et al. Gene conversion: mechanisms, evolution and human disease. Nat. Rev. Genet. 8, 762–775 (2007).

    Article  CAS  PubMed  Google Scholar 

  51. Thompson, J.D., Gibson, T.J. & Higgins, D.G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 1, 2.3 (2002).

    Google Scholar 

  52. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.4 (2009).

    Google Scholar 

  53. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Antonacci, F. et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nat. Genet. 42, 745–750 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Kuhn, H.W. The Hungarian Method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955).

    Article  Google Scholar 

  59. Carpenter, D., Walker, S., Prescott, N., Schalkwijk, J. & Armour, J.A. Accuracy and differential bias in copy number measurement of CCL3L1 in association studies with three auto-immune disorders. BMC Genomics 12, 418 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Nordang, G.B. et al. Association analysis of the CCL3L1 copy number locus by paralogue ratio test in Norwegian rheumatoid arthritis patients and healthy controls. Genes Immun. 13, 579–582 (2012).

    Article  CAS  PubMed  Google Scholar 

  61. Krumm, N. et al. Copy number variation detection and genotyping from exome sequence data. Genome Res. 22, 1525–1532 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Kitzman for early ideas and enthusiasm for the project; P. Sudmant, E. Karakoc, F. Hormozdiari, B. Dumont and O. Penn for thoughtful discussion; L. Vives, K. Mohajeri and C. Lee for technical assistance; and T. Brown for assistance with manuscript preparation. X.N. is supported by a US National Science Foundation Graduate Research Fellowship under grant no. DGE-1256082. This work was supported by US National Institutes of Health grants HG004120 and HG002385 to E.E.E. E.E.E. is supported by the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Authors

Contributions

X.N., J.S. and E.E.E. designed the study. X.N. and B.J.O. designed the MIPs. X.N. performed capture experiments, wrote analysis software and analyzed data. F.A. performed FISH experiments. J.H. contributed to the analysis software, prepared it for public access and identified SUNs from the reference genome. M.F. and C.R. contributed to sample collection. X.N. and E.E.E. wrote the paper, with input and approval from all coauthors.

Corresponding author

Correspondence to Evan E Eichler.

Ethics declarations

Competing interests

E.E.E. is on the scientific advisory boards for Pacific Biosciences, Inc., SynapDx Corp. and DNAnexus, Inc.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Tables 1 and 7–10 (PDF 3550 kb)

Supplementary Table 2

MIP pooling scheme and sequences. (XLSX 25 kb)

Supplementary Table 3

Comparison of MIP-based paralog-specific copy number estimates with orthogonal data. (XLSX 22 kb)

Supplementary Table 4

Comparison of MIP-based genotyping between replicate experiments for 73 individuals where at least 1 technical replicate was performed. (XLSX 23 kb)

Supplementary Table 5

Frequencies of SRGAP2 and RH duplications, deletions, and interlocus gene conversions in 9 HapMap populations. (XLSX 16 kb)

Supplementary Table 6

Single nucleotide variants detected and genotyped for NA18507 and comparison with whole-genome sequence data. (XLSX 15 kb)

Supplementary Table 11

Comparison of MIP-based paralog-specific detected internal variants with orthogonal data. (XLSX 11 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Nuttle, X., Huddleston, J., O'Roak, B. et al. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nat Methods 10, 903–909 (2013). https://doi.org/10.1038/nmeth.2572

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2572

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing