Abstract
Over 900 genes have been annotated within duplicated regions of the human genome, yet their functions and potential roles in disease remain largely unknown. One major obstacle has been the inability to accurately and comprehensively assay genetic variation for these genes in a high-throughput manner. We developed a sequencing-based method for rapid and high-throughput genotyping of duplicated genes using molecular inversion probes designed to target unique paralogous sequence variants. We applied this method to genotype all members of two gene families, SRGAP2 and RH, among a diversity panel of 1,056 humans. The approach could accurately distinguish copy number in paralogs having up to ∼99.6% sequence identity, identify small gene-disruptive deletions, detect single-nucleotide variants, define breakpoints of unequal crossover and discover regions of interlocus gene conversion. The ability to rapidly and accurately genotype multiple gene families in thousands of individuals at low cost enables the development of genome-wide gene conversion maps and 'unlocks' many previously inaccessible duplicated genes for association with human traits.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Change history
13 August 2013
In the version of this article initially published online, the affiliation for Corrado Romano was incorrect. The correct affiliation is: Pediatrics and Medical Genetics, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Associazione Oasi Maria Santissima, Troina, Italy. The error has been corrected for the print, PDF and HTML versions of this article.
References
Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Campbell, C.D. et al. Population-genetic properties of differentiated human copy number polymorphisms. Am. J. Hum. Genet. 88, 317–332 (2011).
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Ohno, S. Evolution by Gene Duplication (Springer, New York, 1970).
Semple, C.A., Rolfe, M. & Dorin, J.R. Duplication and selection in the evolution of primate beta-defensin genes. Genome Biol. 4, R31 (2003).
Han, M.V., Demuth, J.P., McGrath, C.L., Casola, C. & Hahn, M.W. Adaptive evolution of young gene duplicates in mammals. Genome Res. 19, 859–867 (2009).
Bailey, J.A. & Eichler, E.E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).
Lefebvre, S. et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell 80, 155–165 (1995).
Olbrich, H. et al. Recessive HYDIN mutations cause primary ciliary dyskinesia without randomization of left-right body asymmetry. Am. J. Hum. Genet. 91, 672–684 (2012).
Bunge, S. et al. Homologous nonallelic recombinations between the iduronate-sulfatase gene and pseudogene cause various intragenic deletions and inversions in patients with mucopolysaccharidosis type II. Eur. J. Hum. Genet. 6, 492–500 (1998).
Lupski, J.R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998).
Dennis, M.Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).
Doggett, N.A. et al. A 360-kb interchromosomal duplication of the human HYDIN locus. Genomics 88, 762–771 (2006).
Locke, D.P. et al. Linkage disequilibrium and heritability of copy number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).
McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37–S42 (2007).
Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450 (2010).
Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434–1440 (2005).
Bhattacharya, T. et al. CCL3L1 and HIV/AIDS susceptibility. Nat. Med. 15, 1112–1115 (2009).
Cantsilieris, S., Baird, P.N. & White, S.J. Molecular methods for genotyping complex copy number polymorphisms. Genomics 101, 86–93 (2013).
Armour, J.A.L. et al. Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats. Nucleic Acids Res. 35, e19 (2007).
Schouten, J.P. et al. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res. 30, e57 (2002).
Armour, J.A., Sismani, C., Patsalis, P.C. & Cross, G. Measurement of locus copy number by hybridisation with amplifiable probes. Nucleic Acids Res. 28, 605–609 (2000).
Waszak, S.M. et al. Systematic inference of copy number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput. Biol. 6, e1000988 (2010).
Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat. Biotechnol. 21, 673–678 (2003).
Hardenbol, P. et al. Highly multiplexed molecular inversion probe genotyping: over 10,000 targeted SNPs genotyped in a single tube assay. Genome Res. 15, 269–275 (2005).
Porreca, G.J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931–936 (2007).
Turner, E.H. et al. Massively parallel exon capture and library-free resequencing across 16 genomes. Nat. Methods 6, 315–316 (2009).
Colin, Y. et al. Genetic basis of the RhD-positive and RhD-negative blood group polymorphism as determined by Southern analysis. Blood 78, 2747–2752 (1991).
Wagner, F.F. & Flegel, W.A. RHD gene deletion occurred in the Rhesus box . Blood 95, 3662–3668 (2000).
Kitano, T. & Saitou, N. Evolution of Rh blood group genes have experienced gene conversions and positive selection. J. Mol. Evol. 49, 615–626 (1999).
O'Roak, B.J. et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619–1622 (2012).
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Lee, J.A., Carvalho, C.M. & Lupski, J.R.A. DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131, 1235–1247 (2007).
Zhang, F. et al. The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat. Genet. 41, 849–853 (2009).
Fledel-Alon, A. et al. Broad-scale recombination patterns underlying proper disjunction in humans. PLoS Genet. 5, e1000658 (2009).
Carritt, B., Kemp, T.J. & Poulter, M. Evolution of the human RH (rhesus) blood group genes: a 50 year old prediction (partially) fulfilled. Hum. Mol. Genet. 6, 843–850 (1997).
Edwards, M.C. & Gibbs, R.A. Multiplex PCR: advantages, development, and applications. PCR Methods Appl. 3, S65–S75 (1994).
Markoulatos, P., Siafakas, N. & Moncany, M. Multiplex polymerase chain reaction: a practical approach. J. Clin. Lab. Anal. 16, 47–51 (2002).
Groth, M. et al. High-resolution mapping of the 8p23.1 beta-defensin cluster reveals strictly concordant copy number variation of all genes. Hum. Mutat. 29, 1247–1254 (2008).
Aldhous, M.C. et al. Measurement methods and accuracy in copy number variation: failure to replicate associations of beta-defensin copy number with Crohn's disease. Hum. Mol. Genet. 19, 4930–4938 (2010).
Fernando, M.M. et al. Assessment of complement C4 gene copy number using the paralog ratio test. Hum. Mutat. 31, 866–874 (2010).
Hiatt, J.B. et al. Single molecule molecular inversion probes for targeted, high accuracy detection of low frequency variation. Genome Res. 23, 843–854 (2013).
Itsara, A. et al. Resolving the breakpoints of the 17q21.31 microdeletion syndrome with next-generation sequencing. Am. J. Hum. Genet. 90, 599–613 (2012).
Jackson, M.S. et al. Evidence for widespread reticulate evolution within human duplicons. Am. J. Hum. Genet. 77, 824–840 (2005).
Schildkraut, E., Miller, C.A. & Nickoloff, J.A. Gene conversion and deletion frequencies during double-strand break repair in human cells are controlled by the distance between direct repeats. Nucleic Acids Res. 33, 1574–1580 (2005).
Ezawa, K., Oota, S. & Saitou, N. Proceedings of the SMBE Tri-National Young Investigators' Workshop 2005. Genome-wide search of gene conversions in duplicated genes of mouse and rat. Mol. Biol. Evol. 23, 927–940 (2006).
Chen, J.M. et al. Gene conversion: mechanisms, evolution and human disease. Nat. Rev. Genet. 8, 762–775 (2007).
Thompson, J.D., Gibson, T.J. & Higgins, D.G. Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinformatics 1, 2.3 (2002).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, 4.4 (2009).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7, 576–577 (2010).
Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat. Genet. 41, 1061–1067 (2009).
Antonacci, F. et al. A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nat. Genet. 42, 745–750 (2010).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25, 1754–1760 (2009).
Kuhn, H.W. The Hungarian Method for the assignment problem. Nav. Res. Logist. Q. 2, 83–97 (1955).
Carpenter, D., Walker, S., Prescott, N., Schalkwijk, J. & Armour, J.A. Accuracy and differential bias in copy number measurement of CCL3L1 in association studies with three auto-immune disorders. BMC Genomics 12, 418 (2011).
Nordang, G.B. et al. Association analysis of the CCL3L1 copy number locus by paralogue ratio test in Norwegian rheumatoid arthritis patients and healthy controls. Genes Immun. 13, 579–582 (2012).
Krumm, N. et al. Copy number variation detection and genotyping from exome sequence data. Genome Res. 22, 1525–1532 (2012).
Acknowledgements
We thank J. Kitzman for early ideas and enthusiasm for the project; P. Sudmant, E. Karakoc, F. Hormozdiari, B. Dumont and O. Penn for thoughtful discussion; L. Vives, K. Mohajeri and C. Lee for technical assistance; and T. Brown for assistance with manuscript preparation. X.N. is supported by a US National Science Foundation Graduate Research Fellowship under grant no. DGE-1256082. This work was supported by US National Institutes of Health grants HG004120 and HG002385 to E.E.E. E.E.E. is supported by the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Contributions
X.N., J.S. and E.E.E. designed the study. X.N. and B.J.O. designed the MIPs. X.N. performed capture experiments, wrote analysis software and analyzed data. F.A. performed FISH experiments. J.H. contributed to the analysis software, prepared it for public access and identified SUNs from the reference genome. M.F. and C.R. contributed to sample collection. X.N. and E.E.E. wrote the paper, with input and approval from all coauthors.
Corresponding author
Ethics declarations
Competing interests
E.E.E. is on the scientific advisory boards for Pacific Biosciences, Inc., SynapDx Corp. and DNAnexus, Inc.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–10 and Supplementary Tables 1 and 7–10 (PDF 3550 kb)
Supplementary Table 2
MIP pooling scheme and sequences. (XLSX 25 kb)
Supplementary Table 3
Comparison of MIP-based paralog-specific copy number estimates with orthogonal data. (XLSX 22 kb)
Supplementary Table 4
Comparison of MIP-based genotyping between replicate experiments for 73 individuals where at least 1 technical replicate was performed. (XLSX 23 kb)
Supplementary Table 5
Frequencies of SRGAP2 and RH duplications, deletions, and interlocus gene conversions in 9 HapMap populations. (XLSX 16 kb)
Supplementary Table 6
Single nucleotide variants detected and genotyped for NA18507 and comparison with whole-genome sequence data. (XLSX 15 kb)
Supplementary Table 11
Comparison of MIP-based paralog-specific detected internal variants with orthogonal data. (XLSX 11 kb)
Rights and permissions
About this article
Cite this article
Nuttle, X., Huddleston, J., O'Roak, B. et al. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nat Methods 10, 903–909 (2013). https://doi.org/10.1038/nmeth.2572
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2572
This article is cited by
-
Long-read sequence and assembly of segmental duplications
Nature Methods (2019)
-
The birth of a human-specific neural gene by incomplete duplication and gene fusion
Genome Biology (2017)
-
Quantification of differential gene expression by multiplexed targeted resequencing of cDNA
Nature Communications (2017)
-
Molecular Inversion Probes for targeted resequencing in non-model organisms
Scientific Reports (2016)
-
Inferring mechanisms of copy number change from haplotype structures at the human DEFA1A3 locus
BMC Genomics (2014)