Cost-effective high-throughput single-haplotype iterative mapping and sequencing for complex genomic structures

Bellott, Daniel W; Cho, Ting-Jan; Hughes, Jennifer F; Skaletsky, Helen; Page, David C

doi:10.1038/nprot.2018.019

Protocol
Published: 22 March 2018

Cost-effective high-throughput single-haplotype iterative mapping and sequencing for complex genomic structures

Daniel W Bellott¹^na1,
Ting-Jan Cho¹^na1,
Jennifer F Hughes¹,
Helen Skaletsky^1,2 &
…
David C Page ORCID: orcid.org/0000-0001-5489-6453^1,2,3

Nature Protocols volume 13, pages 787–809 (2018)Cite this article

1460 Accesses
8 Citations
2 Altmetric
Metrics details

Subjects

Abstract

The reference sequences of structurally complex regions can be obtained only through highly accurate clone-based approaches. We and others have successfully used single-haplotype iterative mapping and sequencing (SHIMS) 1.0 to assemble structurally complex regions across the sex chromosomes of several vertebrate species and to allow for targeted improvements to the reference sequences of human autosomes. However, SHIMS 1.0 is expensive and time consuming, requiring resources that only a genome center can provide. Here we introduce SHIMS 2.0, an improved SHIMS protocol that allows even a small laboratory to generate high-quality reference sequence from complex genomic regions. Using a streamlined and parallelized library-preparation protocol, and taking advantage of inexpensive high-throughput short-read-sequencing technologies, a small laboratory with both molecular biology and bioinformatics experience can sequence and assemble 192 large-insert bacterial artificial chromosome (BAC) or fosmid clones in 1 week. In SHIMS 2.0, in contrast to other pooling strategies, each clone is sequenced with a unique barcode, thus enabling clones containing nearly identical sequences to be multiplexed in a single sequencing run and assembled separately. Relative to SHIMS 1.0, SHIMS 2.0 decreases the required cost and time by two orders of magnitude while preserving high sequencing accuracy.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of the SHIMS 2.0 protocol.**

**Figure 2: Example QC agarose gel with >1-kb fragments.**

**Figure 3: Example Agilent Bioanalyzer electropherogram plot.**

Highly accurate long-read HiFi sequencing data for five complex genomes

Article Open access 17 November 2020

Ting Hon, Kristin Mars, … David R. Rank

Efficient hybrid de novo assembly of human genomes with WENGAN

Article Open access 14 December 2020

Alex Di Genova, Elena Buena-Atienza, … Marie-France Sagot

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Article Open access 07 December 2020

David Porubsky, Peter Ebert, … Tobias Marschall

References

Mueller, J.L. et al. Independent specialization of the human and mouse X chromosomes for the male germ line. Nat. Genet. 45, 1083–1087 (2013).
Article CAS Google Scholar
Lupski, J.R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998).
Article CAS Google Scholar
Stankiewicz, P. & Lupski, J.R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010).
Article CAS Google Scholar
Ross, M.T. et al. The DNA sequence of the human X chromosome. Nature 434, 325–337 (2005).
Article CAS Google Scholar
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Gordon, D. & Green, P. Consed: a graphical editor for next-generation sequencing. Bioinformatics 29, 2936–2937 (2013).
Article CAS Google Scholar
Bonfield, J.K., Smith, K. & Staden, R. A new DNA sequence assembly program. Nucleic Acids Res. 23, 4992–4999 (1995).
Article CAS Google Scholar
She, X. et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431, 927–930 (2004).
Article CAS Google Scholar
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
Article CAS Google Scholar
Gordon, D. et al. Long-read sequence assembly of the gorilla genome. Science 352, aae0344 (2016).
Article Google Scholar
Eichler, E.E. Segmental duplications: what's missing, misassigned, and misassembled—and should we care? Genome Res. 11, 653–656 (2001).
Article CAS Google Scholar
Dennis, M.Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).
Article CAS Google Scholar
Steinberg, K.M. et al. Single haplotype assembly of the human genome from a hydatidiform mole. Genome Res. 24, 2066–2076 (2014).
Article CAS Google Scholar
Watson, C.T. et al. Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation. Am. J. Hum. Genet. 92, 530–546 (2013).
Article CAS Google Scholar
Mohajeri, K. et al. Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the chromosome 8p23.1 region. Genome Res. 26, 1453–1467 (2016).
Article CAS Google Scholar
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
Article CAS Google Scholar
Kuroda-Kawaguchi, T. et al. The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men. Nat. Genet. 29, 279–286 (2001).
Article CAS Google Scholar
Skaletsky, H. et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423, 825–837 (2003).
Article CAS Google Scholar
Repping, S. et al. High mutation rates have driven extensive structural polymorphism among human Y chromosomes. Nat. Genet. 38, 463–467 (2006).
Article CAS Google Scholar
Lange, J. et al. Intrachromosomal homologous recombination between inverted amplicons on opposing Y-chromosome arms. Genomics 102, 257–264 (2013).
Article CAS Google Scholar
Lange, J., Skaletsky, H., Bell, G.W. & Page, D.C. MSY Breakpoint Mapper, a database of sequence-tagged sites useful in defining naturally occurring deletions in the human Y chromosome. Nucleic Acids Res. 36, D809 D (2008).
Article CAS Google Scholar
Lange, J. et al. Isodicentric Y chromosomes and sex disorders as byproducts of homologous recombination that maintains palindromes. Cell 138, 855–869 (2009).
Article CAS Google Scholar
Repping, S. et al. Polymorphism for a 1.6-Mb deletion of the human Y chromosome persists through balance between recurrent mutation and haploid selection. Nat. Genet. 35, 247–251 (2003).
Article CAS Google Scholar
Repping, S. et al. Recombination between palindromes P5 and P1 on the human Y chromosome causes massive deletions and spermatogenic failure. Am. J. Hum. Genet. 71, 906–922 (2002).
Article Google Scholar
Repping, S. et al. A family of human Y chromosomes has dispersed throughout northern Eurasia despite a 1.8-Mb deletion in the azoospermia factor c region. Genomics 83, 1046–1052 (2004).
Article CAS Google Scholar
Rozen, S.G. et al. AZFc deletions and spermatogenic failure: a population-based survey of 20,000 Y chromosomes. Am. J. Hum. Genet. 91, 890–896 (2012).
Article CAS Google Scholar
Bellott, D.W. et al. Mammalian Y chromosomes retain widely expressed dosage-sensitive regulators. Nature 508, 494–499 (2014).
Article CAS Google Scholar
Bellott, D.W. et al. Convergent evolution of chicken Z and human X chromosomes by expansion and gene acquisition. Nature 466, 612–616 (2010).
Article CAS Google Scholar
Hughes, J.F. et al. Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes. Nature 483, 82–86 (2012).
Article CAS Google Scholar
Hughes, J.F. et al. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463, 536–539 (2010).
Article CAS Google Scholar
Soh, Y.Q. et al. Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes. Cell 159, 800–813 (2014).
Article CAS Google Scholar
Bellott, D.W. et al. Avian W and mammalian Y chromosomes convergently retained dosage-sensitive regulators. Nat. Genet. 49, 387–394 (2017).
Article CAS Google Scholar
Li, G. et al. Comparative analysis of mammalian Y chromosomes illuminates ancestral structure and lineage-specific evolution. Genome Res. 23, 1486–1495 (2013).
Article CAS Google Scholar
Sato, K., Motoi, Y., Yamaji, N. & Yoshida, H. 454 Sequencing of pooled BAC clones on chromosome 3H of barley. BMC Genom. 12, 246 (2011).
Article CAS Google Scholar
Quinn, N.L. et al. Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome. BMC Genom. 9, 404 (2008).
Article Google Scholar
Rounsley, S., Lin, X. & Ketchum, K.A. Large-scale sequencing of plant genomes. Curr. Opin. Plant Biol. 1, 136–141 (1998).
Article CAS Google Scholar
National Center for Biotechnology Information. Commercial and Academic Suppliers of Clones, Libraries and Other Reagents Described in Clone DB https://www.ncbi.nlm.nih.gov/clone/content/distributors/ (2017).
Guha, S. & Maheshwari, S.C. Cell division and differentiation of embryos in pollen grains of Daturain vitro. Nature 212, 97–98 (1966).
Article Google Scholar
Jain, S.M., Sopory, S.K. & Veilleux, R.E. In vitro haploid production in higher plants (Kluwer Academic Publishers, 1996).
Bonfield, J.K. & Whitwham, A. Gap5: editing the billion fragment sequence assembly. Bioinformatics 26, 1699–1703 (2010).
Article CAS Google Scholar
Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).
Article CAS Google Scholar
Wilkening, S. et al. Genotyping 1000 yeast strains by next-generation sequencing. BMC Genom. 14, 90 (2013).
Article CAS Google Scholar
Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).
Article CAS Google Scholar
Goodwin, S. et al. Oxford nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 25, 1750–1756 (2015).
Article CAS Google Scholar
Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015).
Article CAS Google Scholar
Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30, 693–700 (2012).
Article CAS Google Scholar
Madoui, M.A. et al. Genome assembly using nanopore-guided long and error-free DNA reads. BMC Genom. 16, 327 (2015).
Article Google Scholar
Tomaszkiewicz, M. et al. A time- and cost-effective strategy to sequence mammalian Y chromosomes: an application to the de novo assembly of gorilla Y. Genome Res. 26, 530–540 (2016).
Article CAS Google Scholar
McCoy, R.C. et al. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 9, e106689 (2014).
Article Google Scholar
Li, R. et al. Illumina synthetic long read sequencing allows recovery of missing sequences even in the “finished” C. elegans genome. Sci. Rep. 5, 10814 (2015).
Article CAS Google Scholar
Dong, Y. et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat. Biotechnol. 31, 135–141 (2013).
Article CAS Google Scholar
Seo, J.S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).
Article CAS Google Scholar
Nagaraja, R. et al. Characterization of four human YAC libraries for clone size, chimerism and X chromosome sequence representation. Nucleic Acids Res. 22, 3406–3411 (1994).
Article CAS Google Scholar
Venter, J.C., Smith, H.O. & Hood, L. A new strategy for genome sequencing. Nature 381, 364–366 (1996).
Article CAS Google Scholar
Glenn, T.C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).
Article CAS Google Scholar
Agencourt Bioscience Corporation. Agencourt CosMCPrep High and Low Copy Plasmid Purification https://www.beckmancoulter.com/wsrportal/bibliography?docname=Protocol000381v012.pdf (2006).
Lange, V. et al. Cost-efficient high-throughput HLA typing by MiSeq amplicon sequencing. BMC Genom. 15, 63 (2014).
Article Google Scholar
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).
Article Google Scholar
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article CAS Google Scholar
Sahlin, K., Vezzi, F., Nystedt, B., Lundeberg, J. & Arvestad, L. BESST: efficient scaffolding of large fragmented assemblies. BMC Bioinform. 15, 281 (2014).
Article Google Scholar
Salmela, L., Sahlin, K., Makinen, V. & Tomescu, A.I. Gap filling as exact path length problem. J. Comput. Biol. 23, 347–361 (2016).
Article CAS Google Scholar
Church, D.M. Tiling Path File (TPF) Specification v1.4 https://www.ncbi.nlm.nih.gov/projects/genome/assembly/TPF_Specification_v1.4_20110215.pdf (2011).
National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/ (2014).
McPherson, J.D. et al. A physical map of the human genome. Nature 409, 934–941 (2001).
Article CAS Google Scholar
National Center for Biotechnology Information. What is tbl2asn? https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/ (2017).

Download references

Acknowledgements

This work was supported by the National Institutes of Health and the Howard Hughes Medical Institute.

Author information

Daniel W Bellott and Ting-Jan Cho: These authors contributed equally to this work.

Authors and Affiliations

Whitehead Institute, Cambridge, Massachusetts, USA
Daniel W Bellott, Ting-Jan Cho, Jennifer F Hughes, Helen Skaletsky & David C Page
Howard Hughes Medical Institute, Whitehead Institute, Cambridge, Massachusetts, USA
Helen Skaletsky & David C Page
Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
David C Page

Authors

Daniel W Bellott
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Jan Cho
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer F Hughes
View author publications
You can also search for this author in PubMed Google Scholar
Helen Skaletsky
View author publications
You can also search for this author in PubMed Google Scholar
David C Page
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.W.B., H.S., J.F.H., and D.C.P. designed the study. D.W.B. and T.-J.C. developed the experimental methods. D.W.B. wrote the scripts for computational analysis. D.W.B., T.-J.C., and D.C.P. wrote the manuscript.

Corresponding author

Correspondence to Daniel W Bellott.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1 (PDF 351 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bellott, D., Cho, TJ., Hughes, J. et al. Cost-effective high-throughput single-haplotype iterative mapping and sequencing for complex genomic structures. Nat Protoc 13, 787–809 (2018). https://doi.org/10.1038/nprot.2018.019

Download citation

Published: 22 March 2018
Issue Date: April 2018
DOI: https://doi.org/10.1038/nprot.2018.019

This article is cited by

Did some extinct South American native ungulates arise from an afrothere ancestor? A critical appraisal of Avilla and Mothé’s (2021) Sudamericungulata – Panameridiungulata hypothesis
- Alejandro G. Kramarz
- Ross D. E. Macphee
Journal of Mammalian Evolution (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.