Long-read sequence and assembly of segmental duplications

Vollger, Mitchell R.; Dishuck, Philip C.; Sorensen, Melanie; Welch, AnneMarie E.; Dang, Vy; Dougherty, Max L.; Graves-Lindsay, Tina A.; Wilson, Richard K.; Chaisson, Mark J. P.; Eichler, Evan E.

doi:10.1038/s41592-018-0236-3

Article
Published: 17 December 2018

Long-read sequence and assembly of segmental duplications

Mitchell R. Vollger ORCID: orcid.org/0000-0002-8651-1615¹,
Philip C. Dishuck ORCID: orcid.org/0000-0003-2223-9787¹,
Melanie Sorensen¹,
AnneMarie E. Welch¹,
Vy Dang¹,
Max L. Dougherty¹,
Tina A. Graves-Lindsay²,
Richard K. Wilson^3,4,
Mark J. P. Chaisson⁵ &
…
Evan E. Eichler ORCID: orcid.org/0000-0002-8246-4014^1,6

Nature Methods volume 16, pages 88–94 (2019)Cite this article

7395 Accesses
88 Citations
72 Altmetric
Metrics details

Subjects

Genome assembly algorithms

Abstract

We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA; https://github.com/mvollger/SDA) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33–79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Flowchart of the SDA method.**

**Fig. 2: SDA results of the CHM1 human genome assembly.**

**Fig. 3: Sequence and assembly of *SRGAP2* loci in the CHM13 human genome.**

**Fig. 4: Correspondence between SDA sequence-diverged contigs and BACs.**

Evolution of tissue-specific expression of ancestral genes across vertebrates and insects

Article 15 April 2024

Federica Mantica, Luis P. Iñiguez, … Manuel Irimia

scGHOST: identifying single-cell 3D genome subcompartments

Article 08 April 2024

Kyle Xiong, Ruochi Zhang & Jian Ma

Nanopore sequencing technology, bioinformatics and applications

Article 08 November 2021

Yunhao Wang, Yue Zhao, … Kin Fai Au

Data availability

SMRT WGS for CHM1, CHM13, and NA12940 from this study are available at the NCBI Sequence Read Archive (SRA) under accession numbers SRP044331 for CHM1; SRX818607, SRX825542, and SRX825575–SRX825579 for CHM13; and SRX1093000, SRX1093555, SRX1093654, SRX1094289, SRX1094374, SRX1094388, and SRX1096798 for NA19240. ONT WGS data are available at https://github.com/nanopore-wgs-consortium/NA12878/blob/master/Genome.md. De novo assemblies of CHM1, CHM13, NA12940, and NA12878 from this study are available at the NCBI Assembly database under accession numbers GCA_001297185.1, GCA_000983455.2, GCA_001524155.4, and GCA_900232925.1, respectively. Assembled CHORI-17 BACs are available at the NCBI Clone DB (https://www.ncbi.nlm.nih.gov/clone/) under the accession numbers listed in Supplementary Table 4. Information about length, PSVs, and mapping location in GRCh38 can be found for all the SDA contigs generated, in Supplementary Table 8. Additional data that support the findings of this study are available from the corresponding author upon request.

References

Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Article CAS Google Scholar
Alkan, C., Sajjadian, S. & Eichler, E. E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
Article CAS Google Scholar
Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).
Article CAS Google Scholar
Shi, L. et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat. Commun. 7, 12065 (2016).
Article CAS Google Scholar
Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).
Article CAS Google Scholar
Gordon, D. et al. Long-read sequence assembly of the gorilla genome. Science 352, aae0344 (2016).
Article Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS Google Scholar
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Article CAS Google Scholar
Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).
Article CAS Google Scholar
Kronenberg, Z. N. et al. High-resolution comparative analysis of great ape genomes. Science 360, eaar6343 (2018).
Article Google Scholar
Kelley, D. R. & Salzberg, S. L. Detection and correction of false segmental duplications caused by genome mis-assembly. Genome. Biol. 11, R28 (2010).
Article Google Scholar
Pop, M. Shotgun sequence assembly. Adv. Comput. 60, 193–248 (2004).
Article Google Scholar
Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001).
Article CAS Google Scholar
Pevzner, P. A., Tang, H. & Tesler, G. De novo repeat classification and fragment assembly. Genome Res. 14, 1786–1796 (2004).
Article CAS Google Scholar
Myers, E. W. The fragment assembly string graph. Bioinformatics 21, ii79–ii85 (2005).
CAS Google Scholar
Stankiewicz, P. & Lupski, J. R. Genome architecture, rearrangements and genomic disorders. Trends Genet. 18, 74–82 (2002).
Article CAS Google Scholar
Sharp, A. J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).
Article CAS Google Scholar
Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).
Article Google Scholar
Chen, J. et al. Bovine NK-lysin: copy number variation and functional diversification. Proc. Natl. Acad. Sci. USA 112, E7223–E7229 (2015).
Article CAS Google Scholar
Dennis, M. Y. & Eichler, E. E. Human adaptation and evolution by segmental duplication. Curr. Opin. Genet. Dev. 41, 44–52 (2016).
Article CAS Google Scholar
Abegglen, L. M. et al. Potential mechanisms for cancer resistance in elephants and comparative cellular response to DNA damage in humans. J. Am. Med. Assoc. 314, 1850–1860 (2015).
Article CAS Google Scholar
Church, D. M. et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 7, e1000112 (2009).
Article Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS Google Scholar
Emanuel, B. S. & Shaikh, T. H. Segmental duplications: an ‘expanding’ role in genomic instability and disease. Nat. Rev. Genet. 2, 791–800 (2001).
Article CAS Google Scholar
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
Article CAS Google Scholar
Chaisson, M. J., Mukherjee, S., Kannan, S. & Eichler, E. E. Resolving multicopy duplications de novo using polyploid phasing. RECOMB 10229, 117–133 (2017).
CAS Google Scholar
Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
Article CAS Google Scholar
Ailon, N., Charikar, M. & Newman, A. Aggregating inconsistent information. J. Assoc. Comput. Mach. 55, 1–27 (2008).
Article Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Fiddes, I. T. et al. Human-specific NOTCH2NL genes affect notch signaling and cortical neurogenesis. Cell 173, 1356–1369 (2018).
Article CAS Google Scholar
Florio, M. et al. Evolution and cell-type specificity of human-specific genes preferentially expressed in progenitors of fetal neocortex. eLife 7, e32332 (2018).
Article Google Scholar
Dennis, M. Y. et al. Evolution of human-specific neural SRGAP2 genes by incomplete segmental duplication. Cell 149, 912–922 (2012).
Article CAS Google Scholar
Nuttle, X. et al. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nat. Methods 10, 903–909 (2013).
Article CAS Google Scholar
Dennis, M. Y. et al. The evolution and population diversity of human-specific segmental duplications. Nat. Ecol. Evol. 1, 0069 (2017).
Article Google Scholar
Steinberg, K. M. et al. High-quality assembly of an individual of Yoruban descent. bioRxiv Preprint at https://www.biorxiv.org/content/early/2016/08/02/067447 (2016).
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Article CAS Google Scholar
BACPAC Resources. The CHORI-17 BAC library from a hydatidiform (haploid) mole. CloneDB https://www.ncbi.nlm.nih.gov/clone/library/genomic/76/ (2018).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Article CAS Google Scholar
Nuttle, X. et al. Emergence of a Homo sapiens–specific gene family and chromosome 16p11.2 CNV susceptibility. Nature 536, 205–209 (2016).
Article CAS Google Scholar
Dougherty, M. L. et al. Transcriptional fates of human-specific segmental duplications in brain. Genome Res. 28, 1566–1576 (2018).
Article CAS Google Scholar
Das, S. & Vikalo, H. SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics 16, 260 (2015).
Article Google Scholar
Aguiar, D. & Istrail, S. Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29, i352–i360 (2013).
Article CAS Google Scholar
Berger, E., Yorukoglu, D., Peng, J. & Berger, B. in Research in Computational Molecular Biology: RECOMB 2014 (ed Sharan, R.) 18–19 (Springer, 2014).
Puljiz, Z. & Vikalo, H. Decoding genetic variations: communications-inspired haplotype assembly. IEEE/ACM. Trans. Comput. Biol. Bioinform. 13, 518–530 (2016).
Article CAS Google Scholar
Bonizzoni, P. et al. On the minimum error correction problem for haplotype assembly in diploid and polyploid genomes. J. Comput. Biol. 23, 718–736 (2016).
Article CAS Google Scholar
Artyomenko, A. et al. Long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants. J. Comput. Biol. 24, 558–570 (2017).
Article CAS Google Scholar
Parsons, J. D. Miropeats: graphical DNA sequence comparisons. Comput. Appl. Biosci. 11, 615–619 (1995).
CAS Google Scholar
Jain, C., Koren, S., Dilthey, A., Phillippy, A. M. & Aluru, S. A fast adaptive algorithm for computing whole-genome homology maps. Bioinformatics 34, i748–i756 (2018).
Article Google Scholar
Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012).
Article CAS Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article Google Scholar
Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22, 498–509 (2015).
Article CAS Google Scholar
Steinberg, K. M. et al. Structural diversity and African origin of the 17q21.31 inversion polymorphism. Nat. Genet. 44, 872–880 (2012).
Article CAS Google Scholar
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank S. Cantsilieris and D. Gordon for technical assistance, J. Underwood for recommendations regarding the analysis of HSDs and Iso-Seq data, and T. Brown for help in editing this manuscript. This work was supported, in part, by grants from the US National Institutes of Health (NIH) (HG002385 to E.E.E., HG007635 to R.K.W. and E.E.E., and HG003079 to R.K.W.). M.R.V. was supported by a National Library of Medicine (NLM) Big Data Training Grant for Genomics and Neuroscience (5T32LM012419-04). P.C.D. was supported by a National Human Genome Research Institute (NHGRI) training grant (5T32HG000035-23). E.E.E. is an investigator of the Howard Hughes Medical Institute.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Mitchell R. Vollger, Philip C. Dishuck, Melanie Sorensen, AnneMarie E. Welch, Vy Dang, Max L. Dougherty & Evan E. Eichler
The McDonnell Genome Institute at Washington University, Washington University School of Medicine, St. Louis, MO, USA
Tina A. Graves-Lindsay
Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, USA
Richard K. Wilson
Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
Richard K. Wilson
University of Southern California, Los Angeles, CA, USA
Mark J. P. Chaisson
Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
Evan E. Eichler

Authors

Mitchell R. Vollger
View author publications
You can also search for this author in PubMed Google Scholar
Philip C. Dishuck
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Sorensen
View author publications
You can also search for this author in PubMed Google Scholar
AnneMarie E. Welch
View author publications
You can also search for this author in PubMed Google Scholar
Vy Dang
View author publications
You can also search for this author in PubMed Google Scholar
Max L. Dougherty
View author publications
You can also search for this author in PubMed Google Scholar
Tina A. Graves-Lindsay
View author publications
You can also search for this author in PubMed Google Scholar
Richard K. Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. P. Chaisson
View author publications
You can also search for this author in PubMed Google Scholar
Evan E. Eichler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.R.V., M.J.P.C., and E.E.E. developed the SDA method; R.K.W. and T.A.G.-L. generated the PacBio genome sequence; M.S., A.E.W., M.R.V., and V.D. sequenced and analyzed the BAC clone insert; P.C.D., M.R.V., and M.L.D. carried out Iso-Seq analysis; M.R.V. organized the supplementary material; M.R.V., E.E.E., and M.J.P.C. wrote the manuscript; M.R.V. and P.C.D. produced the display items.

Corresponding authors

Correspondence to Mark J. P. Chaisson or Evan E. Eichler.

Ethics declarations

Competing interests

E.E.E. is on the scientific advisory board (SAB) of DNAnexus, Inc.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Proportion of resolved SDs in different PacBio (PB)/ONT genome assemblies.

The figure shows the percent of SD bases that are resolved in human genome assemblies plotted as a function of the length of minimum extension of the alignment past the duplication. The number of resolved SD base pairs is relatively constant irrespective of the requirement of flanking unique base pairs. The dashed red line indicates the threshold chosen for our analysis used to generate the first panel in Supplementary Fig. 2 and the fraction of resolved SDs in Supplementary Table 1.

Supplementary Figure 2 Resolution of SDs in SMRT genome assemblies.

SDs (as a function of percent identity and length) in GRCh38 are marked as resolved (black) if present in the CHM1 assembly, or unresolved (red) if it appears only in the reference. The stacked marginal histograms show the relative number of resolved and unresolved SDs within each bin. Resolved duplications are defined as those mapping with high sequence identity, being completely contained, and extending at least 50 kb into unique sequence on either side of the duplication block (Methods). See Supplementary Fig. 1 and Supplementary Table 1 for the fraction of unresolved duplications across different genomes, assemblers, and technologies. Note that resolved and unresolved SDs are offset from one another along the y-axis to avoid overlapping. b) This plot shows the number of genes that exist within unresolved SDs blocks in the CHM1 assembly versus the maximum percent identity SD within that block.

Supplementary Figure 3 Length of collapsed SDs and SDA assemblies.

Correlation of collapse length and SDA assembly length in a) CHM1 (n = 590), b) CHM13 (n = 1,440), and c) NA19240 (n = 1,772) genome assemblies. In all three assemblies there is a strong correlation (Pearson’s correlation) between the length of a collapsed SD and the length of the resulting SDA assembly. SDA is not restricted to assembling duplications less than the maximum read length (like other assemblers), but rather it is restricted by the size of the collapsed duplication.

Supplementary Figure 4 Sequence and assembly of NOTCH2 loci in the CHM1 human genome.

a) A collapsed representation of a portion of the NOTCH2 loci is shown. Plotted is the read-depth profile over a collapsed representation of NOTCH2. Each black dot represents the coverage of the most frequent base pair at that position, while each red dot is the second most frequent. Secondary bases at low frequency represent sequencing error; however, those at high frequency represent PSV candidates. b) NOTCH2 PSV graph resolves the collapse into five potential loci. c) The alignment of each SDA contig back to the loci for NOTCH2 (./NLA/NLB/NLC/NLD) using Miropeats. Our assembled sequence is 99.88% identical over all five loci and >99.995% identical if only mismatched bases are counted as errors.

Supplementary Figure 5 SDA results for the CHM13 assembly.

a) SDA analysis of the CHM13 FALCON assembly generates 1,848 PSV clusters. b) Cumulative distribution of the assemblies and their percent identity to their best match in the reference. There are 40.4 Mb of diverged assembly (gray) and 43.0 Mb that map to the reference at high identity (black). c) A density plot of SDs plotted by length and percent identity. d) Copy number difference (CND) between CHM13 and the reference genome (CHM13 copy number – reference genome copy number) comparing n = 186 SD regions that match (>99.8%) versus n = 374 diverged SD regions (<99.8% identity). The mean CND of the matched sequence is 1.61 and the mean CND of the diverged sequence is 5.98, indicating that the diverged sequences are much more likely to represent additional duplicate copies that are unrepresented in the reference genome (GRCh38) (two-sided Mann-Whitney test; P = 2.77 × 10^–5). The boxes indicate the range between the first and third quartiles, with the bold line specifying the median. The whiskers show the minimum and maximum within 1.5 times the interquartile range extending from the first and third quartiles. (See Fig. 2 for more details.).

Supplementary Figure 6 SDA results for the NA19240 (African Yoruban) assembly.

a) SDA analysis of the NA19240 FALCON assembly generates 2,136 PSV clusters. b) Cumulative distribution of the assemblies and their percent identity to their best match in the reference. There are 46.1 Mb of diverged assembly (gray) and 41.0 Mb that maps to the reference at high identity (black). c) A density plot of SDs plotted by length and percent identity. d) CND between NA19240 and the reference genome (NA19240 copy number – reference genome copy number) comparing n = 177 SD regions that match (>99.8%) versus n = 384 diverged SD regions (<99.8% identity). The mean CND of the matched sequence is 4.11 and the mean CND of the diverged sequence is 10.87, indicating that the diverged sequences are much more likely to represent additional duplicate copies that are unrepresented in the reference genome (GRCh38) (two-sided Mann-Whitney test; P = 1.88 × 10^–4). The boxes indicate the range between the first and third quartiles, with the bold line specifying the median. The whiskers show the minimum and maximum within 1.5 times the interquartile range extending from the first and third quartiles. (See Fig. 2 for more details.).

Supplementary Figure 7 Comparison of SDA on ONT versus SMRT data.

The left half of the figure shows the results of SDA applied to the ONT assembly of NA12878; on the right is the PacBio assembly of NA19240. a) SDA analysis of the NA12878 assembly generated 38 assemblies that mapped with >99.8% identity (matched) to GRCh38 and 792 mapped with <99.8% sequence identity (diverged). Failed clusters (n = 1,052) did not result in an assembly, while multiple assemblies were PSV clusters with more than one contig produced by the Canu assembly. b) Cumulative distribution of the assemblies and their percent identity to their best match in the reference. The number of assembly Mb is calculated independently of a mapping to the reference. c) Length distribution of the matched and diverged assemblies (NA12878: matched n = 38, diverged n = 792; NA19240: matched n = 789, diverged n = 983). The lines on the violin plots indicate the first and third quartiles as well as the median. d) Sequencing read-depth distribution of the second most common SNV across all collapsed regions of SDs.

Supplementary Figure 8 Sequence and assembly of a missing 16p12.1 duplication.

The Miropeats alignments compare a BAC-based tiling path assembly of CHM1 (top line) to the human reference genome (GRCh38) (middle line) to a de novo assembly of CHM1 where SDA was applied (bottom line). The A/C duplication (red blue) proposed by Sudmant et al. that is present in most humans was correctly assembled using SDA and matches at high sequence identity (99.9%) to the BAC-based assembly structure.

Supplementary Figure 9 Mapping differential of transcripts between SDA and de novo CHM13.

The percent identity differential of the mapping of full-length Iso-Seq transcripts (n = 14,562) from human-specific segmental duplications (HSDs) to both the de novo assembly of CHM13 and the SDA results on CHM13 is shown. In total, 11 gene families showed significantly (P < 0.001, two-sided Wilcoxon signed-rank test) improved mapping to the SDA-resolved contigs. The boxes indicate the range between the first and third quartiles, with the bold line specifying the median. The whiskers show the minimum and maximum within 1.5 times the interquartile range extending from the first and third quartiles.

Supplementary Figure 10 Multiple sequence alignment (MSA) between GRCh38 GPRIN2 and SDA GPRIN2A/B.

Shown is the amino acid MSA between the copies of GPRIN2 resolved by SDA and the copy of GPRIN2 in GRCh38. Of the 15 differences in the MSA, 12 are annotated in dbSNP as variants in GPRIN2 when they are in fact differences between GPRIN2A and GPRIN2B. At p.Ser104Gly, p.Arg242Gly, and p.Val375Ala, the reference has the minor allele. Supplementary Table 7 shows the allele frequencies for all variants seen in this alignment.

Supplementary Figure 11 CHM1 SDA contigs that overlap with unique sequence.

This ideogram shows where SDA contigs could extend the FALCON assembly. The bottom panel of each chromosome shows the FALCON assembly (contigs > 1 Mb (dark blue), contigs < 1 Mb (light blue)). The top panel shows where SDA contigs with unique overlaps map along the reference (contigs with > 10 kb of overlap (green), contig with < 10 kb (red)).

Supplementary Figure 12 PSV graph without attraction edges.

Reproduced above is the PSV graph shown in Fig. 3 for SRGAP2. The left-hand side shows the attraction edges used in correlation clustering (CC). On the right-hand side, the edges are removed so that the transparency of the nodes is visible. The opacity of each node scales from 0.25 to 1, with 0.25 reflecting the start position on the contig and 1 representing the final position on the contig.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vollger, M.R., Dishuck, P.C., Sorensen, M. et al. Long-read sequence and assembly of segmental duplications. Nat Methods 16, 88–94 (2019). https://doi.org/10.1038/s41592-018-0236-3

Download citation

Received: 01 June 2018
Accepted: 30 October 2018
Published: 17 December 2018
Issue Date: January 2019
DOI: https://doi.org/10.1038/s41592-018-0236-3

This article is cited by

The variation and evolution of complete human centromeres
- Glennis A. Logsdon
- Allison N. Rozanski
- Evan E. Eichler
Nature (2024)
Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly
- Joshua Casey Darian
- Ritu Kundu
- Wing-Kin Sung
Nature Methods (2024)
The complex polyploid genome architecture of sugarcane
- A. L. Healey
- O. Garsmeur
- A. D’Hont
Nature (2024)
Genome-wide identification and characterization of SRLK gene family reveal their roles in self-incompatibility of Erigeron breviscapus
- Chenggang Xiang
- Hongzheng Tao
- Wei Zhang
BMC Genomics (2023)
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics
- Scott Hotaling
- Edward R. Wilcox
- Paul B. Frandsen
BMC Genomics (2023)