Insights into variation in meiosis from 31,228 human sperm genomes

Abstract

Meiosis, although essential for reproduction, is also variable and error-prone: rates of chromosome crossover vary among gametes, between the sexes, and among humans of the same sex, and chromosome missegregation leads to abnormal chromosome numbers (aneuploidy)1,2,3,4,5,6,7,8. To study diverse meiotic outcomes and how they covary across chromosomes, gametes and humans, we developed Sperm-seq, a way of simultaneously analysing the genomes of thousands of individual sperm. Here we analyse the genomes of 31,228 human gametes from 20 sperm donors, identifying 813,122 crossovers and 787 aneuploid chromosomes. Sperm donors had aneuploidy rates ranging from 0.01 to 0.05 aneuploidies per gamete; crossovers partially protected chromosomes from nondisjunction at the meiosis I cell division. Some chromosomes and donors underwent more-frequent nondisjunction during meiosis I, and others showed more meiosis II segregation failures. Sperm genomes also manifested many genomic anomalies that could not be explained by simple nondisjunction. Diverse recombination phenotypes—from crossover rates to crossover location and separation, a measure of crossover interference—covaried strongly across individuals and cells. Our results can be incorporated with earlier observations into a unified model in which a core mechanism, the variable physical compaction of meiotic chromosomes, generates interindividual and cell-to-cell variation in diverse meiotic phenotypes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of Sperm-seq.
Fig. 2: Variation in crossover positioning and crossover separation (interference).
Fig. 3: Aneuploidy in sperm from 20 sperm donors.

Data availability

Crossover and aneuploidy data (individual events and counts per donor and/or per cell), including the source data underlying Figs. 2, 3b–e and Extended Data Figs. 59, are available via Zenodo at https://doi.org/10.5281/zenodo.2581570. Raw sequence data are available in the Sequence Read Archive (SRA) (https://www.ncbi.nlm.nih.gov/sra) via the Database of Genotypes and Phenotypes (dbGaP) (https://www.ncbi.nlm.nih.gov/gap/) for general research use upon application and approval (study accession number phs001887.v1.p1).

Code Availability

Analysis scripts and documentation are available via Zenodo at https://doi.org/10.5281/zenodo.2581595.

References

  1. 1.

    Broman, K. W. & Weber, J. L. Characterization of human crossover interference. Am. J. Hum. Genet. 66, 1911–1926 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Coop, G., Wen, X., Ober, C., Pritchard, J. K. & Przeworski, M. High-resolution mapping of crossovers reveals extensive variation in fine-scale recombination patterns among humans. Science 319, 1395–1398 (2008).

    ADS  CAS  PubMed  Google Scholar 

  3. 3.

    Halldorsson, B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019).

    CAS  PubMed  Google Scholar 

  4. 4.

    Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002).

    CAS  PubMed  Google Scholar 

  5. 5.

    Kong, A. et al. Common and low-frequency variants associated with genome-wide recombination rate. Nat. Genet. 46, 11–16 (2014).

    CAS  PubMed  Google Scholar 

  6. 6.

    Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

    ADS  CAS  PubMed  Google Scholar 

  7. 7.

    Myers, S., Bottolo, L., Freeman, C., McVean, G. & Donnelly, P. A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005).

    ADS  CAS  PubMed  Google Scholar 

  8. 8.

    Nagaoka, S. I., Hassold, T. J. & Hunt, P. A. Human aneuploidy: mechanisms and new insights into an age-old problem. Nat. Rev. Genet. 13, 493–504 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Broman, K. W., Murray, J. C., Sheffield, V. C., White, R. L. & Weber, J. L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Cheung, V. G., Burdick, J. T., Hirschmann, D. & Morley, M. Polymorphic variation in human meiotic recombination. Am. J. Hum. Genet. 80, 526–530 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Chowdhury, R., Bois, P. R., Feingold, E., Sherman, S. L. & Cheung, V. G. Genetic analysis of variation in human meiotic recombination. PLoS Genet. 5, e1000648 (2009).

    PubMed  PubMed Central  Google Scholar 

  12. 12.

    Fledel-Alon, A. et al. Variation in human recombination rates and its genetic determinants. PLoS ONE 6, e20321 (2011).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Brown, P. W. et al. Meiotic synapsis proceeds from a limited number of subtelomeric sites in the human male. Am. J. Hum. Genet. 77, 556–566 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Gruhn, J. R. et al. Correlations between synaptic initiation and meiotic recombination: a study of humans and mice. Am. J. Hum. Genet. 98, 102–115 (2016).

    CAS  PubMed  Google Scholar 

  15. 15.

    Gruhn, J. R., Rubio, C., Broman, K. W., Hunt, P. A. & Hassold, T. Cytological studies of human meiosis: sex-specific differences in recombination originate at, or prior to, establishment of double-strand breaks. PLoS ONE 8, e85075 (2013).

    ADS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Baudat, F. & de Massy, B. Regulating double-stranded DNA break repair towards crossover or non-crossover during mammalian meiosis. Chromosome Res. 15, 565–577 (2007).

    CAS  PubMed  Google Scholar 

  17. 17.

    Plug, A. W., Xu, J., Reddy, G., Golub, E. I. & Ashley, T. Presynaptic association of Rad51 protein with selected sites in meiotic chromatin. Proc. Natl Acad. Sci. USA 93, 5920–5924 (1996).

    ADS  CAS  PubMed  Google Scholar 

  18. 18.

    Ioannou, D., Fortun, J. & Tempest, H. G. Meiotic nondisjunction and sperm aneuploidy in humans. Reproduction 157, R15–R31 (2018).

    Google Scholar 

  19. 19.

    Templado, C., Uroz, L. & Estop, A. New insights on the origin and relevance of aneuploidy in human spermatozoa. Mol. Hum. Reprod. 19, 634–643 (2013).

    CAS  PubMed  Google Scholar 

  20. 20.

    Lynn, A. et al. Covariation of synaptonemal complex length and mammalian meiotic exchange rates. Science 296, 2222–2225 (2002).

    ADS  CAS  PubMed  Google Scholar 

  21. 21.

    Wang, S. et al. Per-nucleus crossover covariation and implications for evolution. Cell 177, 326–338 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Hou, Y. et al. Genome analyses of single human oocytes. Cell 155, 1492–1506 (2013).

    CAS  PubMed  Google Scholar 

  23. 23.

    Kirkness, E. F. et al. Sequencing of isolated sperm cells for direct haplotyping of a human genome. Genome Res. 23, 826–832 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Lu, S. et al. Probing meiotic recombination and aneuploidy of single sperm cells by whole-genome sequencing. Science 338, 1627–1630 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Ottolini, C. S. et al. Genome-wide maps of recombination and chromosome segregation in human oocytes and embryos show selection for maternal recombination rates. Nat. Genet. 47, 727–735 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Wang, J., Fan, H. C., Behr, B. & Quake, S. R. Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150, 402–412 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Miller, D., Brinkworth, M. & Iles, D. Paternal DNA packaging in spermatozoa: more than the sum of its parts? DNA, histones, protamines and epigenetics. Reproduction 139, 287–301 (2010).

    CAS  PubMed  Google Scholar 

  28. 28.

    Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Zheng, G. X. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Campbell, C. L., Furlotte, N. A., Eriksson, N., Hinds, D. & Auton, A. Escape from crossover interference increases with maternal age. Nat. Commun. 6, 6260 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Berg, I. L. et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat. Genet. 42, 859–863 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Myers, S., Freeman, C., Auton, A., Donnelly, P. & McVean, G. A common sequence motif associated with recombination hot spots and genome instability in humans. Nat. Genet. 40, 1124–1129 (2008).

    CAS  PubMed  Google Scholar 

  33. 33.

    Hinch, A. G. et al. Factors influencing meiotic recombination revealed by whole-genome sequencing of single sperm. Science 363, eaau8861 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Housworth, E. A. & Stahl, F. W. Crossover interference in humans. Am. J. Hum. Genet. 73, 188–197 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Sun, F. et al. Human male recombination maps for individual chromosomes. Am. J. Hum. Genet. 74, 521–531 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Oliver, T. R. et al. Investigation of factors associated with paternal nondisjunction of chromosome 21. Am. J. Med. Genet. A. 149A, 1685–1690 (2009).

    CAS  PubMed  Google Scholar 

  37. 37.

    Page, S. L. & Hawley, R. S. Chromosome choreography: the meiotic ballet. Science 301, 785–789 (2003).

    ADS  CAS  PubMed  Google Scholar 

  38. 38.

    Sun, F. et al. The relationship between meiotic recombination in human spermatocytes and aneuploidy in sperm. Hum. Reprod. 23, 1691–1697 (2008).

    CAS  PubMed  Google Scholar 

  39. 39.

    Ferguson, K. A., Wong, E. C., Chow, V., Nigro, M. & Ma, S. Abnormal meiotic recombination in infertile men and its association with sperm aneuploidy. Hum. Mol. Genet. 16, 2870–2879 (2007).

    CAS  PubMed  Google Scholar 

  40. 40.

    Ma, S., Ferguson, K. A., Arsovska, S., Moens, P. & Chow, V. Reduced recombination associated with the production of aneuploid sperm in an infertile man: a case report. Hum. Reprod. 21, 980–985 (2006).

    CAS  PubMed  Google Scholar 

  41. 41.

    Savage, A. R. et al. Elucidating the mechanisms of paternal non-disjunction of chromosome 21 in humans. Hum. Mol. Genet. 7, 1221–1227 (1998)

    CAS  PubMed  Google Scholar 

  42. 42.

    Tease, C. & Hultén, M. A. Inter-sex variation in synaptonemal complex lengths largely determine the different recombination rates in male and female germ cells. Cytogenet. Genome Res. 107, 208–215 (2004).

    CAS  PubMed  Google Scholar 

  43. 43.

    Wang, S., Zickler, D., Kleckner, N. & Zhang, L. Meiotic crossover patterns: obligatory crossover, interference and homeostasis in a single process. Cell Cycle 14, 305–314 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Zhang, L., Liang, Z., Hutchinson, J. & Kleckner, N. Crossover patterning by the beam-film model: analysis and implications. PLoS Genet. 10, e1004042 (2014).

    PubMed  PubMed Central  Google Scholar 

  45. 45.

    Zhang, L. et al. Topoisomerase II mediates meiotic crossover interference. Nature 511, 551–556 (2014).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Billings, T. et al. Patterns of recombination activity on mouse chromosome 11 revealed by high resolution mapping. PLoS ONE 5, e15340 (2010).

    ADS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Petkov, P. M., Broman, K. W., Szatkiewicz, J. P. & Paigen, K. Crossover interference underlies sex differences in recombination rates. Trends Genet. 23, 539–542 (2007).

    CAS  PubMed  Google Scholar 

  48. 48.

    Bell, A. D., Mello, C. J. & McCarroll, S. A. Sperm-seq wet lab protocol: sperm preparation and droplet-based sequencing library generation. Protoc. Exch. https://doi.org/10.21203/rs.3.pex-823/v1 (2020).

  49. 49.

    Bell, A. D. et al. Analysis scripts for: Insights about variation in meiosis from 31,228 human sperm genomes. Zenodo, https://doi.org/10.5281/zenodo.2581595 (2019).

  50. 50.

    Bell, A. D. et al. Recombination and aneuploidy data for: Insights about variation in meiosis from 31,228 human sperm genomes. Zenodo, https://doi.org/10.5281/zenodo.2581570 (2019).

  51. 51.

    Bell, A. D., Usher, C. L. & McCarroll, S. A. Analyzing copy number variation with droplet digital PCR. Methods Mol. Biol. 1768, 143–160 (2018).

    CAS  PubMed  Google Scholar 

  52. 52.

    Regan, J. F. et al. A rapid molecular approach for chromosomal phasing. PLoS ONE 10, e0118270 (2015).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    Montag, M., Tok, V., Liow, S. L., Bongso, A. & Ng, S. C. In vitro decondensation of mammalian sperm and subsequent formation of pronuclei-like structures for micromanipulation. Mol. Reprod. Dev. 33, 338–346 (1992).

    CAS  PubMed  Google Scholar 

  54. 54.

    Samocha-Bone, D. et al. In-vitro human spermatozoa nuclear decondensation assessed by flow cytometry. Mol. Hum. Reprod. 4, 133–137 (1998).

    CAS  PubMed  Google Scholar 

  55. 55.

    Taylor, A. C. Titration of heparinase for removal of the PCR-inhibitory effect of heparin in DNA samples. Mol. Ecol. 6, 383–385 (1997).

    CAS  PubMed  Google Scholar 

  56. 56.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  57. 57.

    McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

    Google Scholar 

  59. 59.

    Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    ADS  Google Scholar 

  61. 61.

    Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Tyner, C. et al. The UCSC Genome Browser database: 2017 update. Nucleic Acids Res. 45 (D1), D626–D634 (2017).

    CAS  PubMed  Google Scholar 

  63. 63.

    Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    PubMed  PubMed Central  Google Scholar 

  64. 64.

    Bansal, V. & Bafna, V. HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24, i153–i159 (2008).

    PubMed  Google Scholar 

  65. 65.

    Selvaraj, S., Dixon, R. J., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat. Biotechnol. 31, 1111–1118 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Loh, P. R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Handsaker, R. E. et al. Large multiallelic copy number variations in humans. Nat. Genet. 47, 296–303 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Hyndman, R. J. & Khandakar, Y. Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 26, 1–22 (2008).

    Google Scholar 

  71. 71.

    Hyndman, R. et al. forecast: forecasting functions for time series and linear models. R package version 8.4, http://pkg.robjhyndman.com/forecast (2018).

  72. 72.

    Kauppi, L. et al. Distinct properties of the XY pseudoautosomal region crucial for male meiosis. Science 331, 916–920 (2011).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Kleckner, N., Storlazzi, A. & Zickler, D. Coordinate variation in meiotic pachytene SC length and total crossover/chiasma frequency under conditions of constant DNA length. Trends Genet. 19, 623–628 (2003).

    CAS  PubMed  Google Scholar 

  74. 74.

    Revenkova, E. et al. Cohesin SMC1 beta is required for meiotic chromosome dynamics, sister chromatid cohesion and DNA recombination. Nat. Cell Biol. 6, 555–562 (2004).

    CAS  PubMed  Google Scholar 

  75. 75.

    Zickler, D. & Kleckner, N. Meiotic chromosomes: integrating structure and function. Annu. Rev. Genet. 33, 603–754 (1999).

    CAS  PubMed  Google Scholar 

  76. 76.

    Wang, S. et al. Inefficient crossover maturation underlies elevated aneuploidy in human female meiosis. Cell 168, 977–989 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Blat, Y., Protacio, R. U., Hunter, N. & Kleckner, N. Physical and functional interactions among basic chromosome organizational features govern early steps of meiotic chiasma formation. Cell 111, 791–802 (2002).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank G. Genovese for suggestions on analyses; E. Macosko for advice on technology development; other members of the McCarroll laboratory, including C. Whelan, S. Burger and B. Handsaker, for advice; M. Daly, J. Hirschhorn, S. Elledge and S. Schilit for their insights; 10× Genomics for discussions about reagents; C. L. Usher and C. K. Patil for contributions to the text and figures; and those who commented on the preprint version of this article for their input. This work was supported by grant R01 HG006855 to S.A.M.; a Broad Institute NextGen award to S.A.M.; and a Harvard Medical School Program in Genetics and Genomics NIH Ruth L. Kirchstein training grant to A.D.B.

Author information

Affiliations

Authors

Contributions

A.D.B. and S.A.M. conceived and led the studies. A.D.B, S.A.M. and C.J.M. developed the experimental methods. A.D.B. and C.J.M. performed all experiments, generating all data. A.D.B and S.A.M. designed the strategies for crossover and aneuploidy analysis, and A.D.B. performed the crossover and aneuploidy analyses. A.D.B., J.N. and A.W. wrote the sequence and variant processing software, pipelines and analytical methods. A.D.B. wrote the software for crossover calling and analysis. A.D.B. and S.A.B wrote the software for aneuploidy calling. A.D.B. and S.A.M. wrote the manuscript with contributions from all authors.

Corresponding authors

Correspondence to Avery Davis Bell or Steven A. McCarroll.

Ethics declarations

Competing interests

A.D.B. and S.A.M. are inventors on a United States Provisional Patent application (PCT/US2019/029427; applicant: President and Fellows of Harvard College), currently in the PCT stage, relating to droplet-based genomic DNA capture, amplification and sequencing that is capable of obtaining high-throughput single-cell sequence from individual mammalian cells, including sperm cells. A.D.B. was an occasional consultant from Ohana Biosciences between October 2019 and March 2020. The other authors declare no competing interests.

Additional information

Peer review information Nature thanks Donald Conrad, Beth Dumont, Augustine Kong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Characterization of egg-mimic sperm preparation and optimization of bead-based single-sperm sequencing.

ac, Two-channel fluorescence plots showing the results of ddPCR with the input template noted above each panel, demonstrating that two loci (from different chromosomes) are detectable in the same droplet far more often when sperm DNA florets (rather than purified DNA) are used as input. Each point represents one droplet. Grey points (bottom left) represent droplets in which neither template molecule was detected; blue points (top left) represent droplets in which the assay detected a template molecule for the locus on chromosome 7; green droplets (bottom right) represent droplets in which the assay detected a template molecule for the locus on chromosome 10; and brown points (top right) represent droplets in which both loci were detected. With a high concentration of purified DNA as input (a), comparatively fewer droplets contain both loci than when untreated (b) or treated (c) sperm were used as input. Sperm ‘florets’ treated with the egg-mimicking decondensation protocol had a much higher fraction of droplets containing both loci than did purified DNA (compare a with c, left) and had more-sensitive ascertainment and cleaner results (quadrant separation) than untreated sperm (compare b with c, right). The pink lines in b delineate the boundaries between droplets categorized as negative or positive for each assay. d, Optimization of sperm preparation: characterization of the effect of different lengths of 37 °C incubation of sperm cells treated with egg-mimicking decondensation reagents on how often the loci on chromosomes 7 and 10 were detected in the same ddPCR droplet. The y-axis shows the percentage of molecules that are calculated to be linked to each other (that is, physically linked in input) for assays targeting chromosomes 7 and 10. Extracted DNA (‘DNA’, a negative control) gives the expected result of random assortment of the two template molecules into droplets. The 45-min heat treatment was used for all subsequent experiments in this study. e, f, Distribution of sequence reads across cell barcodes from droplet-based single-sperm sequencing. Each panel shows the cumulative fraction of all reads from a sequencing run coming from each read-number-ranked cell barcode; a sharp inflection point delineates the barcodes with many reads from those with few reads. Points to the left of the inflection point are the cell barcodes that are associated with many reads (that is, beads that are coencapsulated with cells); the height of the inflection point reflects the proportion of the sequence reads that come from these barcodes. Only reads that mapped to the human genome (hg38) and were not PCR duplicates are included. e, Data from an initial adaptation of 10× Genomics’ GemCode linked reads system29, where a small proportion of the reads come from cell barcodes associated with putative cells. f, Data from the final, implemented adaptation of 10× Genomics’ GemCode linked reads system29 for the same number of input sperm nuclei as in e. The x-axis in f includes five times fewer barcodes than in e.

Extended Data Fig. 2 Evaluation of chromosomal phasing and identification of cell doublets.

a, Phasing strategy. Green and purple denote the chromosomal phase of each allele (unknown before analysis). Each sperm cell carries one parental haplotype (green or purple), except where a recombination event separates consecutively observed SNPs (red X in bottom sperm). Because alleles from the same haplotype will tend to be observed in the same sperm cells, the haplotype arrangement of the alleles can be assembled at whole-chromosome scale (resulting in the phased donor genome). b, Evaluation of our phasing method using 1,000 simulated single-sperm genomes (generated from two a priori known parental haplotypes and sampled at various levels of coverage). Because cell doublets (which combine two haploid genomes and potentially two haplotypes at any region) can in principle undermine phasing inference, we included these doublets in the simulation (in proportions shown on the x-axis, which bracket the observed doublet rates). Each point shows the proportion of SNPs phased concordantly with the correct (a priori known) haplotypes (y-axis) for one simulation (five simulations were performed for each unique combination of proportion of cell doublets and percentage of sites observed). c, Relationship of phasing capability to number of cells analysed. Data are as in b, but for different numbers of simulated cells. All simulations had an among-cell mean of 1% of heterozygous sites observed. d, A cell doublet: when two cells (here, sperm DNA florets) are encapsulated together in the same droplet, their genomic sequences will be tagged with the same barcode; such events must be recognized computationally and excluded from downstream analyses. e, Four example chromosomes from a cell barcode associated with two sperm cells (a cell doublet). Black lines show haplotypes; blue circles are observations of alleles, shown on the haplotype from which they derive. Both parental haplotypes are present across regions of chromosomes for which the cells inherited different haplotypes. f, Computational recognition of cell doublets in Sperm-seq data (from an individual sperm donor, NC11). We used the proportion of consecutively observed SNP alleles derived from different parental haplotypes to identify cell doublets; this proportion is generally small (arising from sparse crossovers, PCR/sequencing errors, and/or ambient DNA) but is much higher when the analysed sequence comes from a mixture of two distinct haploid genomes. We use 21 of the 22 autosomes to calculate this proportion, excluding the autosome with the highest such proportion (given the possibility that a chromosome is aneuploid). The dashed grey line marks the inflection point beyond which sperm genomes are flagged as potential doublets and excluded from downstream analysis. Red points indicate barcodes with coverage of both X and Y chromosomes (potentially X + Y cell doublets or XY aneuploid cells); black points indicate barcodes with one sex chromosome detected (X or Y). The red (XY) cells below the doublet threshold are XY aneuploid but appear to have just one copy of each autosome.

Extended Data Fig. 3 Identification and use of ‘bead doublets’.

a, SNP alleles were inferred genome-wide (for each sperm genome) by imputation from the subset of alleles detected in each cell and by Sperm-seq-inferred parental haplotypes. For each pair of sperm genomes (cell barcodes), we estimated the proportion of all SNPs at which they shared the same imputed allele. A small but surprising number of such pairwise comparisons (19 of 984,906 from the donor shown, NC14) indicates essentially identical genomes (ascertained through different SNPs). b, We hypothesize that this arises from a heretofore undescribed scenario that we call ‘bead doublets’, in which two barcoded beads have coencapsulated with the same gamete and whose barcodes therefore tagged the same haploid genome. c, Random pairs of cell barcodes (here 100 pairs selected from donor NC10) tend to investigate few of the same SNPs (left), and also tend to detect the same parental haplotype on average at the expected 50% of the genome (right). d, ‘Bead doublet’ barcode pairs (here 20 pairs from donor NC10, who had the median number of bead doublets, left) also investigate few of the same SNPs, yet detect identical haplotypes throughout the genome (right). Results were consistent across donors. e, Use of ‘bead doublets’ to characterize the concordance of crossover inferences between distinct samplings of the same haploid genome by different barcodes. The bead doublets (barcode pairs) were compared to 100 random barcode pairs per donor. Crossover inferences were classified as ‘concordant’ (overlapping, detected in both barcodes), as ‘one SNP apart’ (separated by just one SNP, detected in both barcodes), as ‘near end of coverage’ (within 15 heterozygous SNPs of the end of SNP coverage at a telomere, where the power to infer crossovers is partial), or as discordant. Error bars (with small magnitude) show binomial 95% confidence intervals for the number of crossovers per category divided by number of crossovers total in both barcodes (32,714 crossovers total in 1,201 bead doublet pairs; 67,862 crossovers total in 2,000 random barcode pairs; some barcodes are in multiple bead doublets or random barcode pairs).

Extended Data Fig. 4 Numbers and locations of crossovers called from downsampled data (equal number of SNPs in each cell, randomly chosen).

To eliminate any potential effect of unequal sequence coverage across donors and cells, we used downsampling to create datasets with equal coverage (numbers) of heterozygous SNP observations in each cell. Crossovers were called from these random equally sized sets of SNPs from all cells. a, b, Crossover number per cell globally (a) and per chromosome (b) (785,476 total autosomal crossovers called from downsampled SNPs included, 30,778 cells included, aneuploid chromosomes excluded). c, Density plots of crossover location with crossover midpoints plotted and area scaled to be equal to the per-chromosome crossover rate. Grey rectangles mark centromeric regions; coordinates are in hg38. d, Similar numbers of crossovers were called from full data and equally downsampled SNP data: we performed correlation tests across cells for each donor and chromosome to compare the number of crossovers called from all data to the number of crossovers called from equal numbers of randomly downsampled SNPs. The histogram shows Pearson’s r values for all 460 (20 donors × 23 chromosomes (total number plus number for 22 autosomes)) tests (n per test = 974–2,274 cells per donor as in Extended Data Table 1; all chromosome comparisons Pearson’s r > 0.83; all two-sided P < 10−300). e, Crossovers called from equally downsampled SNP data were in similar locations to those called from all data: we performed correlation tests comparing crossover rates in 500-kb bins (centimorgans (cM) per 500 kb) from all data versus equally downsampled SNP data for each donor and chromosome. The histogram shows Pearson’s r values for all 460 (20 donors × 23 chromosomes (genome-wide rate plus rate for 22 autosomes)) tests (n per test = number of 500-kb bins per chromosome (genome-wide: 5,739; chromosomes 1–22: 497, 484, 396, 380, 363, 341, 318, 290, 276, 267, 270, 266, 228, 214, 203, 180, 166, 160, 117, 128, 93, 101); all chromosome comparisons Pearson’s r > 0.87, all two-sided P < 10−300).

Extended Data Fig. 5 Interindividual and intercell recombination rate from single-sperm sequencing.

a, Density plot showing the per-cell number of autosomal crossovers for all 31,228 cells (813,122 total autosomal crossovers) from 20 sperm donors (per-donor cell and crossover numbers as in Extended Data Table 1; aneuploid chromosomes were excluded from crossover analysis). Colours represent a donor’s mean crossover rate (crossovers per cell) from low (blue) to high (red). This same mean recombination rate derived colour scheme is used for donors in all figures. The recombination rate differs among donors (n = 20; Kruskal–Wallis chi-squared = 3,665; df = 19; P < 10−300). b, Per-chromosome crossover number in each of the 20 sperm donors (data as in a but shown for individual chromosomes). c, Per-chromosome genetic map lengths for: each of the 20 sperm donors, as inferred from Sperm-seq data (colours from blue to red reflect donors’ individual crossover rates as in a); a male average, as estimated from pedigrees by deCODE6 (yellow triangles); and a population average (including female meioses, which have more crossovers), as estimated from HapMap data7 (yellow circles). The deCODE genetic maps stop 2.5 Mb from the ends of SNP coverage. d, Physical versus genetic distances (for individualized sperm donor genetic maps and deCODE’s paternal genetic map) plotted at 500-kb intervals (in hg38 coordinates). Grey boxes denote centromeric regions (or centromeres and acrocentric arms). Sperm-seq maps are broadly concordant with deCODE maps (see the correlation test results in Supplementary Notes), except at subtelomeric regions that are not included in deCODE’s map.

Extended Data Fig. 6 Distributions of crossover locations along chromosomes (in ‘crossover zones’).

a, Each donor’s crossover locations are plotted as a coloured line; the colour indicates the donor’s overall crossover rate (blue, low; red, high); grey boxes show the locations of centromeres (or, for acrocentric chromosomes, of centromeres and p arms). We used the midpoint between the SNPs bounding each inferred crossover as the position for each crossover in all analyses. To combine data across chromosomes, we show crossover locations (density plot) on ‘meta-chromosomes’ in which crossover locations are normalized to the length of the chromosome or arm on which they occurred. For acrocentric chromosomes, only the q arm was considered; for nonacrocentric chromosomes, the p and q arms were afforded space on the basis of the proportion of the nonacrocentric genome (in base pairs) they comprise, with the centromere placed at the summed p arms’ proportion of base pairs of these chromosomes. Crossover locations were first converted to the proportion of the arm at which they fall, and then these positions were normalized to the genome-wide p or q arm proportion. b, Identification of chromosomal zones of recombination use (‘crossover zones’) from all donors’ crossovers for 22 autosomes. Density plots are shown of crossover location for all sperm donors’ total 813,122 crossovers (aneuploid chromosomes excluded; the crossover location is the midpoint between SNPs bounding crossovers) along autosomes (hg38). Crossover zones (bounded by local minima of crossover density) are shown with alternating shades of grey. Diagonally hatched rectangles indicate centromeres (or centromeres and acrocentric arms).

Extended Data Fig. 7 Crossover placement in end zones, and crossover separation, varies in ways that correlate with crossover rate, among sperm donors and among individual gametes.

Analyses are shown by donor (ah; n = 20 sperm donors) or by individual gamete (i, j, n = 31,228 gametes). In ah, the left panels show the phenotype distributions for individual donors, and the right panels show the relationship to the donors’ crossover rates. To control for the effect of the number of crossovers, the analyses in c, d and gj use ‘two-crossover chromosomes’—chromosomes on which exactly two crossovers occurred. For scatter plots (ah, right), all x-axes show the mean crossover rate and all error bars are 95% confidence intervals (y-axes are described per panel). a, b, Left, both the proportion of crossovers that falls in the most distal chromosome crossover zones (a) and crossover separation (b; a readout of crossover interference, the distance between consecutive crossovers in Mb) vary among 20 sperm donors (proportion of crossovers in end per-cell distributions among-donor Kruskal–Wallis chi-squared = 2,334, df = 19, P < 10−300; all distances between consecutive crossovers among-donor Kruskal–Wallis chi-squared = 3,309, df = 19, P < 10−300). The right panels show both properties (y-axes, total proportion of crossovers in distal zones and median crossover separation, respectively) versus the donor’s crossover rate (correlation results for 20 sperm donors: proportion of all crossovers across cells in distal zones Pearson’s r = −0.95, two-sided P = 2 × 10−10; Pearson’s r = −0.96, two-sided P = 1 × 10−11). c, Results obtained from an alternative method for calculating the proportion of crossovers in the distal regions of chromosomes. The proportion of crossovers in the distal 50% of chromosome arms varies across donors (left, among-donor Kruskal–Wallis chi-squared = 2,209, df = 19, P < 10−300) and negatively correlates with recombination rate (right, Pearson’s r = −0.92, two-sided P = 2 × 10−8; the y-axis shows the actual proportion of crossovers in the distal 50%). d, As in c, but with the proportion of crossovers from two-crossover chromosomes occurring in the distal 50% of chromosome arms. Left, among-donor Kruskal–Wallis chi-squared = 1,058, df = 19, P = 2 × 10−212; right, correlation with recombination rate Pearson’s r = −0.93, two-sided P = 4 × 10−9. e, As in b but for consecutive crossovers on the q arm of the chromosome. Left, among-donor Kruskal–Wallis chi-squared = 346, df = 19, P = 7 × 10−62; right, correlation with recombination rate Pearson’s r = −0.90, two-sided P = 5 × 10−8. f, As in b but for consecutive crossovers on opposite chromosome arms (that is, crossovers that span the centromere). Left, among-donor Kruskal–Wallis chi-squared = 1,554, df = 19, P = 1 < 10−300; right, correlation with recombination rate Pearson’s r = −0.96, two-sided P = 3 × 10−11. g, As in e but for distances between consecutive crossovers on two-crossover chromosomes. Left, among-donor Kruskal–Wallis chi-squared = 181, df = 19, P = 2 × 10−28; right, correlation with recombination rate Pearson’s r = −0.88, two-sided P = 3 × 10−7. h, As in f but for distances between consecutive crossovers on two-crossover chromosomes. Left, among-donor Kruskal–Wallis chi-squared = 930, df = 19, P = 5 × 10−185; right, correlation with recombination rate Pearson’s r = −0.92, two-sided P = 1 × 10−8. i, j, Boxplots show medians and interquartile ranges with whiskers extending to 1.5 times the interquartile range from the box. Each point represents a cell. i, Within-donor percentiles showing the proportion of crossovers from two-crossover chromosomes that fall in distal zones, plotted against the crossover-rate decile. Groups are deciles of crossover rates normalized by converting each cell’s crossover count to a percentile within-donor (all cells from all donors shown together; n cells in deciles = 3,152, 3,122, 3,276, 3,067, 3,080, 3,073, 3,135, 3,132, 3,090, 3,101, respectively (31,228 in total)). Because the initial data are proportions with small denominators, an integer effect is evident as pileups at certain values. j, Crossover interference from two-crossover chromosomes (showing the median consecutive crossover separation per cell). Each point represents the median of all percentile-expressed distances between crossovers from all two-crossover chromosomes in one cell (percentile taken within-chromosome); groupings and n values as in i.

Extended Data Fig. 8 Crossover interference in individual sperm donors and on chromosomes.

a, Solid lines show density plots (scaled by donor’s crossover rate) of the observed distance (separation) between consecutive crossovers as measured in the proportion of the chromosome separating them (left) and in genomic distance (right), with one line per donor (n = 20). Dashed lines show the distance between consecutive crossovers when crossover locations are permuted randomly across cells to remove the effect of crossover interference. b, The median of observed distances between consecutive crossovers for one donor (NC18, who had the tenth lowest recombination rate of 20 donors; blue dashed line) is shown along with a histogram of the medians of n = 10,000 among-cell crossover permutations (in both cases, the permutation one-sided P-value is less than 0.0001). The units are the proportion of the chromosome (left) and genomic distance (in Mb, right). c, Crossover separation on example chromosomes; plots and n values are as in b. Permutation one-sided P < 0.0001 for all chromosomes in all sperm donors except occasionally for chromosome 21, where especially few double crossovers occur. d, Median distances between donor NC18’s consecutive crossovers for each autosome for all intercrossover distances (left two panels) and inter-crossover distances only from chromosomes with two crossovers (right two panels). Units are proportion of the chromosome or genomic distance. e, Diagram describing analysing crossover interference in individualized genetic distance (one 20-cM window is shown), using a donor’s own recombination map. f, When parameterized using each donor’s own genetic map, sperm donors’ crossover interference profiles across multiple genetic distance windows (as shown in e) do not differ (n = 20 sperm donors; Kruskal–Wallis chi-squared = 0.22; df = 19; P = 1, using 20 estimates (cM distances) for each of 20 donors). Error bars show binomial 95% confidence intervals on the proportion of cells with a second crossover in the window given. This suggests that interindividual variation in crossover interference, although substantial when measured in base pairs, is negligible when measured in donor-specific genetic distance, pointing to a shared influence upon crossover interference and crossover rate.

Extended Data Fig. 9 Relationships of aneuploidy frequency to chromosome size and recombination.

a. The across-donor per-cell frequency of chromosome losses (left) and gains (centre), plotted against the length of the chromosome (from reference genome hg38; for losses across n = 22 chromosomes, Pearson’s r = −0.29, two-sided P = 0.19; and for gains across n = 22 chromosomes, Pearson’s r = −0.23, two-sided P = 0.30). Right, the per-chromosome rate of losses exceeding gains (number of losses minus number of gains divided by number of cells) is plotted against the length of the chromosomes (across n = 22 chromosomes; Pearson’s r = −0.29, two-sided P = 0.19). Red labels, acrocentric chromosomes. Error bars show 95% binomial confidence intervals on the per-cell frequency (number of events/number of cells, all 31,228 cells included). bd, Relationship between aneuploidy frequency and recombination. Only autosomal whole-chromosome aneuploidies are included. b, Left, total number of crossovers on meiosis I nondisjoined chromosomes (blue line; chromosomes analysed, called as transitions between the presence of one haplotype and both haplotypes on the gained chromosome) compared with n = 10,000 donor- and chromosome-matched sets (35 × 2 chromosomes per set) of properly segregated chromosomes (grey histogram; permutation). Fifty-four total crossovers on meiosis I gains versus 84.2 mean total crossovers on sets of matched chromosomes; one-sided permutation P < 0.0001, for the hypothesis that gained chromosomes have fewer crossovers. Right, as left but for gains occurring during meiosis II (71 meiosis-II-derived gained chromosomes of one whole copy from all individuals with fewer than five crossovers called on the gained chromosome). One-sided permutation P = 0.98 for meiosis II from n = 10,000 permutations, for the hypothesis that gained chromosomes have fewer crossovers; sister chromatids nondisjoined in meiosis II capture all crossovers whereas matched chromosomes do not: matched simulations and homologues nondisjoined in meiosis I capture only a random half of crossovers occurring on that chromosome in the parent spermatocyte. c, Crossovers per nonaneuploid megabase from each cell from each donor, split by aneuploidy status (n cells = 498, 50, 92, 30,609, left to right; ‘euploid’ excludes cells with any autosomal whole- or partial-chromosomal loss or gain; ‘gains’ includes gains of one or more than one chromosome copy; Mann–Whitney test W = 7,264,117, 722,191, 1,370,376; two-sided P = 0.07, 0.49, 0.66 for all autosomal aneuploidies, meiosis I gains and meiosis II gains, respectively, all compared against euploid). Each cell is represented by one point; boxplots show medians and interquartile ranges with whiskers extending to 1.5 times the interquartile range from the box. d, Per-cell crossover rates versus per-cell rates of aneuploidy (left, loss and gain; middle and right, gain only, as only chromosome gain meiotic division can be determined); n = 20 donors (coloured by crossover rate). P-values shown are for two-sided Pearson’s correlation tests. Error bars represent 95% confidence intervals on the mean crossover rate (x-axis) and on the observed aneuploidy frequency (y-axis).

Extended Data Fig. 10 Additional examples of noncanonical aneuploidy events detected with Sperm-seq.

This figure includes those shown in Fig. 3f. Copy number, SNPs, haplotypes and centromeres are plotted as in Fig. 3a. Donor and cell identities are noted above each panel. Coordinates are in the reference genome hg38. a, b, Chromosomes 2, 20, 21 (a) and 15 (b) are sometimes present in three copies in an otherwise haploid sperm cell. c, A distinct, recurring triplication of much of chromosome 15, from around 33 Mb onwards but not including the proximal part of the q arm, also recurs in cells from three donors. d, Chromosome-arm-level losses (top three panels) and gains (including in more than one copy, bottom three panels, and a compound gain of the p arm and loss of the q arm, top panel).

Extended Data Fig. 11 Single-cell and person-to-person variation in diverse meiotic phenotypes may be governed by variation in the physical compaction of chromosomes during meiosis.

Previous work showed that the physical length of the same chromosome varies among spermatocytes at the pachytene stage of meiosis, probably by differential looping of DNA along the meiotic chromosome axis (for example, the left column shows smaller loops, resulting in more loops in total and in a greater total axis length compared with the right column, with larger loops)15,72,73,74,75. This physical chromosome length is correlated across chromosomes among cells from the same individual21,76, and correlates with crossover number15,20,21,42,73,76. This length—measured as the length of the chromosome axis or of the synaptonemal complex (the connector of homologous chromosomes)—can vary by two or more fold among a human’s spermatocytes21. We propose that the same process differs on average across individuals and may substantially explain interindividual variation in recombination rate. On average, individual 1 (left) would have meiotic chromosomes that are physically longer (less compacted) in an average cell than individual 2 (right); one example chromosome is shown in the figure. After the first crossover on a chromosome (probably in a distal region of a chromosome, where synapsis typically begins in male human meiosis before spreading across the whole chromosome13,14,15), crossover interference prevents nearby double-strand breaks (DSBs) from becoming crossovers; however, DSBs that are far away can become crossovers (which themselves also cause interference). More DSBs are probably created on physically longer chromosomes, and crossover interference occurs among noncrossover as well as crossover DSBs77. Crossover interference occurs over relatively fixed physical (micrometre) distances43,44,45,76; these distances encompass different genomic (Mb) lengths of DNA in different cells or on average in different people owing to variable compaction. Thus, crossover interference tends to lead to a different total number of crossovers as a function of the degree of compaction, resulting in the observed negative correlation (Fig. 2c, e) of crossover rate with crossover spacing (as measured in base pairs). Given that the first crossover probably occurs in a distal region of the chromosome, this model can also explain the negative correlation (Fig. 2b, d) between crossover rate and the proportion of crossovers at chromosome ends. This figure shows the total number of crossovers, crossover interference extent, and crossover locations for both sister chromatids of each homologue combined; in reality, these crossovers are distributed among the sister chromatids, making these relationships harder to detect in daughter sperm cells and requiring large numbers of observations to make relationships among these phenotypes clear.

Extended Data Table 1 Sperm donor and single-sperm sequencing characteristics and results

Supplementary information

Supplementary Information

This file contains the Supplementary Notes, Supplementary Discussion, and Supplementary Methods.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bell, A.D., Mello, C.J., Nemesh, J. et al. Insights into variation in meiosis from 31,228 human sperm genomes. Nature 583, 259–264 (2020). https://doi.org/10.1038/s41586-020-2347-0

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing