Main

Sex chromosomes evolve from homologous autosomes that acquire sex-determining genes and lose their ability to recombine16,17,18,19,20,21. As such, sex chromosome evolution and recombination suppression are closely associated16,17,18,19,20,21. However, because direct evidence of such homologous autosomes primed for evolution into sex chromosomes is difficult to capture, little is known about the molecular signatures associated with the evolution of recombination suppression.

The freshwater planarian Schmidtea mediterranea, an important model organism for studies of regeneration22,23, exists as asexual and sexual reproductive strains. The sexual strain is distributed mostly in Tunisia and on the islands of Sardinia, Corsica and Sicily1. The sexual strain is a simultaneous hermaphrodite that develops both male and female reproductive systems in the same adult individual and obligately outcrosses to fertilize other individuals24,25. Individuals in the asexual strain do not develop sexual reproductive systems. We considered that studying chromosome evolution in a simultaneous hermaphrodite might provide insight into the early evolution of a primitive sex chromosome.

S. mediterranea has four pairs of chromosomes, which are stably diploid. The genome is reported to comprise 774 Mb assembled as 481 non-contiguous series of genomic sequences, or scaffolds26,27,28. A previous study found that approximately 300 Mb of the genome remained heterozygous even after extensive inbreeding of laboratory strains, and that this phenomenon also occurs naturally in wild populations in Sardinia24. The two sets of heterozygous alleles were collectively named J and V haplotypes. To define the chromosomal locations of these alleles and investigate the reasons underpinning the persistence of heterozygosity in S. mediterranea, a detailed assembly of all four chromosomes is needed.

Chromosome-scale genome assembly

To transform the 481 scaffolds26 into a chromosome-scale genome reference, we carried out chromosome sequencing (ChrSeq)29,30 of a laboratory strain, as well as chromatin proximity sequencing by Hi-C3 (Fig. 1a). To do so, we dissected individual chromosomes from mitotic cells using laser capture and amplified and sequenced each chromosome individually (Fig. 1b). We examined the sequencing depth of each scaffold across multiple samples of the same and different chromosomes to ensure reproducibility and specificity, respectively (Extended Data Fig. 1). Overall, we successfully amplified and confidently assigned 740 Mb of the 774-Mb genome to one of four chromosomes (Supplementary Table 1).

Fig. 1: Chromosome-scale genome assembly.
figure 1

a, Schematic of the use of Hi-C, ChrSeq and a linkage map to transform 481 scaffolds26 into a chromosome-scale genome assembly. b, Chromosomes from mitotic cells spread onto membrane slides for laser capture and sequencing. Numbers denote the identity of chromosomes present in the representative samples. c, Contact heat map of chromatin interactions indicated by Juicebox4; black boxes denote the four chromosomes in the final assembly. d, Linkage map generated through LinkageMapView68, showing a lack of recombination on chromosome 1. Tick marks and labels indicate genetic markers.

We used ChrSeq information and chromatin interaction data generated by Hi-C to correct and connect the individual scaffolds within a chromosome into a chromosome-scale genome, hereafter referred to as Smed_chr_ref_v1. The Hi-C data analysed using the SALSA scaffolding algorithm3 resolved the 481 scaffolds into 57 super-scaffolds and 104 singletons. The ChrSeq data indicated that 3 of the 57 super-scaffolds were disjointed inter-chromosomal fragments, consistent with the Hi-C contact heat map (Fig. 1c). We split the disjointed super-scaffolds, merged the scaffolds into chromosomes, and ordered and oriented all scaffolds within the chromosomes using Juicebox visualization software4 (Methods). Chromosome assignments by ChrSeq alone (Supplementary Table 1) and by Hi-C alone were inconsistent for only 3 of 384 (0.8%) scaffolds (Supplementary Table 2). We manually assigned these three scaffolds to chromosomes according to the Hi-C data. Hi-C also detected 26 inter-chromosomal or intra-chromosomal assembly errors (Supplementary Table 2) in the previous assembly26, 5 of which were confirmed by ChrSeq to be inter-chromosomal disjointed scaffolds (Supplementary Table 2). The final genome assembly, Smed_chr_ref_v1 (Fig. 1c), had four chromosomes with a total size of 764 Mb, which is 98.4% of that reported previously26. Of the 1.6% of the previous assembly not contained in Smed_chr_ref_v1 by Hi-C, about half (52.6%) of the scaffolds could not be assigned to a chromosome or were assigned to two chromosomes by ChrSeq, indicating lower assembly quality (Supplementary Table 3). Moreover, 33.0% and 12.4% of the scaffolds had chromatin interaction signals with structurally complex regions of chromosomes 1 and 2, respectively (Extended Data Fig. 2), suggesting that they might be alternative assemblies or repetitive sequences. In our new assembly, Smed_chr_ref_v1, the two ends of chromosome 4 were capped by >1,000 copies of the telomere repeat TTAGGG, indicating high assembly quality.

To validate the linearity of the chromosomes in Smed_chr_ref_v1, we generated a linkage map. We crossed two divergent laboratory strains of S. mediterranea, S2F10b and D5, to produce an F2 population, and we genotyped individual worms with RADseq31. Eighty markers that were evenly distributed and genotyped in at least 98% of the F2 segregants (91 of 93) were used to establish four linkage groups (Fig. 1d and Supplementary Table 4) representing the four chromosomes. The ordering of the 80 markers in the linkage map is consistent with Smed_chr_ref_v1, independently supporting the quality of our chromosome-scale genome assembly. This highly contiguous and complete genome assembly and linkage map together facilitate further genetic and epigenetic investigations of the functions of the genome in this model planarian.

Chromosome 1 recombination suppression

We next re-examined the heterozygous alleles that define the J and V haplotypes in the newly assembled genome. We found that 87.7% of the genetic markers that remained heterozygous were located on chromosome 1, spanning 333 Mb at a density of 30,148 variants per 10 Mb (Fig. 2a). The remaining 12.3% of the heterozygous markers were located on the other three chromosomes at a density of 3,274 variants per 10 Mb and probably correspond to differences between highly similar copies of repetitive elements rather than true polymorphisms (Supplementary Table 5). All F2 worms (n = 93) in this study were heterozygous for chromosome 1 but not for chromosomes 2–4 (Supplementary Table 6), which is consistent with the previous study24. Hence, we concluded that the J/V haplotypes were on chromosome 1.

Fig. 2: Recombination suppression and inversions on chromosome 1.
figure 2

a, Distribution of heterozygous variants maintained in the S2 inbreeding pedigree along the four chromosomes. The y axis shows variant counts per 10 Mb. Green, pink and blue boxes represent inversions 1, 2 and 3, respectively. b, Number of meiotic recombination events on chromosomes 1 and 3 in oocytes (red) and sperm cells (cyan). Dots represent individual gametes, summarized with violin plots. c, Chromosome 1 in late prophase I in oocytes from J/V worms (left and right panels) shows fewer crossovers than in oocytes from J/J worms (middle panel). This experiment was repeated more than 10 times independently with similar results. The right panel shows FISH42 results with probes for the telomeric repeat TTAGGG (magenta) and a repeat located near the centromere (yellow). This experiment was repeated more than three times independently with similar results. d, Chromatin contact heat map for chromosome 1. Dashed-line rectangles indicate the locations of potential inversions. Arrows indicate chromatin contact signals with inversion regions from unassigned scaffolds.

Our linkage map revealed an extremely low rate of recombination on chromosome 1 of only 0.5 cM for the entire chromosome (Fig. 1d). This is particularly notable because, at 333 Mb, chromosome 1 was the largest of the four chromosomes, containing more than 40% of the genome.

To directly examine whether chromosome 1 can recombine, we sequenced 45 single sperm cells and 28 single oocytes from J/V line S2 (Fig. 2d). Gamete sequencing is preferred because recombination events in hatchlings can be selected for by differential fertilization or embryonic lethality. We identified 3,197 single-nucleotide variants (SNVs) on chromosome 1 and 3,312 SNVs on chromosome 3, covering 99% of the length of each chromosome (Supplementary Table 7). The SNVs were distributed at a similar density across 20-Mb windows, with coefficients of variation of 0.38 and 0.31 for chromosomes 1 and 3, respectively (Supplementary Table 7). We observed that 98% of the sperm cells (44 of 45) and 93% of the oocytes (26 of 28) had no crossovers on chromosome 1. By contrast, most sperm cells and oocytes had crossovers on chromosomes 2, 3 and 4 (Fig. 2b, Extended Data Fig. 3a, b and Supplementary Table 7). We thus concluded that recombination on chromosome 1 was strongly suppressed.

Consistent with this conclusion, we found that during prophase I, when other chromosomes had numerous crossovers, chromosome 1 formed a ring structure in the oocyte in J/V worms but not in J/J worms (Fig. 2c and Extended Data Fig. 4). Fluorescence in situ hybridization (FISH) with telomere probes suggested that crossovers between the two homologous pairs of chromosome 1 occurred only in regions close to the telomeres, leading to the observed ring conformation rather than the side-by-side pairing observed for chromosomes 2, 3 and 4. Furthermore, Hi-C analysis showed that chromosome 1 had three putative inversions, each >20 Mb in size (Fig. 2d); such inversions can cause crossover suppression32,33,34. The rest of the genome had only one large inversion of approximately 10 Mb on chromosome 2 (Extended Data Fig. 2b).

Island-specific evolution of chromosome 1

To investigate the genetic diversity of S. mediterranea in its natural environment and determine whether chromosome 1 J/V heterozygosity occurs throughout the entire species, we used RADseq31 to sample the genomes of 70 sexual individuals from Sardinia, Corsica, Sicily and Tunisia and 2 asexual individuals from Menorca1 (Fig. 3a). To look for genetic relationships between the individuals, we determined clustering of relatedness measured by identity-by-state pairwise distances35 and identified two superclusters. Animals from Sicily were closely related to those from Tunisia, and animals from Sardinia were closely related to those from Corsica (Fig. 3b). Phylogenetic clustering (Extended Data Fig. 5), fixation index (FST) values (Extended Data Fig. 6) and analyses with Structure (Extended Data Fig. 7) further supported this observation.

Fig. 3: Island-specific evolution of chromosome 1 heterozygosity.
figure 3

a, Collection sites of 72 wild isolates from the Mediterranean islands and Tunisia. b, Maximum-likelihood tree of 2 asexual isolates from Menorca (MEN_GOR) and 70 sexual isolates from Sardinia (SAR), Corsica (COR), Sicily (SIC) and Tunisia (TUN). Dots denote individual worms, with colours corresponding to their collection sites. c, Observed versus expected heterozygosity for each chromosome (denoted by colour) in each population (denoted by shape). The dashed line corresponds to equality between observed values and those expected under Hardy–Weinberg equilibrium; the deviation of the points corresponding to chromosome 1 in the COR and SAR samples from the expectation is notable. d, e, Number of sites with heterozygosity in >80% of the population per 10-kb window along chromosome 1 (left) and chromosome 3 (right) in Sardinia (d) and Sicily (e).

The relatedness of the populations from Sardinia and Corsica suggests that they may share genome characteristics that differ from those of the populations from Sicily and Tunisia. Indeed, animals in the populations from Sardinia and Corsica had greater heterozygosity on chromosome 1 than expected under Hardy–Weinberg equilibrium (Fig. 3c), whereas the heterozygosity on the other three chromosomes in these populations and on all four chromosomes in the other populations closely followed expectations. By analysing the J/V haplotype markers in wild populations, we found that animals from Sardinia and Corsica were heterozygous J/V, whereas those from Sicily and Tunisia were homozygous J/J. Moreover, the animals from Sardinia (n = 28) had many sites that were heterozygous in more than 80% of the individuals and were distributed along the length of chromosome 1 except near the ends; conversely, the animals from Sicily (n = 27) had very few such heterozygous sites (Fig. 3d, e and Supplementary Table 8). Few such heterozygous sites were observed on chromosomes 2, 3 and 4 in either population (Fig. 3d, e and Extended Data Fig. 3c, d). These analyses suggest that chromosome 1 specifically evolved and diverged on the islands of Sardinia and Corsica.

A sex-primed chromosome

To gain insight into the island-specific suppression of recombination on chromosome 1, we examined the genes located on this chromosome, which contained 39% of all annotated genes. Five of the seven known master regulators of the reproductive system were found on chromosome 1, including nanos5, nhr-1 (refs. 8,9), npy-8 (ref. 6, npyr-1 (ref. 11) and CPEB-2 (ref. 12) (ophis11 and boule-2 (ref. 10) were the two exceptions) (Fig. 4a). Knockdown of any one of these master regulators leads to depletion of both male and female reproductive tissues5,6,8,9,10,11,12. The presence of these genes with crucial roles in sexual development on chromosome 1 suggests that chromosome 1 integrity is important for the maintenance of sexual reproduction. Indeed, the asexual lineage of S. mediterranea, which is devoid of any reproductive organs23, had a translocation from chromosome 1 to chromosome 3 and probably evolved through loss of function of one or more of these genes.

Fig. 4: Acquisition and haplotype-specific expression of sex-related genes.
figure 4

a, Schematic diagram of the inferred structure of the J and V haplotypes on chromosome 1. Arrows denote putative inversions. Names, positions and sex specificity are shown for the eight genes on chromosome 1 with known key roles in the development of the reproductive system. b, Comparison of read counts for the J and V alleles of key genes in the transcriptomes of whole sexually mature worms and penis papillae (PP). Bonferroni-corrected P values from a two-sided binomial test of equal expression are shown. c, Dorsal view of control (top) and Smed-glp knockdown (bottom) sexually mature planarians. Red dashed lines and arrows indicate the posterior and anterior ends of the pharynx and the bursa canal, respectively. Scale bar, 400 um. Yellow dashed lines outline the bursa canal. d, Quantification of the length of the bursa canal relative to that of the whole worm. The y axis shows the percentage of whole-worm length spanned by the bursa canal as a box plot, with individual data points shown (n = 4 each for control and knockdown worms). Statistical significance was assessed with a two-sided, two-sample Student’s t-test (***P = 0.00029). RNAi (glp): minimum, 10.28; maximum, 11.4; median, 10.845; first quartile, 10.375; third quartile, 11.31. Control: minimum, 7.95; maximum, 8.85; median, 8.26; first quartile, 8.06; third quartile, 8.6. e, Model of negative regulation of bursa canal development by Smed-glp alleles.

Chromosome 1 also contained three of the four known master regulators of male or female reproductive tissues (CPEB-1 (ref. 12), onecut13 and Bic-C5; dmd-1 (ref. 36) was the exception) (Fig. 4a). The CPEB-1 gene is specifically required for the development of oocytes and yolk glands12. Loss of onecut and Bic-C expression leads to a ‘no-testes’ phenotype without affecting the ovary5,13. The presence of a female-determining gene on a chromosome that does not recombine provides an ideal foundation for the evolution of a sex chromosome. A loss-of-function mutation in the gene on one of the two homologous chromosomes would turn the chromosome with the mutation into a male-determining chromosome. Similar logic applies to a male-determining gene.

Haplotype-specific expression of sex-specific genes is a predicted signature of a sex-primed autosome. To test this prediction, we examined the expression of 20 genes in sexually mature adult worms and in the male copulatory organ, the penis papilla7, in a J/V line. The 20 genes were chosen because they were characterized as having sex-specific function and/or expression in the literature and include the 8 master regulators mentioned above5,12 (Supplementary Table 9). Of these 20 genes, 13 contained at least one heterozygous variant in the coding sequence, which allowed us to determine whether these genes showed biased expression from the J or V haplotype (Extended Data Fig. 8). We found that a gelsolin-like protein12, Smed-glp, was expressed predominantly from the J haplotype in both whole worms and the penis papilla, with more haplotype-specific expression in the male copulatory organ. Four other genes showed biased expression from the J haplotype in whole worms (KY847529.1 and KY847536.1) or in the penis papilla (KY847539.1 and CPEB-2), and two additional genes showed biased expression from the V haplotype in the male copulatory organ (KY847532.1) or in whole worms (Smed-MSY4) (Fig. 4b and Supplementary Table 9). All seven of these genes were localized within the inversions or close to inversion breakpoints (Supplementary Table 5).

We used FISH with hybridization chain reaction to confirm that the expression of Smed-glp is highly enriched in both the penis papilla and the bursa canal7,37 (Extended Data Fig. 9), which is a female organ used to receive sperm from mating partners. To examine the function of Smed-glp, we used RNA interference (RNAi)14,15 to knock down its expression in young hatchlings. After 8 weeks of feeding, both the knockdown and control hatchlings reached sexual maturity. Notably, the bursa canal was much more pronounced in the knockdown animals (Fig. 4c, d). No obvious morphological differences were observed in the male copulatory organ. These results suggest that the J allele of Smed-glp may have a greater role in preventing over-development of the female bursa canal (Fig. 4e), probably by controlling the number of organ-specific muscle fibres38. This is consistent with a previous observation that J/J individuals show higher egg production, suggesting a larger contribution to female reproduction (Fig. 4b in ref. 24; Supplementary Information).

We have thus identified a chromosome in the hermaphrodite planarian S. mediterranea that does not recombine, is enriched in master regulators of reproductive systems, shows allele-specific expression of sex-related genes, and contains genes orthologous to those on the sex chromosome of Schistosoma mansoni (Supplementary Tables 9 and 10, and Supplementary Information), which is the only known Platyhelminthes species with differentiated sex chromosomes39. These observations collectively led us to propose that chromosome 1 of S. mediterranea is primed for evolution into a sex chromosome.

Chromosome 1 of S. mediterranea marks an intriguing system for studying sex chromosome evolution40. Planarians from the islands of Sardinia and Corsica are heterozygous for the J and V haplotypes, whereas those in Sicily and Tunisia exist as J/J homozygotes. In laboratory crosses involving J/V lines, all hatchlings were J/V heterozygotes24. No V/V planarians have been identified either in nature or in laboratory crosses. We genotyped single zygotes from crosses between J/V individuals and determined that homozygous zygotes (J/J or V/V) exist (Extended Data Fig. 10a and Supplementary Information). We propose that early embryonic lethality leads to the loss of J/J and V/V adults in both nature and laboratory crosses. Such lethality may arise from degeneration of coding sequences on the J and V haplotypes as a consequence of crossover suppression. Indeed, chromosome 1 has an elevated rate of transposable elements and mutations introducing stop codons relative to the rest of the genome (Supplementary Table 11) as well as higher nonsynonymous substitution rates (Supplementary Table 13 and Extended Data Fig. 10b). As a consequence, the J and V haplotypes each carry unique sets of functional genes that are silenced or truncated on the other haplotype (Supplementary Table 11). 

The presence of three nested inversions on chromosome 1 suggests the possibility that recombination suppression on this chromosome may have evolved in a stepwise manner similar to that reported for the human X and Y chromosomes17. In support of this possibility, we observed evidence for three evolutionary strata corresponding to the inverted regions. We used PacBio genome sequencing data26 to identify long reads that bridged the inversion breakpoints, and we validated the three inversions identified by Hi-C (Supplementary Table 12 and Supplementary Information). We observed that the rates of heterozygous sites and synonymous substitutions were elevated within the three inversions in a pattern consistent with the evolutionary strata (Fig. 2a, Supplementary Tables 5 and 13, and Extended Data Fig. 10b, c). We used deep whole-genome sequencing data of individuals from laboratory crosses24 to estimate the de novo mutation rate in sexual planarians to be approximately 1.0 × 10−8 mutations per nucleotide per generation. On the basis of this estimate, it is likely that inversions 3, 1 and 2 evolved approximately 450,000, 320,000 and 260,000 generations ago, respectively.

Although it is not possible to know how chromosome 1 will evolve, our findings provide a snapshot of an incipient stage that supports the hypothesis that sex chromosomes evolve from homologous autosomes that acquire sex-specific roles and cease to recombine16,17,18,19. The locked J/V heterozygous system may facilitate this process by maintaining sex-specific alleles in the planarian population before the evolution of dioecy41. We propose that the planarian chromosome 1 haplotypes provide a unique opportunity to directly examine the molecular characteristics of a sex-primed autosome.

Methods

Planarian husbandry, RNAi and phenotyping

Sexual planarians were fed organic beef liver once a week. All animals used for experiments were selected randomly. For RNAi, hatchlings around 2 weeks old from strain S2F10 were used. RNAi food was prepared by mixing 1 µg of double-stranded RNA with 10 µl of liver paste15. To examine the RNAi phenotypes, photographs were taken with a stereomicroscope (Zeiss) when the animals were freely swimming. The lengths of the worms or bursa canal were measured with ImageJ software by multiple researchers in a double-blind experimental design. The ratio of the bursa canal to the whole worm was used to mitigate potential variations in worm size and their degree of relaxation caused by their soft bodies.

Chromosome sequencing

Chromosomes were collected from multiple animals of one clonal line, S2, maintained in the laboratory by amputation and regeneration. Chromosome spreads were prepared on nuclease-free membrane slides (Zeiss) according to a previously developed protocol except that, at the last step, the tissues were dissociated into single nuclei and placed onto the slides without squashing with a coverslip42. Single chromosomes were identified under a ×40 lens and were collected into the caps of single PCR tubes by PALM MicroBeam laser microdissection (Zeiss). The collected chromosomes were spun down with a tabletop centrifuge in 4 µl PBS, and the DNA in the pellets was amplified with a REPLI-g Single Cell Kit (Qiagen) for sequencing on a MiSeq or HiSeq 3000 sequencing system (Illumina).

Chromosome-scale genome assembly

A Hi-C sequencing library was prepared from multiple animals of the S2 strain using the enzyme DpnII. Sequenced reads were aligned to dd_Smes_g4.fasta26 with bwa mem (version 0.7.17)43. An assembly file was prepared from the SALSA3 output FINAL.fasta with juicebox_scripts (Phase Genomics). The .hic file was prepared by run-assembly-visualizer.sh from three-dimensional de novo assembly44. The two files were loaded into Juicebox4 for scaffold manipulation using split, merge, order and orient commands and for chromosome assembly. The modified assembly file (chromosome-scale) was converted to fasta (Smed_chr_ref_v1) with juicebox_assembly_converter.py.

Hi-C library construction

A Hi-C library was generated using Phase Genomics Proximo Animal Kit version 3.0. Approximately four worms were finely chopped and were then cross-linked for 20 min at room temperature with end-over-end mixing in 1 ml of Proximo cross-linking solution. The cross-linking reaction was terminated with quenching solution for 15 min at room temperature, again with end-over-end mixing. The quenched tissue was rinsed once with 1× chromatin rinse buffer (CRB), transferred to a liquid nitrogen-cooled mortar and ground to a fine powder. The powder was resuspended in 700 µl Proximo lysis buffer 1 and lysed with glass beads for 20 min at room temperature on a vortex mixer. A low-speed spin was used to clear the large debris, and the chromatin-containing supernatant was transferred to a new tube. After a second spin at higher speed, the supernatant was removed, and the pellet containing the nuclear fraction of the lysate was washed with 1× CRB. After removal of the 1× CRB wash, the pellet was resuspended in 100 µl Proximo lysis buffer 2 and incubated at 65 °C for 15 min. Chromatin was bound to the recovery beads for 10 min at room temperature. The beads were placed on a magnetic stand and washed with 200 µl of 1× CRB.

The chromatin bound on the beads was resuspended in 150 µl Proximo fragmentation buffer, and 2.5 µl of Proximo fragmentation enzyme was added. The reaction was incubated for 1 h at 37 °C, cooled to 12 °C and then incubated with 2.5 µl of finishing enzyme for 30 min. After the addition of 6 µl of Stop Solution, the beads were washed with 1× CRB and were resuspended in 100 µl of Proximo ligation buffer supplemented with 5 µl of proximity ligation enzyme. The proximity ligation reaction was incubated at room temperature for 4 h with end-over-end mixing. To this, 5 µl of reverse cross-linking enzyme was added, and the reaction was incubated at 65 °C for 1 h.

After reversing the cross-links, the free DNA was purified with recovery beads, and the Hi-C junctions were bound to streptavidin beads and washed to remove unbound DNA. The washed beads were used to prepare paired-end deep sequencing libraries using Proximo library preparation reagents.

Oocyte and sperm cell sequencing

Sperm cells were released from sexually mature S2 strain animals into calcium- and magnesium-free buffer (1% BSA). The cell dissociation solution was placed onto a slide and examined under a phase-contrast microscope to identify single sperm cells. Oocytes were released from egg capsules. Single sperm cells or oocytes were transferred into single PCR tubes for amplification with a REPLI-g Single Cell Kit (Qiagen). RADseq libraries and whole-genome libraries were prepared and sequenced on a HiSeq 3000 or NovaSeq S2 sequencing system (Illumina). RADseq sequencing data were analysed as described in the linkage map section. Whole-genome sequencing data were analysed as described in the recombination section.

Synteny analysis

For candidate genes, the protein sequences of genes of interest were obtained from Schmidtea specimens. Such genes were aligned to S. mansoni genomes39 using the Protein to Nucleotide BLAST (tblastn) tool. To systematically examine the synteny of the whole genome, we used SonicParanoid (version 1.3.8)45 to identify one-to-one protein orthologues between S. mediterranea and S. mansoni. The S. mansoni genome assembly (V9) and its protein annotations are available at https://zenodo.org/record/5149023#.Ybk9jn3MK3I.

Linkage map

A J/J line, D5, was crossed to a J/V line, S2F8b, to build an F2 population of 93 animals. Genomic DNA was extracted from a fragment of each animal using an Easy-DNA gDNA Purification Kit (K180001, ThermoFisher). Sequencing libraries for RADseq were prepared according to the procedures of Adapterama III31 with a few modifications. Genetic variants were identified using Stacks (version 2.41)46,47]. All variants were filtered with VCFtools (version 0.1.14)48 to remove insertions and deletions and to select biallelic SNVs. Clusters of markers located within 200 bp were removed because they were likely to correspond to repetitive elements. Markers disobeying Mendelian segregation were also removed. Only markers that were homozygous in both parents were used. The linkage map was built with R/QTL49.

Quantifying recombination

Sequencing reads from S2 and its oocytes and sperm cells were aligned to Smed_chr_ref_v1 with bwa mem (version 0.7.17). Genetic variants were jointly called by using the Genome Analysis Toolkit (GATK, version 4.1.4.1) with GenomicsDB and GenotypeGVCFs50. Biallelic heterozygous markers in S2 were further filtered by removing abnormal markers, including those with no segregation of two alleles in the gametes, clusters of markers in close proximity (<200 bp apart) and markers of heterozygosity in sperm cells. The J and V haplotypes were manually phased for oocytes without recombination and were phased by MPR.genotyping51 for all gametes. The MPR.genotyping package was also used to impute or correct missing or erroneous genotypes. The final genotype bins were used to identify and visualize recombination with customized R code. Quantification of recombination was focused on crossovers between long tracks of haplotypes along a chromosome. Putative gene conversion events such as short tracks of haplotype switches encompassing <1% of the chromosome length were not included.

Gene expression analysis

To examine the gene content on chromosome 1, transcriptome data were downloaded from the NCBI Sequence Read Archive (SRA). To examine genes related to sexual development, transcriptomes from sexual adults9,12,52, sexual juveniles9 and sexual adults with nhr-1 RNAi9 were used. To examine stem cell-enriched genes, transcriptomes from sorted X1 cells and CIW4 were used53,54.

All sequencing data were aligned to dd_Smed_v6 (ref. 27) with bwa mem (version 0.7.17). Differential gene expression was analysed with DESeq2 (version 1.26.0)55. Expression was quantified at the transcript level with kallisto (version 0.44.0)56 and was imported and summarized to gene-level count matrices by tximport57.

Haplotype-specific expression

To examine the haplotype-specific expression of critical regulators of the reproductive system, mRNA was extracted from six sexually mature animals of the S2 J/V line and was analysed as three biological replicates, with two animals pooled into each replicate. Nine penis papillae were dissected from nine sexual adult animals of the same line and were analysed as three biological replicates, with penis papillae from three animals pooled into each replicate. All mRNA samples were extracted on the same day and were processed at the same time for library preparation and sequencing to minimize technical variation. Libraries for RNA-seq were prepared with a Clontech SMARTer Stranded Total RNA-seq (Pico) Kit. The workflow consisted of converting total RNA to cDNA and then adding adaptors for Illumina sequencing through PCR. The PCR products were purified, and ribosomal DNA was depleted. The cDNA fragments were further amplified with primers universal to all libraries. Lastly, the PCR products were purified again to yield the final cDNA library. Different adaptors were used to multiplex samples in one lane. Sequencing was performed on an Illumina NovaSeq 6000 with a 150-bp paired-end-read run. Data quality checks were conducted using the Illumina Sequencing Analysis Viewer. Demultiplexing was performed using Illumina Bcl2fastq2 version 2.17.

All sequencing data were aligned to dd_Smed_v6 (ref. 27) with bwa mem (version 0.7.17). To ensure accuracy, haplotype-specific expression of the 14 genes of interest was manually examined using the Integrative Genomics Viewer. J or V allele counts were identified for each biallelic variant in the exons. For a particular gene, the allele counts were aggregated from all variants on all exons for the J and V haplotypes. The allele counts were then subjected to binomial testing and Bonferroni correction58 to determine whether the observed allele bias was statistically significant. P > 1 after Bonferroni correction was set to 1.

De novo map, parameter optimization and phylogenetic inference

All datasets were run through the de novo pipeline as implemented in Stacks version 2.52 (refs. 46,47). First, paired-end reads were demultiplexed and filtered for quality using PROCESS_RADTAGS and were truncated to a length of 135 bp. Individual reads with Phred scores below 30 or uncalled bases were discarded (96.9% of reads passed the quality filters). Optimal parameters were identified following the guidelines of ref. 59 by running multiple iterations of the de novo pipeline and varying just one parameter with each new iteration on a subset of 13 samples from the same population, [Sic_mar], following recommendations60. We varied the minimum stack depth (-m) between 1 and 6 (m1–m6), the number of mismatches allowed between stacks (-M) between 1 and 10 (M1–M10), and the number of mismatches allowed to merge catalogue loci (-n) while keeping all other parameters constant (m3, M2 and n0). We then compared the number of polymorphic assembled loci across samples using a sample representation cut-off of 80% (r80) and gain or loss of polymorphic loci with each new iteration. Once -m and -M were optimized, we assessed -n by evaluating the change in the number of polymorphic loci for n = M − 1, n = M and n = M + 1. RAD loci were then assembled using the denovo_map.pl wrapper in Stacks, and the following parameters were set: m = 3, M = 2 and n = 3.

Assembled loci that were present in 75% of all individuals were kept from POPULATIONS, a minor allele frequency (MAF) filter of 0.04 (--min-maf) was used to filter out singleton SNPs that could mask population structure and a maximum observed heterozygosity (--max-obs-het) filter of 0.99 was used to remove potentially paralogous loci61. Additionally, to build the phylogenetic tree, we concatenated all RADseq loci after filtering (--phylip-var-all). Phylogenetic trees were built by maximum likelihood using RAxML-NG version 0.9.0 (ref. 62), starting from a random seed and applying a GTR+G substitution model and 1,000 bootstrap replicates. The sample from Menorca (Sme7-5_men) was used as an outgroup to root the tree.

Population structure analysis

For this analysis, we excluded the outgroup sequence and ran POPULATIONS to retain loci present in all populations (-p 10) and 75% of the individuals present in each population (-r 0.75). On the basis of the loci that passed our filtering criteria, a random whitelist of 1,000 loci was generated and again run through POPULATIONS with the same criteria but retaining the first SNP at each locus (--write-single-snp). The output was exported in Structure format, and Structure version 2.3.4 (ref. 63) was used to infer population structure with 10,000 chains as burn-in and 100,000 MCMC chains with 20 iterations for K = [1–11]. The resulting files were run through Structure Harvester64, and the optimal K was determined65.

DNA FISH

For telomere FISH on oocyte chromosomes, ovaries were dissected as reported previously66,67. Hybridization was carried out as reported with chromosome spreads42 except that the ovaries were kept in suspension in washing buffer or hybridization buffer. Before hybridization, the dissected ovaries were treated with a digestion buffer (0.1% SDS and 10 µg ml−1 proteinase K (Qiagen) in 0.3% Triton X-100 in PBS) for 10 min at room temperature. The repeat located near centromeres in Fig. 2c had the following sequence: TCTGGACGGAAATTTTTTAATCTTTATAGGCTTGTATCTCTGTCAATTTTTATTTGTTTTCATAATCTTTGATATATTTCTCGATAACTTTTGATTCTCTACATGATAGCATTTTAAAAATTGCAAAAATCATAACGGGCTCGTCAAACACAAGTCAT.

Hybridization chain reaction and RNA FISH

To examine tissue expression of the genes glp and smedwi-1, we used probes and buffers for third-generation ISH chain reaction purchased from Molecular Instruments. Sexually mature planarians were treated with 7.5% N-acetyl-l-cysteine (Sigma-Aldrich) for 10 min and were then fixed in 4% paraformaldehyde (Electron Microscopy Sciences; 16% solution diluted 1:4 in PBS) for 20 min. The copulatory apparatus was dissected into a 1.5-ml RNase-free tube. The rest of the procedures followed the hybridization chain reaction RNA FISH protocol of Molecular Instruments for whole-mount mouse embryos.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.