Introduction

Forward genetic screening is one of the most common and effective methods for identifying phenotypic mutants, which can be further genetically dissected to pinpoint the causal genetic mutations. Unfortunately, genetic screening using spontaneous mutations to study genotype-phenotype relationships in eukaryotes is infeasible due to the low mutation rates, which is usually on the order of 10−8 to 10−10 per nucleotide site (Baer et al. 2007; Lynch et al. 2008; Krasovec et al. 2017). This is because approximately 1/(mutation rate per gene) individuals need to be screened to obtain a mutation in a particular gene (Kutscher and Shaham 2014). Although this number can be reached with bacteria due to their fast reproduction rates and ease of maintenance, it is impractical to get to this large number for multi-cellular eukaryotic organisms with much lower reproduction rates, longer generation times, and purifying selection pressure.

Since the discovery of X-ray induced mutations by H.J. Muller (Muller 1927), mutagens have been used to establish mutagenized screening populations with a manageable number of individuals, while keeping lethality and sterility to a minimum (Kutscher and Shaham 2014). To date a variety of mutagens, e.g., N-ethyl-N-nitrosourea (ENU), trimethylpsoralen with ultraviolet light (UV/TMP), are available to mutagenize both prokaryotes and eukaryotes, allowing researchers to efficiently implement forward genetic screenings.

Among these mutagens, ethyl methanesulfonate (EMS) has been commonly used for genetic screens in many different biological systems (Sega 1984). As an alkylating agent, EMS induces chemical modifications of nucleotides. It was first demonstrated by Brookes and Lawley (1961) that EMS primarily causes guanine alkylation, leading to the formation of O6 ethylguanine, and results in mutations through mispairings with thymine in DNA replication and repair. Therefore, EMS mutagenesis is heavily biased towards G:C to A:T transitions (Coulondre and Miller 1977). In addition to single-base mutations, EMS has also been shown to cause, to a much lesser extent, indels (insertions and/or deletions) and chromosomal breaks (Sega 1984; Greene et al. 2003). Capable of inducing mutations randomly across the genome (Greene et al. 2003), EMS can therefore be used to generate loss- or gain-of-function mutants as well as weak nonlethal alleles (Lee et al. 2003).

EMS mutagenesis experiments were first done in the T2 viral system by Loveless and Haddow (1959), and was later expanded to Drosophila melanogaster (Lewis and Bacher 1968) and Caenorhabditis elegans (Brenner 1974). Although EMS mutagenesis has been applied to an increased number of organisms including Arabidopsis thaliana (McCallum et al. 2000; Greene et al. 2003; Martín et al. 2009) and Saccharomyces cerevisiae (Prakash and Higgins 1982; Mobini-Dehkordi et al. 2008), the list of species with an EMS mutagenesis screening protocol remains limited.

In this work, we aim to develop an effective genetic screening strategy based on EMS mutagenesis for the freshwater microcrustacean Daphnia. With world-wide distribution in nearly all kinds of freshwater habitats, Daphnia has been studied for more than 200 years (Ebert 2005) and has been a model system in ecology, toxicology, and evolution (Altshuler et al. 2011). As the first crustacean to have its whole genome sequenced (Colbourne et al. 2011), and with the development of new genomic tools, Daphnia finds itself with tremendous new momentum in empowering researchers to address many consequential biological questions with genomic insights.

Daphnia represents an important pan-crustacean lineage in metazoan evolution. However, about a third of the Daphnia genes remain poorly understood for their functions because they are lineage-specific and lack orthologues in other eukaryotic genomes (Colbourne et al. 2011; Ye et al. 2017). Understanding the function of these lineage-specific genes is critical for gaining insights into invertebrate evolution, the genomic adaptation to a freshwater lifestyle, and the genetic basis of novel phenotypes in Daphnia. We therefore envision that a forward genetic screening approach would be a valuable tool to aid such efforts.

Daphnia typically reproduces by cyclical parthenogenesis, in which they switch between clonal (asexual) and sexual reproduction depending on environmental conditions. Under favorable conditions, females reproduce asexually, producing chromosomally unreduced, diploid embryos that directly develop into genetically identical daughters. These directly developing embryos can develop into males under stressful conditions (e.g., crowding, lack of food). Environmental stress can also induce females to switch and produce haploid eggs through meiosis, which upon fertilization by sperm become diapausing embryos. These diapausing embryos, deposited in a protective case (i.e., ephippium), can hatch under suitable environmental conditions and remain viable for many, often up to hundreds of years (Frisch et al. 2014). Interestingly, some Daphnia lineages have transitioned to obligate parthenogenesis (Lynch et al. 2008; Xu et al. 2015). These lineages forgo sex and use parthenogenesis to produce diapausing embryos under stress, while still asexually producing directly developing embryos in favorable conditions.

Cyclically parthenogenetic reproduction and a few other life history characteristics make Daphnia well amenable to large-scale forward genetic screening using EMS mutagenesis. Multiple clonal females of the same genotype (F0 individuals) can be exposed to EMS, during which mutations can be introduced into the genomes of oocytes (Fig. 1). Females which have been exposed to EMS can then asexually produce mutant female offspring (F1s), which would carry all EMS-induced germline mutations in the heterozygous state. Each of the F1 individuals can be used to propagate genetically identical female and male progenies, and siblings can be crossed (equivalent to selfing) to produce sexual progeny (F2s) that carry 25% of the EMS-induced mutations in the homozygous state (Fig. 1). Screening these F2s can be performed to identify phenotypes of interest, followed by further genetic analyses to pinpoint the underlying genotypes. Furthermore, the short generation time (7–10 days), large number of broods per female, and easy animal maintenance in lab conditions together make it manageable to screen thousands of Daphnia mutant lines.

Fig. 1: Forward genetic approach for obtaining mutant lines in Daphnia.
figure 1

This study used F1 mutants to determine the mutation rate and spectrum of EMS-induced heritable mutations.

Implementing this EMS screening strategy requires an understanding of the genome-wide EMS-induced heritable mutation rate in Daphnia. Accurate estimates of this rate allow us to gauge the number of mutagenized individuals that are needed to reach a saturation point where nearly every gene in the genome has been mutated a few times. However, no studies have examined EMS-induced mutations in Daphnia. We therefore set out to perform a series of experiments to investigate the genome-wide EMS-induced heritable mutation rate and spectrum in Daphnia.

Our experiments mainly test three hypotheses that can have major impact on the screening design. First, we hypothesize that a higher, non-lethal concentration of EMS causes a higher germline base-substitution rate than a lower concentration, while the mutation spectrum between different concentrations remains similar due to the mutagenic properties of EMS. Understanding the impact of EMS concentrations on base-substitution rates can help determine how we can most efficiently introduce the desired number of mutations into the screening population. In this study, we test the mutational effect of 10 vs 25 mM EMS solution.

Second, we hypothesize that the different broods of the same female are affected by EMS-induced mutations in an independent manner (e.g., location of mutations, mutation rate). In Daphnia females, all the primary oocyte nuclei are deposited in the germarium at the posterior end of ovary (Kato et al. 2012). Once exposed to an EMS solution, all the primary oocyte nuclei could be independently mutagenized by EMS. If this hypothesis holds true, it would mean that the different broods of females exposed to EMS can all be used to establish mutant lines. To this end, we specifically examine the heritable mutations in the first, second, and third broods of EMS-treated Daphnia females.

Third, we hypothesize that the EMS-induced heritable base-substitution rate should be highly similar between different Daphnia species. Although the spontaneous mutation rate in different Daphnia species/population may evolve to different levels largely due to their varying population genetic environments (Ho et al. 2020), the EMS-induced mutation rate is most likely highly similar between species because the EMS concentration and means of exposure are most likely the greatest determinants of the induced mutation rate. To test this hypothesis, we examined EMS-induced mutations in three species (cyclically parthenogenetic D. pulex, D. pulicaria, and obligately parthenogenetic D. pulex) and multiple genotypes from different populations in each species.

Lastly, based on our results of the EMS-induced mutation rate in Daphnia, we performed a power analysis of experimental design (e.g., number of required F1s and F2s) for genetic screening in Daphnia to guide screening efforts in the future.

Materials and methods

Experimental animals

A total of three cyclically parthenogenetic (CP) Daphnia pulex isolates (Tex21, SW4, and Povi4), three obligately parthenogenetic (OP) Daphnia pulex isolates (DB4-1, DB4-2, and DB4-4), and three CP Daphnia pulicaria isolates (AroMoose, RLSD26, and Warner5) were used in this study. These isolates were previously collected from various pond and lake populations across the US and Canada (Supplementary Table S1). They have been kept in the lab as clonally reproducing lines in artificial lake water (Kilham et al. 1998) under a 16:8 h (light:dark) cycle at 18 °C. We fed them with the green algae Scenedesmus obliquus twice a week.

Determining tolerable EMS concentrations

Survival experiments were performed at four different EMS concentrations (i.e., 10, 25, 50 and 100 mM) to determine a tolerable range for Daphnia females. Since no prior data were available regarding EMS tolerance in Daphnia, these four concentrations were established by referencing standard EMS mutagenesis protocols in other model organisms such as C. elegans and D. melanogaster. For C. elegans, the standard mutagenesis protocol entails exposure to 50 mM EMS for 4 h (Brenner 1974) to achieve a mutation rate of 2.5 × 10−3 per gene per generation (Gengyo-Ando and Mitani 2000), while D. melanogaster is fed 25 mM EMS (Lewis and Bacher 1968) to achieve a mutation rate of 1 × 10−3 per gene per generation (Greenspan 1997).

We tested these concentrations on mature females from the three OP D. pulex isolates (DB4-1, DB4-2, and DB4-4). Three replicates of ten females from each isolate were simultaneously placed in 1 mL of EMS solution at the concentrations of 10, 25, 50, and 100 mM, respectively. The exposure lasted four hours. The treated animals were then transferred to artificial lake water, and the survival rate was recorded after 24 h.

Although no animals survived the 4-hour exposure to 50 and 100 mM EMS, 100% and 60% of females survived the 10 and 25 mM treatments, respectively (see Results and Supplementary Table S2). Therefore, we used these two concentrations in our subsequent mutagenesis experiments.

Establishing EMS mutant lines

To examine the rate and spectrum of heritable mutations induced by 10 and 25 mM EMS treatments, sexually mature Daphnia females from each isolate were exposed to these two concentrations for 4 h, respectively. For females of each isolate, the exposed animals (F0 individuals) were isolated and kept individually in benign laboratory conditions. The first brood of asexually produced progenies (F1s) from the F0s were then collected and individually isolated because F1s are derived from oocytes whose DNA may be differentially mutagenized by the EMS. For each natural Daphnia isolate at each concentration, we established two replicate mutant lines by growing two different F1s clonally, with each F1 propagating into a mass asexual culture (Fig. 2A). These asexual progenies were whole-genome sequenced to detect EMS-induced heritable mutations that occur in the germline of F0 individuals.

Fig. 2: Experimental procedure for establishing EMS mutant lines.
figure 2

A Establishing replicate mutant lines of a Daphnia isolate. B Establishing brood-specific mutant lines of a Daphnia isolate.

Furthermore, to understand whether the EMS-induced mutation rate and spectrum differed between consecutive broods of the same F0 females, EMS mutant lines were established with the same procedure as above using one F1 from the first-brood (BR1), second-brood (BR2) and third-brood (BR3) at both 10 and 25 mM EMS treatment (Fig. 2B). We examined the brood effects in three isolates, AroMoose (D. pulicaria), Tex21 (CP D. pulex), and DB4-4 (OP D. pulex).

Whole-genome sequencing of EMS mutant lines

We collected a total of 40–50 clonal offspring of each EMS mutant line for DNA extraction using a CTAB (Cetyl Trimethyl Ammonium Bromide) method (Doyle and Doyle 1987). The concentrations of the DNA samples were measured using a Qubit 4.0 Fluorometer (Thermo Fisher), and DNA quality was checked by electrophoresis on a 2% agarose gel. DNA sequencing libraries were prepared by the Novogene Company following standard Illumina sequencing library protocol. Each library was sequenced on an Illumina Novaseq 6000 platform with 150-bp paired-end reads, with a targeted sequencing coverage of 30x per mutant line.

Computational pipeline for identifying mutations

Our computational pipeline for identifying mutations was constructed by incorporating the strengths of mutation calling procedures from previous Daphnia mutation accumulation studies (Keith et al. 2016; Flynn et al. 2017; Bull et al. 2019). We used the Burrows-Wheeler Alignment Tool BWA-MEM version 0.7.17 (Li and Durbin 2010) with default parameters to align the raw reads of each mutant line to either the Daphnia pulex (Ye et al. 2017) or D. pulicaria (Jackson et al. 2021) reference genome. SAMtools (Li et al. 2009) was used to remove reads that mapped to multiple locations in the genome and retain only uniquely mapped reads for downstream analyses, which helps to reduce false positive calls of mutations. The MarkDuplicates function of Picard tools (http://broadinstitute.github.io/picard/) was used to locate and tag PCR duplicates. We used the mpileup and call functions of BCFtools (Li 2011) to generate genotype likelihoods and genotype calls in a VCF file containing all EMS mutant lines derived from each natural Daphnia isolate. Default parameters were used for BCFtools mpileup and call functions. In addition, we added the following FORMAT and INFO tags to the VCF file: AD (allelic depth), DP (number of high-quality bases), ADF (allelic depth on forward strand) and ADR (allelic depth on reverse strand). We also used the filter function of BCFtools to retain only biallelic single nucleotide polymorphisms sites (SNPs) with a quality score (QUAL) > = 20, sequencing depth (DP) > = 10, and a distance > = 50 bp from an indel in each mutant line. We did not examine indels because previous work have shown a very low rate of EMS-induced indels, with < 2.8 deletions and < 0.6 insertion per mutant line (Flibotte et al. 2010; Shiwa et al. 2012; Henry et al. 2014).

A custom python script (all scripts in this study are available at https://github.com/Marelize007/EMS_mutagenesis_daphnia) was used to identify mutations using a consensus method. We generated one VCF file consisting of the genotype data of all EMS mutant lines derived from the same natural Daphnia isolate. For each SNP site, we established the consensus genotype call (i.e., genotype of natural isolate) using a majority rule. With N samples in a VCF file, the consensus genotype of a site needs to be supported by at least N-1 samples. If an EMS line shows a genotype different from the consensus genotype, a tentative mutation is identified.

This approach allowed us to detect only mutations that were unique to one EMS line and were not shared between multiple lines derived from the same isolate. The rationale of this approach is that because EMS induces mutations at random locations in the genome, with a sample size of no greater than 10 mutant lines per natural isolate and a 200-Mb Daphnia genome size, it is highly unlikely that two lines would have mutations at the same site.

We further examined these tentative mutations to establish the final pool of mutations using two criteria. First, a mutant allele must be supported by at least two forward and two reverse reads to avoid false positives due to sequencing error. Second, a mutant genotype is recognized only when it is a heterozygous genotype derived from a homozygous consensus (i.e., wildtype) genotype. This criterion is to avoid false positives caused by allele drop due to insufficient sequence coverage or artifacts in library construction at heterozygous sites. We note that this criterion excludes less than 2% of genomic sites from our analyses as heterozygosity in natural Daphnia isolates is about 1–2% (Lynch et al. 2017).

Mutation validation with Sanger sequencing

To evaluate the robustness of our mutation calling pipeline, Sanger sequencing was used to verify 20 randomly selected mutations from our final pool of mutations. Primers were designed using Primer 3 (Rozen and Skaletsky 1999) in order to amplify a 300–400 bp region of DNA centered at a mutation. We performed PCR on the genomic DNA of the mutant line from which the mutations were identified. BigDye Terminator v3.1 (ThermoFisher) was used for the sequencing reactions on the PCR amplicons, and the sequencing reaction products were sequenced on a 3130xL Genetic Analyzer (Applied Biosystems) at the Life Science Core Facility, University of Texas at Arlington. We examined the electropherograms in SnapGene® Viewer (GSL Biotech) to determine whether the Sanger genotype of the mutation site matches the genotype from our whole-genome sequencing data.

Mutation rate calculation

The per site per generation mutation rate was calculated for each mutant line using the formula µ = m/n *1, where m represents the total number of mutations detected in each line, n is the total number of genomic sites with > =10 coverage and 50 bp distance away from an indel in each line, and 1 represents one generation. This equation likely results in an underestimated mutation rate because the number of total sites is not subject to as much filtering as the mutations. The per gene per generation mutation rate was calculated using the following formula µg = mg/ng *1, where mg is the total number of mutations located within genic regions (including UTRs, introns, and exons) in each line, ng is the total number of genes analyzed in each line, and 1 represents one generation. To calculate the non-synonymous mutation rate per gene per generation, the same formula was used, with mg representing the number of non-synonymous mutations.

Annotating effect of EMS-induced mutations

We used the cancer mode (-cancer) with default parameters of SnpEff version 4.0 to functionally annotate mutations and predict their effects (Cingolani et al. 2012). This mode allowed us to directly compare the mutant genotypes against the wildtype genotypes and infer the genomic effects of the mutations.

Sequence motifs of EMS-induced mutations

To examine whether any sequence motifs are over- or under-represented surrounding the mutated sites, we performed a sequence motif enrichment analysis. We extracted the 3-bp sequence centered at the mutated sites (5ʹ-3ʹ orientation) from the 10 and 25 mM datasets, divided them into four groups based on the mutated site (NAN, NTN, NCN, and NGN), and calculated the observed frequency of the 16 motifs in each group in the D. pulex and D. pulicaria reference genome using Compseq (http://emboss.open-bio.org/rel/rel6/apps/compseq.html). The expected number of EMS-induced mutations for each motif under a random distribution hypothesis was calculated as the product of the total number of EMS-induced mutations from all mutant lines (10 and 25 mM) and the observed trinucleotide frequency. We then performed a chi-square test on each motif to test whether its observed number of mutations deviates significantly from the expectation under a random distribution hypothesis with Bonferroni-corrected p-values.

Mutagenesis power analysis

Using our estimated EMS-induced heritable per-gene mutation rate, we calculated the probability for finding at least 1 F1 animal heterozygous for a mutation in a gene of interest, using the equation 1 – (1 – r)n, where r is the per gene mutation rate and n is the number of F1s (Shaham 2007). The (1 – r)n term denotes the probability for none of the sampled F1s carrying a mutation at the gene of interest.

Results

Daphnia survival rate after EMS treatment

One of the major effects that EMS exposure had on Daphnia was survival. We obtained the survival rate for three OP Daphnia pulex isolates (DB4-1, DB4-2, DB4-4) exposed to EMS concentrations of 10, 25, 50mM and 100 mM for four hours. All Daphnia exposed to 50 and 100 mM EMS died during or after treatment (within 24 h). At lower EMS concentrations, 100% of the animals treated with 10 mM EMS survived, whereas the average survival rate was 60.0% (SD = 8.8%) for animals treated with 25 mM EMS (Supplementary Table S2).

Whole-genome sequencing data

A total of 43 Daphnia mutant lines derived from 10 or 25 mM EMS treatment were whole-genome sequenced with 150 bp Illumina paired-end reads (Supplementary Table S3 and Table S4). A total of ~6GB of raw sequence data was obtained for each mutant line. Each line had on average ~35 million mapped reads after removing PCR duplicates and reads that mapped to multiple locations, yielding an average coverage of 26 (SD = 3) reads per site in each line.

Mutation validation using Sanger sequencing

Among the EMS-induced germline base substitutions identified using our computational pipeline (see below), 20 were selected for Sanger sequencing verification. We confirmed that all the selected mutations had concordant genotype calls between the Sanger sequencing and Illumina whole-genome data. This suggests that our computational pipeline for identifying EMS-induced germline mutations was robust and that the false positive rate in our dataset was low, most likely 0.05 (i.e., 1 false positive out of 20 mutations, Supplementary Table S11).

EMS-induced heritable base-substitution rate

Across CP D. pulex, CP D. pulicaria, and OP D. pulex, we whole-genome sequenced 12 mutant lines treated with 10 mM EMS and 14 mutant lines treated with 25 mM EMS to detect germline mutations (Supplementary Table S3). Consistent with our expectation that EMS-induced mutations in the germline occur at an elevated rate relative to spontaneous mutations, the base substitution mutation rates for lines derived from 10 mM treatment ranged from 9.40 × 10−7 to 1.32 × 10−6 (mean = 1.17 × 10−6, SEM = 1.84 × 10−7, see Table 1 and Fig. 3), a few hundred times higher than the spontaneous mutation rate which ranges from 2.30 × 10−9 to 7.17 × 10−9 per site per generation (Keith et al. 2016; Flynn et al. 2017; Bull et al. 2019). Although there may be false positives in our dataset, the rate of such mis-identified mutations is most likely much smaller than 0.05 based on Sanger sequencing verification. We found no significant difference in the mean base substitution mutation rate or per gene mutation rate among the three Daphnia species at 10 mM (ANOVA p > 0.1). Across the threes Daphnia species, the average per gene mutation rate and per gene non-synonymous rate of the 10 mM treatment lines are 2.65 × 10−3 (SEM = 3.32 × 10−4) and 1.19 × 10−3 (SEM = 1.71 × 10−4), respectively (Table 1).

Table 1 Summary of mutations induced by 10 and 25 mM EMS.
Fig. 3: Base-substitution mutation rates of three Daphnia species at 10 and 25 mM EMS treatment.
figure 3

The bar plot summarizes the species-specific rates based on multiple isolates of each species, whereas the scatter plot represents brood-specific mutation rates in each species.

Notably, mutant lines from the 25 mM EMS treatment showed on average a higher base substitution mutation rate than those from the 10 mM treatment, yielding strong support to our first hypothesis. The base substitution rate for the 25 mM ranged from 1.58 × 10−6 to 1.98 × 10−6 across the three species. An ANOVA test also indicated no significant difference in EMS-induced base-substitution rates across the three species at this concentration (p > 0.1). The average base substitution rate (1.75 × 10−6 per site per generation, SEM = 6.82 × 10−7, Table 1, Fig. 3) across the three species was significantly higher (a 0.5-fold increase) than that at 10 mM (mean = 1.17 × 10−6, SEM = 1.84 × 10−7, t test p = 0.0052). The average per gene mutation rate (4.09 × 10−3 per gene per generation, SEM = 4.31 × 10−4, Table 1) and average nonsynonymous mutation rate (1.91 × 10−3 per gene per generation, SEM = 1.81 × 10−4, Table 1) across the three species at 25 mM also showed an increase of 0.5 and 0.6 fold compared to those at 10 mM, respectively.

EMS-induced heritable base-substitution rate in consecutive broods

We hypothesized that consecutive broods produced by F0 females carry independent EMS-induced germline mutations as progenitor cells of oocytes are differentially affected by EMS in F0s. To test this, we sequenced a total of 17 first-brood (BR1), second-brood (BR2), and third-brood (BR3) mutant lines treated with 10 and 25 mM EMS in three Daphnia isolates, Tex21 (CP D. pulex), AroMoose (CP D. pulicaria), and DB4-4 (OP D. pulex) (Supplementary Table S4).

Our results of the brood-specific mutation rate and spectrum in these three Daphnia isolates clearly supported our hypothesis. As our ANOVA tests indicated no significant variation in the base-substitution mutation rate among species/isolates, we do not distinguish among the species/isolate in the description below. Consistent with the base-substitution mutation rate at 10 mM, the average base-substitution mutation rates for BR1, BR2, and BR3 progenies at 10 mM are 6.58 × 10−7 (SEM = 2.07 × 10−8), 5.70 × 10−7 (SEM = 9.20 × 10−8) and 9.90 × 10−7 (SEM = 2.90 × 10−7), respectively (Fig. 3). Similarly, the average base-substitution mutation rates for BR1, BR2, and BR3 lines at 25 mM are 1.86 × 10−6 (SEM = 6.95 × 10−7), 3.75 × 10−6 (SEM = 2.02 × 10−6) and 4.49 × 10−6 (SEM = 1.77 × 10−6), respectively, significantly higher than those at 10 mM (ANOVA p = 0.039) (Fig. 3).

When comparing the base-substitution mutation rates between the BR1, BR2 and BR3 lines of the same concentration, no significant difference was found (ANOVA p = 0.34), indicating that the EMS induced base-substitution rate remained similar for at least the first three broods of the exposed F0 mother. The mean per gene mutation rate and non-synonymous mutation rate for the first three broods were also higher at 25 mM when compared to lines treated with 10 mM EMS (Supplementary Table S4). It should also be emphasized that the identified base substitutions in the first three consecutive broods of the same Daphnia isolate all occurred at unique sites in the genome, supporting that EMS induced heritable mutations in these broods in an independent manner.

Spectrum and genomic effects of EMS-induced germline base substitutions

As expected and previously seen in other model organisms such as C. elegans (Flibotte et al. 2010) and D. melanogaster (Pastink et al. 1991), EMS primarily produced G:C to A:T transitions in all of the sequenced Daphnia mutant lines. On average 87% (SD = 8%) of the base substitutions in the 10 mM treatment lines are G:C to A:T transitions, resulting in an elevated transition-transversion ratio greater than 4.1 for all lines (Fig. 4A, Supplementary Table S3 and S5). Mutant lines of 25 mM EMS treatment were also highly biased towards G:C to A:T transitions (mean = 86%, SD = 7%), yielding a transition-transversion ratio greater than 4.8 for all lines (Fig. 4A, Supplementary Table S3). The observed transition-transversion ratio is much higher than those from spontaneous mutation accumulation experiments in Daphnia (e.g., Keith et al. 2016). Dominance of the mutation spectrum by G:C to A:T transitions was also seen in the BR1, BR2 and BR3 mutant lines for both 10 and 25 mM EMS treatments, further substantiating the idea that EMS successfully induced heritable mutations in consecutive broods (Fig. 4B, Supplementary Tables S4 and S6).

Fig. 4: Average proportions of different types of base substitutions in each species at 10 and 25 mM EMS treatment.
figure 4

A Composition of base substitutions in each species. B Composition of base substitutions in different broods.

Concordant with the notion that EMS induces mutation randomly across the genome, the distribution of EMS-induced mutations for the 10 and 25 mM mutant lines were highly similar (ANOVA p = 1) and did not show enrichment in specific genomic regions (chi-squared test p = 0.40). In mutant lines treated with 10 mM, on average 34 (12%) of the induced mutations reside in exons, 14 (5%) in introns, 4 (1.3%) in 3ʹ UTR, 3 (1.1%) in 5ʹ UTR, and 59 (20.8%) in intergenic regions, whereas for lines treated with 25 mM, on average 65 (12.3%) mutations reside in exons, 27 (5.2%) in introns, 9 (1.7%) in 3ʹ UTR, 6 (1.1%) in 5ʹ UTR, and 114 (21.6%) in intergenic regions (Fig. 5A and Supplementary Table S7).

Fig. 5: Summary of distribution of base substitutions and amino acid changing effects.
figure 5

Average proportions of base substitutions in different genomic regions (A and B) and amino acid changing effects at 10 and 25 mM EMS concentration (C and D).

Furthermore, regarding exonic mutations, for the 10 mM treatment on average 23 (65.5%) were missense, 1 (3.2%) nonsense (stop-gained) and 11 (31.3%) silent. The 25 mM treatment once again produced very similar results with 44 (67.8%) missense, 3 (5.2%) nonsense (stop-gained) and 18 (27%) silent (Fig. 5C and Supplementary Table S7). The genomic distribution of mutations and exonic effects for the BR1, BR2 and BR3 lines also remained similar between the different broods and treatments, and mirrored the results summarized above (Fig. 5B, D and Supplementary Table S8). The observed ratios of non-synonymous vs synonymous changes do not significantly deviate from the 3:1 ratio based on considering all possible base substitutions at all codon sites (Graur and Li 2000).

Motif analysis of mutated sites

For the NAN and NTN trinucleotide motifs (Fig. 6), all of the trinucleotides were significantly under-represented (chi-squared test p < 0.05). Among the NCN trinucleotides (Fig. 6), the TCG, ACG, CCG, TCC. CCT, TCT, GCC, ACC, CCC and GCG (5ʹ-3ʹ orientation) were significantly over-represented (chi-squared test p < 0.05). For the NGN trinucleotides, we found significant over-representation of the GGT, AGG, CGG, CGA, GGA, GGC, CGC, and GGG (chi-squared test p < 0.05, Fig. 6 and Supplementary Table S9).

Fig. 6: Bars represent the proportion of EMS-induced mutations at each trinucleotide motif centered at mutated sites (5ʹ-3ʹ orientation), whereas the lines represent the observed proportion of trinucleotide frequencies observed in the Daphnia reference assemblies.
figure 6

NAN and NTN trinucleotides are significantly underrepresented, whereas many of the NGN and NCN motifs are overrepresented (indicated by asterisks).

Number of F1s for reaching mutation saturation

Based on the average base-substitution rate per gene at 25 mM EMS treatment across three Daphnia species (~4 × 10−3 per gene per generation), ~750 F1s are needed to find at least one F1 animal heterozygous for a mutation in a gene of interest with 95% probability. With ~750 F1s, a total of 54,000 genes would have been mutated, translating to roughly 3 mutations per gene given the ~18,000 number of genes in the D. pulex genome.

Discussion

This study examines the genome-wide EMS-induced heritable mutations in three microcrustacean Daphnia species at different EMS concentrations. We demonstrate that exposure to 10 or 25 mM EMS solution for 4 h can readily induce mutations in the oocytes that Daphnia females carry at a rate that is hundreds of times higher than the spontaneous mutation rate (Keith et al. 2016; Flynn et al. 2017; Bull et al. 2019), establishing a useful protocol that can be used for obtaining mutant lines for screening experiments. Since our ultimate goal is to establish a forward genetic method for Daphnia, we will compare our results to those of three model organisms (i.e., C. elegans, D. melanogaster, and A. thaliana) that have well-established EMS mutagenesis protocols (Page and Grossniklaus 2002; St Johnston 2002; Jorgensen and Mango 2002).

As we hypothesized, the concentrations of EMS are indeed a major determinant of the induced mutation rate. The base-substitution mutation rate is significantly higher for the mutant lines from 25 mM treatment than from the 10 mM treatment lines, showing a 0.5-fold increase, with both rates hundreds of times higher than the spontaneous base substitution rate (Keith et al. 2016; Flynn et al. 2017; Bull et al. 2019). Nonetheless, all lines from both treatments show the mutation spectra characteristic of EMS mutagenesis, with a strong bias towards G:C to A:T transitions, averaging 87% and 86% for 10 and 25 mM mutant lines, respectively. This is a substantial increase from the previously reported ~66% G/C to A/T ratio in Daphnia spontaneous mutation accumulation lines (Keith et al. 2016).

With an average of 78 (SD = 13) genes affected by mutations per line treated with 25 mM EMS, the EMS induced per gene per generation mutation rate was 4.1 × 10−3, 3.2 × 10−3, and 4.4 × 10−3 for OP D. pulex, CP D. pulex and D. pulicaria, respectively (Table 1). In comparison, this rate is higher than those in C. elegans (Gengyo-Ando and Mitani 2000), D. melanogaster (1.0 × 10−3, Spradling 1997) and A. thaliana (Ossowski et al. 2010).

Our observed mutation spectrum of EMS-induced base substitutions is also consistent with earlier observations in other model organisms. In comparison, the proportion of G:C to A:T transitions in Daphnia is higher than that in C. elegans (66%, Sarin et al. 2010), similar to that in D. melanogaster (70–84%, Winkler et al. 2005; Cooper et al. 2008), and much lower than that in A. thaliana (>99%, Greene et al. 2003). Evidently, the spectrum of EMS-induced mutations is dominated by G:C to A:T transitions across eukaryotic species, although the ratio greatly varies. Presumably, the concentration of EMS and the means of exposure to EMS could contribute to this difference across species because each species has its own specific experimental procedures (e.g., EMS exposure through feeding, soaking seeds). Nonetheless, whether species-specific DNA repair mechanisms are a contributing factor remains to be clarified by future investigation.

We were also interested in whether induced germline mutations occur in an independent manner in consecutive broods produced by the same female Daphnia exposed to EMS, i.e., whether these progenies are all genetically distinct due to the induced mutations. Our results show that EMS mutagenesis can successfully induce germline mutations in the first three consecutive broods, while the mutation spectrum remains highly similar between broods and between different EMS concentrations. Because all the identified mutations are unique across mutant lines, this supports that the progenitor cells of oocytes were independently affected by EMS. We suggest that the progenies of at least the first three consecutive broods can be used to establish F1 mutant lines in screening experiments.

One important reason that EMS mutagenesis is used for screening experiments is because EMS is expected to induce mutations at random locations throughout the genome. Our results showed that the distribution of induced mutations in mutant lines from 10 and 25 mM treatments are highly similar, and that no genomic regions (e.g., exons, introns,) are significantly enriched with mutations (Fig. 5A).

The trinucleotide motif analysis showed a significant under-representation of NAN and NTN trinucleotides, consistent with the strong preference EMS has to mutate G and C nucleotides. The analysis further shows that most trinucleotides enriched with EMS-induced mutations are characterized by at least two adjacent G/C nucleotides, a novel feature of EMS mutagenesis that has previously not been identified. Studies in A. thaliana reported an excess of purines in the −1 and +1 positions, with adenine favored over guanine, a deficiency of guanine in the −2 position, and an excess of guanine in the +2 position (Greene et al. 2003). In D. melanogaster a strong purine bias, mostly of guanine, was reported in the positions flanking the mutation site (Bentley et al. 2000). These observations suggest that EMS might show preference for certain sequence motifs, but the target motifs can differ between species, likely due to nucleotide composition differences.

Last, we offer a few recommendations for performing genetic screening in Daphnia. We determined the number of function affecting mutations induced by EMS, with 76, 59 and 123 per generation for OP D. pulex, CP D. pulex and D. pulicaria respectively. In comparison, EMS mutagenesis in C. elegans, D. melanogaster, and A. thaliana produces around 49, 14, and 83 function affecting variants per generation respectively (Table 2).

Table 2 Number of genes, spontaneous base substitution rate, EMS treated mutation rate and estimated number of function affecting variants for different model organisms.

A simple screening can be easily performed on F1s in Daphnia. As we calculated, 750 F1s would be sufficient to contain 3 mutated copies of each gene, with at least 1 mutation residing in any single gene at 95% probability. Because all the induced mutations exist in the heterozygous state in the F1s, only mutants with dominant mutations causing observable morphological alterations can be scored. However, we note that high-throughput molecular assays can be applied for screening F1s to detect molecular phenotypic changes caused by recessive mutations if the costs for screening ~1000 individuals are manageable.

Furthermore, we can perform sibling crossing between progenies of F1 individuals to obtain F2s that are homozygous for the induced mutations so that recessive mutations can show their phenotypic effect (Fig. 1). Technically speaking, if we start with 750 F1s, each F1 mutant line can be clonally expanded to a large clonal culture, which can then be crowded to induce clonal male production and sexual reproduction in females. Since clonally produced males and females of the same F1 mutant line are genetically identical, sibling crossing will produce F2 offspring that have 25% induced mutations in the homozygous state (Fig. 1). Although F2s have to be hatched from resting embryos that only develop under a strict set of conditions, an optimal hatching procedure for the Daphnia species used in this study have already been developed (Luu et al. 2020).

The probability of obtaining mutants carrying mutations in homozygous state in a gene of interest depends on the number of F2s collected from each F1 line. This probability is written as 1 – (¾)n, where n is the number of F2s and the term (¾)n denotes the probability of seeing non-homozygous mutants in n F2 individuals. We can directly see that collecting 4 and 5 F2s from each F1 mutant line would have nearly 70 and 80% chance of getting a homozygous mutation, respectively. Therefore, a near saturated F2 screening in Daphnia would require 3000–4000 F2s. It is obvious that the F1/F2 ratio determines the amount of resources that will be devoted to the screening experiments. Depending on the types of mutants of interest, a different F1/F2 ratio can be adopted (Shaham 2007).

Efficiently scoring mutants in the large F2 population is another critical factor for a successful genetic screening experiment. Depending on the phenotypic traits of interest, high throughput phenotypic assay methods need to be developed for Daphnia, which seems to be underdeveloped at this moment. Daphnia has a nearly transparent carapace and many body parts (e.g., heart, appendages) are directly visible under a microscope, which are desirable characteristics for high throughput phenotypic screening. We hope that many novel phenotyping methods will emerge as forward screening in Daphnia or other small crustaceans gains more popularity.

In conclusion, we demonstrated that EMS mutagenesis can successfully induce heritable mutations in the genome of Daphnia. Our analyses of the mutation rate caused by different concentrations of EMS and mutation patterns in consecutive broods provide possible ways to increase the efficiency of a genetic screening experiments. Lastly, we provide some guidance on the sample sizes required for F1 and F2 screening experiments in the hope that genetic screening will become a powerful tool in the study of Daphnia genomics and evolution.