Crossing design shapes patterns of genetic variation in synthetic recombinant populations of Saccharomyces cerevisiae

“Synthetic recombinant” populations have emerged as a useful tool for dissecting the genetics of complex traits. They can be used to derive inbred lines for fine QTL mapping, or the populations themselves can be sampled for experimental evolution. In the latter application, investigators generally value maximizing genetic variation in constructed populations. This is because in evolution experiments initiated from such populations, adaptation is primarily fueled by standing genetic variation. Despite this reality, little has been done to systematically evaluate how different methods of constructing synthetic populations shape initial patterns of variation. Here we seek to address this issue by comparing outcomes in synthetic recombinant Saccharomyces cerevisiae populations created using one of two strategies: pairwise crossing of isogenic strains or simple mixing of strains in equal proportion. We also explore the impact of the varying the number of parental strains. We find that more genetic variation is initially present and maintained when population construction includes a round of pairwise crossing. As perhaps expected, we also observe that increasing the number of parental strains typically increases genetic diversity. In summary, we suggest that when constructing populations for use in evolution experiments, simply mixing founder strains in equal proportion may limit the adaptive potential.

Strains used in this study. 3 Figure S2 Crossing design used to create S4 (A), S8 (B), and S12 (C). 4 Figure S3 UpSet plots illustrating similarities in genetic variation in recombinant populations. 5 Figure S4 Expected site frequency spectra for initial populations created using four (A), eight (B) and twelve founder strains (C). 6 Figure S5 Haplotype frequency estimates for population S4 after 12 cycles of outcrossing. 7 Figure S6 Haplotype frequency estimates for population K4 after 12 cycles of outcrossing. 8 Figure S7 Haplotype frequency estimates for population S8 after 12 cycles of outcrossing. 9 Figure S8 Haplotype frequency estimates for population K8 after 12 cycles of outcrossing.

Figure S9
Haplotype frequency estimates for population S12 after 12 cycles of outcrossing.

Figure S10
Haplotype frequency estimates for population K12 after 12 cycles of outcrossing.

Figure S11
Initial haplotype frequency estimates for population K4.

Figure S12
Haplotype frequency estimates for population K4 after 6 cycles of outcrossing. 14 Figure S13 Initial haplotype frequency estimates for population S4.

Figure S14
Haplotype frequency estimates for population S4 after 6 cycles of outcrossing.

Figure S15
Initial haplotype frequency estimates for population K8.

Figure S16
Haplotype frequency estimates for population K8 after 6 cycles of outcrossing.

Figure S17
Initial haplotype frequency estimates for population S8.

Figure S18
Haplotype frequency estimates for population S8 after 6 cycles of outcrossing. 20 Table S1 Average genome-wide coverage at SNPs identified in each synthetic recombinant population. 21 Table S2 Mean genome-wide haplotype diversity for each synthetic recombinant population after 12 cycles of outcrossing. 22 Table S3 Sporulation efficiencies in recombinant populations. 23 Table S4 Comparative growth rate estimates in recombinant populations and parental strains. 24

Supplementary Figures
Supplementary Figure S1. Strains used in this study. The unrooted phylogeny illustrates the general evolutionary relationships between the strains used in this work, as well as their geographical (colored boxes) and contextual (colored circles) origins. This tree was constructed using 180,276 bi-allelic SNPs that we observed to vary across the 12 haploid founder strains sequenced in this study. These SNPs were used to construct a Newick file which was plotted with the R package ggtree.
Supplementary Figure S2. Crossing design used to create S4 (A), S8 (B), and S12 (C). . For a given plot, the first vertical bar represents the number of SNPs found in both K and S-type populations across all timepoints, the second bar the SNPs found in the S-type population across all timepoints but not in the K-type, the third bar the SNPs found across all timepoints in the K-type population but not in the S-type, and the fourth bar the number of possible SNPs that are not present in either the K or S type population at any timepoint (Note: "possible SNPs" refers to sites that have the potential to be polymorphic given the founder used to create a given pair of populations).  Frequencies were estimated using sliding windows with a length of 30KB and a 1KB step size.
Panels are ordered to match pairings in the first round of crossing (YPS128 was crossed with DBVPG6765, Y12 was crossed with DBVPG6044). Estimates are expressed as the mean of biological replicates (N=3 for the 4-founder populations; N=2 for the 8-and 12-founder populations), +/-standard deviation. Note that recombinant populations were assayed initially and after 12 cycles of outcrossing.