The genomic and phenotypic diversity of Schizosaccharomyces pombe

Article metrics

Abstract

Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10−3 substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (340 BCE), and ancestors of these strains reached the Americas at 1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: An overview of the strain collection.
Figure 2: Recent dispersal of S. pombe.
Figure 3: Relationships between genetic diversity and genome function.
Figure 4: Phenotypes and genome-wide associations.

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

  1. 1

    Gomes, F.C.O. et al. Physiological diversity and trehalose accumulation in Schizosaccharomyces pombe strains isolated from spontaneous fermentations during the production of the artisanal Brazilian cachaça. Can. J. Microbiol. 48, 399–406 (2002).

  2. 2

    Brown, W.R.A. et al. A geographically diverse collection of Schizosaccharomyces pombe isolates shows limited phenotypic variation but extensive karyotypic diversity. G3 1, 615–626 (2011).

  3. 3

    Fawcett, J.A. et al. Population genomics of the fission yeast Schizosaccharomyces pombe. PLoS ONE 9, e104241 (2014).

  4. 4

    Osterwalder, A. Schizosaccharomyces liquefaciens n.sp., eine gegen freie schweflige Säure widerstandsfähige Gärhefe. Mitt. Geb. Lebensmittelunters. Hyg. 15, 5–28 (1924).

  5. 5

    Florenzano, G., Balloni, W. & Materassi, R. Contributo alla ecologia dei lieviti Schizosaccharomyces sulle uve. Vitis 16, 38–44 (1977).

  6. 6

    Teoh, A.L., Heard, G. & Cox, J. Yeast ecology of Kombucha fermentation. Int. J. Food Microbiol. 95, 119–126 (2004).

  7. 7

    Wood, V. et al. The genome sequence of Schizosaccharomyces pombe. Nature 415, 871–880 (2002).

  8. 8

    Liti, G. et al. Population genomics of domestic and wild yeasts. Nature 458, 337–341 (2009).

  9. 9

    Schacherer, J., Shapiro, J.A., Ruderfer, D.M. & Kruglyak, L. Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature 458, 342–345 (2009).

  10. 10

    Avelar, A.T., Perfeito, L., Gordo, I. & Godinho Ferreira, M. Genome architecture is a selectable trait that can be maintained by antagonistic pleiotropy. Nat. Commun. 4, 2235 (2013).

  11. 11

    Seich Al Basatena, N.-K., Hoggart, C.J., Coin, L.J. & O'Reilly, P.F. The effect of genomic inversions on estimation of population genetic parameters from SNP data. Genetics 193, 243–253 (2013).

  12. 12

    Zanders, S.E. et al. Genome rearrangements and pervasive meiotic drive cause hybrid infertility in fission yeast. eLife 3, e02630 (2014).

  13. 13

    Cromie, G.A. et al. Genomic sequence diversity and population structure of Saccharomyces cerevisiae assessed by RAD-seq. G3 3, 2163–2171 (2013).

  14. 14

    Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

  15. 15

    Lawson, D.J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).

  16. 16

    Hornsey, I.S. A History of Beer and Brewing (The Royal Society of Chemistry, 2003).

  17. 17

    Fay, J.C. & Benavides, J.A. Evidence for domesticated and wild populations of Sacchoromyces cerevisiae. PLoS Genet. 1, 66–71 (2005).

  18. 18

    Zhou, T., Gu, W. & Wilke, C.O. Detecting positive and purifying selection at synonymous sites in yeast and worm. Mol. Biol. Evol. 27, 1912–1922 (2010).

  19. 19

    Bowen, N.J., Jordan, I.K., Epstein, J.A., Wood, V. & Levin, H.L. Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res. 13, 1984–1997 (2003).

  20. 20

    Mourier, T. & Willerslev, E. Large-scale transcriptome data reveals transcriptional activity of fission yeast LTR retrotransposons. BMC Genomics 11, 167 (2010).

  21. 21

    Kwon, E.-J.G. et al. Deciphering the transcriptional-regulatory network of flocculation in Schizosaccharomyces pombe. PLoS Genet. 8, e1003104 (2012).

  22. 22

    Guo, Y. & Levin, H.L. High-throughput sequencing of retrotransposon integration provides a saturated profile of target activity in Schizosaccharomyces pombe. Genome Res. 20, 239–248 (2010).

  23. 23

    Guo, Y. et al. Integration profiling of gene function with dense maps of transposon integration. Genetics 195, 599–609 (2013).

  24. 24

    Feng, G., Leem, Y.-E. & Levin, H.L. Transposon integration enhances expression of stress response genes. Nucleic Acids Res. 41, 775–789 (2013).

  25. 25

    Jeffares, D.C., Penkett, C.J. & Bähler, J. Rapidly regulated genes are intron poor. Trends Genet. 24, 375–378 (2008).

  26. 26

    Chen, D. et al. Global transcriptional responses of fission yeast to environmental stress. Mol. Biol. Cell 14, 214–229 (2003).

  27. 27

    Cromie, G.A. et al. A discrete class of intergenic DNA dictates meiotic DNA break hotspots in fission yeast. PLoS Genet. 3, e141 (2007).

  28. 28

    Fowler, K.R., Gutiérrez-Velasco, S., Martín-Castellanos, C. & Smith, G.R. Protein determinants of meiotic DNA break hot spots. Mol. Cell 49, 983–996 (2013).

  29. 29

    Maniatis, N. et al. The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc. Natl. Acad. Sci. USA 99, 2228–2233 (2002).

  30. 30

    Liti, G. & Louis, E.J. Advances in quantitative trait analysis in yeast. PLoS Genet. 8, e1002912 (2012).

  31. 31

    Mackay, T.F.C. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat. Rev. Genet. 15, 22–33 (2014).

  32. 32

    Speed, D., Hemani, G., Johnson, M.R. & Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).

  33. 33

    Warringer, J. et al. Trait variation in yeast is defined by population history. PLoS Genet. 7, e1002111 (2011).

  34. 34

    Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).

  35. 35

    Drummond, A.J., Suchard, M.A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).

  36. 36

    Clément-Ziza, M. et al. Natural genetic variation impacts expression levels of coding, non-coding, and antisense transcripts in fission yeast. Mol. Syst. Biol. 10, 764 (2014).

  37. 37

    Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).

  38. 38

    DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  39. 39

    Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. & McVean, G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012).

  40. 40

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  41. 41

    Thorvaldsdóttir, H., Robinson, J.T. & Mesirov, J.P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).

  42. 42

    Keane, T.M., Wong, K. & Adams, D.J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389–390 (2013).

  43. 43

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

  44. 44

    Hutter, S., Vilella, A.J. & Rozas, J. Genome-wide DNA polymorphism analyses using VariScan. BMC Bioinformatics 7, 409 (2006).

  45. 45

    Lau, W., Kuo, T.-Y., Tapper, W., Cox, S. & Collins, A. Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome. Bioinformatics 23, 517–519 (2007).

  46. 46

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  47. 47

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  48. 48

    Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

  49. 49

    Lanfear, R., Calcott, B., Ho, S.Y.W. & Guindon, S. Combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701 (2012).

  50. 50

    Baele, G., Li, W.L.S., Drummond, A.J., Suchard, M.A. & Lemey, P. Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics. Mol. Biol. Evol. 30, 239–243 (2013).

  51. 51

    O'Fallon, B.D. ACG: rapid inference of population history from recombining nucleotide sequences. BMC Bioinformatics 14, 40 (2013).

  52. 52

    Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 (1993).

  53. 53

    Simpson, J.T. & Durbin, R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556 (2012).

  54. 54

    Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006).

  55. 55

    Camacho, C., Coulouris, G. & Avagyan, V. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

  56. 56

    van Dongen, S. & Abreu-Goodger, C. Using MCL to extract clusters from networks. Methods Mol. Biol. 804, 281–295 (2012).

  57. 57

    Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

  58. 58

    Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006).

  59. 59

    Kahm, M., Hasenbrink, G., Lichtenberg-Frate, H., Ludwig, J. & Kschischo, M. Grofit: fitting biological growth curves with R. J. Stat. Softw. 33, 1–21 (2010).

  60. 60

    Sazer, S. & Sherwood, S.W. Mitochondrial growth and DNA synthesis occur in the absence of nuclear DNA replication in fission yeast. J. Cell Sci. 97, 509–516 (1990).

  61. 61

    Graml, V. et al. A genomic multiprocess survey of machineries that control and link cell shape, microtubule organization, and cell-cycle progression. Dev. Cell 31, 227–239 (2014).

  62. 62

    Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

  63. 63

    The R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013).

Download references

Acknowledgements

We thank L. Clissold, H. Musk, D. Baker and R. Davey for their contributions to sequencing, H. Levin for discussions about transposons, and J. Mata and S. Marguerat for comments on the manuscript. This work was supported by a Wellcome Trust Senior Investigator Award to J.B. (grant 095598/Z/11/Z), by the Wellcome Trust to S.B., T.K., J.T.S. and R.D., by grant 260801-BIG-IDEA from the European Research Council (ERC) and grant BB/H005854/1 from the Biotechnology and Biological Sciences Research Council (BBSRC) to A.R. and F.B., by UK Medical Research Council grant G0901388 to D.S. and D.J.B., by a Cancer Research UK Postdoctoral Fellowship to T.M.K.C., by an ERC Starting Grant (SYSGRO) to R.E.C.S., a Wellcome Trust PhD studentship to J.L.D.L. and BBSRC grant BB/K006320/1 to R.E.C.S. and A.C., by a Wellcome Trust grant (RG 093735/Z/10/Z) and ERC Starting Grant 260809 to M.R. (M.R. is a Wellcome Trust Research Career Development and Wellcome-Beit Prize Fellow), by Czech Science Foundation grant P305/12/P040 and Charles University grant UNCE 204013 to M.P. and by Cancer Research UK to L.J. and J.H.

Author information

D.C.J. coordinated all analyses, isolated DNA for sequencing, analyzed and filtered SNP calls, conducted diversity analysis and GWAS and drafted the manuscript. C.R. produced phenotype data for growth on various solid media and growth rates in liquid media. A.R. conducted analysis of dating using mitochondrial data. D.S. conducted GWAS. M.P. analyzed all phenotype data. T.M. identified LTR transposon insertions and analyzed transposon insertion data. F.X.M. conducted crosses for the analysis of spore viability. Z.I. produced indel calls with Cortex. W.L. conducted analysis of recombination rate, LD decay and principal-component analysis for distance between strains. T.M.K.C. assisted with phenotype and population analysis. R.P. analyzed Cortex and GATK indel calls. M.M. conducted amino acid profiling. J.L.D.L. and A.C. produced automated measures of cell morphology. S.B. aligned reads and produced GATK SNP calls. G.H. analyzed population structure using fineSTRUCTURE. B.O'F. estimated the time to the most recent common ancestor from the nuclear genome using ACG. T.K. identified LTR transposon insertions. J.T.S. produced de novo assemblies. L.B. developed the custom Workspace workflow Spotsizer. B.T. assisted with sequence analysis. D.A.B. assisted with analysis of new genes. T.S. assisted with strain verification. S.C. produced images of wild strains and assisted with strain verification. J.E.E.U.H. assisted with SNP validation. L.v.T. and M.T. assisted with LTR validation. L.J. and J.-J.L. assisted with manual measures of cell morphology and FACS. S.A. produced gene expression data. M.F., K.M. and N.D. assisted with sequencing. W.B. initiated and assisted with strain collection. J.H. coordinated manual measures of cell morphology and FACS. R.E.C.S. coordinated automated measures of cell morphology. M.R. coordinated amino acid profiling. N.M. conducted analysis of recombination and LD and advised on aspects of diversity and GWAS. D.J.B. advised on GWAS. F.B. advised on population structure and supervised A.R. R.D. facilitated sequencing. J.B. contributed to the initiation and development of the project and financed the Bähler laboratory.

Correspondence to Daniel C Jeffares or Jürg Bähler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Clonal clusters and isolation by distance.

(a) For all strains, we calculated the number of allelic differences using all SNPs. Pairs with <150 SNPs were considered nearly clonal, and these pairs were clustered using Markov clustering. Strains (spheres) are colored according to the continent where they were isolated, with gray spheres indicating unknown locations. Colors are as in Figure 1a: red, Americas; pink, Africa; green, Europe; blue, Asia; yellow, Australia. (b) The 752 unlinked SNPs used for descriptions of population structure are evenly distributed across the genome. For each 50-kb window of the genome, with a step size of 1 kb, we show the number of SNPs from the 752 unlinked set. Chromosomes 1 and 3 are in black, and chromosome 2 is in red. We note a slight bias to the right side of chromosome 1, which contains 20 of these 752 SNPs. (c) Genetic distance is correlated with geographical distance. For each pairwise comparison of the 161 strains, we calculated the proportion of shared alleles from the 752 unlinked SNPs ('drift distance') and the great circle distance (distance around the globe) between the locations from which strains were collected. A Mantel test with 10,000 resamplings showed that these 2 matrices were anticorrelated (r = -0.36, P = 9.9 × 10−5). This correlation is also present when we use only the 57 non-clonal strains (r = -0.28, P = 9.9 × 10−5) (d) Genetic distance is correlated with spore viability. For 43 crosses, we recorded spore survival by tetrad analysis. Spore survival was correlated with the proportion of shared alleles from the 752 unlinked SNPs (Pearson's product-moment correlation, r = 0.51, P = 6.4 × 10−4). Some strains do not produce many viable spores even when mated to themselves (low self-cross viability). The plot represents this by scaling each circle size to the lowest self-cross viability of the parents, showing that all low-viability outliers (top left of the plot) have at least one parent with low self-viability. When excluding crosses with the lowest self-cross viability (<0.3), the correlation between spore viability and genetic distance is stronger (Pearson's r = 0.76, P = 1.2 × 10−7).

Supplementary Figure 2 Population structure and relatedness between strains.

In each panel, Leupold's 972 reference strain (JB22) is indicated by a black triangle. (a) Admixture results. Each bar represents the proportion of SNPs assigned to each of the 2–5 populations, with the strain name below the bar. The geographical locations of the strains are shown in colored dots above the bar; yellow, Australia; green, Europe; red, Americas; pink, Africa; blue, Asia. (b) Principal-components plot colored by admixture clusters. Principal-component coordinates as described for Figure 1 using 752 unlinked SNPs. Strains (filled dots) are colored according to their Admixture cluster with k = 5. As in Figure 1, the 57 non-clonal strains are indicated with thick black borders. (c) fineSTRUCTURE analysis of shared haplotypes. The heat map depicts the proportion of the genome for which each strain in the columns shares the most recent common ancestry with each other strain (i.e., relative to all other strains) in the rows, as inferred by ChromoPainter (note that values therefore add up to 1.0 in each column)3. Strains are colored along the axes by their geographical sampling location, as above. The row and column for Leupold's 972 reference strain (JB22) are indicated with gray shading. The tree at the top shows the hierarchical merging of each strain based on genetic similarity, as inferred by fineSTRUCTURE9. This tree was inferred by first taking the sample configuration with the highest posterior probability among 100 posterior samples taken every 10,000 iterations from a Markov chain Monte Carlo (MCMC) run following 1 million burn-in iterations, next performing an additional 100,000 hill-climbing steps to find a solution with higher posterior probability and then constructing the tree by the stepwise merging of clusters as described in Lawsen et al.10. Strains connected by a horizontal row at the bottom of the tree are inferred by fineSTRUCTURE to form a genetically homogeneous cluster. (d) Majority consensus trees of the 57 non-clonal strains. A consensus tree generated from 100 trees, each estimated from a window of one centile of the genome. Branch values show the percentage of windows that support each clade, with strain names colored according to their geographical origin as for Figure 1. The two trees have identical topology, branch lengths are adjusted to give a radial presentation in the left tree and all branch lengths are equal in the right tree. The historical recombination of these strains is illustrated by the fact that all but one of the internal clades have less than 56% support. To generate this tree, we divided the genome into 100 non-overlapping windows and produced alignments for all of the fourfold degenerate sites from each window (~10,000 sites each). We estimated the best tree for each window using the GTRGAMMA model in RaXML11,12 and calculated the consensus tree using the CONSENSE function from PHYLIP (http://evolution.genetics.washington.edu/phylip.html) with Majority rule (extended).

Supplementary Figure 3 The terminal 100 kb of all chromosomes contains excess diversity and unusual properties.

The columns of the nine panels show the three chromosomes. The rows show the expression levels of protein-coding genes (top), the number of essential genes (middle) and the diversity (π; bottom). Expression panels (top) show the range of expression levels (in reads/kb/million reads, RPKM) for genes during exponential growth (log), stationary phase (stat) and meiotic differentiation (mei) (S.A., unpublished data). For each chromosome, we show the expression levels for the left 100 kb of the chromosome in red, the right 100 kb of the chromosome in blue and all other genes in green. Box widths are proportional to the number of genes. We note that, in general, genes at chromosome ends are expressed at lower levels under all conditions tested. Essential gene panels (middle) show the number of essential genes per 10-kb window, with box fill colors as above. Essential genes are defined as those annotated with the Fission Yeast Phenotype Ontology ID FYPO:0000049 (inviable) in PomBase (http://www.pombase.org/). Diversity panels (bottom) show the distribution of average pairwise similarity (π) for the 10-kb windows in the left, middle and right regions of each chromosome. Chromosome ends have higher diversity, indicating less purifying selection. Not shown: the ends of chromosomes contain an excess of common LTR insertions (present in at least half of the 57 non-redundant strains, per 10-kb window of the genome). Windows within 100 kb of the chromosome ends had significantly more common insertions (ends mean = 0.74 transposons/window, internal regions mean = 0.15 transposons/window; Mann-Whitney test P = 4.8 × 10−11).

Supplementary Figure 4 Differences in diversity in various genome annotations.

(a) SNP median minor allele frequency. Median minor allele frequency calculated with SNPs from 100 windows of the genome, using sites specific to one annotation. Colors are as in Figure 3b. C/RNA indicates canonical RNAs (rRNAs, tRNAs, snoRNAs and snRNAs). One-sided Mann-Whitney test P values versus the FFD site neutral proxy were: exons, 3.4 × 10−13; 3′ UTRs, 4.4 × 10−3; canonical noncoding RNAs, 0.97; 5′ UTRs, 0.013; lncRNAs, 1; non-annotated regions, 0.0078; introns, 0.55; LTRs (which have higher median MAF), 3.7 × 10­­–3; onefold-degenerate sites, 7.3 × 10−16. This supports the conclusion from θ that exons and UTRs but not lncRNAs have been subject to purifying selection. (b) Indel median minor allele frequency. Median minor allele frequency calculated with indels from 100 windows of the genome, using sites specific to one annotation. Colors are as in Figure 3b. One-sided Mann-Whitney test P values versus the neutral proxy of unannotated sites were: exons, 1.5 × 10−7; 5¢ UTRs, 2.8 × 10−3; lncRNAs, 0.5; 3′ UTRs, 0.077; introns, 0.42; transposon LTRs, 0.66. Here exons and 5′ UTRs show evidence for constraint, but 3′ UTRs and lncRNAs do not. (c) Diversity (θ) in lncRNA expression fractions. θ, calculated using SNPs, from left to right; 5 expression fractions of non-canonical lncRNAs (ncRNA1 to lncRNA5, with lncRNA5 including the top 20% most highly expressed lncRNAs), exonic sites, 3′ UTRs, unannotated regions, fourfold-degenerate sites from genes with low expression (FFD0, lowest 10%) and fourfold-degenerate sites from genes with high expression (FFD9, highest 10%). In this analysis, we use unannotated regions as a neutral proxy, and the red horizontal line shows the median value for these sites. Annotations that show significantly lower diversity than the neutral proxy are shaded gray; one-sided Mann-Whitney test P values are: ncRNA5, 2.7 × 10−3; exons, 6.9 × 10−29; 3′ UTRs, 1.2 × 10−19. (d) SNP median MAF in lncRNA expression fractions. Median minor allele frequency of SNPs, with annotation classes as above. In this analysis, we use fourfold-degenerate sites from genes with low expression as a neutral proxy, and the red horizontal line shows the median value for these sites. Annotations that show significantly lower diversity than the neutral proxy are shaded gray; one-sided Mann-Whitney test P values are: ncRNA5, 0.012; exons, 1.3 × 10−5; 3′ UTRs, 2.3 × 10−5; unannotated regions, 0.026. (e) Indel median MAF in lncRNA expression fractions. Median minor allele frequency of indels, with annotation classes as above. In this analysis, we use unannotated regions as a neutral proxy, and the red horizontal line shows the median value for these sites. Annotations that have significantly lower diversity than unannotated regions are shaded gray; one-sided Mann-Whitney test P values are: ncRNA5, 7.0 × 10−3; exons, 8.7 × 10−11.

Supplementary Figure 5 A sharp peak of LTR insertions within 500-nt regions upstream of transcription start sites.

Histogram of LTR insertions in 100-bp bins around the transcription start sites (TSSs) of protein-coding genes. Positive and negative x values denote regions up- and downstream of the TSS, respectively. The number of insertions is shown for 'fixed' insertions (present in all 57 strains), 'singletons' (present in a single strain only) and 'intermediates' (all other insertions).

Supplementary Figure 6 Recombination rate and linkage decay.

(a) The recombination rate is log-normally distributed. For each SNP, we calculated the recombination rate in linkage disequilibrium units/Mb. The plot shows the distribution of nonzero rates on a log10 scale. (b) The recombination rate is correlated with diversity. Filled red and black circles indicate centromeric and telomeric regions, respectively, as in Figure 3c. Diversity (Watterson's θ), calculated as in Figure 3c (in 10-kb genomic windows) is correlated with the average recombination rate (LDU/Mb) (Spearman's rank correlation ρ = 0.43, P = 2.2 × 10−57). (c) Diversity is calculated as above. The recombination rate is negatively correlated with exon density (the proportion of each 10-kb window that is annotated as an exon (Spearman's ρ = −0.42, P = 2.2 × 10−53). (d) Linkage disequilibrium (LD) declines to 50% of its value within 21 kb. Using SNPs with minor allele frequencies >0.05, we calculated the D′ and r2 measures of linkage disequilibrium for all pairs of SNPs up to 250 kb apart (Online Methods). We show the mean D′ and r2 values for all pairwise comparisons within each 1-kb window of distance.

Supplementary Figure 7 Microscopy images of selected strains.

All strain descriptions (long, misshapen) are in comparison to Leupold's 972 reference strain. Left, DIC micrographs; right, calcofluor-stained cells, fluorescence microscopy (calcofluor stains the cell wall and division septum). Strains from the top are: (a) Leupold's 972 reference strain, (b) JB762, which has branched, multi-septated and pear-shaped cells, (c) JB1207, which has long cells, (d) JB1117, where cells are weakly misshapen/pear shaped and slightly curved, (e) JB939, which has misshapen cells, (f) JB914 which is near-filamentous on solid media (bright calcofluor staining between cells shows that cells that have undergone cell division remain attached at the septum), (g) JB930, which has short cells, and (h) JB1116, which contains 'banana-shaped' (curved) cells.

Supplementary Figure 8 Trait heritability and the value of repeat trait measurements.

(a) Traits collected using all methods are heritable. Here we show heritability estimates according to the method of data collection. All methods are sufficiently accurate to detect some heritable traits. Data collection types from left are: AA, amino acid concentrations determined by mass spectrometry; SOL/M, colony size on various solid media; LIQ/M1, growth parameters in liquid YES rich media and EMM2 minimal media; LIQ/M2, growth parameters in various liquid media from Brown et al. (2011); SHAPE/M, manually defined shape parameters; SHAPE/A, automated definitions of shape parameters. (b) Repeat measurements reduce non-genetic sources of variation (experimental noise/environmental variation). This plot shows the proportion of variation removed for each phenotype due to repeats, calculated as the adjusted r2 from regressing the 179 individual phenotypic values on the factor clonal ID. For example, for the trait "Predicted Banana," for each clone, we recorded average phenotypic values across five samples, which removed approximately 30% of phenotypic variation. Repeated measurements for clones can substantially increase power to detect causal variants; for example, suppose we can remove 50% of variation through repeated measurements, then the proportion of variance explained by each variant effectively doubles (a variant that explains X% of total variation will explain 2X% of the variance that remains).

Supplementary Figure 9 Analysis of GWAS results.

(a) To examine whether the mixed-model GWAS controlled for population structure, we compared the degree of population stratification of each trait to the number of variants that passed the P-value threshold. To calculate the degree of population stratification, we divided the strains into five groups (defined by Admixture) and used a Kolmogorov-Smirnov test to determine whether the trait was significantly different between these five groups, using the log (P value) as a metric. This metric is not significantly correlated with the number of passing variants (Spearman rank correlation P > 0.05). Circles show the number of all variants that are significant in the GWAS, red crosses indicate the number of passing SNPs and green crosses indicate the number of passing indels. Traits that we might evaluate with caution because they are significantly stratified by population and have many passing variants are indicated with a black circle. The red vertical line shows the Bonferroni-corrected P-value threshold for the Kolmogorov-Smirnov tests; the green vertical line shows the median number of passing variants. (b) Genomic inflation factors (GIFs). The GIF is the observed median P value divided by the median expected P value. Under a null model of no associations and unlinked variants, the expectation is for the GIF to be 1. We show the distribution of GIFs from the 223 traits (top left), GIFs from permuted data (top right), density plot of observed GIFs versus 10 sets of permutations (each one per trait) (bottom left) and the distribution of adjusted GIFs (observed median P value/median P value from permuted data) (bottom right). Although the distribution of observed GIFs is slightly skewed to values larger than 1, adjusted GIFs (observed median/median from permuted data) are close to 1. (c) Associated indels tend to explain a greater proportion of trait variance. For all variants associated with a trait (left) and for the most significant variant associated with the 89 traits (right) we show the estimated variance explained by the trait. (d) Annotations of variants used for the GWAS analysis (top), all variants passing the P-value threshold (middle) and the most significant variant from each of the 89 traits (top hits) (bottom). The annotations from top are intergenic regions (unannotated as any other of the categories below), long noncoding RNAs (ncRNAs), 5′ and 3′ UTRs, synonymous sites in exons (Exon:syn) and nonsynonymous sites in exons (Exon:nonsyn). Indels that are multiples of three nucleotides are categorized as Exon:syn; all others are categorized as Exon:nonsyn. χ2 tests showed no significant difference between SNPs in any three groups, or indels in any three groups, including no bias towards nonsynonymous SNPs.

Supplementary Figure 10 The GWAS hotspot on chromosome 1.

The 10-kb region that contains the largest number of significant associations in the mixed model and also the passing variant with the lowest P value is on chromosome 1 (Fig. 4b). Here we show: (a) the passing variants in this 10-kb window (top), with the window indicated by vertical gray lines, and the local neighborhood of three genes (bottom). In both panels, protein-coding genes are shown below variants as black rectangles and noncoding RNAs are shown as gray rectangles, with forward-strand genes above reverse-strand genes. The most significant variants are three SNPs between nsk1 (SPAC3G9.01) and sod2 (SPAC1486.01). These variants are in perfect LD and are associated with growth in solid media with 0.1 M MgCl2. nsk1 is a reverse-strand gene (transcribed from right to left) and sod2 is a forward-strand gene, so these variants are in the promoter regions of both genes. (b) The distribution of values for growth in solid media with 0.1 M MgCl2 (left), categorizing strains by the genotype of one of these three variants (chromosome 1, position 3,185,213). The top right panel shows the trait values for the 57 non-clonal strains in 0.2 M MgCl2. Because some strains are clear outliers, we show the trait on a log scale in the two lower plots. The box-and-whisker plots overlaid show the median and interquartile ranges of trait values. The red and black crosshairs show the trait values for the two parents used in the cross (c, below): JB931, which has the T allele (red), and JB953, which has the G allele (black). (c) PCR and ABI capillary sequencing of the parents and F1 progeny of a cross between two strains with the two genotypes at chromosome 1, position 3,185,213 (JB931 × JB953). The left panel shows the parents, and the right panels show pools of F1 segregants grown on YES rich media without MgCl2 (top) or in YES rich media with 0.1 M MgCl2 (below). The segregating allele is indicated with a yellow box. The T allele is enriched relative to the G allele on MgCl2, as expected from the trait values in b. The increase in signal from the favored allele is likely due to either the increased colony size of segregants with the favored T allele (expected from the association) and/or the increased survival of segregants with the favored T allele. Pools contained at least 35 colonies. (d) Spot assays of serial tenfold dilutions of sod2 and nsk1 deletion strains on control rich media (YES) and rich media with 0.1 MgCl2 or 0.2 M MgCl2. Both deletion strains show less dense growth on media with 0.2 M MgCl2, consistent with these genes affecting sensitivity to this stress. Deletion strains are from the Bioneer Version 2.0 deletion collection, and ED668 is the corresponding wild-type strain (genotype h+ ade6 M216 ura4-D18 leu1-32).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Note. (PDF 2988 kb)

Supplementary Tables 1–9

Supplementary Tables 1–9. (XLSX 961 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jeffares, D., Rallis, C., Rieux, A. et al. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet 47, 235–241 (2015) doi:10.1038/ng.3215

Download citation

Further reading