The genomic and phenotypic diversity of Schizosaccharomyces pombe

Jeffares, Daniel C; Rallis, Charalampos; Rieux, Adrien; Speed, Doug; Převorovský, Martin; Mourier, Tobias; Marsellach, Francesc X; Iqbal, Zamin; Lau, Winston; Cheng, Tammy M K; Pracana, Rodrigo; Mülleder, Michael; Lawson, Jonathan L D; Chessel, Anatole; Bala, Sendu; Hellenthal, Garrett; O'Fallon, Brendan; Keane, Thomas; Simpson, Jared T; Bischof, Leanne; Tomiczek, Bartlomiej; Bitton, Danny A; Sideri, Theodora; Codlin, Sandra; Hellberg, Josephine E E U; van Trigt, Laurent; Jeffery, Linda; Li, Juan-Juan; Atkinson, Sophie; Thodberg, Malte; Febrer, Melanie; McLay, Kirsten; Drou, Nizar; Brown, William; Hayles, Jacqueline; Salas, Rafael E Carazo; Ralser, Markus; Maniatis, Nikolas; Balding, David J; Balloux, Francois; Durbin, Richard; Bähler, Jürg

doi:10.1038/ng.3215

Article
Published: 09 February 2015

The genomic and phenotypic diversity of Schizosaccharomyces pombe

Daniel C Jeffares ORCID: orcid.org/0000-0001-7320-0706¹,
Charalampos Rallis¹,
Adrien Rieux^1,2,
Doug Speed^1,2,
Martin Převorovský³,
Tobias Mourier⁴,
Francesc X Marsellach¹,
Zamin Iqbal⁵,
Winston Lau¹,
Tammy M K Cheng⁶,
Rodrigo Pracana¹,
Michael Mülleder⁷,
Jonathan L D Lawson^8,9,
Anatole Chessel⁷,
Sendu Bala¹⁰,
Garrett Hellenthal^1,2,
Brendan O'Fallon¹¹,
Thomas Keane¹⁰,
Jared T Simpson¹⁰^nAff17,
Leanne Bischof¹²,
Bartlomiej Tomiczek¹,
Danny A Bitton¹,
Theodora Sideri¹,
Sandra Codlin¹,
Josephine E E U Hellberg¹,
Laurent van Trigt¹,
Linda Jeffery⁶,
Juan-Juan Li⁶,
Sophie Atkinson¹,
Malte Thodberg⁴,
Melanie Febrer¹³,
Kirsten McLay¹³,
Nizar Drou¹³,
William Brown¹⁴,
Jacqueline Hayles⁶,
Rafael E Carazo Salas ORCID: orcid.org/0000-0002-5943-3981^8,9,
Markus Ralser ORCID: orcid.org/0000-0001-9535-7413^7,15,16,
Nikolas Maniatis¹,
David J Balding ORCID: orcid.org/0000-0002-1480-6115^1,2^nAff17,
Francois Balloux^1,2,
Richard Durbin¹⁰ &
…
Jürg Bähler^1,2

Nature Genetics volume 47, pages 235–241 (2015)Cite this article

10k Accesses
115 Citations
40 Altmetric
Metrics details

Subjects

Abstract

Natural variation within species reveals aspects of genome evolution and function. The fission yeast Schizosaccharomyces pombe is an important model for eukaryotic biology, but researchers typically use one standard laboratory strain. To extend the usefulness of this model, we surveyed the genomic and phenotypic variation in 161 natural isolates. We sequenced the genomes of all strains, finding moderate genetic diversity (π = 3 × 10⁻³ substitutions/site) and weak global population structure. We estimate that dispersal of S. pombe began during human antiquity (∼340 BCE), and ancestors of these strains reached the Americas at ∼1623 CE. We quantified 74 traits, finding substantial heritable phenotypic diversity. We conducted 223 genome-wide association studies, with 89 traits showing at least one association. The most significant variant for each trait explained 22% of the phenotypic variance on average, with indels having larger effects than SNPs. This analysis represents a rich resource to examine genotype-phenotype relationships in a tractable model.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: An overview of the strain collection.**

Figure 2: Recent dispersal of *S. pombe.*

**Figure 3: Relationships between genetic diversity and genome function.**

**Figure 4: Phenotypes and genome-wide associations.**

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Complexity of avian evolution revealed by family-level genomes

Article 01 April 2024

Josefin Stiller, Shaohong Feng, … Guojie Zhang

Evolution of tissue-specific expression of ancestral genes across vertebrates and insects

Article 15 April 2024

Federica Mantica, Luis P. Iñiguez, … Manuel Irimia

Accession codes

Accessions

GenBank/EMBL/DDBJ

ACQJ00000000.2

References

Gomes, F.C.O. et al. Physiological diversity and trehalose accumulation in Schizosaccharomyces pombe strains isolated from spontaneous fermentations during the production of the artisanal Brazilian cachaça. Can. J. Microbiol. 48, 399–406 (2002).
Article CAS PubMed Google Scholar
Brown, W.R.A. et al. A geographically diverse collection of Schizosaccharomyces pombe isolates shows limited phenotypic variation but extensive karyotypic diversity. G3 1, 615–626 (2011).
Article PubMed PubMed Central Google Scholar
Fawcett, J.A. et al. Population genomics of the fission yeast Schizosaccharomyces pombe. PLoS ONE 9, e104241 (2014).
Article CAS PubMed PubMed Central Google Scholar
Osterwalder, A. Schizosaccharomyces liquefaciens n.sp., eine gegen freie schweflige Säure widerstandsfähige Gärhefe. Mitt. Geb. Lebensmittelunters. Hyg. 15, 5–28 (1924).
CAS Google Scholar
Florenzano, G., Balloni, W. & Materassi, R. Contributo alla ecologia dei lieviti Schizosaccharomyces sulle uve. Vitis 16, 38–44 (1977).
Google Scholar
Teoh, A.L., Heard, G. & Cox, J. Yeast ecology of Kombucha fermentation. Int. J. Food Microbiol. 95, 119–126 (2004).
Article CAS PubMed Google Scholar
Wood, V. et al. The genome sequence of Schizosaccharomyces pombe. Nature 415, 871–880 (2002).
Article CAS PubMed Google Scholar
Liti, G. et al. Population genomics of domestic and wild yeasts. Nature 458, 337–341 (2009).
Article CAS PubMed PubMed Central Google Scholar
Schacherer, J., Shapiro, J.A., Ruderfer, D.M. & Kruglyak, L. Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature 458, 342–345 (2009).
Article CAS PubMed PubMed Central Google Scholar
Avelar, A.T., Perfeito, L., Gordo, I. & Godinho Ferreira, M. Genome architecture is a selectable trait that can be maintained by antagonistic pleiotropy. Nat. Commun. 4, 2235 (2013).
Article PubMed Google Scholar
Seich Al Basatena, N.-K., Hoggart, C.J., Coin, L.J. & O'Reilly, P.F. The effect of genomic inversions on estimation of population genetic parameters from SNP data. Genetics 193, 243–253 (2013).
Article PubMed PubMed Central Google Scholar
Zanders, S.E. et al. Genome rearrangements and pervasive meiotic drive cause hybrid infertility in fission yeast. eLife 3, e02630 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cromie, G.A. et al. Genomic sequence diversity and population structure of Saccharomyces cerevisiae assessed by RAD-seq. G3 3, 2163–2171 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lawson, D.J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hornsey, I.S. A History of Beer and Brewing (The Royal Society of Chemistry, 2003).
Fay, J.C. & Benavides, J.A. Evidence for domesticated and wild populations of Sacchoromyces cerevisiae. PLoS Genet. 1, 66–71 (2005).
Article CAS PubMed Google Scholar
Zhou, T., Gu, W. & Wilke, C.O. Detecting positive and purifying selection at synonymous sites in yeast and worm. Mol. Biol. Evol. 27, 1912–1922 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bowen, N.J., Jordan, I.K., Epstein, J.A., Wood, V. & Levin, H.L. Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res. 13, 1984–1997 (2003).
Article CAS PubMed PubMed Central Google Scholar
Mourier, T. & Willerslev, E. Large-scale transcriptome data reveals transcriptional activity of fission yeast LTR retrotransposons. BMC Genomics 11, 167 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kwon, E.-J.G. et al. Deciphering the transcriptional-regulatory network of flocculation in Schizosaccharomyces pombe. PLoS Genet. 8, e1003104 (2012).
Article CAS PubMed PubMed Central Google Scholar
Guo, Y. & Levin, H.L. High-throughput sequencing of retrotransposon integration provides a saturated profile of target activity in Schizosaccharomyces pombe. Genome Res. 20, 239–248 (2010).
Article CAS PubMed PubMed Central Google Scholar
Guo, Y. et al. Integration profiling of gene function with dense maps of transposon integration. Genetics 195, 599–609 (2013).
Article CAS PubMed PubMed Central Google Scholar
Feng, G., Leem, Y.-E. & Levin, H.L. Transposon integration enhances expression of stress response genes. Nucleic Acids Res. 41, 775–789 (2013).
Article CAS PubMed Google Scholar
Jeffares, D.C., Penkett, C.J. & Bähler, J. Rapidly regulated genes are intron poor. Trends Genet. 24, 375–378 (2008).
Article CAS PubMed Google Scholar
Chen, D. et al. Global transcriptional responses of fission yeast to environmental stress. Mol. Biol. Cell 14, 214–229 (2003).
Article CAS PubMed PubMed Central Google Scholar
Cromie, G.A. et al. A discrete class of intergenic DNA dictates meiotic DNA break hotspots in fission yeast. PLoS Genet. 3, e141 (2007).
Article CAS PubMed PubMed Central Google Scholar
Fowler, K.R., Gutiérrez-Velasco, S., Martín-Castellanos, C. & Smith, G.R. Protein determinants of meiotic DNA break hot spots. Mol. Cell 49, 983–996 (2013).
Article CAS PubMed PubMed Central Google Scholar
Maniatis, N. et al. The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc. Natl. Acad. Sci. USA 99, 2228–2233 (2002).
Article CAS PubMed PubMed Central Google Scholar
Liti, G. & Louis, E.J. Advances in quantitative trait analysis in yeast. PLoS Genet. 8, e1002912 (2012).
Article CAS PubMed PubMed Central Google Scholar
Mackay, T.F.C. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat. Rev. Genet. 15, 22–33 (2014).
Article CAS PubMed Google Scholar
Speed, D., Hemani, G., Johnson, M.R. & Balding, D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Article CAS PubMed PubMed Central Google Scholar
Warringer, J. et al. Trait variation in yeast is defined by population history. PLoS Genet. 7, e1002111 (2011).
Article CAS PubMed PubMed Central Google Scholar
Listgarten, J. et al. Improved linear mixed models for genome-wide association studies. Nat. Methods 9, 525–526 (2012).
Article CAS PubMed PubMed Central Google Scholar
Drummond, A.J., Suchard, M.A., Xie, D. & Rambaut, A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012).
Article CAS PubMed PubMed Central Google Scholar
Clément-Ziza, M. et al. Natural genetic variation impacts expression levels of coding, non-coding, and antisense transcripts in fission yeast. Mol. Syst. Biol. 10, 764 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936–939 (2011).
Article CAS PubMed PubMed Central Google Scholar
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. & McVean, G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
PubMed PubMed Central Google Scholar
Thorvaldsdóttir, H., Robinson, J.T. & Mesirov, J.P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).
Article PubMed Google Scholar
Keane, T.M., Wong, K. & Adams, D.J. RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics 29, 389–390 (2013).
Article CAS PubMed Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
CAS PubMed PubMed Central Google Scholar
Hutter, S., Vilella, A.J. & Rozas, J. Genome-wide DNA polymorphism analyses using VariScan. BMC Bioinformatics 7, 409 (2006).
Article CAS PubMed PubMed Central Google Scholar
Lau, W., Kuo, T.-Y., Tapper, W., Cox, S. & Collins, A. Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome. Bioinformatics 23, 517–519 (2007).
Article CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
CAS PubMed PubMed Central Google Scholar
Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS PubMed PubMed Central Google Scholar
Lanfear, R., Calcott, B., Ho, S.Y.W. & Guindon, S. Combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701 (2012).
Article CAS PubMed Google Scholar
Baele, G., Li, W.L.S., Drummond, A.J., Suchard, M.A. & Lemey, P. Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics. Mol. Biol. Evol. 30, 239–243 (2013).
Article CAS PubMed Google Scholar
O'Fallon, B.D. ACG: rapid inference of population history from recombining nucleotide sequences. BMC Bioinformatics 14, 40 (2013).
Article PubMed PubMed Central Google Scholar
Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 (1993).
CAS PubMed Google Scholar
Simpson, J.T. & Durbin, R. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556 (2012).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006).
Article CAS PubMed PubMed Central Google Scholar
Camacho, C., Coulouris, G. & Avagyan, V. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Article CAS PubMed PubMed Central Google Scholar
van Dongen, S. & Abreu-Goodger, C. Using MCL to extract clusters from networks. Methods Mol. Biol. 804, 281–295 (2012).
Article CAS PubMed Google Scholar
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Article PubMed PubMed Central Google Scholar
Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006).
Article CAS PubMed Google Scholar
Kahm, M., Hasenbrink, G., Lichtenberg-Frate, H., Ludwig, J. & Kschischo, M. Grofit: fitting biological growth curves with R. J. Stat. Softw. 33, 1–21 (2010).
Article Google Scholar
Sazer, S. & Sherwood, S.W. Mitochondrial growth and DNA synthesis occur in the absence of nuclear DNA replication in fission yeast. J. Cell Sci. 97, 509–516 (1990).
CAS PubMed Google Scholar
Graml, V. et al. A genomic multiprocess survey of machineries that control and link cell shape, microtubule organization, and cell-cycle progression. Dev. Cell 31, 227–239 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
Article CAS PubMed Google Scholar
The R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013).

Download references

Acknowledgements

We thank L. Clissold, H. Musk, D. Baker and R. Davey for their contributions to sequencing, H. Levin for discussions about transposons, and J. Mata and S. Marguerat for comments on the manuscript. This work was supported by a Wellcome Trust Senior Investigator Award to J.B. (grant 095598/Z/11/Z), by the Wellcome Trust to S.B., T.K., J.T.S. and R.D., by grant 260801-BIG-IDEA from the European Research Council (ERC) and grant BB/H005854/1 from the Biotechnology and Biological Sciences Research Council (BBSRC) to A.R. and F.B., by UK Medical Research Council grant G0901388 to D.S. and D.J.B., by a Cancer Research UK Postdoctoral Fellowship to T.M.K.C., by an ERC Starting Grant (SYSGRO) to R.E.C.S., a Wellcome Trust PhD studentship to J.L.D.L. and BBSRC grant BB/K006320/1 to R.E.C.S. and A.C., by a Wellcome Trust grant (RG 093735/Z/10/Z) and ERC Starting Grant 260809 to M.R. (M.R. is a Wellcome Trust Research Career Development and Wellcome-Beit Prize Fellow), by Czech Science Foundation grant P305/12/P040 and Charles University grant UNCE 204013 to M.P. and by Cancer Research UK to L.J. and J.H.

Author information

Jared T Simpson & David J Balding
Present address: Present addresses: Ontario Institute for Cancer Research, Toronto, Ontario, Canada (J.T.S.) and School of Biosciences and School of Mathematics and Statistics, University of Melbourne, Melbourne, Queensland, Australia (D.J.B.).,

Authors and Affiliations

Department of Genetics, Evolution and Environment, University College London, London, UK
Daniel C Jeffares, Charalampos Rallis, Adrien Rieux, Doug Speed, Francesc X Marsellach, Winston Lau, Rodrigo Pracana, Garrett Hellenthal, Bartlomiej Tomiczek, Danny A Bitton, Theodora Sideri, Sandra Codlin, Josephine E E U Hellberg, Laurent van Trigt, Sophie Atkinson, Nikolas Maniatis, David J Balding, Francois Balloux & Jürg Bähler
University College London Genetics Institute, University College London, London, UK
Adrien Rieux, Doug Speed, Garrett Hellenthal, David J Balding, Francois Balloux & Jürg Bähler
Department of Cell Biology, Faculty of Science, Charles University in Prague, Prague, Czech Republic
Martin Převorovský
Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
Tobias Mourier & Malte Thodberg
Wellcome Trust Centre for Human Genetics, Oxford, UK
Zamin Iqbal
Cell Cycle Laboratory, Cancer Research UK London Research Institute, London, UK
Tammy M K Cheng, Linda Jeffery, Juan-Juan Li & Jacqueline Hayles
Department of Biochemistry, University of Cambridge, Cambridge, UK
Michael Mülleder, Anatole Chessel & Markus Ralser
Department of Genetics, University of Cambridge, Cambridge, UK
Jonathan L D Lawson & Rafael E Carazo Salas
Gurdon Institute, University of Cambridge, Cambridge, UK
Jonathan L D Lawson & Rafael E Carazo Salas
Wellcome Trust Sanger Institute, Cambridge, UK
Sendu Bala, Thomas Keane, Jared T Simpson & Richard Durbin
Associated Regional and University Pathologists, Inc. University of Utah, Salt Lake City, Utah, USA
Brendan O'Fallon
Commonwealth Scientific and Industrial Research Organisation (CSIRO) Mathematics, Informatics and Statistics, North Ryde, New South Wales, Australia
Leanne Bischof
Genome Analysis Centre, Norwich, UK
Melanie Febrer, Kirsten McLay & Nizar Drou
Centre for Genetics and Genomics, University of Nottingham, Nottingham, UK
William Brown
Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK
Markus Ralser
Division of Physiology and Metabolism, Medical Research Council (MRC) National Institute for Medical Research, London, UK
Markus Ralser

Authors

Daniel C Jeffares
View author publications
You can also search for this author in PubMed Google Scholar
Charalampos Rallis
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Rieux
View author publications
You can also search for this author in PubMed Google Scholar
Doug Speed
View author publications
You can also search for this author in PubMed Google Scholar
Martin Převorovský
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Mourier
View author publications
You can also search for this author in PubMed Google Scholar
Francesc X Marsellach
View author publications
You can also search for this author in PubMed Google Scholar
Zamin Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Winston Lau
View author publications
You can also search for this author in PubMed Google Scholar
Tammy M K Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Pracana
View author publications
You can also search for this author in PubMed Google Scholar
Michael Mülleder
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan L D Lawson
View author publications
You can also search for this author in PubMed Google Scholar
Anatole Chessel
View author publications
You can also search for this author in PubMed Google Scholar
Sendu Bala
View author publications
You can also search for this author in PubMed Google Scholar
Garrett Hellenthal
View author publications
You can also search for this author in PubMed Google Scholar
Brendan O'Fallon
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Keane
View author publications
You can also search for this author in PubMed Google Scholar
Jared T Simpson
View author publications
You can also search for this author in PubMed Google Scholar
Leanne Bischof
View author publications
You can also search for this author in PubMed Google Scholar
Bartlomiej Tomiczek
View author publications
You can also search for this author in PubMed Google Scholar
Danny A Bitton
View author publications
You can also search for this author in PubMed Google Scholar
Theodora Sideri
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Codlin
View author publications
You can also search for this author in PubMed Google Scholar
Josephine E E U Hellberg
View author publications
You can also search for this author in PubMed Google Scholar
Laurent van Trigt
View author publications
You can also search for this author in PubMed Google Scholar
Linda Jeffery
View author publications
You can also search for this author in PubMed Google Scholar
Juan-Juan Li
View author publications
You can also search for this author in PubMed Google Scholar
Sophie Atkinson
View author publications
You can also search for this author in PubMed Google Scholar
Malte Thodberg
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Febrer
View author publications
You can also search for this author in PubMed Google Scholar
Kirsten McLay
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Drou
View author publications
You can also search for this author in PubMed Google Scholar
William Brown
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline Hayles
View author publications
You can also search for this author in PubMed Google Scholar
Rafael E Carazo Salas
View author publications
You can also search for this author in PubMed Google Scholar
Markus Ralser
View author publications
You can also search for this author in PubMed Google Scholar
Nikolas Maniatis
View author publications
You can also search for this author in PubMed Google Scholar
David J Balding
View author publications
You can also search for this author in PubMed Google Scholar
Francois Balloux
View author publications
You can also search for this author in PubMed Google Scholar
Richard Durbin
View author publications
You can also search for this author in PubMed Google Scholar
Jürg Bähler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.C.J. coordinated all analyses, isolated DNA for sequencing, analyzed and filtered SNP calls, conducted diversity analysis and GWAS and drafted the manuscript. C.R. produced phenotype data for growth on various solid media and growth rates in liquid media. A.R. conducted analysis of dating using mitochondrial data. D.S. conducted GWAS. M.P. analyzed all phenotype data. T.M. identified LTR transposon insertions and analyzed transposon insertion data. F.X.M. conducted crosses for the analysis of spore viability. Z.I. produced indel calls with Cortex. W.L. conducted analysis of recombination rate, LD decay and principal-component analysis for distance between strains. T.M.K.C. assisted with phenotype and population analysis. R.P. analyzed Cortex and GATK indel calls. M.M. conducted amino acid profiling. J.L.D.L. and A.C. produced automated measures of cell morphology. S.B. aligned reads and produced GATK SNP calls. G.H. analyzed population structure using fineSTRUCTURE. B.O'F. estimated the time to the most recent common ancestor from the nuclear genome using ACG. T.K. identified LTR transposon insertions. J.T.S. produced de novo assemblies. L.B. developed the custom Workspace workflow Spotsizer. B.T. assisted with sequence analysis. D.A.B. assisted with analysis of new genes. T.S. assisted with strain verification. S.C. produced images of wild strains and assisted with strain verification. J.E.E.U.H. assisted with SNP validation. L.v.T. and M.T. assisted with LTR validation. L.J. and J.-J.L. assisted with manual measures of cell morphology and FACS. S.A. produced gene expression data. M.F., K.M. and N.D. assisted with sequencing. W.B. initiated and assisted with strain collection. J.H. coordinated manual measures of cell morphology and FACS. R.E.C.S. coordinated automated measures of cell morphology. M.R. coordinated amino acid profiling. N.M. conducted analysis of recombination and LD and advised on aspects of diversity and GWAS. D.J.B. advised on GWAS. F.B. advised on population structure and supervised A.R. R.D. facilitated sequencing. J.B. contributed to the initiation and development of the project and financed the Bähler laboratory.

Corresponding authors

Correspondence to Daniel C Jeffares or Jürg Bähler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Clonal clusters and isolation by distance.

(a) For all strains, we calculated the number of allelic differences using all SNPs. Pairs with <150 SNPs were considered nearly clonal, and these pairs were clustered using Markov clustering. Strains (spheres) are colored according to the continent where they were isolated, with gray spheres indicating unknown locations. Colors are as in Figure 1a: red, Americas; pink, Africa; green, Europe; blue, Asia; yellow, Australia. (b) The 752 unlinked SNPs used for descriptions of population structure are evenly distributed across the genome. For each 50-kb window of the genome, with a step size of 1 kb, we show the number of SNPs from the 752 unlinked set. Chromosomes 1 and 3 are in black, and chromosome 2 is in red. We note a slight bias to the right side of chromosome 1, which contains 20 of these 752 SNPs. (c) Genetic distance is correlated with geographical distance. For each pairwise comparison of the 161 strains, we calculated the proportion of shared alleles from the 752 unlinked SNPs ('drift distance') and the great circle distance (distance around the globe) between the locations from which strains were collected. A Mantel test with 10,000 resamplings showed that these 2 matrices were anticorrelated (r = -0.36, P = 9.9 × 10⁻⁵). This correlation is also present when we use only the 57 non-clonal strains (r = -0.28, P = 9.9 × 10⁻⁵) (d) Genetic distance is correlated with spore viability. For 43 crosses, we recorded spore survival by tetrad analysis. Spore survival was correlated with the proportion of shared alleles from the 752 unlinked SNPs (Pearson's product-moment correlation, r = 0.51, P = 6.4 × 10⁻⁴). Some strains do not produce many viable spores even when mated to themselves (low self-cross viability). The plot represents this by scaling each circle size to the lowest self-cross viability of the parents, showing that all low-viability outliers (top left of the plot) have at least one parent with low self-viability. When excluding crosses with the lowest self-cross viability (<0.3), the correlation between spore viability and genetic distance is stronger (Pearson's r = 0.76, P = 1.2 × 10⁻⁷).

Supplementary Figure 2 Population structure and relatedness between strains.

In each panel, Leupold's 972 reference strain (JB22) is indicated by a black triangle. (a) Admixture results. Each bar represents the proportion of SNPs assigned to each of the 2–5 populations, with the strain name below the bar. The geographical locations of the strains are shown in colored dots above the bar; yellow, Australia; green, Europe; red, Americas; pink, Africa; blue, Asia. (b) Principal-components plot colored by admixture clusters. Principal-component coordinates as described for Figure 1 using 752 unlinked SNPs. Strains (filled dots) are colored according to their Admixture cluster with k = 5. As in Figure 1, the 57 non-clonal strains are indicated with thick black borders. (c) fineSTRUCTURE analysis of shared haplotypes. The heat map depicts the proportion of the genome for which each strain in the columns shares the most recent common ancestry with each other strain (i.e., relative to all other strains) in the rows, as inferred by ChromoPainter (note that values therefore add up to 1.0 in each column)³. Strains are colored along the axes by their geographical sampling location, as above. The row and column for Leupold's 972 reference strain (JB22) are indicated with gray shading. The tree at the top shows the hierarchical merging of each strain based on genetic similarity, as inferred by fineSTRUCTURE⁹. This tree was inferred by first taking the sample configuration with the highest posterior probability among 100 posterior samples taken every 10,000 iterations from a Markov chain Monte Carlo (MCMC) run following 1 million burn-in iterations, next performing an additional 100,000 hill-climbing steps to find a solution with higher posterior probability and then constructing the tree by the stepwise merging of clusters as described in Lawsen et al.¹⁰. Strains connected by a horizontal row at the bottom of the tree are inferred by fineSTRUCTURE to form a genetically homogeneous cluster. (d) Majority consensus trees of the 57 non-clonal strains. A consensus tree generated from 100 trees, each estimated from a window of one centile of the genome. Branch values show the percentage of windows that support each clade, with strain names colored according to their geographical origin as for Figure 1. The two trees have identical topology, branch lengths are adjusted to give a radial presentation in the left tree and all branch lengths are equal in the right tree. The historical recombination of these strains is illustrated by the fact that all but one of the internal clades have less than 56% support. To generate this tree, we divided the genome into 100 non-overlapping windows and produced alignments for all of the fourfold degenerate sites from each window (~10,000 sites each). We estimated the best tree for each window using the GTRGAMMA model in RaXML^11,12 and calculated the consensus tree using the CONSENSE function from PHYLIP (http://evolution.genetics.washington.edu/phylip.html) with Majority rule (extended).

Supplementary Figure 3 The terminal 100 kb of all chromosomes contains excess diversity and unusual properties.

The columns of the nine panels show the three chromosomes. The rows show the expression levels of protein-coding genes (top), the number of essential genes (middle) and the diversity (π; bottom). Expression panels (top) show the range of expression levels (in reads/kb/million reads, RPKM) for genes during exponential growth (log), stationary phase (stat) and meiotic differentiation (mei) (S.A., unpublished data). For each chromosome, we show the expression levels for the left 100 kb of the chromosome in red, the right 100 kb of the chromosome in blue and all other genes in green. Box widths are proportional to the number of genes. We note that, in general, genes at chromosome ends are expressed at lower levels under all conditions tested. Essential gene panels (middle) show the number of essential genes per 10-kb window, with box fill colors as above. Essential genes are defined as those annotated with the Fission Yeast Phenotype Ontology ID FYPO:0000049 (inviable) in PomBase (http://www.pombase.org/). Diversity panels (bottom) show the distribution of average pairwise similarity (π) for the 10-kb windows in the left, middle and right regions of each chromosome. Chromosome ends have higher diversity, indicating less purifying selection. Not shown: the ends of chromosomes contain an excess of common LTR insertions (present in at least half of the 57 non-redundant strains, per 10-kb window of the genome). Windows within 100 kb of the chromosome ends had significantly more common insertions (ends mean = 0.74 transposons/window, internal regions mean = 0.15 transposons/window; Mann-Whitney test P = 4.8 × 10⁻¹¹).

Supplementary Figure 4 Differences in diversity in various genome annotations.

(a) SNP median minor allele frequency. Median minor allele frequency calculated with SNPs from 100 windows of the genome, using sites specific to one annotation. Colors are as in Figure 3b. C/RNA indicates canonical RNAs (rRNAs, tRNAs, snoRNAs and snRNAs). One-sided Mann-Whitney test P values versus the FFD site neutral proxy were: exons, 3.4 × 10⁻¹³; 3′ UTRs, 4.4 × 10⁻³; canonical noncoding RNAs, 0.97; 5′ UTRs, 0.013; lncRNAs, 1; non-annotated regions, 0.0078; introns, 0.55; LTRs (which have higher median MAF), 3.7 × 10^–3; onefold-degenerate sites, 7.3 × 10⁻¹⁶. This supports the conclusion from θ that exons and UTRs but not lncRNAs have been subject to purifying selection. (b) Indel median minor allele frequency. Median minor allele frequency calculated with indels from 100 windows of the genome, using sites specific to one annotation. Colors are as in Figure 3b. One-sided Mann-Whitney test P values versus the neutral proxy of unannotated sites were: exons, 1.5 × 10⁻⁷; 5¢ UTRs, 2.8 × 10⁻³; lncRNAs, 0.5; 3′ UTRs, 0.077; introns, 0.42; transposon LTRs, 0.66. Here exons and 5′ UTRs show evidence for constraint, but 3′ UTRs and lncRNAs do not. (c) Diversity (θ) in lncRNA expression fractions. θ, calculated using SNPs, from left to right; 5 expression fractions of non-canonical lncRNAs (ncRNA1 to lncRNA5, with lncRNA5 including the top 20% most highly expressed lncRNAs), exonic sites, 3′ UTRs, unannotated regions, fourfold-degenerate sites from genes with low expression (FFD0, lowest 10%) and fourfold-degenerate sites from genes with high expression (FFD9, highest 10%). In this analysis, we use unannotated regions as a neutral proxy, and the red horizontal line shows the median value for these sites. Annotations that show significantly lower diversity than the neutral proxy are shaded gray; one-sided Mann-Whitney test P values are: ncRNA5, 2.7 × 10⁻³; exons, 6.9 × 10⁻²⁹; 3′ UTRs, 1.2 × 10⁻¹⁹. (d) SNP median MAF in lncRNA expression fractions. Median minor allele frequency of SNPs, with annotation classes as above. In this analysis, we use fourfold-degenerate sites from genes with low expression as a neutral proxy, and the red horizontal line shows the median value for these sites. Annotations that show significantly lower diversity than the neutral proxy are shaded gray; one-sided Mann-Whitney test P values are: ncRNA5, 0.012; exons, 1.3 × 10⁻⁵; 3′ UTRs, 2.3 × 10⁻⁵; unannotated regions, 0.026. (e) Indel median MAF in lncRNA expression fractions. Median minor allele frequency of indels, with annotation classes as above. In this analysis, we use unannotated regions as a neutral proxy, and the red horizontal line shows the median value for these sites. Annotations that have significantly lower diversity than unannotated regions are shaded gray; one-sided Mann-Whitney test P values are: ncRNA5, 7.0 × 10⁻³; exons, 8.7 × 10⁻¹¹.

Supplementary Figure 5 A sharp peak of LTR insertions within 500-nt regions upstream of transcription start sites.

Histogram of LTR insertions in 100-bp bins around the transcription start sites (TSSs) of protein-coding genes. Positive and negative x values denote regions up- and downstream of the TSS, respectively. The number of insertions is shown for 'fixed' insertions (present in all 57 strains), 'singletons' (present in a single strain only) and 'intermediates' (all other insertions).

Supplementary Figure 6 Recombination rate and linkage decay.

(a) The recombination rate is log-normally distributed. For each SNP, we calculated the recombination rate in linkage disequilibrium units/Mb. The plot shows the distribution of nonzero rates on a log₁₀ scale. (b) The recombination rate is correlated with diversity. Filled red and black circles indicate centromeric and telomeric regions, respectively, as in Figure 3c. Diversity (Watterson's θ), calculated as in Figure 3c (in 10-kb genomic windows) is correlated with the average recombination rate (LDU/Mb) (Spearman's rank correlation ρ = 0.43, P = 2.2 × 10⁻⁵⁷). (c) Diversity is calculated as above. The recombination rate is negatively correlated with exon density (the proportion of each 10-kb window that is annotated as an exon (Spearman's ρ = −0.42, P = 2.2 × 10⁻⁵³). (d) Linkage disequilibrium (LD) declines to 50% of its value within 21 kb. Using SNPs with minor allele frequencies >0.05, we calculated the D′ and r² measures of linkage disequilibrium for all pairs of SNPs up to 250 kb apart (Online Methods). We show the mean D′ and r² values for all pairwise comparisons within each 1-kb window of distance.

Supplementary Figure 7 Microscopy images of selected strains.

All strain descriptions (long, misshapen) are in comparison to Leupold's 972 reference strain. Left, DIC micrographs; right, calcofluor-stained cells, fluorescence microscopy (calcofluor stains the cell wall and division septum). Strains from the top are: (a) Leupold's 972 reference strain, (b) JB762, which has branched, multi-septated and pear-shaped cells, (c) JB1207, which has long cells, (d) JB1117, where cells are weakly misshapen/pear shaped and slightly curved, (e) JB939, which has misshapen cells, (f) JB914 which is near-filamentous on solid media (bright calcofluor staining between cells shows that cells that have undergone cell division remain attached at the septum), (g) JB930, which has short cells, and (h) JB1116, which contains 'banana-shaped' (curved) cells.

Supplementary Figure 8 Trait heritability and the value of repeat trait measurements.

(a) Traits collected using all methods are heritable. Here we show heritability estimates according to the method of data collection. All methods are sufficiently accurate to detect some heritable traits. Data collection types from left are: AA, amino acid concentrations determined by mass spectrometry; SOL/M, colony size on various solid media; LIQ/M1, growth parameters in liquid YES rich media and EMM2 minimal media; LIQ/M2, growth parameters in various liquid media from Brown et al. (2011); SHAPE/M, manually defined shape parameters; SHAPE/A, automated definitions of shape parameters. (b) Repeat measurements reduce non-genetic sources of variation (experimental noise/environmental variation). This plot shows the proportion of variation removed for each phenotype due to repeats, calculated as the adjusted r² from regressing the 179 individual phenotypic values on the factor clonal ID. For example, for the trait "Predicted Banana," for each clone, we recorded average phenotypic values across five samples, which removed approximately 30% of phenotypic variation. Repeated measurements for clones can substantially increase power to detect causal variants; for example, suppose we can remove 50% of variation through repeated measurements, then the proportion of variance explained by each variant effectively doubles (a variant that explains X% of total variation will explain 2X% of the variance that remains).

Supplementary Figure 9 Analysis of GWAS results.

(a) To examine whether the mixed-model GWAS controlled for population structure, we compared the degree of population stratification of each trait to the number of variants that passed the P-value threshold. To calculate the degree of population stratification, we divided the strains into five groups (defined by Admixture) and used a Kolmogorov-Smirnov test to determine whether the trait was significantly different between these five groups, using the log (P value) as a metric. This metric is not significantly correlated with the number of passing variants (Spearman rank correlation P > 0.05). Circles show the number of all variants that are significant in the GWAS, red crosses indicate the number of passing SNPs and green crosses indicate the number of passing indels. Traits that we might evaluate with caution because they are significantly stratified by population and have many passing variants are indicated with a black circle. The red vertical line shows the Bonferroni-corrected P-value threshold for the Kolmogorov-Smirnov tests; the green vertical line shows the median number of passing variants. (b) Genomic inflation factors (GIFs). The GIF is the observed median P value divided by the median expected P value. Under a null model of no associations and unlinked variants, the expectation is for the GIF to be 1. We show the distribution of GIFs from the 223 traits (top left), GIFs from permuted data (top right), density plot of observed GIFs versus 10 sets of permutations (each one per trait) (bottom left) and the distribution of adjusted GIFs (observed median P value/median P value from permuted data) (bottom right). Although the distribution of observed GIFs is slightly skewed to values larger than 1, adjusted GIFs (observed median/median from permuted data) are close to 1. (c) Associated indels tend to explain a greater proportion of trait variance. For all variants associated with a trait (left) and for the most significant variant associated with the 89 traits (right) we show the estimated variance explained by the trait. (d) Annotations of variants used for the GWAS analysis (top), all variants passing the P-value threshold (middle) and the most significant variant from each of the 89 traits (top hits) (bottom). The annotations from top are intergenic regions (unannotated as any other of the categories below), long noncoding RNAs (ncRNAs), 5′ and 3′ UTRs, synonymous sites in exons (Exon:syn) and nonsynonymous sites in exons (Exon:nonsyn). Indels that are multiples of three nucleotides are categorized as Exon:syn; all others are categorized as Exon:nonsyn. χ² tests showed no significant difference between SNPs in any three groups, or indels in any three groups, including no bias towards nonsynonymous SNPs.

Supplementary Figure 10 The GWAS hotspot on chromosome 1.

The 10-kb region that contains the largest number of significant associations in the mixed model and also the passing variant with the lowest P value is on chromosome 1 (Fig. 4b). Here we show: (a) the passing variants in this 10-kb window (top), with the window indicated by vertical gray lines, and the local neighborhood of three genes (bottom). In both panels, protein-coding genes are shown below variants as black rectangles and noncoding RNAs are shown as gray rectangles, with forward-strand genes above reverse-strand genes. The most significant variants are three SNPs between nsk1 (SPAC3G9.01) and sod2 (SPAC1486.01). These variants are in perfect LD and are associated with growth in solid media with 0.1 M MgCl₂. nsk1 is a reverse-strand gene (transcribed from right to left) and sod2 is a forward-strand gene, so these variants are in the promoter regions of both genes. (b) The distribution of values for growth in solid media with 0.1 M MgCl₂ (left), categorizing strains by the genotype of one of these three variants (chromosome 1, position 3,185,213). The top right panel shows the trait values for the 57 non-clonal strains in 0.2 M MgCl₂. Because some strains are clear outliers, we show the trait on a log scale in the two lower plots. The box-and-whisker plots overlaid show the median and interquartile ranges of trait values. The red and black crosshairs show the trait values for the two parents used in the cross (c, below): JB931, which has the T allele (red), and JB953, which has the G allele (black). (c) PCR and ABI capillary sequencing of the parents and F₁ progeny of a cross between two strains with the two genotypes at chromosome 1, position 3,185,213 (JB931 × JB953). The left panel shows the parents, and the right panels show pools of F₁ segregants grown on YES rich media without MgCl₂ (top) or in YES rich media with 0.1 M MgCl₂ (below). The segregating allele is indicated with a yellow box. The T allele is enriched relative to the G allele on MgCl₂, as expected from the trait values in b. The increase in signal from the favored allele is likely due to either the increased colony size of segregants with the favored T allele (expected from the association) and/or the increased survival of segregants with the favored T allele. Pools contained at least 35 colonies. (d) Spot assays of serial tenfold dilutions of sod2 and nsk1 deletion strains on control rich media (YES) and rich media with 0.1 MgCl₂ or 0.2 M MgCl₂. Both deletion strains show less dense growth on media with 0.2 M MgCl₂, consistent with these genes affecting sensitivity to this stress. Deletion strains are from the Bioneer Version 2.0 deletion collection, and ED668 is the corresponding wild-type strain (genotype h⁺ ade6⁻ M216 ura4-D18 leu1-32).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Note. (PDF 2988 kb)

Supplementary Tables 1–9

Supplementary Tables 1–9. (XLSX 961 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeffares, D., Rallis, C., Rieux, A. et al. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet 47, 235–241 (2015). https://doi.org/10.1038/ng.3215

Download citation

Received: 08 July 2014
Accepted: 14 January 2015
Published: 09 February 2015
Issue Date: March 2015
DOI: https://doi.org/10.1038/ng.3215

This article is cited by

Cross-feeding promotes heterogeneity within yeast cell populations
- Kevin K. Y. Hu
- Ankita Suri
- Victoria S. Haritos
Nature Communications (2024)
Macroevolutionary diversity of traits and genomes in the model yeast genus Saccharomyces
- David Peris
- Emily J. Ubbelohde
- Chris Todd Hittinger
Nature Communications (2023)
Insights into the ecology of Schizosaccharomyces species in natural and artificial habitats
- Michael Brysch-Herzberg
- Guo-Song Jia
- Li-Lin Du
Antonie van Leeuwenhoek (2022)
Genomic diversity and meiotic recombination among isolates of the biotech yeast Komagataella phaffii (Pichia pastoris)
- Stephanie Braun-Galleani
- Julie A. Dias
- Kenneth H. Wolfe
Microbial Cell Factories (2019)
Cellular geometry scaling ensures robust division site positioning
- Ying Gu
- Snezhana Oliferenko
Nature Communications (2019)