Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast

Large structural variations (SVs) within genomes are more challenging to identify than smaller genetic variants but may substantially contribute to phenotypic diversity and evolution. We analyse the effects of SVs on gene expression, quantitative traits and intrinsic reproductive isolation in the yeast Schizosaccharomyces pombe. We establish a high-quality curated catalogue of SVs in the genomes of a worldwide library of S. pombe strains, including duplications, deletions, inversions and translocations. We show that copy number variants (CNVs) show a variety of genetic signals consistent with rapid turnover. These transient CNVs produce stoichiometric effects on gene expression both within and outside the duplicated regions. CNVs make substantial contributions to quantitative traits, most notably intracellular amino acid concentrations, growth under stress and sugar utilization in winemaking, whereas rearrangements are strongly associated with reproductive isolation. Collectively, these findings have broad implications for evolution and for our understanding of quantitative traits including complex human diseases.

The paper is overall well written, and an important contribution to the understanding of extent and influences of structural variation. I only have clarifying major remarks, and a list of minor suggestions. -The copy number differences between near-clonal strains are striking and interesting.
--You refer to these copy number changes as "transient" (subtitle) and "evolutionarily unstable" (page 10, line 191). To me, both of these imply lasting a short time, but this has not been shown.
--Is there any way it could be due to non-chromosomal DNA content variation? --Did you PCR validate any sequence-inferred indel sizes? --It was great to read about the results in the context of genome content variation ascertained in humans (1000 genomes). It would also be relevant to compare S. pombe to S. cerevisiae and S. paradoxus (Bergstrom et al., 2014, MBE; especially with respect to the claims on subtelomeric gene content).
-There are gene expression changes between strains, some within duplicated regions, some outside. The Authors ascribe these to "probably reflect indirect and compensatory effects of the ... duplication". Compensatory is also an indirect effect, making the statement tautological, but more importantlywhy do you exclude the other SNPs in the strain as potential causes of the gene expression difference? -It is not clear what went into the kinship matrices of the mixed model for each result. --The Methods say "composite model". P29 L576 refers to SNPs, CNVs and rearrangements included in the kinship matrix. Then on next page, 22000 indels are included as well. Where did these come from, how were they handled for heritability analyses, etc? --How were the SVs encoded -as a SNP for 0/1 absence/presence of the allele, or taking into account the amount of DNA that is changed as a proxy for fraction of the genome that's different? --Was a separate kinship matrix estimated for the SVs, and a variance component inferred for them? --Overall, it would be useful to have an explicit model in the Methods section from the variance parameters of which, the presented quantities are calculated. For example, what is the exact statistic on the y-axis on P42 (Fig. S8); what model does it come from, what is in the kinship matrix? -In the discussion, the findings are framed causally (SVs make substantial contributions) -only correlations, associations, and linkages are shown throughout, so I would urge the Authors to be precise about separation of what they can show is causal from what they believe is causal.
Minor comments: - Fig. 1a -perhaps give numbers in the pie chart as well? - Fig. 1d -any reason to have the information of a linear genome arranged in a circle? -Page (P) 7 line (L) 122 "Deletions and duplications and strongly biased". Also, no quantification of strength of bias in the text. -P7 L125 -"preferentially occurred" -no quantification of preference in the text. -P9 L174 -"significantly induced" -give amount, test statistic, p-value -P9 L176 -"levels correlated with copy numbers" -give correlation -P10 L178 -"no changes were evident" -give amount of change -The information on PCR results on inversions is given in Methods; I would not have expected to see novel results there. -P12 L219 "CNVs influence quantitative traits" -only linkage is shown, no causality. -P19 L366 "measurable rate" -all such rates are measurable :) Reviewer #2 (Remarks to the Author): The authors present an analysis of structural variants in their recently sequenced S. pombe natural isolates. They show structural variation, including what must be segregating structural variation, contributes to changes in gene expression, substantially to quantitative trait variation, and potentially to reproductive isolation. Overall the analyses seem robust and the manuscript is well written and easy to follow.
The study contributes to the establishment of S. pombe as a model of natural variation and quantitative traits. My one criticism would be that similar things have been shown previously in both human and S. cerevisiae quantitative trait analyses. The authors need to better distinguish their findings from what has previously been shown in these other more intensively studied species.
Reviewer #3 (Remarks to the Author): In this manuscript, Jeffares et al. analyze published short read data to detect structural variants (SVs) in the genomes of a set of S. pombe strains. They associate these SVs with phenotypic variation among the strains and provide estimates of heritability explained by SVs, as well as map individual cases of SV-trait associations. Perhaps the most interesting and surprising result is that nearly clonal groups of strains that are almost identical at the SNP level nevertheless segregate for several SVs. The role of SVs in complex trait variation is interesting, and this is an interesting contribution. The manuscript is clearly written, follows a clean logic, and therefore is easy to follow. I would like to see additional information on some of the results, mainly about the SVs that segregate within clonal populations. I also have suggestions for minor additions and clarifications.
Main comments: 1. The most surprising result is that the nearly clonal sets of strains do segregate SVs. I would like to see a little more information about these events in the main text. Specifically: a) What are the allele frequencies within the clonal populations? Is it usually the case that only a single member of a clonal set carries the SV, or are they at higher frequency? b) Within the clonal populations, do the SV alleles reflect relatedness based on SNPs? I realize this analysis may be underpowered because of the low number of SNPs, but maybe there are obvious agreements between SV status and SNP alleles, perhaps splitting each clonal group in half? Or do the SVs just occur randomly within each cluster? c) Is there segregation of CNV copy number within each cluster of clonal strains? I.e., do all clonal strains that carry a given CNV have the same copy number, or is there variation in how many copies they carry? 2. p. 8 l. 147 "Furthermore, we observed instances of the same SVs that were present in two or more different clonal populations that were not fixed within any clonal population." This is a really interesting observation, and I would like to hear more about it in the Results and / or the Discussion. Currently there is only this one sentence for a topic that could easily support an entire paragraph: a) Are these SVs shared perfectly (with the same breakpoints) or do they just overlap? b) Are there any patterns in terms of which clonal populations share a given SV? For example, are the clonal populations that share a given SV more closely related to each other than clonal populations that do not share SVs? c) Do you see any evidence that these SVs might have been moved around between populations by outcrossing? d) If not, what is your explanation for why these SVs occur? Are they recurrent mutations at labile sites in the genome that are more prone to forming SVs? e) If they are recurrent mutations, can you infer or speculate about the mutational mechanism? For example, are the SV break points close to repetitive elements or close gene paralogs that might frequently create errors in recombination? f) For shared CNVs, is the copy number of the CNV the same or different in different clonal populations?
3. Do any of the SVs that are associated with a phenotype segregate within a clonal population? If yes, how much of trait variance does the SV explain within that clonal population? Because there is essentially no other genetic variation among the clones, the SV might completely determine genetic trait variation among the clones. It would be interesting to know if such cases exist. 4. Please provide a supplementary text or spreadsheet file that lists the genotypes (presence / absence and copy number where appropriate) for each SV in each strain. This would also help address some of my questions above on allele frequencies and SV sharing. Together with the phenotypes that are available from reference 8, this would allow readers to recapitulate the heritability and association analyses. I couldn't find SNP genotypes associated with reference 8 (although I checked only briefly). These would also need to be made available to ensure that readers can reproduce the analyses presented here. If they are available somewhere, a brief mention of their location would be useful in the present paper.

5.
Have you done qPCR to confirm some of the CNVs, especially those that segregate among multiple clonal populations? 6. In the visual inspections for the SV calls, what types of artifacts or features did you look for? What were typical failure modes for putative SVs that you deemed incorrect? A brief description in the Methods would be useful to the community.
Minor comments: Figure S3: It would be helpful to indicate the absolute coverage of the strains as well. This would help to get a better sense of the strength of the signal. For example, a two-fold coverage difference means more with a 100X coverage baseline than a 2X baseline. If different strains had different average genome coverage, how were the relative coverages in the plots calculated?

Supplementary
Were they anchored to the flanking sequence somehow, or are they purely "coverage strain 1 / coverage strain 2"? I'm trying to understand why some of the green strains in the figure have less coverage than the reference. The normalization scheme would probably explain this. 8. p. 14 l. 262 "Our analysis of heritability showed that SNPs are generally able to capture most of the genetic contribution of SVs" seems to contradict the result on p. 13 l. 233 that "Analysis of simulated data confirmed that the contribution of CNVs could not be explained by linkage to causal SNPs alone". Please clarify. 9. p. 17 l. 311 "we found that rearrangements explained spore viability better than CNVs [...]" this implies that you tested rearrangements and CNVs directly against each other, perhaps as you did further down for SNPs and rearrangements. Please rephrase this to "while rearrangements correlated with spore viability, there was no significant correlation between CNVs and viability". 10. Figure 4: the legend has an incorrectly rounded p-value: SNPs | rearrangements = 0.03, whereas the figure gives p = 0.038, which is p = 0.04 after rounding. The correlation estimate is also slightly different between legend and figure.
11. p. 19 l. 366 Instead of a "measureable" rate, do you mean "considerable" or simply "high"? All mutation rates can be measured. Figure S8: in the top left panel, in the leftmost bar, the open circle above the bar should probably be filled? If not, why is the "estimate -1sd" higher than the estimate? 13. Abstract: "genomics regions" should be "genomic regions" Reviewer #4 (Remarks to the Author):

Supplemental
This paper focuses on the effects of structural variation on phenotypic differences and reproductive isolation in Sc. pombe. Although the work in this manuscript is performed well, I had some significant criticisms: 1. Not very much was done with the ample phenotype dataset to make specific connections between genetic variants and traits.
2. It is known that Sc. pombe isolates exhibit a substantial amount of structural variation. This paper improves upon our knowledge of this details of this structural variation, but at this juncture, these details seem to represent an incremental advance. 3. A large amount of work in Sa. cerevisiae has shown that structural variation can have important phenotypic and gene expression effects, and that some of these structural variations can be transient. I thought the attempt to determine the quantitative contribution of structural variation to phenotypic variation was of value, but the insights gained also seemed incremental.  7. It was surprising that more work from Saccharomyces was not cited. This was especially true in the section on reproductive isolation, where the work mentioned above, which arguably represents the gold standard for yeast papers on the topic, was not even recognized. Ultimately, many of the questions addressed in this paper have been extensively examined in Saccharomyces. Even though this is a different yeast genus, it is still important to cite and discuss the prior work in Saccharomyces and describe how this paper builds upon it.
In summary, the science and writing in this paper were solid. However, this paper had insufficient novelty and awareness of historical context to warrant publication in Nature Communications.

REVIEWERS' COMMENTS:
Reviewer #1 (Remarks to the Author): The Authors have thoroughly addressed my previous comments, and I have no further ones to make. I defer to other Reviewers in regard to novelty of the findings in S.pombe, as this is not my area of expertise.
Reviewer #2 (Remarks to the Author): I am satisfied with the response to my comments -the text changes now better distinguish this study from previous work in the other yeast + human populations.
Reviewer #3 (Remarks to the Author): Thanks to the authors for carefully addressing all my previous comments. I have just a few minor remaining comments: 1. p. 9 l. 160 refers to Figure 3c, which does not seem right. Should this be Figure 2c or some other Figure? 2. p. 9 l. 165: "strong correlation between the total mutation in these regions and the total variation in copy number of the CNV" is awkwardly phrased. "total mutation" sounds like it includes the CNV, which seems wrong. Please reword. The revised version of this manuscript represents a significant improvement over the initial submission. The authors do a much better job now of connecting their work to previous papers from other groups, including labs that work on Saccharomyces cerevisiae. It is clearer how the components of the paper collectively build into a manuscript that could be of value to a number of different groups of researchers (e.g., people working on S. pombe, quantitative genetics, folks interested in structural variation).
Aside from one comment, I am satisfied with how the authors addressed my remarks and handled input from the other reviewers. However, there was a misinterpretation of my first point, perhaps because I could have been clearer: 'Not very much was done with the ample phenotype dataset to make specific connections between genetic variants and traits'. What was meant by this point is that the authors do not discuss how any specific variants influence any specific traits? In other words, no discussion of the molecular and systems mechanisms contributing to heritable phenotypic variation in this organism is provided. For example, on p15, the authors write: 'Thus, some groups of traits have consistently larger contributions from SVs than from SNPs alone. These traits include intracellular amino acid concentrations…' Can you make any connection to the mechanisms based on SNVs? This seems especially feasible for CNVs, which are often resolved to individual genes. There are other similar opportunities in the paper. I don't think these modifications are absolutely necessary, but they would certainly help make this paper more accessible to researchers who are not statistical geneticists.
More minor comments: Yeast species: Often in a context where both Saccharomyces cerevisae (or related species) and Schizosaccharomyces pombe are being discussed, the former and latter will be referred to as Sa. cerevisiae and Sc. pombe, respectively, to prevent confusion.
P3, l68: The word 'progress' read weird to me. Maybe 'advance'? P4, l71: 'Various aspects of biology' is a vague phrase. P21, l398-402: In these sentences the authors mention experimental studies in budding yeast, but then an example from S. pombe is provided.

Reviewer #1
In this paper, the Authors perform a follow up analysis of extent and associations of large structural variation (SV) in an S. pombe population genomics resource that covers 161 strains. They provide a catalog of SVs, assess their segregation patterns, correlate them to gene expression changes, and quantify linkage to cellular traits. The paper is overall well written, and an important contribution to the understanding of extent and influences of structural variation. I only have clarifying major remarks, and a list of minor suggestions. 2) The copy number differences between near-clonal strains are striking and interesting.

Response:
We agree and thank the reviewer for his/her endorsement.
1.2.1) You refer to these copy number changes as "transient" (subtitle) and "evolutionarily unstable" (page 10, line 191). To me, both of these imply lasting a short time, but this has not been shown.

Response:
We have performed additional, quantitative investigations of the transience of CNVs, and report these in a new section with new figures. In brief, we constructed local, SNP-based phylogenies for the region surrounding each CNV (20kb up-and down-stream, merged) and found that strains identical in these regions could still have different copy numbers within clusters of near-clonal strains (Figure 2). This high similarity (as well as the nearidentical sequence throughout the genome for clonal clusters) effectively rules out CNV gain/loss by recombination. We also produced neighbor joining trees from CNV alleles, and showed that CNV-allele distance, and local SNP-tree distance were only weakly correlated, consistent with different processes. We also extended the discussion, relating   Response: Yes, the Bergstrom paper 1 produces some finding that similar to ours. That CNVs and rearrangements are more abundant in subtelomeric regions, that a CNV appears to associate with a quantitative trait (arsenic resistance in this case), and that subtelomeric regions tend to have more loss-of-function variants (we observed in a similar thing, along with an increase of retrotransposons in our Nature Genetics paper 2 ).
They also note that the strains that have SV similarity tend to have high spore viability, consistent with our findings (but do not show any correlation, or statistics). We have highlighted these similarities in the manuscript.

1.3)
There are gene expression changes between strains, some within duplicated regions, some outside. The Authors ascribe these to "probably reflect indirect and compensatory effects of the ... duplication". Compensatory is also an indirect effect, making the statement tautological, but more importantly -why do you exclude the other SNPs in the strain as potential causes of the gene expression difference? "As environmental growth conditions were tightly controlled, these changes in gene expression could be due to either compensatory effects of the initial perturbation caused by the duplication or changes that arise due to SNPs or indels that segregate between the strains". We additionally considered the model Having estimated the four variance components, again using REML, the relative contributions of SNPs, CNVs and Rearrangements are, respectively, Minor comments: - Fig. 1a -perhaps give numbers in the pie chart as well?

Response:
We have done this.
- Fig. 1d -any reason to have the information of a linear genome arranged in a circle?   Supplementary Tables 14 and 15.

Response: We prefer not to place such detailed results in the main text. We provide a summary in the Methods, and detailed results in
-P12 L219 "CNVs influence quantitative traits" -only linkage is shown, no causality. -P19 L366 "measurable rate" -all such rates are measurable :)

Response:
We altered this sentence to: "at a rate of approximately one CNV/10 generations".

Reviewer #2
The authors present an analysis of structural variants in their recently sequenced S.
pombe natural isolates. They show structural variation, including what must be segregating structural variation, contributes to changes in gene expression, substantially to quantitative trait variation, and potentially to reproductive isolation. Overall the analyses seem robust and the manuscript is well written and easy to follow.

2.1)
The study contributes to the establishment of S. pombe as a model of natural  Figure 5).", and we and provide a supplementary figure.  (Figure 2b).
Additionally, the SNP-based phylogenies for these CNVs could not resolve closelyrelated strains within clusters that had different copy numbers (Figure 2c).
See also response to 1.     Response: Before the calling and mapping we randomly subsampled reads for each strain such that we had an average theoretical coverage of 40x per sample.
3.8) p. 14 l. 262 "Our analysis of heritability showed that SNPs are generally able to capture most of the genetic contribution of SVs" seems to contradict the result on p. 13 l.
233 that "Analysis of simulated data confirmed that the contribution of CNVs could not be explained by linkage to causal SNPs alone". Please clarify. to "while rearrangements correlated with spore viability, there was no significant correlation between CNVs and viability".

Response:
We have corrected this sentence.
3.10) Figure 4: the legend has an incorrectly rounded p-value: SNPs | rearrangements = 0.03, whereas the figure gives p = 0.038, which is p = 0.04 after rounding. The correlation estimate is also slightly different between legend and figure.

Response:
We have corrected this, so that the figure and the text now have the same values.
3.11) p. 19 l. 366 Instead of a "measureable" rate, do you mean "considerable" or simply "high"? All mutation rates can be measured.

Response:
We altered this sentence to: "at a rate of approximately one CNV/10 generations". Figure S8: in the top left panel, in the leftmost bar, the open circle above the bar should probably be filled? If not, why is the "estimate -1sd" higher than the estimate?

3.12) Supplemental
Response: Due to our reanalysis of heritability this figure has been replaced.

Reviewer #4
This paper focuses on the effects of structural variation on phenotypic differences and reproductive isolation in Sc. pombe. Although the work in this manuscript is performed well, I had some significant criticisms:

4.7)
It was surprising that more work from Saccharomyces was not cited. This was especially true in the section on reproductive isolation, where the work mentioned above, which arguably represents the gold standard for yeast papers on the topic, was not even recognized. Ultimately, many of the questions addressed in this paper have been extensively examined in Saccharomyces. Even though this is a different yeast genus, it is still important to cite and discuss the prior work in Saccharomyces and describe how this paper builds upon it. In summary, the science and writing in this paper were solid. However, this paper had insufficient novelty and awareness of historical context to warrant publication in Nature

Response
Communications.

Response:
We and seemingly the other reviewers disagree with this conclusion. As