Phylogenomic profiles of whole-genome duplications in Poaceae and landscape of differential duplicate retention and losses among major Poaceae lineages

Poaceae members shared a whole-genome duplication called rho. However, little is known about the evolutionary pattern of the rho-derived duplicates among Poaceae lineages and implications in adaptive evolution. Here we present phylogenomic/phylotranscriptomic analyses of 363 grasses covering all 12 subfamilies and report nine previously unknown whole-genome duplications. Furthermore, duplications from a single whole-genome duplication were mapped to multiple nodes on the species phylogeny; a whole-genome duplication was likely shared by woody bamboos with possible gene flow from herbaceous bamboos; and recent paralogues of a tetraploid Oryza are implicated in tolerance of seawater submergence. Moreover, rho duplicates showing differential retention among subfamilies include those with functions in environmental adaptations or morphogenesis, including ACOT for aquatic environments (Oryzoideae), CK2β for cold responses (Pooideae), SPIRAL1 for rapid cell elongation (Bambusoideae), and PAI1 for drought/cold responses (Panicoideae). This study presents a Poaceae whole-genome duplication profile with evidence for multiple evolutionary mechanisms that contribute to gene retention and losses.

Minor: 1. Line 234, "WGD dating can be affected by the ingroup species that have evolved more quickly (with greater mutation rates) than outgroup species".Both the acceleration or decline of mutation rate may affect the estimation of molecular dating.2. Line 155, a typo of "…GDs with gene duplicates from in sequenced Poaceae genomes" 3. Line 582, a typo of "…pruned the gene trees to by removing putative paralogs" Reviewer #2: Remarks to the Author: Dear Editor I have revised the manuscript titled Phylogenomic Profiles of Whole-Genome Duplications in Poaceae and Landscape of Differential Duplicate Retention and Losses among Major Poaceae Lineages by Taikui Zhang, Weichen Huang, Lin Zhang, De-Zhu Li, Ji Qi, and Hong Ma.Whole-genome duplications (WGDs) are important drivers of angiosperms evolution and diversification.Previous studies reported three WGDs in the ecologically and economically important grass (Poaceae) family; however, these studies included a limited number of species, suspecting that there are more WGDs that has not been discovered yet.In the present work, authors performed phylogenomic/phylotranscriptomic analyses of 363 grasses covering all 12 Poaceae subfamilies to detected strong evidence for new WGDs, and other Gene Duplication clusters (GDs) and explore their possible roles in adaptive evolution and species divergence.
The paper is well written and I enjoyed the combination of methods to address, with confident, new events of WGDs and calibrate the timing of such events.From my perspective this is a very interested paper that significantly contribute with the knowledge of genome duplications presenting a whole picture of several duplication events in grasses.Sampling and methods are very well chosen and documented.Authors did and excellent selection of figures that support the main text.The manuscript presents a detailed literature compilation.
I am not a native speaker; however, in my opinion, some minor languages issues need to be addressed.Specially, pay attention to large sentences.
Reviewer #3: Remarks to the Author: The manuscript presents a significant contribution to the field of evolutionary biology, specifically in the realm of phylogenomics and whole-genome duplications (WGDs) in Poaceae.The findings are of considerable value to the scientific community, shedding light on lineage-specific WGDs, gene sequence evolution, and the implications of lineage-dependent duplicate retention and losses.The use of syntenic analyses to correlate gene duplicates from the same WGD event is a commendable approach.However, there are notable areas that require attention, including the need for subheaders to help clarify the paper's outline.The manuscript holds great promise and significance but requires revisions to address the outlined concerns.The incorporation of suggested improvements will contribute to the overall quality and clarity of the research.
General comments: 1. Adding subheaders would enhance the manuscript's organization, providing better guidance for readers.Additionally, the outline of the paper is not entirely clear.A more structured presentation of the research would improve the overall readability.A proposed revision involves incorporating dedicated sections that accentuates some of the newly described WGD events which are discussed at length.These sections could encompass detailed methodologies employed to confirm their placements, coupled with an exploration of intriguing gene retention patterns observed during these events.2. The manuscript makes excessive use of acronyms, which may hinder comprehension for readers.It is especially difficult to keep track of the HB, WB, TWB, NWB, and PWB when discussing the ABCD genomes.3. The manuscript should explicitly address how the current work builds on the authors previous paper on phylogeny, especially concerning the section on polyploidy.Providing this connection will enhance the contextualization of the research within the existing literature.4. I really appreciate the authors detailed explanation of the PROSOL genes, but I was wondering if this could possibly be condensed to a table with a shorter summary in the main text.Currently, the paper is over 30 pages, and this could be a way to shorten it.

Specific comments:
Line 15 -the first sentence of the abstract is hard to follow, please consider breaking into multiple sentences.
Line 43 -I don't think the authors mean molecular dating placed the WGD events, but rather some other analyses did?Line 88 -"Additionally, rice and other Oryza species collectively have 11 reported genome types (six diploids and five allotetraploids); furthermore, domestic and wild Oryza species that have adapted to different aquatic environments."-this sentence could be broken up for clarity.
Line 97 -Can the authors please clarify what they mean "successive species phylogenetic positions" Line 129 -this paragraph ends with recently and the next paragraph starts with the same word.
Line 155 -Can the authors please clarify what they mean by "For those WGDs that are supported by GDs with gene duplicates from in sequenced Poaceae genomes, we further estimated the number of GDs with detected duplicates in syntenic blocks (collinear genomic regions)."Line 216 -I am not sure that Ks analyses can be referred to as molecular dating.I may be incorrect, but they are all relative, not absolute, so perhaps avoiding calling them molecular dating would be more clear.Line 968 -Specie should be species Line 969 -"The rate was compared to that inferred from null simulation which assume no of WGD occurred."Line 978 -I think a word is missing : "is likely similar to the Node-Ks approach in previous studies" Figure 1 legend: Why is the name Poaceae in green if the branches are in Red? Same with Poales?Maybe also add an explanation for δ18O (‰).Reviewer #1 (Remarks to the Author): Zhang and his colleagues presented a very comprehensive profiling of the Whole-Genome Duplications (WGDs) events in Poaceae.They also reported the detailed WGD analysis in woody bamboo and Oryza, and finalized their work by investigating lineage specific differential retention of rho-derived duplicates and their potential functional consequences.Overall, this is a nice work that offers many novel evolutionary insights into Poaceae.One of my major concerns was that the manuscript was too long to effectively capture the key points.For example, the authors spend six pages, three sub-titles to report the major finding of the paleo-polyploidization (called kappa) event in woody bamboo, and most of the analysis in these sections were to test the topology of GDs in different lineages that supported this kappa polyploidization.I suggest the authors to re-organize these sections and simplify their writings to make this manuscript more readable.
Response: We thank the reviewer for the positive feedback.We have greatly reduced the length of descriptions in our main text, including the description about results in supplemental figures, and moved some of those descriptions to the legends of the supplemental figures to help readers to better understand our results.In particular, we have dramatically reduced the length of the part on kappa analyses.I also have some further comments below: 1. Line 224-229, the authors applied Ks analysis to date the time of WGDs.They used the example in Ischaemum, with the Ks of WGD paralogs, orthologs between Ischaemum species and orthologs between Ischaemum and Eulaliopsis being 0.1144, 0.0599 and 0.1184, respectively.In my opinion, these data suggested that the WGD in Ischaemum lineage happened soon after its split with Eulaliopsis, so I cannot understand the author's conclusion in Line 255 that the duplications in a burst was from a single event?
Response: We are sorry for the confusion here.Our Ks comparisons agree with the reviewer's judgment in dating the WGD in Ischaemum lineage.More specifically, a peak in the Ks-plot of paralogs suggests a cluster of gene duplications at approximately same time in Ischaemum lineage.These gene duplications likely derived from a single WGD event soon after its split with Eulaliopsis.
To clarify the rationale of the Ks analyses, we have revised the first few sentences of this paragraph as: "The Ks among paralogs has been widely used as a correlate of relative time for the divergence of paralogs; when Ks values form a peak in a distribution, the corresponding GDs are considered to be in a cluster near a specific time and used as support for WGDs 1,2 .For example, the OneKP study has used detection of Ks peaks among paralogs from separate analyses of sequences of 99 single species as support for 99 WGDs in plants 2 .Thus Ks was analyzed for paralogs identified here (see methods) and Ks peaks shared by multiple species were observed, providing additional support for WGDs from the Tree2GD analyses (Figs.S17-S23; Table S3)."(Lines 173-178) We have also revised the sentence for our analyses in the manuscript as follows.
"In particular, the Ks peak of paralogs from a proposed WGD in a focal species is expected to have a higher value than that of orthologs between the focal species and its closely related species that also shares the WGD, and lower than that of orthologs between the focal species and an outgroup species, which diverged before the WGD event.For example, the newly proposed WGD for Ischaemum (WGD#13; Fig. S22; Table S3) is supported by the Ks peak value of 0.1144 for paralogs mapped at the MRCA of two Ischaemum species; this Ks value is higher than the Ks peak value (0.0599) of orthologs between the two Ischaemum species, but lower than the Ks peak value (0.1184) between I. aristatum and the outgroup Eulaliopsis binata."  2. The authors used the gene sequences from both genome assemblies and transcriptome assemblies to perform this phylogenomic study.As the authors have stated (Line 661), the completeness of gene annotation may affect the examination of TDs.Similarly, the completeness of transcript assemblies due to the limited RNA-seq data (Table S2) may also create bias during the identification of GDs.It's a good opportunity to evaluate the strength and weakness of using transcript assemblies to identity GDs, or at least make some discussions.By the way, I also strongly suggest the authors to divide the Results and Discussions into two different sections, and put some discussion points that currently mixed with results into the new Discussions section.
Response: We thank the reviewer for this valuable suggestion.We provided more details of our revision as below.
For discussing the GD detection using genomes and transcriptomes, we added several sentences in the manuscript (Lines 655-665) as below.
"In addition, GD detection using genomes and transcriptomes can increase the number of GDs at early branches in species-tree, because transcriptomes from the early diverging species can help to mapping gene pairs to more ancient positions.For example, among 1633 GDs mapped at Poaceae using 15 genomes, 9 transcriptomes and 2 genome-skimming datasets (Fig. S26A), 1010 and 728 GDs were shared by genes from the Anomochlooideae species Streptochaeta angustifolia (genome sequenced) and S. spicata (transcriptome sequenced), respectively.Genome-skimming sequenced datasets of the Puelioideae species (Puelia ciliata and Guaduella oblongifolia) provide genes that shared 19~36 GDs of those 1633 GDs.Genomes tend to contribute to more duplicates (423~1018 GDs) than transcriptomes (313~728 GDs) and genome-skimming datasets with incomplete sequence and annotation.Sequenced genomes also allow comparison of gene orders of paralogs on chromosomes and hence provide strong GD evidence for WGD and SSD events.Integration of genomes and transcriptomes from basal lineages to core branches can provide GD clues for understanding gene and genome evolution."Moreover, we divided the original section of Results and Discussions into two different sections.
3. Line 744, the authors carried out the analysis of rho-derived PROSOL genes, and reported the lineage specific PROSOL genes in Panicoideae, Pooideae, Bambusoideae, and Oryzoideae.These analyses were based on the genes that clustered into Orthogroups.Since lineage specific WGDs after rho were common in all these four subfamilies, my question was how to exclude the misidentification of rho-retained duplicates, which might actually come from the reciprocal loss of duplicate genes after lineage specific WGDs.For example, the FIE1 and FIE2 genes were reported to come from the reciprocal loss of one of their paralogs after maize WGD (Swigonova, et al., Genome Res, 2004.).
Response: We sincerely thank the reviewer for these comments to help us improve our analyses and exclude the misidentification due to lineage-specific duplication.We have performed new gene tree analyses and updated our results in main text and Figures 6 and 7. We also newly added supplemental Fig. S50-S53 and Table S5.
The two FIE genes in maize were reported to be two closely linked paralogous sequences (Swigonova, et al., Genome Res, 2004).To investigate the history of the FIE1/2 genes using multiple grass genomes, we performed phylogenetic and synteny analyses of the grass FIE genes (including both maize FIE1 and FIE2) and found that all 7 subfamilies retained the same rho-derived single copy, suggesting that their ancestor had lost the other rho-derived copy.We further found that Panicoideae (including maize and sorghum) and four other (Chloridoideae, Pooideae, Bambusoideae, and Oryzoideae) subfamilies underwent subfamily-specific duplications of FIE genes (Figure S50).The FIE1/2 gene family is a good example of the lineage-specific duplicates, which should not be mis-identified as rho-derived paralogs.Our newly performed gene tree analyses allowed the identification of lineage-specific duplication in 5,666 Poaceae gene families, which have been classified as having only one rho-derived copy; the gene family analyses also verified that 2,758 other orthogroups indeed retained both rho-derived paralogs.We have revised the context relevant to it in our new manuscript as shown below.
Lines 556-579: "We investigated differential retention/loss patterns of rho-derived gene pairs for Poaceae subgroups.First, we detected orthogroups supported by syntenic genes from 24 sequenced grass genomes, with pineapple as an outgroup species in Poales (Fig. S48A, S48B).To obtain further support for rho-derived duplicates, we integrated our phylogenomic results (Fig. S3) and the synteny results by identifying the synteny blocks that contains at least one gene pair belonging to an orthogroup with a GD mapped at the MRCA of Poaceae or one of the early nodes with multiple subfamilies (C1-C3 in Fig. 2).To illustrate this analysis, an example of synteny blocks is shown in Fig. 6A (see details in Fig. S49A), the #6 genes correspond to a GD mapped at Poaceae in the gene tree (Fig. S49B), supporting the gene pairs (pink) in the syntenic block being from rho.The gene trees of the orthogroups were reconciled with species-tree to estimate the retention and loss events after rho in different subfamilies (see methods and supplemental Note) and the results revealed that 6,147 orthogroups retained a single copy at Poaceae and 2,758 orthogroups retained in pair (Figs.6B and S48B, Table S5).Among the 6,147 orthogroups, 5,666 experienced subsequent lineage-specific duplication in at least one subfamily (Type-I; Fig. 6C, Table S5); one such orthogroup contains the fertilization independent endosperm (FIE) genes 73 with a duplication in Panicoideae (e.g., maize FIE genes [Zm00001d049608_T001(FIE1) and Zm00001d024698_T001(FIE2)]; Fig. S50).Other instances of Type-I orthogroups include the TASSELSEED2 (TS2) 74 , DWARF53 (D53) 75 , COLD1 76 , and NAC78 77 (see details of examples in Table 1).For the remaining 481 of the 6,147 orthogroups, no more than one copy was detected in grass species (Type-II; see orthogroups in Table S5 and a specific gene tree in Fig. S51).Among 2,758 orthogroups with two rho-derived copies (Fig. 6C, Table S5), 128 (Type III) have two copies in each of the four subfamilies (Fig. S52); whereas 2,630 (Type IV) have lost one or two detected copies in at least one subfamily (see gene examples in Table 1).Among the Type IV orthogroups, we identified four patterns (IV-1 through IV-4 in Fig. 6D).Specifically, 1991 have two or more subfamilies with one detected copy (IV-1 and IV-2), including 578 that exhibit reciprocal loss of paralogs in different subfamilies (IV-2; see an example in Fig. S53).Among the 2,758 orthogroups with two copies in at least one subfamily, 565 show possible reciprocal loss of rho-derived duplicates between species within an individual subfamily (Table S5).These detected patterns should be further tested by including more high-quality genomes from different subfamilies of Poaceae and other families of Poales." And on lines 605-609: "For convenience, we refer to a PROSOL specific to each of Panicoideae, Pooideae, Bambusoideae, and Oryzoideae, respectively, as PROSOL-Pa, PROSOL-Po, PROSOL-Ba, PROSOL-Or.Our examination of the above-mentioned 2,630 orthogroups in Type-IV uncovered 19 PROSOL-Pas, 8 PROSOL-Pos, 36 PROSOL-Bas, and 18 PROSOL-Ors, possibly representing subfamily-specific (or lineage-specific) retention (Fig. 7A; see representative genes in Table 2)." Minor: 1. Line 234, "WGD dating can be affected by the ingroup species that have evolved more quickly (with greater mutation rates) than outgroup species".Both the acceleration or decline of mutation rate may affect the estimation of molecular dating.
Response: Thanks for the suggestion.We have revised the context relevant to it in our new manuscript  as below."To further estimate the difference for rho, we surveyed the evolutionary rate (estimated by branch length) between species and the Ks value of retained paralogs from each species.Our results indicate that Ks values are positively correlated with the total branch length from the Poaceae MRCA to tips (Coefficient: 0.89, p-value =1.21e-08) (Fig. S24).Hence, WGD dating can be affected by the different evolutionary rates of species, including the accelerated (e.g., Panicoideae species) or reduced (e.g., Bambusoideae species) mutation rates.Thus a higher Ks peak value in a rapidly evolving lineage after a WGD compared to the Ks peak value an outgroup that diverged before the WGD could incorrectly place a WGD at an earlier node.". 2. Line 155, a typo of "…GDs with gene duplicates from in sequenced Poaceae genomes" Response: We thank the reviewer for pointing the typo error out.We have revised this sentence as ").For the WGDs supported by gene duplicates from sequenced Poaceae genomes, we further estimated the number of GDs with detected duplicates in syntenic blocks."(Lines 146-147).
3. Line 582, a typo of "…pruned the gene trees to by removing putative paralogs" Response: Thanks for the suggestion.We have revised this sentence as "…pruned the gene trees by removing putative paralogs from duplications before the Oryza diversification (see methods)."(Lines 447-448).
Reviewer #2 (Remarks to the Author): Dear Editor I have revised the manuscript titled Phylogenomic Profiles of Whole-Genome Duplications in Poaceae and Landscape of Differential Duplicate Retention and Losses among Major Poaceae Lineages by Taikui Zhang, Weichen Huang, Lin Zhang, De-Zhu Li, Ji Qi, and Hong Ma.Whole-genome duplications (WGDs) are important drivers of angiosperms evolution and diversification.Previous studies reported three WGDs in the ecologically and economically important grass (Poaceae) family; however, these studies included a limited number of species, suspecting that there are more WGDs that has not been discovered yet.In the present work, authors performed phylogenomic/phylotranscriptomic analyses of 363 grasses covering all 12 Poaceae subfamilies to detected strong evidence for new WGDs, and other Gene Duplication clusters (GDs) and explore their possible roles in adaptive evolution and species divergence.
The paper is well written and I enjoyed the combination of methods to address, with confident, new events of WGDs and calibrate the timing of such events.From my perspective this is a very interested paper that significantly contribute with the knowledge of genome duplications presenting a whole picture of several duplication events in grasses.Sampling and methods are very well chosen and documented.Authors did and excellent selection of figures that support the main text.The manuscript presents a detailed literature compilation.
I am not a native speaker; however, in my opinion, some minor languages issues need to be addressed.Specially, pay attention to large sentences.
Response: We greatly appreciate this positive comment.We have converted some long sentences to shorter ones and made other language improvements in the revised manuscript.
Line 538 -Triplett et al. 2014 is not formatted correctly.Line 694 -I don't think ROS has been defined.Line 826 -BOP and PACMAD clades have not been defined.Line 963 -How does the Asterids project connect with this work?