Genomic insights into rapid speciation within the world’s largest tree genus

Supplementary Note 1. Description of the Smithsonian Institution (SI)-ForestGeo longterm forest dynamics plots. One of the fundamental aspects in forest ecology is to study the structure and dynamics of tree communities in situ, and the functions and services of forest ecosystems. In order to pursue this long-term ecological research agenda, SIForestGeo has established a worldwide network of permanent forest dynamics plots ranging in size from 2-ha to 120-ha1. Trees in the permanent plots ≥ 1 cm diameter at breast height have been tagged and identified and re-censusing is carried out for these plots every five years to document species gained or lost. In this study, we sampled as many identified and undetermined Syzygium spp. as possible from the two long-term ecological plots listed below.

The Danum Valley Conservation Area is located in the interior of the Malaysian state of Sabah, on its east coast; it covers an area of 438 km 2 encompassing undisturbed lowland dipterocarp forest. A total of 24 Syzygium spp. is recorded from the 50-ha ForestGEO plot, although some taxa require further attention due to the lack of flowering and fruiting materials needed for species identification..

Supplementary Note 2.
States for three morphological characters -specifically (i) inflorescence habit (erect vs. pendent), (ii) shedding fused corolla present as a true calyptra, a pseudocalyptra, vs. corolla free at anthesis, and (iii) mature fruit colour (green, white or cream, black, pink, purple, red, brown, orange, yellow, blue, or grey) -were gathered from living material, herbarium specimens, published flora accounts, and species protologues.
(i) Inflorescence habit of Syzygium can in general be categorised into two groups based on the orientation of the inflorescences being displayed and presented on branchlets. The group of taxa with erect inflorescences generally have inflorescences presented in an upright position, possibly influenced by pollination syndromes. The other group of taxa have pendulous inflorescences.
(ii) The Syzygium perianth can be divided into two main categories, the first category based on morphological traits of the calyx, and the other based on the corolla. In general, calyx lobes are free (Supplementary Figure S18; B1, B2, C1 and C2), but they can also be fused in the bud and eventually split free into equal portions along a suture as the stamens expand (Supplementary Figure S18; D1 and D2: S. fibrosum). Calyx lobes have also been recorded as fused into a true calyptra that tears irregularly as the stamens expand (Supplementary Figure S18; A1 and A2: S. paradoxum). Apart from the calyx, petals have been recorded to be free, spreading, and persistent, such as in S. grande (Supplementary Figure S18;C1 and C2). However, the corolla can also form a pseudocalyptra, in which the petals are tightly folded above the stamens to form a cap that tears along the attachment at the base, the cohered petals shedding like a calyptra at anthesis, as seen in S. cumini.
(iii) Mature fruit colour of Syzygium is extremely diverse and broad in colour spectrum, ranging from very bright hues to dark-coloured fruits that maximise visual detection by specific dispersers (Figs. 4D-E). It has been shown that fruits dispersed by birds tend to be in the red part of the spectrum, while mammalian dispersed fruits display the green part of the spectrum.

B. Supplementary Figures
Supplementary Figure S1. Resequenced MaSuRCA assemblies, coloured by BUSCO completeness; left y-axis (histogram) is assembly size, right y-axis (black line) is contig N50. for Syzygium grande with coloration by Ks (histogram) reveals internal paralogy suggesting one ancestral WGD in Syzygium following the gamma paleohexaploidy event. The analysis can be regenerated at https://genomevolution.org/r/1gh12.
Supplementary Figure S3. A SynMap syntenic dotplot (left) for Syzygium grande:Vitis vinifera with coloration by Ks (histogram) reveals a 2:1 relationship, indicating one WGD in Syzygium following its species split with Vitis, which otherwise only contains the ancient gamma hexaploidy event. The medium-blue blocks are 2:1 syntenic orthologs, and the more dispersed and fractionated cyan blocks are syntenic paralogs dating from the ancient gamma event. The orange peak represents irrational Ks values from poor CDS alignments. The analysis can be regenerated at https://genomevolution.org/r/1i4rm.
Supplementary Figure S4. A SynMap syntenic dotplot (left) of Syzygium grande against Eucalyptus grandis with coloration by Ks (histogram). The violet blocks are 1:1 syntenic orthologs, and the more dispersed and fractionated cyan blocks are syntenic paralogs. The orange peak represents irrational Ks values from poor CDS alignments. The analysis can be regenerated at https://genomevolution.org/r/1i4rm.
Supplementary Figure S5. SynMap syntenic dotplot (left) of Syzygium grande against Punica granatum with coloration by Ks. The pink blocks are 1:1 syntenic orthologs, and the more dispersed and fractionated cyan blocks are syntenic paralogs. The orange peak represents irrational Ks values from poor CDS alignments. The analysis can be regenerated at https://genomevolution.org/r/1hxo0.
Supplementary Figure S6. FractBias mapping of the Populus trichocarpa genome against Syzygium grande shows a 2:2 relationship between the species, confirming independent polyploidy events in the two lineages. This analysis can be regenerated at https://genomevolution.org/r/1ig9q.
Supplementary Figure S7 (separate file). Tanglegram comparing the BUSCO-and SNPbased phylogenetic trees. The R package dendextend 3 (version 3.5.1) was used. Specifically, the tanglegram() function was used. Branches that contribute to unique subtrees are marked with black dotted lines; incongruent relationships are shown with red lines. The BUSCO species tree and genome-wide SNP tree are both well-resolved, with robust support throughout. Five major clades, Syzygium subgenus Acmena, S. subgenus Perikion, S. subgenus Sequestratum, S. subgenus Syzygium, and a yet-to-be-named clade (S. cf. attenuatum-rugosum-SULAWESI2 clade), are consistently present in both trees. Minor discordances between these two phylogenies are present, but these do not affect the positions of the five clades recognised in this study.
Supplementary Figure S8 (separate file). Tanglegram comparing the plastome-and BUSCO-based phylogenetic trees. The R package dendextend 3 (version 3.5.1) was used. Specifically, the tanglegram() function was used. Branches that contribute to unique subtrees are marked with black dotted lines; incongruent relationships are shown with red lines. These results show that Syzygium phylogeny inferred from plastome data is wellresolved when compared to a previous study 4 , but still with some internal and external branches having low bootstrap supports distributed throughout the tree. Five major clades are recognised on our plastome tree, namely Syzygium subgenus Acmena, S. subgenus Perikion, S. subgenus Sequestratum, S. subgenus Syzygium, and a yet-to-be-named clade (S. cf. attenuatum-rugosum-SULAWESI2 clade), while the placement of Syzygium wesa is poorly supported, although robustly nestled in the S. subgenus Acmena clade in both the BUSCO genes and genome-wide SNP trees. The most significant finding is that relationships within Syzygium subgenus Syzygium, the largest of the five clades recognised, are well-resolved. One disparate placement is the position of Syzygium jambos; in the plastome tree, S. jambos is embedded in a clade otherwise comprised of all Syzygium buxifolium taxa, whereas in the BUSCO genes and genome-wide SNP trees, S. jambos is sister to S. filiforme in a small clade that comprised of 11 other Syzygium individuals. One possible explanation for the incongruent placements for S. jambos and S. wesa on the plastome tree against the nuclear inferred trees could be chloroplast capture through ancient hybridisation events, or possibly even deep ILS.
Supplementary Figure S10. Time-calibrated BUSCO species tree for the 292 resequenced Myrtaceae accessions with ADMIXTURE results for K=14.
Supplementary Figure S11. ADMIXTURE cross-validation scores for dataset FRSA1 indicate that K=14 is the best supported number of clusters. Supplementary Figure S14 (separate file). PCA analysis of alternatively-filtered SNP datasets.
Supplementary Figure S15 (separate file). PCA analysis of the Syzygium grande group.
Supplementary Figure S16 (separate file). Biogeographic reconstruction for the genus Syzygium based on RASP software.
Supplementary Figure S17 (separate file). Biogeographic reconstruction for the genus Syzygium based on BioGeoBEARS software.
Supplementary Figure S18. Perianth traits used for morphological character-state optimisations with Mesquite; see Supplementary Note 2 for details.