Genomic and epigenomic variation in Psidium species and their outcome under the yield and composition of essential oils

Diploid and polyploid species derived from the euploid series x = 11 occur in the genus Psidium, as well as intraspecific cytotypes. Euploidy in the genus can alter the gene copy number, resulting in several “omics” variations. We revisited the euploidy, reported genomic (nuclear 2C value, GC%, and copy number of secondary metabolism genes) and epigenomic (5-mC%) differences in Psidium, and related them to essential oil yield and composition. Mean 2C values ranged from 0.90 pg (P. guajava) to 7.40 pg (P. gaudichaudianum). 2C value is intraspecifically varied in P. cattleyanum and P. gaudichaudianum, evidencing cytotypes that can be formed from euploid (non-reduced) and/or aneuploid reproductive cells. GC% ranged from 34.33% (P. guineense) to 48.95% (P. myrtoides), and intraspecific variations occurred even for species without 2C value intraspecific variation. Essential oil yield increased in relation to 2C value and to GC%. We showed that P. guajava (diploid) possesses two and P. guineense (tetraploid) four copies of the one specific TPS gene, as well as eight and sixteen copies respectively of the conserved regions that occur in eight TPS genes. We provide a wide “omics'' characterization of Psidium and show the outcome of the genome and epigenome variation in secondary metabolism.

Interspecific variation in nuclear genome size was confirmed by mean nuclear 2C values of the Psidium species (Table 2). Considering the lowest mean 2C value = 0.93 pg for P. macahense and highest 2C = 4.99 pg for P. gaudichaudianum, we realize a variation equivalent to 2C = 4.06 pg more nuclear DNA. Additionally, the individual nuclear 2C values also show the interspecific variation in nuclear genome size, reaching 2C = 6.50 pg more nuclear DNA, since one access of P. guajava has 2C = 0.90 pg and one access of P. gaudichaudianum has 2C = 7.40 pg.
In addition to interspecific variation, the individual values point to intraspecific variation of the nuclear 2C value, including among relatives. Accesses belonging to P. guajava (2C difference = 0.13 pg among individuals), P. guineense (2C = 0.20 pg) and P. acidum (2C = 0.08 pg) showed less variation of the nuclear 2C value. These nuclear 2C values differences are lower than the 1Cx value of P. guajava (0.475 pg) and P. oblongatum (0.490 pg) determined considering the basic chromosome number (x = 11) of the genus Psidium and the ploidy level of these species (2n = 2x = 22 chromosomes -diploid 8 ). Therefore, the intraspecific variation found among accesses of P. guajava, P. guineense and P. acidum is probably a consequence of secondary metabolites that interfere with the intercalation of the propidium iodide fluorochrome to DNA in the staining step for nuclear suspension preparation for flow cytometry.
We also observed intraspecific variation of the nuclear 2C value among individuals, as well as in the relatives of P. myrtoides, P. cattleyanum and P. gaudichaudianum. The nuclear 2C value difference was: 2C = 0.40 pg between individuals of P. myrtoides, 2C = 5.03 pg of P. cattleyanum, and 2C = 2.76 pg of P. gaudichaudianum. For these species, the nuclear 2C value differences are close to or higher than the reference 1Cx value (1Cx = 0.475 pg P. guajava-1Cx = 0.490 pg P. oblongatum) at the basic chromosome number x = 11 of Psidium. Therefore, these values indicate that individuals of these species have different 2n chromosome numbers among each other, possibly arising from numerical chromosomal changes (euploidy and/or aneuploidy).  (Table 2) and the data reported for diploid and polyploid Psidium species (Table 1, http:// ccdb. tau. ac. il/ search/, https:// cvalu es. scien ce. kew. org/ search/ angio sperm), we suggest that accesses of the species P. acidum, P. rufum, P. friedrichsthalianum and P. gaudichaudianum are potential polyploids. Thus, individuals of these species probably have more than 2n = 2x = 22 chromosomes. However, chromosome number counting should be conducted to confirm the ploidy level.
Six groups were identified in relation to the individual nuclear 2C value (Table 3). Group I comprises the individuals with the smallest nuclear 2C value (2C = 0.90 pg to 2C = 1.10 pg), which include the diploid species  (Table 1), as well as the only individual from group V of P. cattleyanum. Group VI includes the three individuals with the highest 2C values (2C = 6.83 pg-2C = 7.40 pg), two from P. cattleyanum and one from P. gaudichaudianum. Probably the accesses in group VI have a ploidy level higher than octaploid (2n = 8x = 88 chromosomes).
Psidium cattleyanum showed the highest intraspecific variation of the nuclear 2C value, exhibiting individuals in five of the six groups. Differently, less intraspecific variation was confirmed for the diploid species P. guajava (all individuals in group I), for the tetraploids P. guineense and P. acidum (group II) and for the hexaploid P. myrtoides (group III, Table 3).
Seven groups were obtained from comparative GC% analysis. Psidium cattleyanum, P. guajava, P. guineense, P. myrtoides, P. gaudichaudianum, P. guajava × P. guineense hybrids, and Psidium sp. showed individuals in at least two groups, evidencing GC% intraspecific variation. Groups III-VI consisted of diploid and polyploid species. Only one family of P. cattleyanum showed greater stability being allocated only to group VII (Table 3).

5-mC%, yield and chemical composition of the essential oil. We compiled in Supplementary
Table S3 the unpublished and published values, for each Psidium individual, of 5-mC%, yield, and the percentage of the chemical compounds identified from the essential oil. 5-mC values varied between 16.34% (P. guajava) to 33.30% (P. myrtoides), and between 0.20% (P. guajava) to 0.95% (P. cattleyanum) for yield of the essential oil. A total of 56 compounds were identified for Psidium species. Of these compounds, 55 are chemically classified between hydrocarbons and oxygenated mono-and sesquiterpenes. TPS gene copy number. Copy number of TPS genes was determined in diploid P. guajava (2n = 2x = 22 chromosomes and 2C = 0.95 pg) and tetraploid P. guineense (2n = 4x = 44 chromosomes and 2C = 1.90 pg). We expected the copy number to be exactly double in the tetraploid P. guineense. The genes showed distinct marking patterns when comparing the hybridization signal number in relation to the ploidy of the species. For the specific region used, we detected two hybridization signals in P. guajava and four in P. guineense nuclei (Fig. 1).
From the general primer of the conserved motives, the P. guajava nuclei showed four strong signals and four weak signals. P. guineense nuclei exhibited eight strong and eight weak signals (Fig. 1). The presence of weak signals was considered as DNA sequences that have a relatively homology in relation to the probe, corresponding to regions that had the same origin, but that accumulated differences in the gene sequence. In addition, the weak hybridization signals can be resulted by occurrence of a single gene copy. Differently, the strong hybridization signals correspond to gene copies in tandem repeats, which form clusters that amplify the fluorescence signal.
Therefore, we confirmed that the copy number of the TPS genes was directly related to the ploidy level of the species with double of signals found in the tetraploid P. guineense. This result also shows the genome evolution of these Psidium species.

Discussion
We report the nuclear 2C and GC% values, TPS gene copy number, as well as 5-mC% and essential oil yield and composition of a considerable number of Psidium species and individuals. The GC% values, with the exception of P. guajava 12 , are unpublished for all Psidium, as well as the nuclear 2C values for the P. acidum, P. gaudichaudianum, P. friedrichsthalianum, P. macahense and interspecific hybrids (P. guajava L. × P. guineense). 5-mC% are also unpublished for P. cattleyanum, P. guineense, P. myrtoides, P. gaudichaudianum, P. friedrichsthalianum, P. oblongatum and one access of the genus. In addition, the determination of the TPS gene copy number evidenced other genomic outcomes of the polyploidy beyond the nuclear 2C and GC% interspecific variation. The results bring advances about the structure, organization and evolution of the genome and epigenome of Psidium species in inter-and intraspecific contexts, including analyses of related individuals. The genomic and epigenomic data were contextualized with the yield and diversity of compounds in the leaf essential oils, which are rich in mono-and sesquiterpenes that have ecological and economic importance for the Myrtaceae family.
Based on the previous study of the our research group and the basic chromosome set of Psidium (x = 11), the 1C value varied from 0.465 pg (P. cauliforum-diploid species) to 0.640 pg (P. longipetiolatum-octaploid species), and the increase in ploidy culminated in the increase in nuclear DNA content 8 (Table 1). Thus, the 2n chromosome number variation in this species can also be supported by the 2C nuclear value variation. Furthermore, we infer that euploidy occurs in the genus not only in P. cattleyanum, but also in P. gaudichaudianum and P. friedrichsthalianum due to the amplitude of 2C nuclear value variation. The oosphere and the reproductive nucleus of the pollen grain are usually reduced reproductive cells (nhaploid). However, non-reduced and/ou aneuploid reproductive cells can be formed due to errors in anaphase I or II during meiosis, by the non-disjunction of chromosomes 32 , and/or by the non-occurrence of cytokinesis I or II. Thus, reproductive cells with different euploidy and/or aneuploidy can be generated. Individuals of P. cattleyanum and P. gaudichaudianum, including relatives, showed expressive variations in nuclear 2C value (possibly reflecting the 2n chromosome number) that may have resulted from the unilateral or bilateral fusion of reduced and unreduced reproductive cells. The unreduced reproductive cells may come from one (unilateral) or both parents (bilateral) in cross-fertilization or self-fertilization 33 . In addition, numerical chromosomal variations (euploidy and aneuploidy) can occur in cells of meristematic regions resulting in mixoploid tissues and/or individuals. Thus, the male and/or female reproductive organs of the flowers can have meiocytes with different chromosome numbers compared to the sporophyte. In this context, it is important to highlight the occurrence of a mixoploid individual (Hib_11), not yet reported in Psidium. The mixoploidy can compromise the stability and fertility of plants in the field and, thus, the use of these plants for breeding purposes is not very desirable 34 .
In general, polyploid species of Psidium present a greater geographical distribution compared to diploids 8 , with the exception of P. guajava because it is widely exploited and cultivated and, therefore, present in the most varied regions and biomes 35 . This fact was pointed out by our research group, considering species P. guajava, P. guineense, P. myrtoides, P. cattleyanum, P. longipetiolatum, P. oblongatum and P. cauliflorum 8 . In addition, the geographical distribution of P. cattleyanum cytotypes is influenced by the ploidy level. P. cattleyanum cytotypes with higher ploidy levels were identified in regions where the environmental conditions are more adverse, with higher temperatures, higher incidence of solar radiation and lower precipitation 9,15,36 . Therefore, the polyploid condition of the species studied here, may be favorable for expansion of their geographic distribution, both by natural and anthropic action. Hence, exploitation and utilization of these natural resources is relevant for breeding programs and for familiar production.
We verified inter-and intraspecific variations of GC% for the diploid and polyploid species. Although variations occurred, the overall mean 38.92% CG in Psidium is close to the mean value of 38.06% obtained by means of 22 diploid Eucalyptus species and three of the genus Corymbia, also species of the Myrtaceae family 37 .
Due to 2C value and GC% variations in Psidium, especially in families, and their influence on secondary metabolism, we suggest, in a practical context, the individual pre-selection of plants to compose an experimental project, breeding program, germplasm bank or cultivation. In this sense, we recommended that the pre-selected accesses or individuals of Psidium should be vegetatively propagated, generating new individuals with the same 2n chromosome number, 2C nuclear value and GC% (genomic stability). On the other hand, inter-and intraspecific genomic diversity is important as a source of genetic resources for breeding.
We verified an increase in essential oil yield in Psidium due to the larger genome, evidencing the impact of the genomic changes (2n chromosome number and 2C nuclear value) in the secondary metabolism, which is a trait of ecological and economic importance. Experimentally, the tetraploid induction (2n = 4x = 72 chromosomes) www.nature.com/scientificreports/ in Lippia integrifolia (family Verbenaceae) increases the essential oil yield compared to diploids (2n = 2x = 36 chromosomes), in addition to larger leaves and trichomes, structures related to essential oil yield 38 . Additionally, we showed by FISH that the polyploidy increases the copy number of the orthologs of two TPS genes related to essential oil biosynthesis of Psidium species. Therefore, the polyploidy, also evidenced by 2C nuclear value, affects the essential oil yield in Psidium from the diploid species (P. guajava 2n = 2x = 22 chromosomes) and the hitherto reported closest species P. guineense (2n = 2x = 44). The impact of the polyploidy in the essential oil traits can be related with the diversification and size of TPS gene family in Psidium species. The evolution of the TPS genes in the Myrtaceae family genomes have reported the largest TPS gene family in plants (Eucalyptus spp. having up to 100 genes) 26,39 and occurrence of lineage-specific pathways and products. Although the essential oil of Psidium species exhibits a great diversity in its chemotypes conditioned to environmental and genetic variations 24,27,40,41 , the evolution of TPS genes in Myrtaceae neotropical fresh fruits remain unknown. The increase of the values of 2C nuclear, CG% and 5-mC% was related to the decrease in (E)-Nerolidol and β-Bisabolol. Therefore, in addition to the genome effect (2n chromosome number, 2C nuclear value and GC%), chemical changes of the cytosine (5-mC) also influences composition of the essential oils. So, we showed the influence of the epigenetic control in the compound biosynthesis of the secondary metabolism in Psidium. The higher abundance of oxygenated sesquiterpenes was related to the occurrence of smaller genomes, with lower CG% and 5-mC%, indicating the genomic and epigenomic influence in this chemical class. In previous studies, the presence of oxygenated sesquiterpenes was clearly increased at the expense of hydrocarbons sesquiterpenes in spring in P. guajava genotypes 42 . Together, these data, which were reported for the first time, show the influence of genome and epigenome on essential oil yield and in specific compounds, suggesting for epigenetic control for terpene in Myrtaceae.

Conclusion
From genome and epigenome to secondary metabolism, we provided data about the diversity of the Psidium species. We characterize the Psidium germplasm in relation to the 2n chromosome number, 2C nuclear and GC% values, TPS gene copy number and 5-mC%, generating knowledge about species previously studied and also about others not yet evaluated. In addition, we also explore the secondary metabolism, evidence the phenotypic divergences between Psidium species and individuals, and confirm our hypothesis about the influence of the genome and epigenome. Therefore, this work provides an important characterization of the genus Psidium, bringing information and evidence that can be incorporated in further studies, especially in phenotypic responses related to characters of economic interest.

Material and methods
Plant material. We collected leaf samples from ten Psidium species: Psidium acidum (DC.) Landrum, P.  Table 1. The number of individuals of each species for each analysis is presented in Supplementary Table S1. The localization of occurrence of each access, individual identification and families are presented in Supplementary Table S2. Table S2) were used for nuclear 2C value and GC% measurements. Solanum lycopersicum L., 1753, 'Stupické' was used as internal standard (2C = 2.00 pg) 10 . 2 cm 2 leaf fragment from each Psidium germplasm and from the S. lycopersicum were simultaneously chopped 43 for about 30 s in a Petri dish containing 0.5 mL OTTO-I 44 modified for species of the Myrtaceae family (0.1 M citric acid, 0.5% Tween 20, 50 µg mL -1 RNAse, 2 mM dithiothreitol, and 7% polyethylene glycol 2000 -PEG) 37 .After adding 0.5 mL of the same buffer, the resulted suspensions were incubated for 3 min, filtered on a 30 μm diameter nylon filter (Partec) in a 2.0 mL microtube, and centrifuged at 100 xg for 5 min. The supernatant was discarded and 100 μL of the same buffer was added to the pellet, which was homogenized in vortex and incubated for 10 min. Subsequently, 0.5 mL of modified OTTO-II staining buffer (400 mM Na 2 HPO 4 H 2 O, 2 mM dithiothreitol, 50 µg mL -1 RNAse, and 75 µg mL -1 propidium iodide (PI, excitation/emission wavelengths: 480-575/550-740 nm) was added to the 10,44 . The suspensions were filtered through 20 µm nylon mesh (Partec) into tubes (Partec) and kept for 30 min in the dark. Then, the suspensions were analyzed in a flow cytometer (BD Accuri C6 flow cytometer, Accuri cytometers, Belgium) equipped with a 488 nm laser source to promote emissions at FL2 (615-670 nm) and FL3 (> 670 nm). The fluorescence peaks of the G 0 /G 1 nuclei of each access and the standard were identified in the histograms using BD Accuri™ C6 software. G 0 /G 1 peaks with coefficient of variation (CV) less than 5% were considered for nuclear 2C value measurement in pg by the formula: nuclear 2C value of the access (pg) = [(mean G 0 /G 1 peak channel of the access)*2.00 pg S. lycopersicum]/(mean G 0 /G 1 peak channel of S. lycopersicum).

Nuclear 2C value and GC%. Young leaves from each germplasm (Supplementary
For GC%, nuclear suspensions were generated following the procedure adopted to measure the nuclear 2C value with some modifications: (a) the OTTO I and II buffers were not supplemented with RNAse, and (b) the OTTO II buffer was supplemented with 1.5 μM of 4' ,6-diamidino-2-phenylindole (DAPI, excitation/emission wavelengths: 320-385/400-580 nm). The suspensions were analyzed with a Partec PAS flow cytometer (Partec GmbH, Munster, Germany), equipped with an 388 nm UV mercury arc lamp and a GG 435-500 nm band-pass filter. AT% was measured using the formula 45 %AT sample = %AT standard *[(R DAPI /R PI ) 1/r ], in which: %AT S. lycopersicum = 64.50% 11,46 ; R = ratio of the fluorescence intensity of the access/standard; r = 3 for DAPI 46 . From the AT%, the GC% was calculated by the following formula: GC% = 100-AT%. The data corresponding to the www.nature.com/scientificreports/ nuclear 2C value and GC% of Psidium accesses were submitted to clustering by the Toucher method optimized by Euclidean distance, in which the variables were separately evaluated (nuclear 2C value and GC%). The analyses were conducted in the Genes computer program 45 .
Percentage of methylated cytosines (5-mC%) in the genome. The 5-mC% data of P. guajava accesses were revisited from our previous study 47 . For P. cattleyanum, P. guineense, P. myrtoides, P. gaudichaudianum, P. friedrichsthalianum and Psidium sp. the unpublished 5-mC% was measured based on the methodology used for P. guajava 47 . Yield and chemical composition of the essential oil. We revisited the data about yield and chemical composition of the essential oil previously published by our research group for P. guajava 40,42 , P. guineense 48 and P. cattleyanum 24 . For the other accesses, the essential oil was extracted based on the methodology used for P. guajava 41 . The identification and semi-quantification of the leaf essential oil compounds were performed using gas chromatography with flame ionization detector (GC-FID QP2010SE, Shimadzu, Japan) and gas chromatography coupled to mass spectrometry (GC-MS QP2010SE, Shimadzu, Japan). For these analyses, the following conditions were adopted: the carrier gas used was He for both detectors with flow rate and linear velocity of 2.80 mL min − 1 and 50.80 cm sec − 1 (GC-FID) and 1.98 mL min − 1 and 50.90 cm sec − 1 (GC-MS), respectively; injector temperature was 220 °C at a split ratio of 1: 30; fused silica capillary column (30 m × 0.25 mm); Rtx-5MS stationary phase (0.25 μm film thickness); the oven temperature had the following programming: initial temperature of 40 °C, which remained for 3 min and then the temperature was gradually increased at 3 °C min − 1 until it reached 180 °C, remaining for ten minutes, with a total analysis time of 59.67 min; the temperatures used in the FID and MS detectors are 240 and 200 °C, respectively. The sample used was drawn from the vial in a volume of 1 μL of a 3% solution of essential oil dissolved in 95% hexane. GC-MS analyses were performed in an electron impact equipment with an energy of 70 eV; scanning speed of 1000; scanning interval of 0.50 fragments.sec − 1 and detected fragments from 29 to 400 (m/z). GC-FID analyses were performed by a flame formed by H 2 and atmospheric air with a temperature of 300 °C. Flow rates of 40 mL min − 1 and 400 mL min − 1 were used for H 2 and air, respectively.
Identification of the essential oil compounds was performed by comparing the mass spectra in relation to available in the spectrophotometer database (Wiley 7, NIST 05 and NIST 05 s) and by the retention index (RI). For the RI calculation, a mixture of saturated C7-C40 alkanes (Supelco, USA) submitted under the same chromatographic conditions as the OE was used and the adjusted retention time of each compound was obtained using GC-FID. Then, the calculated values for each compound were compared with those in the literature 49-51 . Correlation analysis. 2C value, GC%, 5-mC%, yield and content of each compound present in the essential oil were subjected to Pearson's correlation. The analysis was conducted in the R environment 39 using the package "Agricolae" (https:// CRAN.R-proje ct. org/ packa ge= agric olae).
Terpene synthase gene (TPS) copy number in P. guajava and P. guineense. We showed the polyploidy influence in the copy number of the genes involved with essential oil synthesis, the terpene synthase genes (TPS). These genes encode enzymes that act in essential oil synthesis pathways 27,39,[52][53][54] . For this, we used the sequence of genes functionally characterized and involved in the synthesis of terpenes (TPS genes), which have been described and available in database ID AB266390.1 and ID MK873024.1. Through the BLAST tool, the similarity of these sequences was evaluated in relation to the TPS genes from the P. guajava genome annotation (data of the research group). The alignments that presented a score of at least 80% were selected for the design of the primers. From this, the primers were designed in the conserved motives of these TPS genes, considering mainly exon regions. Primers were designed and evaluated using the OligoIDTAnalyzer program (IDT). We defined two pairs of primers: the first (F 5'-GGT GGG ATG TCG ATG CTA AA-3' and R 5'-CTC TTC CTC CGT AAC TCT GTA TTG 3') specific to one predicted TPS gene orthologue with an amplicon 500 pb; and a general primer pair (F 5'-CGA TTC CGG CTA CTT AGA CATC-3' and R 5'-GTT CTT CCA GCG TCC CAT ATAC-3') aligned to the conserved motifs in eight predicted TPS genes of P. guajava genome, corresponding to sequences from 415 to 502 pb.
The DNA sequences of the putative TPS were amplified from P. guajava and P. guineense genomic DNA using the primers. Amplification reaction consisted of 50 ng genomic DNA, 200 µM dNTPs, 0.5 µM each R and F primers, 1 U GoTaq enzyme (Promega), 1X GoTaq enzyme reaction buffer and 1.8 mM MgCl 2 . Amplification conditions were initial denaturation at 95 °C for 5 min, followed by 30 cycles of 95 °C for 1 min, 58 °C for 45 s, 72 °C for 1 min and a final extension at 72 °C for 5 min. The amplification products were evaluated on 1.5% agarose gel and NanoDrop. Then, DNA probes were generated for each putative gene by a second PCR reaction on the same conditions described above, differing by the labeling with Tetramethyl-rhodamine 5-dUTP (Roche) for the specific or ChromaTide Alexa Fluor 488-5-dUTP (Life Technologies) for the general. Fluorescent in situ hybridization (FISH) was performed in slides containing isolated and preserved nuclei to detect the number of hybridization signals corresponding to the TPS genes. Hybridization mix consisted of 50% formamide, 2X SSC and 200 ng of the probe. This mix was applied to the slide, which was covered with a coverslip, sealed with rubber cement and kept at 37 °C for 20 h. Post-hybridization wash was in 2X SSC at 42 °C for 20 min. Slides were counterstained with 4′,6-diamidino-2-phenylindole and analyzed on a photomicroscope Olympus BX60 equipped with epifluorescence and an immersion objective 100x/A.N. 1.4. At least 20 nuclei were scrambled for each species and for each gene using a 12-bit CCD digital video camera (Olympus) coupled to the photomicroscope and a computer with a digitizer plate. Captured images were processed by Image ProPlus 6.1 (Media Cybernetics).

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.