Deciphering phenotyping, DNA barcoding, and RNA secondary structure predictions in eggplant wild relatives provide insights for their future breeding strategies

Eggplant or aubergine (Solanum melongena L.) and its wild cousins, comprising 13 clades with 1500 species, have an unprecedented demand across the globe. Cultivated eggplant has a narrow molecular diversity that hinders eggplant breeding advancements. Wild eggplants need resurgent attention to broaden eggplant breeding resources. In this study, we emphasized phenotypic and genotypic discriminations among 13 eggplant species deploying chloroplast–plastid (Kim matK) and nuclear (ITS2) short gene sequences (400–800 bp) at DNA barcode region followed by ITS2 secondary structure predictions. The identification efficiency at the Kim matK region was higher (99–100%) than in the ITS2 region (80–90%). The eggplant species showed 13 unique secondary structures with a central ring with various helical orientations. Principal component analysis (PCoA) provides the descriptor–wise phenotypic clustering, which is essential for trait–specific breeding. Groups I and IV are categorized under scarlet complexes S. aethiopicum, S. trilobatum, and S. melongena (wild and cultivated). Group II represented the gboma clade (S. macrocarpon, S. wrightii, S. sisymbriifolium, and S. aculeatissimum), and group III includes S. mammosum, and S. torvum with unique fruit shape and size. The present study would be helpful in genetic discrimination, biodiversity conservation, and the safe utilization of wild eggplants.

The phenotypic tree in the heatmap also describes the critical conventional features that predominantly discriminate the eggplant species (Fig. 2).The phenotypic descriptors were grouped into three major clusters.Group I includes plant spread, plant height, days to flower, stem girth, and the number of leaves which showed higher variability among the tested eggplant species.Group II comprises the number of spines in leaves and stems, leaf width, and girth, However, Group III comprises 24 conventional phenotypic characters, further grouped into three Subgroups (Fig. 2).
Fruit characters such as fruit color at maturity, fruiting pattern, fruit color at harvest, fruit shape, fruit curvature, fruit calyx spine, and flower color in Subgroup1 of Group III signifies less variation among the eggplant species, which may be used for stringent selection of the unique species.On the other hand, fruit glossiness, fruit ápex shape, length of the peduncle, spines intensity, fruit stripes, leaf-spine, and flower numbers in Subgroup 2 showed moderate variation.Subgroup 3 (number of branches, plant growth habit, flower size, vein color intensity, fruit length: diameter, fruit length, stem pubescence, fruit calyx size, fruit diameter, blade color intensity) showed a minimum impact on phenotypic discrimination among the eggplant species (Fig. 2).
Principal component analysis.Principal component analysis (PCoA; Fig. 3) represented the phenotypic descriptor-wise genotypic clustering, which validates the clusters obtained from the heat map.As per the PCoA result, CHB-WEP-1 (S. wrightii) and CHB-WEP-6 (S. torvum) differed from other species in terms of plant height, stem girth, plant spread, leaf length and width, fruiting pattern, and fruit and flower color.CHB-WEP-2, CHB-WEP-3, CHB-WEP-4, and CHB-WEP-9 exhibited spiny features in the leaf and stem and discriminated from other species based on the fruit color at maturity and the number of flowers.Two cultivated eggplant species (CHB-WEP-12, and CHB-WEP-13) were categorized in the same group with larger fruit shape, size, curvature, fruit stripes, and calyx spininess.The rest of the five eggplant species plotted in the fourth quadra varied among each other with nine morphological descriptors (Fig. 3).

Correlation studies.
Pearson's correlation revealed phenotypic discriminations among the tested species at P ≤ 0.001 level of significance with a threshold value (r = 94.763)[Fig.4].The red color dots in Fig. 4 indicated the lowest, and the Green dots represented the highest correlation among the tested eggplant species.Based on the 33 phenotypic descriptors, CHB-WEP-11 (S. aethiopicum) possesses significant similarities with five species (two wild genotypes of S. melongena, S. torvum, S. macrocarpon, and S. aethiopicum (CHB-WEP-8).The wild There is a need for the incorporation of the wild genetic base to broaden the genetic trait among the cultivated species.The combined understanding of phenotypic phylogeny, principal component analysis, and correlation studies would be helpful in selecting the suitable species for augmenting trait-specific breeding strategies.
Species discrimination using DNA barcoding.www.nature.com/scientificreports/gaps indicated a distinct genetic variability among the eggplant wild relatives at the species level.In our study, the identification efficiency at the Kim matK region was higher (99-100%) than that of the ITS2 region (80-90%).
Molecular phylogeny using máximum likelihood tree.Figure 5A,B depicted the phylogenetic relationships among the wild eggplant species and the barcodes obtained from Kim matK, and ITS2 sequences, respectively.The phylogeny was established using a maximum likelihood tree (MLT) in a K2P model with bootstrap-1000.The MLTs distinctly categorized the 13 eggplant species into four major monophyletic groups.CHB-WEP-13, CHB-WEP-1, and CHB-WEP-2 were consistently categorized under Groups I, II, and III in the phylogeny at the Kim matK, and ITS2 region.CHB-WEP-10, CHB-WEP-11, and CHB-WEP-12 appeared together in one clade (cluster IV).However, ITS2 MLT confirmed the similarities of CHB-WEP-13 with CHB-WEP-4, 5, and 7.The MLT based on Kim matK locus data, CHB-WEP-1, showed similarities with CHB-WEP-3, 7, and 9. Groups I and IV are categorized under scarlet complexes S. aethiopicum, S. trilobatum, and S. melongena (wild and cultivated).Whereas Group II represented the gboma clade (S. macrocarpon, S. wrightii, S. sisymbriifolium, and S. aculeatissimum).However, the intermediate Group III includes S. mammosum, and S. torvum with unique features in fruit shape and size.
ITS2 secondary structure predictions.We have predicted ITS2 secondary structures for the 13 eggplant wild relatives (Fig. 6).The studied species showed 13 unique secondary structures with four similar helices, which implied the genetic variations among the species.Most species represented a central ring with various helical orientations regarding the loop number, position, size, and angle from the spiral.Helix I comprised of three species (S. wrightii, S. macrocarpon, and S. mammosum), helix II includes six eggplant genotypes.In contrast, helix III (S. torvum) and IV (S. sisymbriifolium) showed unique structures with multi-central rings.Helix V includes wild S. melongena (CHB-WEP-5) that predicted a unique but similar structure as predicted in cultivated S. melongena (CHB-WEP-13), which indicated their near-isogenic nature.The secondary structure prediction is important in the molecular breeding of eggplants interrogating wild relatives.The unique genetic sequences at the conserved nuclear region would also help to develop species-specific primers for the identification of wild eggplants at a faster pace.

Discussion
Phenotyping of eggplant wild relatives.Phenotyping using morphological descriptors is crucial for the preliminary identification and selection of genotypes for breeding and crop improvement 31 .In the present study, thirteen diverse eggplant species were phenotypically discriminated using the forty morphological descriptors illustrating a close relationship among the eggplant wild relatives.Among all the significant heritable descriptors, plant height, stem features, leaf characters, and fruit characters (fruit shape, size, and color) could be considered reliable traits to distinguish eggplant diversity phenotypically 32 .Leaf structural descriptors, such as the spiny features in wild eggplants, served as a potential basis for morphological discrimination among the wild and cultivated ones.The species closer to the wild eggplants support the assumption of interspecific hybrids revealing the general observation of allelic uniformity 13 .Overlapping phenotypic features in the same genus Solanum correlates with the genetic interrelationship among the wild and cultivated eggplant gene pool.Following the hypothesis of the gene pool notion, Howard et al. 32 suggested that although the wild progenitors and the cultivars possess some morphological similarities, they might differ at the genotypic/species level, which needs to be confirmed at the molecular level.
In the eggplant improvement program, S. torvum and S. mammosum are often used as a primary gene pool for interspecific hybridization against biotic and abiotic stress 33 .Similarly, introgression of Solanum incanum and Solanum lichtensteinii are accomplished for broadening eggplant genetic diversity 13 .In our study, PCoA suggests considering S. wrightii, S. torvum, and S. macrocarpon for trait-specific breeding for plant height, plant spread, stem girth, and fruiting pattern.S. sisymbriifolium, S. mammosum, S. trilobatum, and S. aculeatissimum are grouped under the leaf and stem spininess may be selected for breeding for spiny characters.S. melongena, and S. aethiopicum could be selected for breeding for better fruit characteristics.The morphological clustering in our study would help select suitable species for improving introgression breeding strategies to develop stress-tolerant eggplant species.The detailed analysis of morphological characters represents a powerful technique for analyzing the phenomic relationship among wild and cultivated eggplant species 34 .However, DNA-based molecular tools may be considered for species identification and gene bank conservation 35 .
DNA barcoding of eggplant wild relatives.Various molecular markers are often used for the genetic characterization of plant species to identify quantitative and qualitative trait-specific loci.However, the accurate identification of a species is practically complicated using taxonomic or molecular characterization.In the present study, we have efficiently used the DNA barcode markers (Kim matK, and ITS) to accurately discriminate the species of the thirteen eggplant wild relatives at the molecular level.Specific candidate barcode markers such as Kim matK (chloroplast-plastid region) and ITS (nuclear region) were often deployed for species identification in many plants 32 .Consequently, molecular barcoding approaches can provide a tool to identify novel eggplant species-specifically. Using the advancement of DNA barcode, ITS and Kim matK barcode loci efficiently discriminate Solanaceae family at species level 36 .The genotypes with significant barcode gaps may be considered for inter-or intraspecific eggplant breeding strategies.The genetic information at a particular barcode location is suitable for enhancing eggplant breeding techniques.The genotypes with fewer DNA barcode sequence gaps could be chosen for breeding eggplants with specific traits.www.nature.com/scientificreports/ITS2 secondary structure predictions in eggplant wild relatives.The RNA secondary structure can be categorized based on three main criteria: minimum free energy, a technique based on statistical value, and evaluating the nucleotide sequence 37 .The RNA secondary model presumes that RNA folding occurs in a stable structure with the lowest free energy.RNA secondary structure prediction is a novel method to elucidate RNA folding in plant cell physiology.Few studies in plant RNA structure predictions, especially those of agriculturally important crops, have been attempted.However, a genome-wide RNA structure map has been inferred in vivo using A. thaliana seedlings 38 .Expanding the findings of such methods, we focused on advancement in understanding the outline and role of RNA structure in plants.The prediction accuracy by comparing and investigating a considerable figure of homologous RNA molecular sequences of different plant species is tricky in discriminating the variation using RNA secondary structure.Identification based on the nuclear coding region of ribosomal subunit (28S and 5.8S coding region) using ITS primers is now a reliable tool for species-level specification 39 .The barcode-based molecular analysis of RNA secondary structure using ITS sequences for species evolution interferes with their target genetic loci.For correct discrimination of all the 13 genotypes of eggplant germplasm, additional information on the RNA folded model appears to be relevant in determining the divergence between all closely related eggplant variants.The complementarities accounting for the regions of the folded structure were found to be identical in domain base pairing, forming a core region by correlating it with some stem features 38,39 .The revealed order of predilection is maintained on the topology of RNA structure based on the inner loop, bulge variety loop, hairpin, and outer loop of all eggplant species.Hence, the studied relationship among the eggplant variants depends upon the prediction effect of the results of ITS sequence conservativeness in the preferred nuclear region.The species with closer barcode gaps represent the same clade suitable for inter or intra-specific eggplant breeding.

Conclusions
CHB-WEP-1 (S. wrightii) and CHB-WEP-6 (S. torvum) exhibited unique plant characteristics such as plant height, stem girth, plant spread, and fruiting pattern.CHB-WEP-2 (S. mammosum), CHB-WEP-3 (S. aculeatissimum), CHB-WEP-4 (S. trilobatum), and CHB-WEP-9 (S. sisymbriifolium) exhibited spiny features, which could be considered for the trait-specific approaches with the cultivated Solanum melongena (CHB-WEP-12, and CHB-WEP-13) possessed better fruit shape, size, and curvature, and fruit stripes.The chloroplast-plastid gene www.nature.com/scientificreports/Kim matK provided better species discrimination over the nuclear ITS2.The species discrimination was more prominent at DNA barcode regions, confirming the genotypic variations among the wild eggplant species.Kim matK could be used for the identification of new species or discrimination among large genetic populations.ITS2 secondary structure predictions depict the unique genetic configuration at the conserved 5.8S nuclear region.
Most species represented a central ring with various helical orientations regarding loop number, position, size, and angle from the spiral.This study shows the potential of DNA barcoding in discriminating eggplant wild relatives.Understanding the phenology and molecular phylogeny would be helpful for the selection of CWR for breeding strategies of eggplants.

Materials and methods
Plant materials and experimental conditions.Thirteen accessions of eggplant, including eleven wild and two cultivated species maintained at the Central Horticultural Experiment Station (CHES), Indian Council of Agricultural Research-Indian Institute of Horticultural Research (ICAR-IIHR), Bhubaneswar, India, with due approval of the competent authority following institutional guidelines and legislation, were used as the source materials for the present study.The station is located at a latitude of 20° 15′ N, a longitude of 85° 52′ E, and an altitude of 35 m above mean sea level.
Seeds of eggplant and its wild relatives were sown in pot trays containing cocopeat for germination under a naturally ventilated poly house (14 h photoperiod, 85-90% relative humidity, and temperature of 30/25 ºC day/ night).Six weeks old seedlings were transplanted to the polyethene pots (30 × 30 × 30 cm) containing garden soil, sand, and farm yard manure (1:1:1) in the polyhouse.The plants were maintained as per the recommended package of practice for eggplant.The experiment was designed with 13 genotypes and five replications in a completely randomized design (CRD).The leaf voucher specimens (CHB WEP 1-13; Table 1) of the eggplant and its wild relatives were deposited in the herbarium at ICAR-IIHR-CHES, Bhubaneswar, India, which were used as biological reference material (BRM) in the present study.

Morphological characterization.
Morphological descriptors such as plant phenotypic features, leaf phenology, floral morphology, and fruit characters were recorded as per the distinctness, uniformity, and stability (DUS) guidelines for eggplant as recommended by the protection of plant variety and farmers' rights authority (PPV&FRA) 40 , New Delhi, India.Data were analyzed using analysis of variances (ANOVA).Principal component analysis (PCoA), and heat map were illustrated using GraphPad Prism 9 (GraphPad Software, San Diego, CA, USA).The 13 eggplant species were characterized using 40 morphological descriptors at the whole plant level (Supplementary Table 1).
Genomic DNA isolation and quantification.Total genomic DNA (gDNA) was isolated from the fresh juvenile leaf tissues of the 13 wild and cultivated eggplants using GCC-WLN plant gDNA extraction kit (GSure® Plant Mini Kit with WLN Buffer, GCC Biotech Pvt. Ltd., Kolkata, India) by following manufacturer's protocol.The isolated gDNAs were quantified using a nanodrop spectrophotometer (Eppendorf, Hamburg, Germany) and checked on 0.8% agarose gel electrophoresis (Tarson, Kolkata, India).Total gDNA concentration adjusted to 50 ng µL -1 was used for PCR amplification with different barcode primers 24 .
Primer selection and PCR amplification.DNA barcode primers for the chloroplast-plastid genome (Kim matK) and nuclear gene (ITS2) were synthesized at M/S Bioserve Biotechnologies India Pvt. Ltd., Hyderabad, India.The details of the barcode primer sequences (5' to 3') are, Kim matK (3F_Kim matK: CGT ACA GTA CTT TTG TGT TTA CGA G; and 1R_Kim matK: ACC CAG TCC ATC TGG AAA TCT TGG ) and ITS2 (ITS-S2F: ATG CGA TAC TTG GTG TGA ATT ATA GAAT; and ITS-S3R: GAC GCT TCT CCA GAC TAC AAT).For each chloroplast and nuclear marker, PCR amplification was performed in a volume of 25 µL, containing 50 ng of gDNA (1 µL) as a template, 12.5 µL 2 × PCR master mix (GCC Biotech Pvt. Ltd., Kolkata, India), primers (10 pM, 1 µL each of forward and reverse primers), and 9.5 µL Milli-Q water.All PCR amplifications were performed in the thermal cycler (Eppendorf, Hamburg, Germany) following denaturation of 5 min at 95 °C, 40 cycles of 1 min at 95 °C, 1 min at 55 °C of annealing, 1 min at 72 °C and a final extension of 10 min at 72 °C.The PCR products were purified using a PCR Purification Kit (GCC Biotech Pvt. Ltd., Kolkata, India) following the manufacturer's instructions.The PCR-purified fragments were visualized in 1.5% agarose TAE gels, and the gel images were taken in the E-Box gel documentation system (Vilber, Eberhardzell, Germany).
Sequencing and bioinformatics data analyses.The purified PCR products were sequenced using Sanger sequencing (ABI Genetic Analyzer 3730, 48 capillaries, 50 cm, ABI, Massachusetts, USA) at M/S Bioserve Biotechnologies India Pvt. Ltd., Hyderabad, and the sequences were viewed in FinchTV v1.4.0.Phylogenetic analysis of the 13 eggplants was carried out by the homology search of the obtained sequences using NCBI Basic Local Alignment Search Tool (BLAST, http:// blast.ncbi.nlm.nih.gov) to identify the highest similarity of the eggplant accessions within the GenBank database 41 .Before submission of the sequences in NCBI, the analyzed forward and reverse sequences (Kim matK and ITS) were edited, trimmed, and contig formation was done using SnapGene v 5.3 (https:// www.snapg ene.com/).The nucleotides were BLAST, and the selection of the species was made based on the maximum similarity score, per cent identity (above 80%) and lowest E value after significant sequence alignment.The barcode gaps were manually edited in a pairwise alignment view using BLAST 42 .To obtain their respective accession numbers, the acquired Kim matK and ITS barcode sequences of each eggplant genotype were submitted to the BlankIt submission portal (https:// submit.ncbi.nlm.nih.gov/ subs/ genba nk/) and Genbank (https:// submit.ncbi.nlm.nih.gov/) submission portal, respectively 43 .Around ten closest matches of the sequences were aligned with the query sequences by using Cluster Omega, and the resulting alignments were

Figure 2 .
Figure 2. Heat map depicting the phenotypic association among 13 eggplant wild relatives.

Figure 4 .
Figure 4. Correlation among the 13 eggplant wild relatives based on the phenotypic features.The threshold value at P ≤ 0.001 is r = 94.763.

Figure 5 .
Figure 5. Maximum likelihood tree and DNA barcodes obtained from Kim matK (A) and ITS2 (B) sequences depicting the relationship among 13 eggplant wild relatives.The bootstrap scores (1000 replicates) were shown (≥ 50%) for each branch.

Table 2 .
Molecular identification of eggplant and its wild relatives using Kim matK, and ITS2 barcode genes.