Construction and genetic characterization of an interspecific raspberry hybrids panel aiming resistance to late leaf rust and adaptation to tropical regions

Raspberries (Rubus spp) are temperate climate fruits with profitable high returns and have the potential for diversification of fruit growing in mid to low-latitude regions. However, there are still no cultivars adapted to climatic conditions and high pressure of diseases that occurs in tropical areas. In this context, our objective was to evaluate the genetic diversity from a 116 raspberry genotypes panel obtained from interspecific crosses in a testcross scheme with four cultivars already introduced in Brazil. The panel was genotyped via genotyping-by-sequencing. 28,373 and 27,281 SNPs were obtained, using the species R. occidentalis and R. idaeus genomes as references, respectively. A third marker dataset was constructed consisting of 41,292 non-coincident markers. Overall, there were no differences in the results when using the different marker sets for the subsequent analyses. The mean heterozygosity was 0.54. The average effective population size was 174, indicating great genetic variability. The other analyses revealed that the half-sibling families were structured in three groups. It is concluded that the studied panel has great potential for breeding and further genetic studies. Moreover, only one of the three marker matrices is sufficient for diversity studies.

to leaf rust and the R. idaeus species has greater economic importance due to favorable market characteristics.However, unilateral incompatibility and different flowering periods between these species is not synchronized 8 , hindering stable hybrid development.In this sense, studies involving interspecific hybrids provide information that allows a better understanding of raspberries and enables the exploitation and maintenance of the genetic variability of these species.
Genetic variability, interspecific and intraspecific, constitutes a source of genetic resources essential for plant breeding.It represents an allelic pool responsible for generating the existing phenotypic contrasts, resulting from the evolution of organisms in the population, arising from different processes, such as natural and artificial selection.Genetic diversity studies are essential as they seek to understand the potential resource to cope with various biotic and abiotic stresses existing or yet to come 9 .It also provides the basis for further genomic exploitation, such as Genomic Wide Association Studies (GWAS) 10,11 , and Genomic Selection (GS) 12 .Variability can be accessed in different ways, the most used currently is through SNP-type genetic markers.Genotyping by this type of marker constitutes an important tool for evaluating genetic diversity and germplasm characterization.Because it is allelic in nature and enables wide genome sampling due to the high density of markers found 13 .In this context, a panel of raspberry plants was constructed from crosses in a testcross scheme between four cultivars introduced in tropical regions of Brazil.The panel was genotyped using the genotyping-by-sequencing (GBS) method 14 , and, from the data obtained, we aimed to (i) estimate the genetic diversity present in the raspberry panel, (ii) understand the genetic structure present in the panel and (iii) explore the impact of the reference genome in the SNP calling process in the genetic analyses.

Results
The diversity panel was sequenced with a mean depth of coverage of 32×.After marker filtering, 28,373 SNPs were mapped on the seven chromosomes of R. occidentalis, each chromosome with an average of 4053 markers, ranging from 3287 SNPs on chromosome one to 4753 on chromosome six.On the other hand, 27,281 markers were mapped to the R. idaeus genome and distributed across 658 scaffolds, with an average of 41 markers per scaffold.Finally, 12,919 common markers were detected in both genomes.Figure 1 shows the marker distribution across genome markers.
Linkage disequilibrium (LD) analysis was performed using only the Moc matrix since only the R. occidentalis genome is structured in chromosomes 15 .The LD decay was different among chromosomes, ranging from 250 to 310 kb, considering the R2 of 0.20 (Fig. 2).www.nature.com/scientificreports/According to Principal Component Analysis (PCA), the first two components explained 91% of the total variation: component 1, responsible for 89%, and component 2, responsible for 2%, for the unique and Mid matrices.For the Moc matrix, the first two components explained 92% of the variation, with the first component accounting for 90% and the second for 2%.The distribution and grouping of genotypes were similar among the different matrices, distinguishing two groups, the first including most JS (Jewel × Salmon) crosses and another grouping JT (Jewel × Himbo Top) and JG crosses (Jewel × Golden Bliss) (Fig. 3).

R. occidentallis
According to the dendrogram the positioning of each individual is in a likely phylogeny.Using the Mid and Moc matrices, two groups were identified, accordingly to the PCA results.However, a third group was detected using the unique matrix, corroborating the structure of half-sibling families.However, regardless of the structure, most JS progeny remained distant from the other individuals (Fig. 4).
DAPC analysis was performed using 45 principal components and the first two discriminant eigenvalues, which explained 74.3%, 75.7% and 76.4% of the total variance of the unique, Mid, and Moc matrices, respectively.The optimal number of groups obtained from the k-means analysis corresponds to three groups (Fig. 5).Regardless of the matrix, there were no significant changes in the population structure, with only some genotypes relocated from groups, and the JS progeny remained away from the other genotypes.The general population is well structured since, in most cases, each genotype was allocated exclusively to a single group (Fig. 5).
The heatmap of the additive genetic relationship (Fig. 6) developed using the different matrices showed the same pattern, forming three groups, one of them composed mostly of the individuals from the JS cross, as well as DAPC.
Using the unique matrix, the values of MAF (Minor Allele Frequency) ranged from 0.05 to 0.50, with an average of 0.32.Polymorphic Information Content (PIC) values ranged from 0.10 to 0.39, with an average of 0.38, similar results were found using the Moc and Mid matrices (Fig. 7).
The estimated effective size was 174, considering the unique and Mid matrices, and 166, considering the Moc matrix.The Genetic Diversity (GD) of the individuals that make up the panel was 0.39 when using the unique and Mid matrices and 0.38 when using the Moc matrix (Table 1).The observed heterozygosity (Ho) was higher in terms of magnitude than the genetic diversity in the three situations with the unique, Mid and Moc matrices, respectively.Furthermore, the average Ho of the resistant parent was higher than the susceptible ones and their progenies (Table 1).The endogamy coefficient (f) was 0.35 for the unique and Mid matrix markers and 0.37 for the Moc matrix.When analyzing the progenies separately, the values remained similar, however, the resistant parent had a lower inbreeding coefficient than the other individuals in the panel.The same pattern was observed regardless of the matrix used.

Discussion
The development and characterization of raspberry diversity panels are crucial for the next stages of the breeding program, searching for materials more adapted to tropical climatic conditions and the late leaf rust that affects the crop [16][17][18] .Knowledge about the mentioned issues assists in the choice of contrasting and complementary individuals as genitors in breeding programs, thus minimizing the risks of inbreeding depression and loss of genetic variability 16,19 .In addition, it allows laying the foundation for more advanced studies on genetic control via associative mapping and genomic selection 10 .
The hierarchical grouping analysis, PCA, and DAPC showed a close relationship between the progeny JG and JT (Figs. 3, 5).One of the scenarios that may explain this proximity is a recent common ancestor between the cultivars "Himbo Top" and "Golden Bliss" which, despite artificial selection processes originating materials with distinct phenotypic characteristics, still have a great genetic similarity.A second hypothesis is that the cultivars "Golden Bliss" and "Himbo Top" come from the same cultivar but were selected separately due to mutations or segregation of important loci.The analyzes indicate that the JS progeny presents greater genetic similarity with the resistant parent compared to the JT and JG progenies (Figs. 4, 5).However, as the genealogy of the "Salmon" variety is not known, we cannot say with certainty why this is happening.However, it is possible that Salmon is a variety closer to the R. occidentalis species or even that the population has large epistatic selection or large inbreeding depression.Dossett et al. 16 , working with 148 cultivars of R. occidentalis, reported that it is challenging to distinguish cultivars due to the segregation of loci.Furthermore, breeding programs are recent 20,21 and even the most recent cultivars have not advanced more than a few generations ahead of their ancestors 22 .
When evaluating the genetic diversity index, it is possible to identify that the panel presents a high diversity, with an average of 0.39.A study evaluating the genetic diversity of 148 genotypes of R. occidentalis through SSR markers, found an average GD of 0.24, a value considered high by the authors 22 .Similar values were found from the analysis of several species of the genus Rubus, including R. occidentalis and R. idaeus, when evaluating 10 SSR markers loci (GD = 0.23) 23 .On the other hand, higher genetic diversity (GD = 0.60) 24 where observed when using 12 SSR markers from 48 cultivars of R. idaeus and R. strigosus.
The observed heterozygosity was higher than expected, but a similar value was observed before 16 .The high heterozygosity detected may be due to genetic recombination from interspecific crosses and the species' selfincompatibility mechanisms 25,26 .The heterozygosity observed in the panel individuals may account for the effective population size of 174, 174, and 166, considering the unique, Mid and Moc matrix, respectively, in addition to the inbreeding values.On average, 75% of the genotypes that make up the panel have heterozygosity values greater than 0.5, which can be expected due to interspecific crossings and because they are allogamous species.Concerning the parent's heterozygosity and inbreeding, it is possible to observe that the resistant parent has greater heterozygosity than the average of the susceptible parents, and its inbreeding coefficient is lower (Table 1).A possible explanation for this result is the raspberry breeding process, which intensively selected individuals mainly based on qualitative fruit-related traits, thus, artificial selection and drift, due to the small breeding population sizes, may have decreased Ho and increased inbreeding.In general, there was no discrepancy between the results of genomic diversity and population structure generated when the markers obtained from the alignment with the reference genome of the R. idaeus, the R. occidentalis or the set of unique SNPs from the union of the two were analyzed.Probably due to the similarity between red and black raspberries 27 .
Finally, we conclude that the panel studied has great potential for improvement and further genetic studies.In this context, aiming at the next steps in the improvement program, this study will allow the best use of available genetic resources, helping in the search for materials that are best adapted to tropical soil and climate conditions and in the identification of alleles of resistance to late leaf rust in raspberry.Moreover, only one of the three markers dataset is sufficient for diversity studies.However, differences between the marker sets are expected to be observed for association genomics or genomic prediction studies.

Materials and methods
Biological material.The raspberry panel consists of 116 interspecific hybrids, four parent cultivars, and a control (see Supplementary Table.S1 online).The hybrids originated from testcross scheme crossings performed in the Department of Crop Sciences, ESALQ/USP.All raspberries cultivars used for hybridization were bought in a commercial nursery at São Bento do Sapucaí, São Paulo State, Brazil (22° 41ʹ 20ʺ S; 45° 43ʹ 51ʺ W), following all applicable agricultural and genetic resource laws regarding acquisition, transport, and management of plants.The leaves of both R. occidentalis and R. idaeus were collected from cultivars, i.e. commercially cultivated varieties and obtained from a commercial nursery.The R. occidentalis (Black raspberry) cv.Jewel was used as a tester because it possesses late leaf rust resistance alleles 17,18 and, as there is a unilateral incompatibility between the species 8 , it was also used as a female genitor.The selected cultivars from R. idaeus species ('Golden Bliss' , 'Himbo Top' and 'Salmon') were used as pollen donors.The cultivar "Heritage" was also added to the panel.However, it www.nature.com/scientificreports/Genotyping and obtaining SNPs markers.The genotyping was performed using GBS 14 in the Genetic Diversity and Improvement Laboratory of the Department of Genetics, ESALQ/USP.Leaf samples were collected, and the total genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen®), following the manufacturer's protocols.After extraction, genomic libraries were assembled following the protocol adapted from Poland et al. 30 , using two restriction enzymes: NsiI (New England BioLabs Inc.®) and MseI (New England BioLabs Inc.®).
Libraries were sequenced on a HiSeq 2500 System sequencer (Illumina, Inc), and they were aligned in Bowtie2 software 31 with two reference genomes: R. occidentalis species 15 and R. idaeus species 32 .SNPs were called using TASSEL 5.0 software 33 .A total of 275,904,265 reads were found, which resulted in 64,063 (R. occidentalis), and 64,468 (R. idaues) SNPs.Sequencing depth was calculated using VCFtools and quality control was performed to eliminate markers with the following parameters: non-biallelic markers, call rate > 90%, linkage disequilibrium r 2 > 99% and MAF > 0.05.Finally, three marker matrices were constructed for the subsequent analyses, being (i) a set of unique markers originating from the alignment with the R. occidentalis genome and with the R. idaeus genome (see Supplementary Fig. S2 online), with a total of 41.292 SNPs (Unique matrix); (ii) set originating from the alignment with the R. idaeus genome, with a total of 27,281 SNPs (Mid matrix); and (iii) originating from the alignment with the R. occidentalis genome, with 28,373 SNPs (Moc matrix).For the construction of the unique matrix, it was calculated the Pearson correlation 34 between the markers of the genome R. idaeus with the markers of the genome R. occidentalis and those with a correlation greater than 0.98 were considered coincident between the two genomes, being a copy removed from the set of R. idaeus, since the genome of R. occidentalis is more structured.The datasets generated and analyzed during the current study have been submitted to the Rosaceae database under the accession number tfGDR1071, which can be accessed at https:// www.rosac eae.org/ publi cation_ datas ets.
Analysis of genetic diversity and structure.The marker matrices were used to study the genetic relationship between the different individuals that make up the panel.To this end, the lowest frequency allele (MAF), polymorphic information content (PIC), observed heterozygosity (Ho), and genetic diversity (GD) were calculated using the snpReady package 35 .In addition, using the method of Villanueva et al. 36 , the endogamy coefficient was determined ( F ej ) using Eq. (1): where I ij k is the identity of the two alleles ( i e j ) of the individual at the SNP k with value 0 for homozygotes and 1 for heterozygotes, S the total number of SNPs and N the total number of individuals.And from the F, the effective population size ( Ne ), using Eq. ( 2): The additive genetic relationship matrix was obtained from the VanRaden 37 method, and based on it, the principal component analysis (PCA) was performed.Population structure analysis was performed using the hierarchical clustering methods by Ward's method and discriminant principal component analysis (DAPC) 38 using the adegenet package 39 .Finally, the LD ( r 2 ) among all SNPs within less than 1 Mbp on the same chromosome.Only the Moc matrix was used because the genome of the species R. occidentalis is annotated on chromosome while that of R. idaeus is on scaffolds.A graphical representation of the markers in redundancy between the two genomes was also developed using the OmicCircos package 40 .

Statement of methods.
The procedures and techniques employed in this study involving plants were rigorously executed in strict accordance with the pertinent guidelines as outlined in the Method section.These guidelines ensured the ethical and scientific integrity of the research, promoting accurate data collection and analysis while prioritizing the proper treatment of the plant specimens involved.

Figure 1 .
Figure 1.Interactions among redundant markers between R. occidentalis and R. idaeus genomes.R. occidentalis genome is annotated on chromosomes, and R. idaeus genome is annotated using scaffolds.

Figure 2 .
Figure 2. Linkage disequilibrium decay (in base pairs) of the set of markers obtained from the R. occidentalis genome.

Figure 4 .
Figure 4. Analysis by circular dendrogram of the hierarchical clustering identified in the raspberry panel, considering the matrices (a) Unique, (b) Mid, and (c) Moc, where each color represents a group.

Figure 5 .
Figure 5. Discriminant analysis of principal components (DAPC) was performed with the 120 individuals that compose the panel, considering the matrices (a) Unique, (b) Mid, and (c) Moc, the right column containing the scatterplots based on two discriminant functions with three groups identified by the k-means method and the left column containing the graphs and the cluster probability analysis of each genotype.

Figure 6 .
Figure 6.Additive relationship matrix between the genotypes that compose the panel, considering the matrices (a) Unique, (b) Mid, and (c) Moc, where the progenies are highlighted in blue, green, and pink, are JG, JS, and JT, respectively.Susceptible parents in orange and resistant in purple. https://doi.org/10.1038/s41598-023-41728-8