Construction of a breeding parent population of Populus tomentosa based on SSR genetic distance analysis

Parent selection is the core of hybrid breeding. The breeding strategy involving the parental identification of superior open-pollinated progeny of Populous tomentosa germplasm resources can significantly improve the efficiency of parental matching. However, due to some factors such as loose powdering time and pollen competitiveness, the offspring derived from open-pollination families which do not undergo completely random mating. Although hybrid combinations based on the male identification method have a high combining ability, this method cannot easily cover the mating combinations of all male and female specimens in the germplasm bank. In addition, the performance of superior plants in open-pollinated families also affects the selection result. If the trait performance value is higher than the population average, then the special combining ability of the reconstructed hybrid combination may be overestimated. Obtaining a solution to the above problems is of great significance for improving the efficiency and accuracy of selecting hybrid parents of P. tomentosa. In this study, 24 pairs of SSR (Simple Sequence Repeats) molecular markers were used to analyze the genetic differentiation of P. tomentosa germplasm resources. The results showed that the genetic variation of the P. tomentosa population was derived from individuals within the provenance, indicating that high genetic diversity is preserved in provenances. The correlation analysis showed that there was a significant positive correlation between the special combining ability of planting height and diameter at breast height (dbh) of the 34 full-sib progeny population and the genetic distance between the parents. Then, the genetic distance between 18 female plants with high fertility and 68 male plants with large pollen quantity was analyzed using this correlation. Fifteen female parents and 12 male parents were screened out, and 52 hybrid combinations with high specific combining ability for growth traits were predicted. Furthermore, for the male parent identification of superior individual plants, we constructed the breeding parent population including 10 female parents and 5 male parents, generating 14 hybrid combinations with potentially high combining ability. The results of the hybridization test showed that the specific combining ability of plant height and dbh was significantly higher than the controlled pollination. Moreover, genetic distance and paternal identification can be used to rapidly and efficiently construct hybrid parent combinations and breeding parent populations.


Scientific Reports
| (2020) 10:18573 | https://doi.org/10.1038/s41598-020-74941-w www.nature.com/scientificreports/ developing a method to quickly and accurately predict the hybrid parental combination has become a problem with which breeding researchers are extremely concerned and committed to solving. For the genetic improvement of forest trees, EI-Kassaby et al. 1 proposed a BWB (breeding without breeding) breeding strategy that uses trees germplasm with a known pedigree to construct a breeding population. This method requires fewer resources than controlled pollination in traditional breeding and has been performed to reconstruct the pedigree of open-pollinated families 1,2 . However, there is a clear sampling error deficiency in the BWB breeding strategy. In other words, the performance of superior individuals in the plantation will affect the selection results. The phenotypic value of the superior plants is higher than the population average, which will lead to overestimation of the special combining ability of the reconstructed hybrid combinations. However, the phenotypic value of superior plants is lower than the population average, which will lead to the underestimation of the special combining ability of reconstructed hybrid combinations. In addition, the generation of open-pollinated families is often affected by many factors, such as paternal pollen competitiveness, flowering consistency, parental location in the group, and wind direction. This is not a completely randomized genetic environment, and certain parents with high combining ability will not have the opportunity to mate to produce offspring. Thus, the hybrid combinations with high combining ability predicted by paternal identification were not comprehensive.
With the development of molecular marker technology as well as techniques to determine the genetic diversity and genetic differentiation of species provenance, new methods of indirectly predicting the combining ability, which improve breeding efficiency, were developed. In crops, Wang et al. 3 demonstrated that genetic distance can be used as a basis for selecting a superior rice combination. Phumichai et al. 4 also showed that the genetic distance was significantly positively correlated with the special combining ability of maize yield. Jovan et al. 5 proved that there was a significant positive correlation between genetic distance and the special combining ability of maize economic traits. In addition, Tian et al. 6 used molecular markers to analyze the correlation between genetic distance and Brassica napus economic traits, and the results showed that there was a significant positive correlation of the two indexes. However, there are exceptions. Perenzin et al. 7 found that genetic distance has a certain correlation with heterosis in wheat, but the correlation coefficient is small and insufficient to predict heterosis. Dong et al. 8 analyzed the relationship between genetic distance and heterosis in Pinus massoniana and found that genetic distance can be used to predict combining ability and heterosis within a certain genetic distance. The difference in genetic distance and combining ability between the above parents is related to the research material and type of marker applied. Not all molecular markers are suitable for estimating the combining ability between parents. The correlation between genetic distance and combining ability among parents has a close relationship with the research object. Thus, could a positive correlation occur between genetic distance and combining ability in P. tomentosa? Can genetic distance be used to predict the high combining ability of hybrid parents to make up for the deficiency of the parent population constructed by paternal identification of open-pollinated superior plants?
In this study, we analyzed the P. tomentosa seedlings height and diameter at breast height (dbh) of 34 hybrid combinations and determined the correlation between SSR marker genetic distance and the special combining ability of 19 parentals derived from 5 provenances. The possibility of using genetic distance to predict high combining ability for hybrid combination was discussed, and the open-pollinated individual male identification results were compared with the primary breeding parent population based on genetic distance. Constructing a more reliable high combining ability breeding parent population provides a theoretical reference for overcoming the blindness of hybrid combination and predicting high combining ability combinations of P. tomentosa.

Materials and methods
Materials. From 1983 to 1984, the P. tomentosa Research Institute of Beijing Forestry University (China, Beijing) organized the provincial P. tomentosa research collaboration organization to carry out a preliminary resource survey. In the Huanghe-Huaihe-Haihe river basin, the P. tomentosa distribution area is 1 million km 2 (30-40° N, 105-125° E) in 100 counties, and 1047 high-quality P. tomentosa trees were selected. In 1986, the germplasm resources bank of P. tomentosa in Guan Xian County of Shandong Province was established. At present, 441 genotypes of excellent diploid trees from Beijing, Hebei, Shandong, Henan, Shanxi, Shaanxi, Gansu, Anhui, and Jiangsu provinces are preserved in the germplasm bank, 18 of which have been screened for good fertility and high combining ability of the female plants 9 , and 68 male plants with large pollen quantities were also identified 10 Table S1) were constructed by cross-matching of test lines, and 980 seedlings were obtained. Furthermore, 3984 seedlings from eight full-sib hybrid progeny populations, used for hybridization verification, were planted in the nursery of Guan Xian County, Liaocheng City, Shandong Province.
DNA extraction and SSR analysis. Total DNA extraction was performed using a plant genomic DNA extraction kit (Tiangen Biotech Co. Ltd, Beijing, China). In this study, three types of primers were required according to Schuelke's method 11 , including forward primers with a 5′-terminus M13 sequence primer (5′-TGT AAA ACG ACG GCC AGT -3′), common reverse primers, and fluorescent (ROX, FAM, TAMRA, and HEX) M13 primers. The specific PCR procedure is as follows:  14 for all clones of the P. tomentosa populations. These parameters can ideally reflect the genetic diversity of the population as a whole and sub-regions. The F-statistic index (F IS , F IT , F ST ) 15,16 was estimated by FSTAT 2.9.3 software 17 , where the genetic differentiation coefficient (Fst) reflects the degree of the population genetic differentiation. The fixation indices F IT and F IS represent degree of inbreeding at the total population level and between individuals in each subpopulation, respectively 18 . The estimated allele frequencies were imputed by GenAlEx version 6.5 19 software to compute the average value across markers of the standardized genetic differentiation measure (Gst) proposed and recommended for SSR data by Hedrick 20 .
Polymorphism information content (PIC) was calculated using PIC-CalcVersion 0.6 software to evaluate genetic diversity 21 . Rousset's genetic distance model F ST 1−F ST was used to estimate the genetic distance between individuals within provenances 22 . The geographical distance was based on the latitude and longitude of the sampling point, and Vincenty's formula (https ://www.movab le-type.co.uk/scrip ts/latlo ngvin centy .html) was used to calculate the geographic distance (km) between the provenance and the individuals within the provenance. In addition, the Distance Web Service (Version 3.23) was used to perform a Mantel correlation test of genetic distance and geographic distance 23,24 . Estimation of genetic parameters. The height and diameter at breast height data for 34 full-sib families (Table 1) were obtained by measuring for three consecutive years. The SPSS software was used to calculate the relevant parameters of all phenotypic traits. The genetic parameters analysis (VSN International, Hemel Hampstead, UK) was performed using ASReml-R 4 software 25 , and a Pearson correlation analysis was performed using Origin pro version 9.0. There were F families, with the observed value of a single plant as the statistical unit. Tree height and diameter at breast height data were analyzed by the following mixed linear model for restricted maximum likelihood analysis: y ij = µ + M i + F j + MF ij + e ij , where y ij is the observation of the i-th male parent and the j-th female parent, μ is the universal mean, and e ij is the residual effect.
The random effects of genetic variance include: • Fi, which is the random effect of general combining ability of the parent i, E(Fi) = 0, Var(Fi) = σ 2f ; • Mj, which is the random effect of general combining ability of the father j, E(Mj) = 0, Var(Mj) = σ 2 m ;

Results
Genetic differentiation among provenances of P. tomentosa germplasm resources. Genetic differentiation was performed at 24 polymorphic SSR loci (Supplementary Table S2). There were significant genetic differentiation levels in various provenances (P < 0.01). The PIC parameters varied from 0.232 (Jiangsu) to 0.461 (Henan), with an average of 0.332, indicating a lower level of allelic diversity in the P. tomentosa germplasm resource bank (Supplementary Table S3).
In the genetic analysis of the germplasm resources from the nine provenances, we found that the mean fixation index within provenance (Fis) was negative in the nine provenances, ranging from − 1.0 to − 0.237 (Supplementary Table S3). The genetic differentiation ranged from 0.031-0.036. The genetic differentiation was low between pairwise provenances, varying from − 0.007 to 0.046, indicating that the genetic diversity is distributed within provenances ( Table 1). The differences among individuals within the population were the main sources of genetic variation, indicating that higher genetic diversity was preserved within the provenance.
Analysis of combining ability of plant height and diameter at breast height of full-sib hybrid combination. The specific combining ability for tree height and diameter at breast height of 34 hybrids combinations was analyzed ( Table 2). The results showed that the maternal or paternal effect of different provenances had no significant effect on the height and diameter at breast height of the progenies. The difference between height and diameter at breast height between the hybrid combinations reached a very significant level (P < 0.01), indicating that the parental combination has a very significant effect on the special combining ability of height and diameter at breast height.
The results showed that among the 34 hybrid combinations, the special height combining ability effect value of the combination 3-85-1 × 5088 was the largest, and the height general combining ability effect value of the female parent 3-85-1 was the lowest (Fig. 2i). The special combining ability effect value of height special combining ability of 5019 × 5088 was the smallest (Fig. 2c), while the height general combining ability effect value of the female parent 5019 was generally larger (Fig. 2i), and a correlation with the special combining ability was not observed (Fig. 2a). The special combining ability of the diameter at breast height of 5066 × 5088 was the largest, and the general combining ability effect value of the diameter at breast height of 5066 was also the largest (Fig. 2f,i). The special combining ability of the diameter at breast height of the combination of 5074 × 5088 was the smallest, and the general combining ability effect value of the diameter at breast height of the 5074 was close to the lowest (Fig. 2f,i). www.nature.com/scientificreports/ Pearson correlations between height and diameter at breast height general combining ability and the special combining ability of different provenance parents were analyzed (Fig. 2a,b). The height general combining ability of different provenance parents was positively correlated with the special combining ability. The hybrid fathers 3-24-3 and 5088 had correlation coefficients of 0.1821 and 0.235, respectively, but did not reach a very significant correlation level. However, the diameter at breast height general combining ability value and the special combining ability effect value of different provenance parents showed different correlation trends. For hybrid parent 3-24-3, the general combining ability of diameter at breast height was negatively correlated with the special combining ability. The correlation coefficient was − 0.029, and the correlation did not reach a significant level. For the hybrid parent 5088, the general combining ability of the diameter at breast height was positively correlated with the special combining ability, and the correlation coefficient was 0.457. This correlation also did not reach a significant level.  Table S4; Fig. 2d,e), and the special combining ability effect of the height and diameter at breast height of hybrid combination and its corresponding inter-individual geographical distance were shown by the Mantel test (Fig. 2g,h). The height of each parental combination was positively correlated with the geographical distance, and the correlation coefficient was 0.106, which did not reach a significant level (P = 0.5516). The diameter at breast height of each parental was not significantly (P = 0.9625) correlated with the geographical distance (r = − 0.008). Thus, using the geographical distance to predict the special combining ability is not possible. A correlation analysis between height and diameter at breast height special combining ability and the corresponding genetic distance showed that the height and diameter at breast height were positively correlated with the genetic distance (P = 0.0024; P = 0.0156). Thus, the genetic distance had a very significant effect on the special combining ability of height and diameter at breast height, respectively.
Prediction of hybrid parental population with high special combining ability of P. tomentosa. The relationship between the special combining ability and the genetic distance was verified, and the genetic distance was positively correlated with the special combining ability, which indicates that genetic distance can be used to predict the breeding parents. Due to the serious abortion problem in P. tomentosa, the reproductive ability was weak, and many female parents are not suitable for use in hybridization by artificial pollination. Therefore, the selection of parents should be a male and female with a certain fertility and a large genetic distance. Based on the above principles and previous research, 18 females with high fertility and high general combining ability and 68 male trees with large pollen quantities were selected from the excellent tree germplasm resources of P. tomentosa to construct a primary breeding parent population with potentially high special combining ability. The parental genetic distance of 1224 hybrid combinations (Supplementary Table S5) ranged from 0.032 and 0.984, with an average of 0.424 (Table 3). A genetic distance greater than 0.710 (i.e., the genetic distance average + 2 standard deviations) was used as the standard for the construction of the primary breeding parent population. There are 52 hybrid breeding combinations with a large genetic distance, including 15 female parents and 12 male parents ( Table 3).
The  Table 3. Prediction of the parent group with high specific combining ability in P. tomentosa. ♀ represents the female parent, ♂ represents the male parent, and SD represents standard deviation.  Through the construction of the full-sib hybrid progeny population, a total of 34,020 hybrid seeds were obtained, 5415 full-sib child progeny hybrid seedlings were obtained, and 3984 seedlings were obtained. The emergence rate was 15.9%, and the survival rate was 73.6% (Supplementary Table S9).
Height and diameter at breast height were measured for the full-sib hybrid progeny population ( Table 5) Table 4. Statistical table of high specific hybridization combinations based on parents' genetic distance and half-sib progeny paternal identification. ○ represent high specific hybridization combination based on genetic distance between parents. □ represents hybridization combinations based on the identification of the male parent. ○/□ represents hybrid combination predicted based on the above two methods. Table 5. Analysis of variation and special combining ability of seedling height and diameter at breast height for different P. tomentosa hybrid combinations. Note: Different lowercase letters represent difference at 0.05 significance level, CV = Coefficient of variation, and SCA = Specific combining ability. The combining ability analysis of growth traits by ASReml-R 4 software showed that the special combining ability of height and diameter at breast height of the hybrid combinations T-F-14 × T-M-45 and T-F-14 × T-M-43 was significantly higher than that of the same female parent. In the hybrid combination with T-F-18 and T-F-15 as the female parent, the special combining ability of plant height and diameter at breast height was also significantly higher than that of the control hybrid combination. There were large variations in height and diameter at breast height of the hybrid progeny in each hybrid combination. The variation coefficient of height ranged from 23.1 to 38.7%, and the variation coefficient of diameter at breast height ranged from 22.8 to 39.1% (Table 5) (Table 4). As displayed in Table 5, The paternal identification and genetic distance methods jointly predicted that the seedling height and diameter at breast height measurements of the hybrid combinations and their special combining ability values would be higher than those of the other hybrid combinations of the same female parent. However, the height, diameter at breast height and special combining ability of hybrid combinations T-F-15 × T-M-41 and T-F-18 × T-M-43 predicted by paternal identification were lower than those predicted by paternal identification and genetic distance. The height and diameter at breast height of the hybrid combination T-F-15 × T-M-41 were higher than those of the T-F-15 × T-M-2 hybrid combination. T-F-15 × T-M-2 was the dominant hybrid combination, and it was not predicted by the combination of parental identification and genetic distance prediction. The paternal identification method has a certain ability to predict strong dominant hybrid combinations, although the joint prediction by the paternal identification and genetic distance method is more effective than paternal identification alone.

Discussion
Genetic variation analysis among provenances. In this study, 24 pairs of SSR primers were used in the genetic analysis of 441 clones of P. tomentosa other than the natural triploid of P. tomentosa. The results show that there is excess of heterozygosity within provenance (F IS = − 0.540) and low genetic differentiation among P. tomentosa provenance (F ST = 0.037), which may be due to the long-term cultivation and introduction of P. tomentosa. The same pattern of genetic variation was also found in close relatives of P. tomentosa, Populus tremula [29][30][31] . To carry out the genetic improvements of P. tomentosa, individual differences within the population should be considered, and hybrid parents with large genetic distances should be selected to acquire high heterosis. A correlation analysis between individual genetic distance and geographical distance revealed that the geographic distance is not important. Furthermore, the 34 full-sib hybrid family derived from the same provenances or different provenances are analyzed. The results show that the genetic distance between the parents is significantly positively correlated with the special combining ability of the full-sib growth traits, and a larger genetic distance corresponds to strong heterosis. Therefore, the analysis of the correlation between the genetic distance and the special combining ability in P. tomentosa has important guiding significance for the selection of hybrid combinations. In the P. tomentosa population, the breeding strategy in which strong heterosis combinations are selected based on the parents genetic distance can be used to significantly reduce the selection range of parents, thus saving breeding time and a large amount of human and material resources; moreover, it lays the foundation for the rapid and efficient construction of high combining ability hybrid parent groups.
Parent selection for breeding based on genetic analyses. The general combining ability reflects the average performance of the progenies resulting from several mating combinations in a mating group and reacts to the parental additive gene effect, and the additive gene can be transmitted to the offspring in the mating population. The special combining ability reveals the deviation between the performance of a particular mating combination and the predicted performance of the parent general combining ability. The additive effect and the superior effect of the reactive gene can only be expressed when specific genes are combined and cannot be passed on to the offspring 32 . In this study, a significant correlation is not observed between the general combining ability and special combining ability of height and diameter at breast height. This relationship has been reported in recent research 33,34 . The relationship between the special combining ability and the genetic distance was verified, and the genetic distance in P. tomentosa was significantly positively correlated with the special combining ability, indicating that the breeding parents could be predicted by the genetic distance. However, Tian et al. used the genetic distance of rapeseed economic traits to predict heterosis and combining ability and found that there was no significant correlation between the two indices 6 , indicating that the genetic distance is not capable of predicting heterosis and combining ability [35][36][37][38][39][40] . This finding may be related to the lack of a relationship between the determinant gene of the measured trait and the molecular marker utilized to estimate the genetic distance or differences between individual genomes of the offspring and interactions between genes and the environment [41][42][43][44] , thus illustrating the complexity of quality traits 36 .
Based on the genetic distance analysis, a primary breeding parent population with potentially high specific combining ability was constructed, and the high specific combining ability hybrid combination was predicted; however, only 14 hybrids were consistent with the results of the paternal identification of open-pollinated individuals. The reason may be related to the inconsistency of the loose powder time of the male plants or the difference in the competitiveness of the pollen, meaning that the parent population identified based on open-pollinated excellent plants is not obtained under a genetic environment with completely random mating. However, 35 of the reconstructed hybrid combinations based on the parental identification were different from those with high special combining ability predicted by genetic distance, which may be due to the influence of random sampling error. Therefore, the selected superior plants were the most prominent individuals in the progeny population, and Scientific Reports | (2020) 10:18573 | https://doi.org/10.1038/s41598-020-74941-w www.nature.com/scientificreports/ the average value of the growth traits of the full-sib population was relatively low, which may also explain why the male plant T-M-31 can be combined with 13 high fertility and high combination ability female plants as a high special combining ability hybrid combination via the genetic distance prediction but not by the parental identification method 45 . Here, we included T-M31 in the breeding parent population, which was the final constructed breeding parent group that included 10 female parents and 5 male parents. The full-sib offspring representative assay was used to verify that height and diameter at breast height of the predicted high specific combining ability hybrid combination were significantly higher than those of the same female parental control hybrid combination. The results showed that the genetic distance and the paternal identification results of the half-sib progeny are able to quickly and effectively construct breeding parent groups and confirm breeding parents.
Forest tree breeding parent matching strategy. In forest genetic improvement, BWB strategies have effectively improved breeding efficiency 2 . However, there are obvious shortcomings in the BWB breeding strategy, such as sampling error and incomplete randomization of the mating genetic environment, which may result in the selection of superior trees that do not represent the comprehensive performance of the progeny population of elite hybrid combinations. This study combines genetic distance and BWB breeding strategies to improve the accuracy and effectiveness of parental selection and breeding parent population construction. Based on the above research, this paper proposes a breeding parent selection strategy based on a high special combining ability. For the selection of excellent P. tomentosa plants and even similar species, the germplasm resource bank is first used to observe and screen parents with good fertility through pollen and seed formation. Molecular markers are used to analyze the genetic distance of the fertile parents. Based on certain genetic distance criteria, the parental hybrids with potential high specific combining ability are screened to construct the primary breeding parent population. Furthermore, the open-pollinated family is constructed by using the female parents of the primary breeding parental group, excellent offspring trees are screened from this group, and the paternal identification of open-pollinated excellent individuals is analyzed by molecular markers. On this basis, a primary breeding parental group and parent identification results are synthesized using the genetic distance to select parents that match the high special combining ability hybrid combination.