Application of Pedimap: a pedigree visualization tool to facilitate the decisioning of rice breeding in Sri Lanka

The development of rice cultivars with desirable traits is essential. The decision-making is a crucial step in rice breeding programs. Breeders can make efficient and pragmatic decisions if an organized pedigree visualization platform is available for the accessions and cultivars in rice breeding germplasm. In the present study, the available data of all the rice varieties released by Rice Research and Development Institute, Sri Lanka, and the related landraces and genotypes were arranged in Pedimap, a pedigree visualization tool. Pedimap can showcase pedigree relationships, phenotypic, and molecular data. The identity by descent probabilities were calculated using FlexQTL software and included in the Pedimap database. The parentage selection based on the variations of phenotypic traits, selection of marker alleles for molecular breeding, and detection of the founders of genetic effects can be swiftly conducted using Pedimap. However, the power of harnessing the value of Pedimap for making breeding decisions relies on the availability of data for the traits, markers, and genomic sequences. Thus, it is imperative to characterize the breeding germplasms using standard phenomic and genomic characterization procedures such as the assessment of before organized into Pedimap. Thereby, the worldwide breeding programs can benefit from each other to produce improved varieties to meet global challenges.

The decision-making process in a breeding program is crucial for successful outcomes. The formulation of decisions before breeding is a multi-step process that consists of the identification of breeding priorities, determination of the genetics and estimated breeding values (EBV) of target traits, and employment of pre-breeding methods if required. The economic and technical feasibility, number of parents for crosses, number of selfing and outcrossing cycles, length of the breeding program/cycles, and identification of the selection methods must also be assessed 22 . In the decisioning process, initially, the market trends based on consumer and other stakeholder preferences must be recognized 23 . Subsequently, the novelty and the uniqueness of the breeding objective must be assessed before the execution of the breeding program 22 .
The selection of suitable varieties or individual plants as parents and the determination of the selection methods are the two most critical aspects in planning breeding programs 24 . The parental selection depends on the number of prioritized traits for breeding. When multiple characteristics are to be introgressed, the breeders require a prioritized order of parents for stepwise crossing and selection 25,26 . The decision-making process in breeding is entirely based on the available information on phenotypes, genotypes, pedigree, EBVs of key traits, available budget, field and greenhouse space, desired time-to-market, etc. Although the data for decision-making for breeding are indispensable, haphazardly collected information would provide less value to the breeders. In many conventional breeding programs, most of the data are recorded in field notebooks and stored in the breeding stations, while very little information is available as computerized databases. If an organized database containing all the essential information for the rice varieties released and the parental genotypes used in breeding, the decisions can be easily made.
The construction of a database with all the necessary information from varieties and their parents promotes the capacity of data sharing, mining, visualization, and retrieval 27 . Pedimap is a pedigree visualization software. The data needed can be imported to Pedimap from FlexQTL, or with some custom script from any other database program. Pedimap is used by many contemporary plant genetics and breeding programs worldwide. As stated in Voorips et al. 28 , Pedimap can be used to record and utilize breeding history. Pedimap illustrates the available phenotypic and genetic data through pedigrees. All the information, including parentage, qualitative and quantitative data, marker alleles/genotypes, and the calculated identity-by-descent (IBD) probabilities can be presented in Pedimap. Currently, breeders prefer to use pedigree visualization tools like Pedimap since it allows them to access the large pool of genetic and phenotypic data quickly and generate pedigrees that are essential in making breeding decisions.
In Sri Lanka, Rice Research and Development Institute (RRDI) is the sole organization conducting the rice breeding programs for the national needs. Therefore, in the present study, we report an attempt to organize the information of the released varieties and the parental genotypes of RRDI breeding programs as a Pedimap based database, which is a valuable step to take accurate breeding decisions and speed up the process of releasing novel varieties.

Materials and methods
Data curation. The data were collected from RRDI, Sri Lanka, and classified under three main categories, namely pedigree history, phenotypic data, and molecular data on rice varieties/ landraces/ genotypes (herein after collectively referred to as cultivars). The male and female parents and the order of crosses were taken under pedigree history. The average yield of the rice plants, the maturity period in different growing seasons (Yala and Maha seasons of Sri Lanka, two main rice growing seasons, based on the two different seasons of monsoonal rains. Yala season is generally drier 29 ), plant height, basal leaf sheath color, and additional color patterns, recommended type of the land, level of phosphorus deficiency tolerance, amount of brown rice recovery, milling recovery, head rice recovery, amylose content, gelatinization temperature, the weight of 1,000 grains, shape of the grain, pericarp color, the weight of a kg, the color of the buff coat and resistance/susceptibility to pests and diseases; brown planthopper (BPH), bacterial leaf blight and rice blast disease were recorded under phenotypic data (Supplementary Table S1 online). The available DNA marker alleles, marker positions in the linkage map, and allelic scores were entered under molecular data [30][31][32][33] (Supplementary Table S2 online). pedimap procedure. A Pedimap input data file is created in MS Excel (2019), and the data file is exported as a tab-delimited text (.txt) file (Supplementary Table S3 online). The input file contains four main subdivisions; header, pedigree, marker data, and IBD probability section (Fig. 1). The header consists of five essential elements and one additional element. The name of the population and symbols for unavailable or missing data, null homozygous alleles, and confirmed null alleles are entered to the pedigree section, as shown in Fig. 1a. The name of the cultivar must be a string with text or numerical values without spaces.
Next to the header, the pedigree section is entered, as shown in Fig. 1b. The first column denotes the name of the variety or landrace, and second and third columns are reserved for maternity and paternity information, respectively. The numbers and strings can be included to represent the phenotypes in the first three columns. From the fourth column onwards, any desirable quantitative or qualitative trait values can be entered. All the collected phenotypic data are introduced, as shown in Fig. 1b. The third section of the input data file is for marker information. The linkage group of the DNA marker and the marker positions in the linkage map are entered, as shown in Fig. 1b. If there are more than one linkage group, all the linkage group maps should be defined successively before entering the allelic scores. The detailed data for each DNA marker can be inserted after revealing the map positions. The respective number of columns, according to the ploidy level, should be incorporated to enter allelic scores. The fourth section is for IBD probability values (Fig. 1c). The IBD probabilities cannot be Scientific RepoRtS | (2020) 10:14255 | https://doi.org/10.1038/s41598-020-71260-y www.nature.com/scientificreports/  where H 2 = heritability of the trait; P = trait values of the individual or cultivar; P = population mean value of the trait; P − P = phenotypic superiority.

Results
Worldwide plant genetics and breeding programs use Pedimap as the platform for maintaining breeding databases and pedigree visualization. In the RosBREED project 41 , the parental and progeny identification, tracing founders, and calculation of allelic representation are conducted using Pedimap. The pedigree display of Pedimap is used to plan crosses in the Rosaceae research community 42,43 , HIDRAS project 44 , and visualize of Arabidopsis thaliana crosses 45 . Selecting parentage, sketching out crossing schemes, estimating the probability of allelic segregation, and choosing compatible molecular markers for MAB can be achieved using Pedimap 28 . The use of Pedimap as a pedigree visualization tool for the decision-making process in rice breeding is described using three examples (Table 1).
Example 1: Selecting parents for higher yield, BPH tolerance, short duration and white pericarp with diverse grain shapes. The Pedimap database of rice breeding germplasm in Sri Lanka has a total of 224 input cultivars. There are 36 intermediate genotypes such as F1 and F2 that were not reported, but we included them to complete the pedigree in Pedimap. Thus, the database has a total of 188 rice cultivars and accessions with known identities with records (Supplementary Table S1 online and Supplementary Fig. 1 online). In Example 1, we considered a scheme to select cultivars as parents with the parameters given in Table 1 for white pericarp, yield, BPH resistance, maturity period, and the grain shape. These thresholds defined a subpopulation of 26 cultivars (Fig. 2). The variation of the yield is given in Fig. 2a. According to the color shading given, the breeder can select the required parents for crossing to obtain higher yield levels. However, as shown in Fig. 2b, only three cultivars show the complete resistance to BPH. If breeder plans to introgress the complete BPH resistance to the novel varieties, only Bg250, At307, and At306 are available as the sources of resistance. Figure 2c displays the variation for the maturity period. The breeder can choose the parents depending on his objective for the intended maturity period for the novel varieties. Example 1 was exclusively planned to breed for white pericarp. However, the grain shape is also important as a significant quality trait to become a successful variety in the market. Figure 2d shows the variation for grain shapes for the breeder to carry out the selection. If we consider all the traits and selected At307 as a parent based on the pedigree visualization in Pedimap, At307 can provide the genetic basis for high yield, complete resistance to BPH, approximately three months for maturity, and intermediate-bold shaped grains. If Bg450 was selected, the yield is still in the higher range with moderate Example 2: Selecting parents for high/high-intermediate amylose content, higher yield, short duration, and resistance to blast disease. In Example 2, we considered a scheme to select cultivars/ accessions as parents with the parameters given in Table 1 for high/high-intermediate amylose content, higher yield, short duration, and resistance to blast disease. These thresholds defined a subpopulation of 37 cultivars/ accessions (Fig. 3). The breeder can select the high yielding, short-duration, and blast-resistant cultivars as parents from pedigrees visualized in Fig. 3a-c, respectively. The high, high-intermediate, and intermediate amylose contents are depicted in the pedigree given in Fig. 3d. Only Bw351, At307, Bg407H, At308, and Bg252 show the complete resistance to blast (Fig. 3c). However, At307 is the most promising parent with high yield (Fig. 3a), short duration (Fig. 3b), and high amylose content (Fig. 3d) along with complete resistance to blast (Fig. 3c). Also, Bg407H is the highest yielding (Fig. 3a), blast-resistant (Fig. 3c), and high in amylose content (Fig. 3d). However, Bg407H is a long duration variety compared to At307. Therefore, the breeder may plan to cross At307 and Bg407H to accomplish the breeding objective of Example 2.
Example 3: Selecting parents for phosphorus deficiency tolerance, higher yield, short duration, resistance to both BPH and blast, and high/intermediate-high amylose content. We selected a set of rice cultivars from the Pedimap database based on the availability of ranked scores for phosphorus deficiency tolerance (PDT). Twenty-four cultivars contain the PDT ranks of high, moderate, and sensitive (Fig. 4a). The same set was illustrated using Pedimap for yield (Fig. 4b), maturity period (Fig. 4c), degree of resistance to BPH (Fig. 4d) and blast (Fig. 4e), and amylose content (Fig. 4f). If At362 is considered as a parent, it can bring resistance to phosphorus deficiency (PD), and BPH, moderate resistance to blast, high yield, average maturity period, and intermediate-high amylose content. Similarly, if Bg250 is selected, it can bring moderate resistance to PD and blast, resistance to BPH, moderate yield and shortest maturity period, and high amylose content (Fig. 4). A sample crossing scheme is shown in Fig. 5 to produce a rice variety with high PDT, mean yield ≥ 5.0 mt/ ha, maturity period ≤ 105 days, resistant to BPH and blast disease, and higher amylose content. Since there is no reported cultivar for high PDT with complete blast resistance (Fig. 4), the illustrated crossing scheme in Fig. 5 is proposed with two phases. In the first phase, the crossing of At362 and Bg250, followed by numerous rounds of selfing and selection of the most beneficial lines among the recombinant inbred lines (RILs) at advanced www.nature.com/scientificreports/ generations, would accomplish the breeding objective only without complete resistance to blast (i.e., a moderate level of blast resistance is possible). In the second phase, the selected RILs from phase 1 can be backcrossed to Bg252 as the donor parent to introgress the complete resistance to blast. The breeder can come up with diverse crossing schemes like the one given in Fig. 5 to make effective decisions for breeding and maximize the resource utilization to release varieties in the shortest possible time. The breeder can select any number of parents that are needed to use as sources of resistance and other traits to start crossing. Also, the marker alleles and the IBD probabilities can be checked, as illustrated in Supplementary Fig. 2a,b online, respectively.
estimated breeding values (eBV) of the rice cultivars to support the breeding decisions in examples 1, 2 and 3. The calculations revealed that the EBV-yield of the cultivar At307 is 1.03, which received the second rank that justifying its selection for the cross selected in example 1. The cultivar Bg450 is also ranked at 29th position in terms of its EBV-yield. Thus, Bg450 also brings a higher genetic effect for yield. However, Bg450 carries genes for extended maturity (EBV-maturity period rank of 62); however, At307 got the rank 20th; hence, the progeny has a chance to receive genes for a shorter maturity period. For plant height, At307 and Bg450 got rank of 50th and 24th positions, respectively, indicating that progeny would have a strong basis for shorter plant height, which is desirable to prevent lodging and increase the fertilizer use efficiently ( In example 2, our selection of Bg407H and At307 is firmly validated by the EBV-yield ranks of first and second received by these two cultivars, respectively. However, the EBV-maturity period of Bg407H was ranked 55th, and EBV-plant height was ranked 67th, indicating that Bg407H would bring favorable genes for extended maturity and taller plants. However, At307 got rank 20th for the EBV-maturity period and 15th for plant height, causing decreasing genetic effects on extended maturity period and tallness of the plants ( In example 3, our selection of At362 is validated by the EBV-yield rank. This cultivar provides the secondbest possible genetic effect for yield in the breeding germplasm available in RRDI. In the proposed crossing schemes in Figs. 4 and 5, Bg250 and Bg252 were selected to provide the genetic basis for the shorter maturity period, and the EBV-maturity period values of those cultivars support this selection. Also, Bg250 was previously used by RRDI as a breeding parent, it's accuracies of EBV for maturity period were 0.98 and 0.99 for Yala and Maha seasons, respectively. The perfect correlations between EBV and TBV of Bg250 regarding the maturity period, indicates that it is an ideal parent to provide the genetic basis for shorter maturity period to the progeny.

Discussion
The decision-making process in breeding is a tedious task 22 . The breeding germplasm is complex with large numbers of improved varieties, traditional cultivars, landraces, wild germplasm, and accessions. Also, there can be large mapping populations and unreleased varieties due to various reasons. The numerous cultivars in breeding germplasm may have extensive records on agronomic data, pest and disease resistance, quality traits, www.nature.com/scientificreports/ availability of samples, geographic locations, and utilization in diverse breeding programs as parents 46,47 . With the advent of DNA markers and sequencing technologies, a wealth of genomic information is also available 48 . However, one of the recurrent problems in any breeding germplasm in the world is most of the cultivars remain uncharacterized. Thus, they cannot be used directly in breeding activities. Traditionally, breeders keep records in field books. With the development of computer technology, data tabulation is becoming a common practice. However, given the highly complex nature of the datasets in breeding germplasm, data tables have a limited value to the breeders. The tables created with contemporary data managing software cannot graphically display complex pedigrees and variations of qualitative and quantitative traits along with DNA marker information. These database handling platforms do not make use of the pedigree-based capabilities of Pedimap, like selecting related parental varieties/accessions. In this context, Pedimap provides a considerable advantage, as it can visualize pedigree relationships, trait variations, and any other useful information required for decision-making and planning crosses in breeding programs 28 . If all the available details on breeding germplasm are arranged as a database, the breeder can come up with subpopulations based on diverse traits and select the parents for improving multiple traits. However, simple spreadsheets or manually prepared note pages cannot be used to www.nature.com/scientificreports/ visualize the essential information and complex pedigrees. Breeding programs often suffer a lot when the breeder gets retired or moved to a different position [49][50][51] . The newly hired breeder cannot practically go through the individual records of the existing breeding germplasm. Thus, there is a strong possibility that valuable breeding germplasm might get lost, wasting time, resources, and courage of the retired breeder and his team. However, as a routine practice, if the breeder maintains and updates a Pedimap file for the developing germplasm of breeding materials, the newly hired breeders can go through and identify the value and gaps in the available material for him to plan further. The creation of a Pedimap file is simple, and a novice to informatics can curate and use Pedimap with a little training. Pedimap allows breeders to store data, fetch and visualize genomic information at any time with less effort and complete accuracy 52 . The straightforward accessibility, direct data interpretation, ability to customize the views in multiple fashions, and editable output file formats are the significant features of Pedimap. The graphic files created can be readily imported to image editing software for further visualizations and illustrations. Pedimap is not an opensource software but can be freely obtained by contacting the developers; thus, even the breeders in developing countries can benefit from Pedimap 28 .
In the current study, we created a Pedimap database for the rice cultivars and accessions prominently used by breeding programs in Sri Lanka. With the available information, significant breeding decisions can be made, as we explain in three examples (Figs. 2, 3, 4 and 5). However, it is essential to characterize the cultivars for all the important traits, molecular markers, and SNP haplotypes 53 , so that breeding decisions can be effectively made 17 . The EBVs for the parental cultivars and progenies further can be consolidated with the pedigrees to intensify the reliability of the breeding decisioning 54 . The phenotyping methods must be standard and should follow common procedures across different locations so that the power of the Pedimap database would go up dramatically. Therefore, breeders should always follow the standard, globally acceptable phenomic platforms to characterize the material in breeding germplasm 44,55 . The novel Agri-tech practices such as vertical farming, artificial intelligence-powered technology, and re-energizing the plant microbiome would improve conventional breeding, leading to second green-revolution 36,56 . Therefore, in addition to genotyping technologies, including whole genome sequencing and DNA marker-assisted selection techniques, high-throughput phenotyping tools/ phenome platforms are also essential to develop further breeding systems.  The application of EBV in breeding is a common practice to decide the additive genetic effect that each parent can bring to the progeny 57 . There are only three quantitative parameters (yield, maturity period for Yala and Maha seasons and plant height) available in the breeding germplasm at RRDI (Supplementary  Table S1 online). We calculated EBV for these three parameters (Supplementary Table S4 online). The first 30 top tanked cultivars for EBV, together with two other important cultivars used in example 3, are given in Table 2. Also, the accuracy/reliability of EBVs are given in Table 2 and Supplementary Table S4 online for the cultivars that were used by RRDI as breeding parents. It is evident from the trait data, EBVs and accuracy of EBVs given in Table 2 that the breeding germplasm at RRDI got elite cultivars that can be used as breeding parents in the future. Interestingly, all these are newly improved and released rice varieties. It is evident that the crossing schemes must always plan using these elite cultivars as parents while carefully adding other landraces of exotic types as resistant sources to avoid linkage drags. The EBVs of yield (Fig. 6a), plant height (Fig. 6b), and maturity periods in Yala and Maha seasons (Fig. 6c,d) show continuous distribution. For yield and maturity period, RRDI germplasm has promising rice cultivars. However, for plant height, the high yielding parents tend to have increasing genetic effects. The lodging is a frequent problem in rice farming in Sri Lanka; thus, current EBV-plant height estimates imply the necessity of boarding the breeding germplasm with the parental cultivars that can provide a genetic basis for short plants.
The relative rankings of rice cultivars for EBEs calculated for yield, maturity period, and plant height are given in Fig. 7a. The even distribution of rice cultivars in the 3-d sphere (i.e., box based on ranks) highlights the broad genetic diversity of Sri Lankan rice breeding germplasm. However, it has to be completely characterized, and the EBVs must be calculated to understand the complex-multidimensional diversity structures to carry out the Pedimap decision procedures in designing crosses efficiently. The EBV estimates for maturity period imply that The lack of seasonal variations for the maturity period is an advantage for breeding programs as two seasons of selection are possible in every year to fast track the variety development process. (Fig. 7b-d).
In the present study, we only used phenotypic data available for traits to calculate the EBVs. However, for efficient genomic selection, high throughput genomic data such as marker alleles, sequence polymorphisms, and haplotype variants are needed. Thereby EBVs can be translated into more robust genome EBVs (GEBVs). The GEBVs would facilitate the efficient introgression of desirable traits to new varieties through MAB with efficient background and foreground selection schemes 58,59 . conclusion The pedigree visualization with variations of phenotypic and molecular data using Pedimap is a user-friendly tool to plan rice breeding programs with higher accuracy and resource optimization. The present study explains the applicability of Pedimap as a decision-making tool to streamline the rice breeding programs in Sri Lanka and the calculated EBVs highly supports to the validity of decisioning based on Pedimap. However, it is also important to note that accurate characterization of the breeding germplasm for phenotypic and molecular data is the critical prior step to harness the value of Pedimap for breeding.

Data availability
All the data of the manuscript are available in Supplemental Material.