Deterministic and stochastic modelling of impacts from genomic selection and phenomics on genetic gain for perennial ryegrass dry matter yield

Increasing the efficiency of current forage breeding programs through adoption of new technologies, such as genomic selection (GS) and phenomics (Ph), is challenging without proof of concept demonstrating cost effective genetic gain (∆G). This paper uses decision support software DeltaGen (tactical tool) and QU-GENE (strategic tool), to model and assess relative efficiency of five breeding methods. The effect on ∆G and cost ($) of integrating GS and Ph into an among half-sib (HS) family phenotypic selection breeding strategy was investigated. Deterministic and stochastic modelling were conducted using mock data sets of 200 and 1000 perennial ryegrass HS families using year-by-season-by-location dry matter (DM) yield data and in silico generated data, respectively. Results demonstrated short (deterministic)- and long-term (stochastic) impacts of breeding strategy and integration of key technologies, GS and Ph, on ∆G. These technologies offer substantial improvements in the rate of ∆G, and in some cases improved cost-efficiency. Applying 1% within HS family GS, predicted a 6.35 and 8.10% ∆G per cycle for DM yield from the 200 HS and 1000 HS, respectively. The application of GS in both among and within HS selection provided a significant boost to total annual ∆G, even at low GS accuracy rA of 0.12. Despite some reduction in ∆G, using Ph to assess seasonal DM yield clearly demonstrated its impact by reducing cost per percentage ∆G relative to standard DM cuts. Open-source software tools, DeltaGen and QuLinePlus/QU-GENE, offer ways to model the impact of breeding methodology and technology integration under a range of breeding scenarios.

Questions associated with optimisation of breeding strategies are complex, compounded by cost and uncertainty of changing a system, even if the existing system may be delivering less genetic gain, be less nimble or address fewer traits than desired. Decision support software enables simulation of breeding strategy efficacy in response to factors including selection among and within genetic families, different combinations of year, season, site and replicate with associated costs per selection cycle. Such software help breeders design and implement more efficient and effective breeding programs based on predicted genetic gain. To date this has been focused on simulation for inbred species [18][19][20] , leaving a gap in obligate outcrossing species such as forage grasses. Recent efforts to address this gap have led to the development of two decision support software packages for plant breeding: the tactical tool DeltaGen 21 based on deterministic modelling; and the strategic tool QuLinePlus 22 a component of QU-GENE 18 based on stochastic modelling. QuLinePlus is a breeding module, specifically designed to simulate full and half sibling breeding programs for outcrossing species and QU-GENE is considered as an engine which generates the simulation inputs. Concurrently, new phenomic tools for electronic automation, acceleration and standardisation of data collection for key traits such as DM yield have been developed 8 , and are being deployed in field breeding programs 23 . Genomic prediction models have recently been developed and assessed for complex traits, including DM yield 24 and nutritive value 25 in perennial ryegrass (Lolium perenne). Understanding the implications of these new technologies on genetic gain over time is pivotal to ensuring their use is optimized.
The availability of Ph and GS tools for forage plant breeding, and of decision support simulation tailored for outbred species, gives rise to the opportunity to examine the relative costs and impact of options for integrating these tools into breeding systems. This paper aims to investigate, using modelling and simulation, the relative efficiency of five half sib (HS) breeding methods based on predicted genetic gain and associated costs to improve DM yield of perennial ryegrass. Simulation was conducted applying tactical deterministic and strategic stochastic decision support modelling software (i.e. DeltaGen and QU-GENE) based on a "mock data" set of 1000 HS families, created by combining DM yield data from two sets of field trials. A key objective of this paper is also to demonstrate the potential application of the combination of DeltaGen/QU-GENE, to expand the scope of tools available to breeders in decision support for breeding program design.
The five HS family breeding methods simulated were: (a) standard phenotypic among half-sib family selection (A p ), (b) among half-sib family selection based on phenomics (A Ph ), (c) standard phenotypic among halfsib family selection and within family genomic selection (A p W gs ), (d) among half-sib family selection based on phenomics and within family genomic selection (A Ph W gs ), (e) both among and within half-sib family selection based on genomic selection (A gs W gs ).

Material and methods
Deterministic modelling for estimating genetic gain using HS family breeding strategies. The genetic gain simulation analyses were conducted using the open source software DeltaGen 21 (available via https:// delta gen. agres earch. co. nz/ app/ delta gen) for single selection cycles. The constructed mock data matrices of 200 HS and 1000 HS families used to generate starting points for the simulation of breeding methods, were compiled using perennial ryegrass DM yield and growth score measurements generated from two sources of multi-year-season-location trials. The term starting points referred to in this paper, are a common set of HS family means, estimates of additive genetic/interaction variance components and narrow sense heritabilities, applied in the different breeding equations used for deterministic modelling and simulation of the five breeding methods using DeltaGen.
Construction of the 1000 and 200 HS family "mock data" matrices of perennial ryegrass. Prediction of differences in genetic gain ∆G among breeding strategies is based on the application of quantitative genetic models 3,4 . These genetic analyses require estimates of population parameters; phenotypic variance ( σ 2 P ), additive genetic variance ( σ 2 A ), different components of genotype-by-environment (GE) interaction such as HS family interactions with; year, season and location, depending on breeding objectives, and narrow sense heritability ( h 2 n ). In our investigation using deterministic simulation, a key criterion for modelling the impact of GS and Ph technologies on the relative efficiency of HS family breeding methods, was to use mock data sets constructed from actual HS family multi location-year-season field trials rather than using in silico generated data. These mock data matrices enabled "realistic" estimates of genetic parameters to be generated. It is important to note that these estimates were only used as starting points to conduct simulation of the breeding methods using DeltaGen. Two mock DM yield data matrices, of 1000 HS and 200 HS families, were created by combining data generated from multi-location, year and season, HS family evaluation studies reported by Faville et al. 24 and Arojju et al. 26 .
The 1000 HS family DM yield mock data set was based on field trials conducted across 2 locations over 2 years and 3 seasons per year. The mock data set was constructed so that each location had a 20 row-by-50 column design with 3 replicates. The herbage DM yield data matrix consisting of 200 HS families was constructed from a random sample of 200 families taken from the 1000 HS family matrix. The year ( n=2), season ( n=3) and location ( n=2) data associated with each of the 200 randomly sampled HS families, were combined into a 3 replicate, row and column (10 rows-by-20 columns) by location, season and year design matrix. A detailed description of construction of the two mock data sets is provided in supplementary material 1 in File S1, and the distribution of the 200 and 1000 HS family DM yield data are presented in Figure S1.
It is important to note that a key assumption used for quantitative genetic analysis and modelling of the mock data, was that both the 200 HS and 1000 HS families were generated from parents randomly sampled from the same random mating population, cross pollinated under isolation. The coefficient of inbreeding ( F ) www.nature.com/scientificreports/ was assumed to be zero (F = 0) and therefore the among HS family ( σ 2 f ) variation provided an estimate of ¼ σ 2 A (additive genetic variation) 27 .
Data analysis. The two HS family mock data matrices (provided as supplementary material 2) were analysed using Eqs. 1 and 2 below, to derive estimates of additive genetic variation, associated genotype-by-environment (GE) interaction and narrow sense heritability on a family mean basis. A linear mixed model (Eq. 1) using the Residual Maximum Likelihood (REML) [28][29][30] procedure in DeltaGen 21 was used to estimate genetic parameters to be used for modelling based on breeding equations. The linear mixed model used for analysis across locations, seasons and years: Y ijklmno is the value of an attribute measured from HS family i in column o and row n of replicate m at location j in season k of year l , and i = 1,…, n f ; j = 1,…,n l , k = 1,…,n s ; l = 1,…, n y ; m = 1,…, n b ; n = 1,…, n r ; o = 1,…,n c ; where f, l , s , y , b , r and c , are families, locations, seasons, years, replicates, rows and columns, respectively; M is the overall mean; f i is the random effect of family i , N(0, σ 2 f ); l j is the fixed effect of location j ; fl ij is the effect of the interaction between family i and location j , N(0, σ 2 fl ) ; s k is the fixed effect of season k ; fs ik is the effect of the interaction between family i and season k , N(0, σ 2 fs ) ; y l is the fixed effect of year l; fy il is the effect of the interaction between family i and year l , N(0, σ 2 fy ) ; b jklm is the random effect of replicate m within location j , within season k , within year l , N(0, σ 2 b ) ; r jklmn is the random effect of row n in replicate m within location j , within season k , within year l , N(0, σ 2 r ) ; c jklmno is the random effect of column o in row n in replicate m within location j , within season k , within year l , N(0, σ 2 c ) ; ε ijklmno is the residual effect for family i in row n and column o in replicate m in location j in season k , during year l, N(0, σ 2 ε ). The estimates of variance components generated from Eq. 1, were used to generate an estimate of narrow sense heritability, h 2 n , for herbage DM yield on a family mean basis using Eq. 2.
where n , number of and f, l, s, y and r, are families, locations, seasons, years and replicates, respectively. DeltaGen generated estimates of narrow sense heritability ( h 2 n ) on a HS family mean basis across locations, seasons and years 31 .
Breeding methods and associated prediction equations. The five breeding methods assessed in this study were: (a) Standard phenotypic among HS family selection (A p ): Elite HS families are selected based on phenotypic performance within or across environments. Equal numbers of remnant seed from each selected HS family are randomly sampled. The selected individuals become the parents of the next generation . The prediction  equation for genetic gain 4 , where, k f , is among family selection intensity; c , parental control factor; σ 2 A , additive genetic variance;σ PF , the phenotypic standard deviation among families. For HS family selection c = 0.5 as selection is on female gametes only 27 . (b) Among HS family selection using LiDAR (light detection and ranging) based phenomics (A Ph ): as per (a), elite HS families are selected but based on phenotypic data collected using a LiDAR based phenomic tool 8 . The precision of genetic gain estimated using breeding Eq. (3) will depend on the accuracy of the phenomics method applied. (c) Standard phenotypic among HS family selection and within HS family genomic selection (A p W gs ): This breeding method consists of the steps: (i) select elite HS families, using a predetermined selection pressure, based on standard phenotypic measurements as per (a); (ii) equal numbers of remnant seed from each selected HS family are randomly sampled, seedlings established and individually genotyped, individual genomic-estimated breeding values (GEBV's) determined using genomic prediction, and the best seedlings within each family based on their GEBV's are selected on a predetermined selection pressure; (iii) the selected individuals become the parents of the next generation. The genomic prediction model used in step (ii) is itself derived using standard phenotypic measurements from the HS field trial and DNA marker data from the maternal parents of the HS families 24,32,33 . The equation for predicting genetic gain, where, A p W gsY is the predicted ∆G for trait Y using a combination of standard phenotypic among HS family selection and within HS family genomic selection; σ 2 AY , additive genetic variance for primary trait Y; σ AY , standard deviation of additive genetic variance for primary trait Y ; 3 4 = √ 3 2 ; σ PF , among HS family phenotypic standard deviation; k f and k w , among and within family selection intensity, respectively; c f (= 0.5) (1) Y ijklmno = M + f i + l j + fl ij + s k + fs ik + y l + fy il + b jklm + r jklmn + c jklmno + ε ijklmno www.nature.com/scientificreports/ and c W (= 0.5) among and within family parental controls, respectively; h X , square root of heritability for secondary trait X ; r A−XY , genomic prediction accuracy -Pearson's correlation between Phenotypic Estimated Breeding Values -BLUP's Y and GEBV's X . In GS, the assumption is h X =1 34,35 . Please note that the subscript gsY in Eqs. 4 and 5 is for genomic selection ( gs ) for trait Y . This is based on correlated response where, trait Y (the true breeding value of an individual) is indirectly selected based on selection for trait X (the GEBV of that individual). (d) Among HS family selection using LiDAR based phenomics (Ph) and within family genomic selection (A Ph W gs ): as per (c) except that elite HS families are selected based on data collected using a LiDAR based Ph tool. (e) Both among and within HS family selection based on genomic selection (A gs W gs ): This breeding method is implemented in cycle 2 using the HS families generated using the A p W gs method (c). In this scenario, following application of A p W gs , a second cycle of selection is completed using only genomic prediction. To achieve among family selection based on GEBV's, a predetermined number of seeds is randomly sampled from each HS family and the seedlings genotyped. The marker information is used in the alreadyestablished genomic prediction model to generate GEBV estimates and a mean GEBV for each family is determined. Based on the required selection pressure, HS families are selected on their GEBV means.
Within-family selection of the chosen HS families, using a predefined selection pressure, is applied to family members based on their individual GEBV estimates, as per (c). The selected individuals form the parents of the next generation. The equation used for predicting genetic gain 21 , where the terms in the above equation have been defined in equation 4.
Genomic selection. For the purposes of this exercise, the cost of conducting GS was defined as the cost per GEBV. The costing was based on a non-commercial research laboratory, the Forage Genetics GBS facility at AgResearch, and includes consumables and time. All steps from extracting DNA through genotyping-bysequencing (GBS) to generate a GEBV are addressed in supplementary material 1, Table S1: DNA isolation, GBS library development, GBS library sequencing, bioinformatic processing and genotype calling from raw GBS data, and genomic prediction (GEBV determination). Seedling grow-out and tissue sampling were not included as it is expected tissue samples for DNA isolation would be provided. The GBS process, including data filtering steps, is largely as described by Faville, et al. 24 except that it is conducted at a 384-plex scale (376 samples plus four blanks and four positive controls per library) instead of 96-plex (94 samples plus one blank and one positive control).
Phenomics, sampling and field trial operational costs. The Ph referred to in this paper is a LiDAR based mobile platform for non-invasive vegetative biomass and growth rate estimation in perennial ryegrass. The accuracy of 0.90 was determined from LiDAR based volumetric estimates compared against fresh weight and dry weight data across different ages of plants, seasons, stages of regrowth, sites, and row plot (two 2 m rows 15 cm apart) configurations 23 . The costs (New Zealand $) per sample; based on perennial ryegrass herbage DM derived via harvest and via Ph were $7.50 and $0.87, respectively. These sampling costs were obtained from multi-year, season and location field trials, based on row plots, which generated the original data used to build the 1000 HS family data matrix. The cost information is provided in supplementary material 1, Table S2.
Stochastic modelling of multiple selection cycles using HS family breeding strategies. Stochastic modelling of the three breeding strategies A p , A p W gs and A gs W gs were conducted using QuLinePlus 22 , available via https:// sites. google. com/ view/ qu-gene, with modifications to the software to implement GS. The objective of this modelling was to generate trends of response to selection over multiple breeding cycles, based on populations of similar size used in deterministic modelling. The breeding strategies; A p , A p W gs and A gs W gs were simulated across ten selection cycles using QuLinePlus software. The three strategies were compared for percentage of genetic gain per year (%ΔG), cumulative genetic gain (ΣΔG), allele fixation rate, genetic variance and prediction accuracy across multiple selection cycles. Each generated value is an average 250 iterations.
Generation of the initial training population for simulation in QuLinePlus. Two sets of training populations consisting of 196 and 980 HS families, hereafter referred to as Sim200 and Sim1000, were simulated. Beginning with experimentally derived information on phenomic and associated genomic data from a commercial breeding population consisting of 98 plants, QuLinePlus software was used to generate 98 HS families. These HS families were used to simulate their performance for DM yield for three years across three locations. From each of the evaluated families two random individuals were drawn to generate a training population of 196 plants (Sim200) and ten random individuals were drawn to generate a second training population of 980 plants (Sim1000). It is important to note that the 200 and 1000 HS families used in deterministic modelling were related, the former a random sample from the latter.
Defining the genetic model and trait architecture. A recombination map with seven linkage groups was constructed based on a genetic map 36 using the genomic markers from the initial population (98 individuals). In total, 1807 segregating markers were assigned to the seven linkage groups, out of which 474 were QTLs with an www.nature.com/scientificreports/ additive gene model. The allelic effect of each QTL was estimated based on genome-wide association analysis for DM yield implemented using GAPIT 37 . Narrow sense heritabilities on a family mean basis, estimated from mixed model analysis in DeltaGen using data for mean DM yield of the simulated 200 HS and 1000 HS families, were converted to single plant-based heritabilities (considering 30 plants per plot).
Genomic prediction. A genomic prediction model was generated using marker and phenotypic data from the training populations, Sim200 and Sim1000. The GEBVs for each individual plant were estimated using a standard BLUP procedure using the R package rrBLUP 38 . The heritability estimate of GEBVs was considered as 0.95, rather than 1.00, to account for genotyping error.
Breeding strategies. The breeding strategies A p , A p W gs and A gs W gs were simulated in QuLinePlus using the modelling options available for each method.
(a) Among half-sib family phenotypic selection (A p ): Using Sim200 and Sim1000 training populations, the A p strategy was performed by randomly intermating all individuals to generate 196 and 980 HS families. These HS families were evaluated for DM yield in two environments for two years using three replicates and 30 plants per plot. From these trials, among family selection pressures of 20% and 2% were applied to select the best families. To restore the initial number of parents in the training population (196 and 980) for the next selection cycle, random samples of 5 (2.77%) and 50 (27.77%) individuals from the best 20% and 2% HS families were taken and used as parents to generate progeny for the next selection cycle. Each selection cycle was completed in three years (years 1 and 2-field evaluation and selection, and year 3-random mating of selected parents and half-sib family generation). (b) Among half-sib family phenotypic selection and within family genomic selection (A p W gs ): In this breeding strategy, HS families were phenotypically evaluated as described in (a) and among family selection pressures of 20% and 2% were applied based on phenotype. However, within family selection was based on GEBVs estimated using the rrBLUP genomic prediction model. The GEBVs were ranked and the top 5 (2.77% within family selection pressure) or 50 (27.77% within family selection pressure) individuals from the best 20% and 2% families, respectively, were selected as parents for the next cycle. Each selection cycle was completed in three years (years one and two-field evaluation and selection, and year three-random mating of selected parents and half-sib family generation).

(c) Among and within half-sib family selection based on genomic selection (A gs W gs ):
The A gs W gs strategy was similar to A p W gs , with the only difference at the stage of among family selection, which was based on GEBVs rather than phenotypic measurements. Among and within family selection pressures were the same as the A p W gs strategy, with 20% and 2% among family selection pressure and 2.77% and 27.77% within family selection pressure. It is important to note that this breeding strategy was conducted in one year.
Simulation output. Percentage genetic gain (%∆G) for all three breeding strategies was based on BLUP mean differences between selection cycles. The percentage genetic gain per year was calculated by dividing total predicted %∆G by the number of years per cycle. Cumulative genetic gain (Σ∆G) was calculated as the BLUP mean in each cycle relative to that in cycle zero. Allele fixation rate was computed within the QuLinePlus software to determine the percentage of fixed alleles in each cycle for the trait under selection. Genomic prediction accuracy was computed as the Pearson correlation coefficient between the true breeding value and their GEBVs. Percentage change of genetic variance across selection cycles was computed relative to cycle zero, considered as 100%.

Results
Deterministic modelling. Variance component analysis indicated significant (P < 0.05) additive genetic variation ( σ 2 A ) for herbage dry matter (DM) yield among the HS families within each of the two mock data matrices, consisting of 200 and 1000 entries (Table 1). There were significant (P < 0.05) family-by-season ( σ 2 AS ) and family-by-location ( σ 2 AL ) interactions estimated from the 200 HS matrix but family-by-year ( σ 2 AY ) was not significant (P > 0.05) in this dataset. The estimates of family-by-season ( σ 2 AS ), family-by-location ( σ 2 AL ) and family-by-year ( σ 2 AY ) interactions were all significant (P < 0.05) for the 1000 HS family DM yield matrix. HS family narrow sense heritability ( h 2 n ) estimates for DM yield, based on family mean performance across years, seasons and locations, were moderate for both data matrices, the 1000 HS family derived estimate being higher ( Table 1). The estimates were within the range reported for perennial ryegrass 24,39,40 . The estimated genetic parameters, from REML analysis, presented in Table 1 were used as starting points in the different breeding equations to predict genetic gain.
Predicted genetic gain based on the 200 HS family mock data matrix. Applying the among HS family phenotypic selection method (A p ), at a selection pressure of 20%, to the seasonal DM yield data, predicted a 1.43% increase in DM yield above the population mean of 200 HS families (Table 2). Using phenomics with an accuracy of 0.90 for estimating DM yield generated from A Ph , a 1.29% increase in DM yield was predicted. The cost per %ΔGc using A p was higher than that when A Ph was applied. However, using A Ph reduced the cost per %ΔGc by 25%. The final number of selected parents decreased from 40 at 20% selection pressure to 4 individuals at 2% selection pressure ( Table 2).
The results of predicted ∆G presented in Tables 3 and 4 were generated from two cycles of HS family selection for increasing herbage DM yield. Cycle 1 (C1) (  (Table 4). In C1 (Table 3), the predicted %ΔG per cycle rose as r A was increased. In this cycle, the combination of increasing among HS family phenotypic selection pressure (2%) and within family genomic selection pressure (1%), at high r A (0.46), resulted in the highest ΔG of 6.35%, based on data from the A p seasonal herbage DM yield sampling method. The cost per percentage gain was a low $27,397 compared to the high $111,819 at among and within HS family selection pressures of 20% and 10%, respectively, at r A of 0.26. However, while the low selection pressures and r A resulted in the selection of 400 parents, the high selection pressure and r A lead to identifying only 4 parents ( Table 3). As expected, the predicted ΔG per cycle, under the A Ph strategy phenomics assessments of seasonal herbage DM in C1, was slightly lower than A p , but there was a substantial increase in cost efficiency per % genetic gain (Table 3).
Cycle 2 (Table 4) was based on genomic selection among and within (A gs W gs ) 100 HS families generated from selected parents from C1 at 20%/10% and 10%/5% selection pressure and at r A values of 0.26, 0.36 and 0.46. In C2, ΔG was estimated relative to the new mean of HS progeny generated from C1, within each combination of % selection pressure and r A . The predicted ΔG per cycle ranged from 2.05% at r A 0.26 to 3.59% at r A 0.46, at among and within family selection pressures of 20%/10%. Applying higher selection pressures (10%/5%) at r A values of 0.26 and 0.46, increased the range of predicted ΔG per cycle to 2.44% and 4.27%, respectively. Predicted ΔG combined across selection C1 and C2 resulted in totals ranging from 4.93% to 7.58% at r A 0.26 and 0.46, respectively, at among and within selection pressures of 20%/10% (Table 4). At the same r A values of 0.26 and 0.46 at among and within HS family selection pressures of 10%/5%, the predicted ΔG combined across cycles C1 and C2 were 5.93% and 9.06%, respectively. Comparing the costs per percentage predicted ΔG, combined across cycles C1 and C2, the total of $148,418 at r A 0.26 and 20%/10% among and within family selection pressures was Table 1. Ranges, medians, means (based on BLUP estimates), variance components and associated standard error (± SE) and family mean narrow sense heritability for herbage DM yield estimated from the 200 and 1000 HS family data; evaluated across two sites over 2 years and 3 seasons per year. The estimated variance components were used as starting points for simulation of breeding methods.    (Table 4). These results assumed that the r A values, 0.26, 0.36 and 0.46, did not change from C1 to C2. The effect of possible genomic accuracy decay on ΔG in C2 was assessed using a r A of 0.12. This resulted in diminished ΔG and increased costs ($) per % ΔG. However, even at the reduced r A of 0.12, the additional predicted ΔG in the single year of C2 provided an increase in total gain across cycles C1 and C2.
Predicted genetic gain based on the 1000 HS family mock data matrix. The trend of response to selection for DM yield based on the 1000 HS families was the same as that observed from the 200 HS family simulations. Using DM cut phenotypic data (A p ) the predicted ΔGc ranged from 1.90 at 20% among HS family phenotypic selection pressure to 3.62% at 1% selection pressure ( Table 5). The large population of HS families resulted in higher numbers of parental half-sibs being selected, compared to the 200 HS family dataset. This ranged from 200 parental HS families at 20% selection pressure to 10 families at 1%. The application of phenomics based DM assessment (A Ph ) clearly indicated its impact on reducing costs ($) per %ΔGc by 58%. Analyses involving 1000 HS families (Tables 6 and 7) generated results analogous to those from the 200 HS families (Tables 3 and 4). The application of the A p W gs breeding strategy to the 1000 HS families in C1 resulted in predicted ΔGc ranging from 3.69% to 8.10% at among/within selection pressures of 20%/10% and 2%/1% at r A values of 0.26 and 0.46, respectively. While the predicted ΔGc from applying A Ph W gs at similar combinations of selection pressures and r A values was lower, there was a considerable reduction in cost ($) per %ΔG that ranged from 10 to 40%. The number of selected parental HS families in C1 at the among and within family selection pressure combinations of 5%/1% and 2%/1%, resulted in selection of less than 100 HS families, 50 and 20, respectively (Table 6).
Genomic among and within family selection was conducted in C2, in a single year, on 100 HS families from C1 generated at 20%/10% and 10%/5% selection pressure at r A values of 0.26, 0.36 0.46. Trends in ΔGc mirrored those observed with the 200 HS family dataset, but the level of gain was consistently higher. The A gs W gs applied resulted in predicted ΔGc ranging from 2.53% at r A 0.26 to 4.41% at r A 0.46 at among and within family selection pressures of 20%/10% (Table 7). At selection pressures of 10%/5% and r A values of 0.26 and 0.46, predicted ΔGc cycle increased to 3.00% and 5.23%, respectively. Combining %ΔG across C1 and C2 resulted in total ΔG ranging from 6.22% to 9.48% at r A 0.26 and 0.46, respectively, at selection pressures of 20%/10%. At the same r A values of 0.26 and 0.46 and selection pressures of 10%/5%, the predicted %ΔG combined across selection C1 and C2 was 7.49% and 11.33%, respectively. The total combined costs per %ΔG across cycles 1 and 2, were $273,392 at r A equal to 0.26 at 20%/10% among and within family selection pressures was over twice the cost, $113,901 at r A 0.46 at 10%/5% selection pressures. The elite parents selected on GEBV's at the end of cycle 2 were 200 individuals and 50 individuals based on the selection pressures of 20%/10% and 10%/5%, respectively ( Table 7). As in the 200 HS family analysis, the effect of a possible scenario of r A decay on ΔG in C2 was assessed using a r A of 0.12, ( Table 7). The outcomes were similar to those observed in the 200 HS analysis.
Predicted annual %ΔG for seasonal DM yield resulting from deterministic modelling of the three breeding strategies; A p , A p W gs and A gs W gs , based on data from the 200 and 1000 HS families clearly indicate the lower response to selection resulting from the among HS family phenotypic selection (A p ) breeding strategy (Fig. 1A,B). There was an increase in annual %ΔG when GS was combined with the A p breeding method. The predicted annual %ΔG improved with increasing selection pressure and r A . The breeding strategy A gs W gs resulted in the highest predicted annual %ΔG at all among and within family selection pressures and r A . It is also clear that when r A decreased to 0.12 in C2 there was a reduction in predicted annual %ΔG, as would be expected. However, even at a r A value of 0.12 in C2, the predicted annual %ΔG was higher than that predicted for HS in C1 at all selection pressures (Fig. 1A,B). Stochastic modelling. Genetic gain. The percentage annual genetic gain (%ΔG) for DM yield in Sim200 and Sim1000 training populations were different for each breeding strategy (Fig. 2). The trends in genetic gain between two training populations (Sim200 and Sim1000) were similar, however, the variation in genetic gain Table 5. Simulated predictions of percentage genetic gain per cycle (3 years) of selection (%ΔGc) and cost per percentage of predicted genetic gain ($ per % gain), using among half-sib (A p ) family selection, for dry matter (DM) yield based on the 1000 HS family data (evaluated across years, seasons and locations), from DM cuts (A p ) and phenomics based phenotyping (A Ph ). Selection pressure (%) and the associated number of selected parents are indicated. $, New Zealand dollars. These results are based on the estimated variance components from REML analysis, used as starting points for simulation of breeding methods. Ph*, LiDAR based phenomics accuracy = 0.90.   www.nature.com/scientificreports/ across multiple iterations was higher in Sim200 compared to the Sim1000 training population. Among different breeding strategies, the highest annual genetic gain (%ΔG) was achieved for A gs WS gs , followed by A p W gs and A p strategies under both 20% and 2% selection pressures. The differences in %ΔG between the breeding strategies were less evident after the second selection cycle (Fig. 2). When comparing among family selection pressures, the highest genetic gain for all three strategies was observed under 2% selection pressure compared to 20% selection pressure. While the variability in genetic gain across multiple iterations was consistently higher under 2% selection pressure compared to that at 20%, this was more evident for the A gs W gs strategy. The highest ΣΔG in each selection cycle was achieved in A p W gs strategy, followed by A p and A gs W gs (Supplementary material 1, Figure S2).
Fixation rate of favourable alleles. The influence of selection pressure and training population size on allele fixation rates were observed, with the latter being more pronounced (Fig. 3). The fixation rate was highest in the Sim200 training population under 2% selection pressure and the lowest rate was observed in Sim1000 under 20% selection pressure. In both the training populations, under 20% among family selection pressure, the fixation rate steadily increased across the selection cycles. However, under 2% selection pressure the alleles were fixed more rapidly from selection cycle 3 for all three breeding strategies. Among three breeding strategies, the allele fixation rate followed similar patterns and the differences in fixation rate were more noticeable in the later selection cycles (Fig. 3).
Genetic variance and prediction accuracy. The percentage of genetic variance present within the Sim200 and Sim1000 training population decayed rapidly from cycle 0 to cycle 1 for all three breeding strategies. The patterns were similar under the two different selection pressures (Fig. 4). In Sim200 training population, among three breeding strategies, the lowest genetic variance was observed for A gs W gs followed by A p W gs and A p , similar trends were observed in Sim1000 training population. The prediction accuracy at cycle 0 was 0.3 in Sim200 and 0.25 in Sim1000 and was reduced to 0.12 (60% decrease in prediction accuracy) and 0.07 (72% decrease in prediction accuracy) in selection cycle 1 (Fig. 5). There was no difference in the prediction accuracies between the two different selection pressures. The accuracy steadily decreased from cycle 2 and at the end of cycle 10 was a low 0.03 in Sim200 and 0.002 in Sim1000 training populations (Fig. 5).

Discussion
Deterministic modelling. Choosing and optimizing an appropriate breeding strategy to suit a specific cultivar development goal, will require the comparison of different breeding methods based on predicted genetic gains 3 . While a key criterion for determining the success of a breeding method is genetic gain, cost is also a major factor. Especially in commercial breeding programs, where cost efficiency of a breeding strategy, and access to commercial opportunities, will often determine its adoption. With increasing pressure on plant breeders to develop new productive cultivars with adaptation to changing environments and meeting consumer demands, new selection and phenotyping technologies must be implemented to enhance genetic advance 41 . However, incorporation of these technologies into conventional field breeding programs are often challenged as being expensive to implement. Application of quantitative genetic deterministic 21 and stochastic modelling 22 can be used to evaluate the relative efficiency, genetic gain and associated cost, of conventional breeding strategies in combination with the integration of new selection technologies.
As mentioned in the introduction, HS and FS family breeding strategies are commonly used in forage grass cultivar development programs. In this paper, we present results from deterministic modelling using DeltaGen and stochastic modelling via QU-GENE/QuLinePlus, to predict ∆G and calculate cost per predicted %∆G for DM herbage yield of perennial ryegrass. Deterministic analysis was conducted using a common set of starting points based on estimates of quantitative genetic parameters, from analysis of the 200 HS and 1000 HS family mock data matrices generated using actual field trial DM yield (kg ha -1 ). We also evaluated the effect of using phenomics (Ph) on the cost per cycle of selection. The narrow sense heritability values 0.31 and 0.36, estimated for herbage DM yield for the 200 HS and 1000 HS families, respectively, were within the range previously reported 24,42,43 . The percentage genetic gain (%ΔG) per year for DM yield in Sim200 and Sim1000 training populations estimated across 10 selection cycles using three breeding strategies (A p , A p W gs and A gs W gs ). Selection pressures of 20% and 2% were imposed to select the best HS families. Within each family, for the A p W gs and A gs W gs strategies the top 5 or 50 individuals were selected and for the A p strategy 5 or 50 individuals were randomly selected to restore the initial number of parents for the next selection cycle. In each selection cycle %ΔG was based on 250 iterations. There was a considerable reduction in cost ($) per %ΔG that ranged from 10 to 40%. Annual increases in %∆G of over 2% per year will be required in crops such as maize (Zea mays), rice (Oryza sativa), wheat (Triticum aestivum), and soybean (Glycine max L.), to meet future global food demands 49 . For perennial ryegrass and other out crossing forage species, achieving a 2% annual gain in DM yield using A p methods, which exploit only a quarter of the total available additive genetic variation 27 , is an unrealistic objective. Accessing the ¾ additive variation within HS families will make a significant contribution to increasing annual %∆G 4,50 . Low genetic gains in forage breeding programs are due, in part, to an inability to satisfactorily exploit within family genetic variation 50,51   Percentage of favourable alleles fixed in the Sim200 and Sim1000 training populations at each selection cycle, under three breeding strategies (A p , A p W gs and A gs W gs ). A selection pressure of 20% and 2% was imposed to select the best HS families. Within each family, for the A p W gs and A gs W gs strategies the top 5 or 50 individuals were selected and for A p strategy 5 or 50 individuals were randomly selected to restore the initial number of parents for next selection cycle. In each selection cycle percentage of favourable alleles were based on 250 iterations. As alluded to above, in addition to enabling within family selection, a key advantage in using GS is reducing the length of a selection cycle and the cost %∆G 54,55 . This was further demonstrated in our deterministic simulation by applying a wholly GS second selection cycle, A gs W gs , to the HS families generated following C1 in both the 200 HS and 1000 HS family scenarios. The fact that A gs W gs enabled selection of another generation of elite parental seedlings based on GEBV's in the space of one year, following C1 (3 years), resulted in high annual %∆G in cycle 2. Annual %∆G resulting from A gs W gs was higher than both the A p and A p W gs breeding strategies Figure 4. The percentage of genetic variance at each selection cycles estimated in Sim200 and Sim1000 training populations using three breeding strategies (A p , A p W gs and A gs W gs ). A selection pressure of 20% and 2% was imposed to select best half-sib families and within each family, for the A p W gs and A gs W gs strategies top 5 and 50 individuals were selected and for A p strategy random 5 and 50 individuals were selected to restore the initial number of parents for next selection cycle. In each selection cycle percentage genetic variance was based on 250 iterations. It is important to note that the deterministic simulation results discussed assumed that the r A values 0.26, 0.36 and 0.46 in C1 continued to hold in C2. There is always the probability of the correlation between GEBV's and phenotype derived BLUP's or true breeding values, decreasing due to decay in r A , due principally to declining genetic relatedness between training and selection populations 56 , following polycrossing of C1 selected parents. The use of a r A of 0.12 was introduced into simulation in C2 to investigate this scenario. Simulation results indicated that even at the r A value of 0.12 in C2, the predicted annual %ΔG using A gs W gs , was higher than that predicted for A p in C1 at all selection pressures. Stochastic modelling. While genetic gain is a key determinant of the merit of a breeding strategy, information on fixation rate of favourable alleles and the rate of decay of genetic variance and r A are vital, especially when designing long-term recurrent selection breeding programs. For example, knowing at what stage in the recurrent selection process to recalibrate the prediction model when using GS is essential. Application of strategic decision support software tools such as QU-GENE, provide a platform for plant breeders to assess the progress, over multiple selection cycles, for ∆G and associated genetic parameters such as; additive genetic variance, fixation rates and r A . In crop species, several studies have explored stochastic modelling to understand long-term trends of implementing GS in breeding programs 19,[57][58][59][60] . In forages, Lin, et al. 55 and Esfandyari, et al. 61 used stochastic modelling to assess the impact of GS in commercial and FS breeding programs.
In the current study, stochastic modelling using QU-GENE/QuLinePlus provided graphical trends for; ∆G, allele fixation rate, genetic variance and r A , over 10 cycles of selection in response to three breeding strategies; A p , A p W gs and A gs W gs , when applied to either 200 HS or 1000 HS families of perennial ryegrass. In our simulations, the greatest %∆G observed for GS (A p W gs and A gs W gs ) breeding strategies compared to A p across two different www.nature.com/scientificreports/ selection pressures and populations sizes (Fig. 2). These results are due to precise selection of genotypes, for DM yield, within a family when implementing GS, compared to random selection of individuals from within selected families when using the A p strategy. This further supports the findings of Esfandyari, et al. 61 who in the context of full sib (FS) family breeding strategies, reported that accurate selection of single plants, based on GS, establishes a strong genetic correlation in terms of performance between single plants and plots, leading to increased ∆G. The GS breeding strategies applied in C1 outperformed the A p strategy in terms of predicted %∆G. However, in later cycles, these differences were less pronounced, due to a decline in r A in the selection cycles post C1. These trends observed in our simulations were similar for both the Sim200 and Sim1000 HS families, mainly due to similar size and number of additive genetic effects used in the recombination map for simulating the breeding strategies. Generally, with larger training populations, r A increases 6,62 or remains constant relative to smaller populations 58 . However, in our simulations there was a difference in r A following each selection cycle. In the Sim1000 HS families the observed r A values were relatively lower compared to the Sim200 HS family simulations. While the r A disparity between the populations is unclear, it should be noted that, Sim200 and Sim1000 were two independently simulated populations evaluated under different GE simulation systems, as described in "Material and Methods". Populations evaluated under different GE systems, irrespective of the size of training population, produce different r A 63,64 . In forages, r A is a result of genetic linkage and linkage disequilibrium (LD) 24,65,66 . As selection cycles progress the genetic linkage between the training and selection population declines, resulting in poor r A as observed in our simulations (Fig. 5). To maintain higher r A across multiple selection cycles, training populations need to be updated with new elite material or by including the families from previous selection cycles 58,61 . The implication of this strategy in HS breeding systems is yet to be investigated and would be considered in future studies.
Genetic variance within breeding populations provides the foundation for cultivar development 67 . In our simulations, there was a rapid decline in the percentage of genetic variance from selection C0 to C1, particularly for the A p W gs and A gs W gs methods and in the later selection cycles there was a steady decline in all three breeding strategies (Fig. 4). The rapid decline from C0 to C1 in the GS strategies is the result of utilizing both among (1/4 σ 2 A ) and within (3/4 σ 2 A ) HS family additive genetic variation, and this was reflected in the %∆G (Fig. 2). In addition, the polygenic architecture of the DM yield trait may be an underlying factor. Muleta, et al. 58 reported similar trends, notably a rapid early decline in genetic variance, for a polygenic trait through their simulation of genomic assisted recurrent selection in sorghum (Sorghum bicolor). Cycles of continuous recurrent selection increase the frequency of favourable alleles and at the same time increase the probability of fixation of deleterious alleles. Selection pressure and population size have big impact on the allele fixation rates (Fig. 3). Our simulations demonstrated that smaller populations and high selection pressure will lead to higher allelic fixation rates earlier in the selection cycles. Allelic fixation rate is the direct measure of inbreeding in a population 27,68,69 . With decreasing population size and higher selection pressures the potential for inbreeding depression increases 55,68,69 . In cross pollinating species such as perennial ryegrass, which consists of heterozygous and heterogenous populations, inbreeding depression effects can be severe 70 . Our results re-emphasize the importance of larger training populations for GS implementation in order to maintain optimum levels of inbreeding rates and the genetic variance in a recurrent selection program.
In addition to having information such as heritability of key traits, the possibility of predicting the rate of decrease of their genetic variances and associated allele fixation rates, over multiple selection cycles, as shown in our study for HS families, also simulation studies by Esfandyari, et al. 61 for FS families and Lin, et al. 55 in a commercial breeding program, will enhance decisions on choice of breeding pool to achieve specific cultivar development goals, by deploying cost effective breeding strategies.

Conclusion
Optimizing breeding program inputs for rate and cost-efficiency of genetic gain can be informed by simulation. Using mock data matrices constructed from empirically derived data, we demonstrated short-and long-term impacts of breeding strategy and integration of key technologies including genomic selection and phenomics on rate of predicted genetic gain for dry matter yield, a key economic trait, in perennial ryegrass. Our findings indicate these technologies offer substantial improvements in the rate of gain, and in some cases improved costefficiency per unit gain. The value of GS in exploiting within family additive genetic variation to increase genetic gain was demonstrated using both deterministic and stochastic simulation.
The application of GS in both among and within HS family selection in C2, provided a significant boost to total annual genetic gain across both cycles (C1 = 3 years and C2 = 1 year), even at low GS accuracy r A of 0.12. Despite some reduction in genetic gain, using phenomics (LiDAR based mobile platform) to assess seasonal DM yield clearly demonstrated its impact by reducing cost per percentage gain relative to standard DM cuts.
The open-source software tools, DeltaGen and QU-GENE, offer ways to query and model the impact of breeding methodology and technology integration under a range of breeding scenarios and inputs in out crossing species including pasture species. This software expands the scope of tools available to breeders in decision support for breeding program design. The analyses reported in this paper can also be extended to major crop species using the genetic modelling capability for self-pollinating species, developed in both DeltaGen and QU-GENE.

Data availability
The datasets generated in this study are included as supplementary information files.

Code availability
Software links to perform the deterministic and stochastic modelling were provided within this article.