Breeding signature of combining ability improvement revealed by a genomic variation map from recurrent selection population in Brassica napus

Combining ability is crucial for parent selection in crop hybrid breeding. The present investigation and results had revealed the underlying genetic factors which might contribute in adequate combining ability, further assisting in enhancing heterosis and stability. Here, we conducted a large-scale analysis of genomic variation in order to define genomic regions affecting the combining ability in recurrent selection population of rapeseed. A population of 175 individuals was genotyped with the Brassica60K SNP chip. 525 hybrids were assembled with three different testers and used to evaluate the general combining ability (GCA) in three environments. By detecting the changes of the genomic variation, we identified 376 potential genome regions, spanning 3.03% of rapeseed genome which provided QTL-level resolution on potentially selected variants. More than 96% of these regions were located in the C subgenome, indicating that C subgenome had sustained stronger selection pressure in the breeding program than the A subgenome. In addition, a high level of linkage disequilibrium in rapeseed genome was detected, suggesting that marker-assisted selection for the population improvement might be easily implemented. This study outlines the evidence for high GCA on a genomic level and provided underlying molecular mechanism for recurrent selection improvement in B. napus.

Crop breeding programs have generated excellent resources that can be used to improve agronomic traits and identify favorable loci affected by artificial selection. Analysis of genetic diversity, allele frequency, and heterozygosity are used to find genomic alterations and genetic effects on the traits in different generations or sub populations 11 . Additionally, this has been found to be a good approach for scanning genome regions, even candidate genes that underline selection 7 . In chicken, 82 putatively selected regions with reduced levels of heterozygosity are identified 12 . In a cattle population, genetic changes are detected, and 13 genomic regions were found to affect milk production 13 . Moreover, several functional genes were verified in some selected regions in cattle 14 . Similar studies have been carried out in other animals 15,16 . In miaze, a set of genes (2~4% of 774 genes) are found to have undergone artificial selection during domestication 3 . Scanning of few known functional genes involved in maize domestication has indicated selection signatures on the genomic level 4,17 . Furthermore, several chromosome segments and genes were revealed by comparing genetic variation between wild and cultivated populations in soybean 5 . As for rice, a genealogical history analysis of overlapping low diversity regions can distinguish genomic backgrounds between indica and japonica rice populations, and 13 additional candidate genes were identified 18 . Another study found 200 genomic regions, spanning 7.8% of the rice genome that had been differentially selected between two putative heterotic groups 19 . These studies have successfully investigated genome-wide genetic changes during domestication and modern breeding. The results can provide useful information to reveal the agronomic potential of a breeding line and genomic loci.
Rapeseed (Brassica napus; AACC, 2n = 38) is one of the most important oil crops worldwide. Rapeseed originated from a doubling event between Brassica rapa (AA, 2n = 20) and Brassica oleracea (CC, 2n = 18) along the Mediterranean coastline 10,000 years ago 20,21 . It is considered as a young species because of a short domestication history spanning only 400-500 years 22 . In addition to several other factors, modern breeding has substantially increased production, especially through heterosis. In a hybrid breeding program, combining ability is a crucial factor for parental line selection and for the development of superior hybrids. Evaluation of the combining ability using traditional methods is labor intensive and time-consuming, and may create a bottleneck in hybrid breeding 23 . Therefore, dissection and comparison of the genetic basis of combining ability can be crucial for breeding. Combining ability was defined as a complex trait in plants, and was evaluated by several techniques, including molecular markers, QTL mapping, and genome scan approaches [24][25][26] . There have been limited investigations carried out to evaluate the genetic basis of combining ability in rapeseed. During rapeseed breeding history, heterosis and double-low varieties (low erucic acid and low glucosinolate) were mainly used to produce higher yield and better quality at the cost of genetic diversity 27,28 . Recently, new genetic resources are used to increase the genetic basis of rapeseed, including the artificially synthesized B. napus generated from B. oleracea and B. rapa 29 , the subgenome materials 30,31 . Multigenerational improvement and a recurrent selection program are required before utilizing these new materials.
In our work, genomic SNP markers were used to analyze the breeding signatures of GCA as revealed by the genetic variation in a recurrent selection population. The objectives of our study were (1) to estimate genetic diversity of genome-wide SNPs in different groups of the rapeseed restorer population, (2) to detect the putatively selected regions and SNPs associated with breeding efforts on the genomic level, and (3) to identify known important QTLs associated with rapeseed agronomic traits in selected regions. These findings might be of potential use in improving the rapeseed breeding.

Results
Phenotype variations in yield and yield-related GCA. Plant yield from the population of 175 families and 525 hybrids, were analyzed with two replicates in three different environments. GCA of each parental line was estimated statistically using the phenotype data sets. Extensive phenotype variations were observed ( Table 1). The mean yield of three environments were 13.93 g, 7.41 g, 8.08 g per plant, respectively, and varied from 4.36~36.87 g in Wuhan, from 1.62~15.92 g in Xiangyang and from 2.24~37.82 g in Yichang. The plant yield had high coefficients of variation in the three environments, suggesting that the yield of the rapeseed was a typical quantitative trait and was substantially affected by the environment. The mean value of GCA (Table 1)  Genetic variation detecting across the regions of the specific loci. SNPs were used to detect the genetic variation of three specific loci in rapeseed genome: erucic acid related genes at the BnFAE1.1, BnFAE1.2 loci on A8 and C3 chromosomes 32 , and the Ogrua CMS restorer gene Rf o loci on C9 chromosome 33 . It consistently showed that closer the target loci, lower the genetic variation (Fig. 1). These findings indicated that our selection program have been carried out efficiently. Moreover, the evaluation method via genetic diversity could be of potential use in breeding improvement.
Linkage disequilibrium (LD) in the R population. R 2 was used to calculate the LD level. For r 2 = 0.2, LD level occurred at approximately 0.8 Mb, 4.8 Mb, and 2.4 Mb for A and C subgenomes, and AC genome, respectively (Fig. 2). When r 2 decayed to 0.1, LD values increased to 3 Mb, 8 Mb and 6.5 Mb for A and C subgenomes, and AC genome, respectively. The C subgenome had a larger LD value than A subgenome. As for chromosomes of the two subgenomes, LD of chromosomes in A subgenome was highly consistent, while variation was detected in C subgenome. The LD of the C4 chromosome was higher than 6 Mb when r 2 = 0.1 (Fig. S2). C1 and C2 chromosomes showed almost no LD decay. Genetic variation of the two subgenomes was also evaluated. The A subgenome had a little higher genetic diversity than the C subgenome (Table 3). By detecting the changes in genetic diversity between the selected and basic populations, we found a greater decrease in the C subgenome (2.7%) as compared to the A subgenome (1.55%).

Selected regions and candidate QTLs analysis. Scanning of genomic regions indicated a reduction in
genetic diversity. In total, we identified 376 selected regions, covering 3.03% (21.26 Mb) of the assembled genome (Table 4; Fig. S3). More than 96% of these regions were distributed on the C subgenome ( Fig. 3; Table 4). C6 chromosome had the largest size of selected regions (4.56 Mb), while A3 had the smallest (0.02 Mb). Furthermore, A1, A5, A7, A8, A9, A10 and C5 chromosomes had no distribution of selected regions. The mean size of selected regions for each chromosome on A and C subgenomes and AC genome were 0.14 Mb, 2.55 Mb, and 1.52 Mb, respectively. The C subgenome had a larger distribution of selected regions than the A subgenome. Many QTLs related to yield and yield-related traits were located in these selected regions (Supplementary Table S1) which likely contributed to the increase in rapeseed yield and GCA. Among the 19 chromosomes of the rapeseed genome, we found differences in the distribution of genetic diversity in the selected regions. In particular,  35 . QTL hot spots contained important QTLs for rapeseed yield and    Table 3. Genetic diversity of the genome in R population and the selected population. Chr represent the chromosome; A, C and AC represent the A and C subgenome, and the whole genome of rapeseed, respectively. a The average value of genetic diversity (π ) for the R population. b The average value of genetic diversity (π ) for the the selected population. c The decrease ratio of genetic diversity.
yield-related traits were also detected in the region on chromosome C3 (Supplementary Table S1). All these QTLs in the selected regions provided a potential resource for rapeseed breeding, and selection for these QTLs for rapeseed genetic improvement might lead to low genetic diversity in these regions, but increase in rapeseed yield.
Pedigree breeding history reproduction. The genomic changes that occurred between the genealogy lines were detected. We reconstructed the recombination events that gave rise to specific inbred lines zhongsh-uang5 and zhongshuang4, which were both produced from zhongyou821. We traced the chromosome segments through pedigree breeding of the two lines. In total, zhongshuang5 inherited 15.41% of its genome from the ancestral line zhongyou821 while zhongshuang4 inherited 34.06% (Table 4; Fig. S4). Zhongshuang5 inherited 24.17%  Table 4. Summary of size and distribution of selected regions and IBD regions between the genealogy lines. Chr represents the chromosome; A, C and AC represent the A and C subgenomes, and the whole genome of rapeseed, respectively. Zy821 stands for zhongyou821; zs5 stands for zhongshuang5; zs4 stands for zhongshuang4. a Genome size covered by all SNPs on each chromosomes. b Summary size of selected regions on each chromosomes. SR stands for selected region. c Summary size of IBD regions on each chromosomes between zy821 and zs5. IBD is an abbreviation of identity by descent. d Summary size of IBD regions on each chromosomes between zy821 and zs4. e The percentage of the IBD regions shared the chromosome between zy821 and zs5. f The percentage of the IBD regions shared the chromosome between zy821 and zs4. of the A subgenome and 10.26% of the C subgenome from zhongyou821, while zhongshuang4 inherited only 14.37% of the A subgenome but 45.63% of the C subgenome from zhongyou821. Out of the 19 chromosomes, six chromosomes (A1, A5, C1, C6, C7 and C8), showed that more than half of their chromosome fragments were inherited from zhongyou821 into zhongshuang4, particularly in C6 and C7, where almost the whole chromosomes were found to be inherited. However, 8 chromosomes (A2, A3, A4, A6, A7, A8, A9, and C2) were not inherited into zhongshuang4. In 45.63% inherited component, we observed that 84.39% was from the C subgenome. These findings were consistent with the analysis of the selected regions.
Fixed SNP provided a reference index for population improvement. By detecting the allele frequencies of genome-wide SNPs, we identified a total of 403 Fixed SNPs from the genotype data sets. There were 214 of these Fixed SNPs from the A subgenome and 189 from the C subgenome (Fig. 5). The allele frequencies of these SNPs were fixed to 0 or 1 in the selected group and subsequent generations. These loci have lost other alleles and showed monomorphism in the subsequent population.

Discussion
A yield-improving plateau occurs for a limited genetic diversity 2 . Demonstration and breeding program reduce the crop genetic diversity significantly 1,2 . To enhance the diversity, we used the contents of the subgenomes from the relation species in Brassica. By the breeding method of recurrent selection for GCA improvement, some  desirable loci were maintained in the population and others undesirable loci were deleted from the population. Our analysis provided useful information to exhibit genetic base of GCA on rapeseed.
The LD value of this population was larger than the natural population reported previously. Breeding selection for the favorable alleles would increase the LD level between loci in genome 36 . In this study, we observed strong LD between SNPs separated up to 2.5 Mb (r 2 = 0.2). This value was higher than the LD value obtained in previous studies [37][38][39] , which have indicated LD levels at about 500 Kb, 700 Kb, and 2 cM, respectively. In these studies, researchers have used the resource populations collected from all over the world which contained higher rapeseed genetic variation. Contrastingly, in our study, the population was derived from several artificially synthesized B. napus and the subgenome materials. Afterwards, it was improved for subsequent generations which might have contributed to the higher LD. These findings might be useful for marker-assisted recurrent selection. Our results also demonstrated higher LD in the C subgenome, especially for C1, C2 and C4 chromosomes, which was consistent with the previous results 40,41 . Possibly, this could be explained by several reasons: the C subgenome had a lower level of genetic variability than the A subgenome, and the C subgenome might be under a more intense selection pressure in our breeding program. Polygenetic analysis also showed a decreasing trend in the diversity of the C subgenome than that of the A subgenome (Table 3). It had also provided a favorable evidence for the higher LD of the C subgenome.
The C subgenome is a repository for a wider range of selected regions with favorable loci contributing to rapeseed agronomic traits. By detecting the changes in genomic diversity, we identified 376 genomic regions and covered 3.03% (21.26 Mb) of the rapeseed genome. Many important QTLs related to yield and yield-related traits were located in some of these selected regions (Supplementary Table S1). In particular, some of these genomic regions harboured QTL hotspots (for one trait or multiple traits) or significant QTL reported in other studies (Fig. 4). We noticed that more than 96.05% of these selected regions distributed on the C subgenome ( Fig. 3; Table 4) and only about 3.95% was distributed on the A subgenome. This indicated that the C subgenome had sustained more pressure in the selection program or the C subgenome contributed more to the yield-related GCA than the A subgenome. The differences in the genome background between the genealogical lines further support this conclusion (Table 4; Fig. S4). In China, for improvement of the adaptive traits of the European and Japanese varieties, breeders have lead to the introgression of the A genome components of B. rapa into the B. napus genome 31,42 . This process has enhanced the genetic diversity of A subgenome in Chinese rapeseed. However, the breeding potential of C subgenome has not been developed and utilized much. The genetic background of our population contained European winter-type rapeseed, which has higher genetic variation of C subgenome than the A subgenome 37 . Furthermore, the subgenome materials (A r A r C c C c ) have been introgressed with the C c genome from B. carinata, and artificial synthetic materials have been introgressed with the C o genome from B. oleracea, which might also contributed to the increased genetic variation of C subgenome in B. napus. These new genetic components of the C subgenome might potentially improve the rapeseed yield. Results of the present investigation, along with a deeper understanding of heterosis and changes in breeding programs have indicated that the C subgenome needs to be fully developed in rapeseed hybrid breeding.
Recurrent selection has been established as a very useful method for plant breeding 43,44 . The process can break the linkage of disadvantageous alleles and pyramid favorable alleles through sustaining recombination and selection. In this study, we used the recurrent selection method to improve the GCA level of the R population. The top 20% individuals with high GCA were selected for the next generation. Genetic analyses showed that there were many genomic regions under selection. These regions might play an important role in rapeseed breeding. We suggest that most favorable alleles might be accumulated through MAS and standing selection. This might assist in the development and improvement of potential rapeseed.
In summary, we have conducted a comprehensive analysis of changes in genomic variation and identified a number of genomic regions and loci subjected to selection. Firstly, we found a slightly higher level of genetic diversity for the A subgenome as compared to the C subgenome. Both of the subgenomes had a higher LD, and might be beneficial for MAS. Secondly, the program for breeding selection might decrease the genetic diversity of the population and some allelic variations would disappear or approach to fixation. Thirdly, most of the selected regions were distributed on the C subgenome, which indicated that the C subgenome might have been under stronger selection pressure, or contributed more towards GCA improvement in rapeseed hybrid breeding. Finally, we have identified several potential selection targets, genomic regions and loci, which provided further insight into rapeseed research and improvement.

Materials and Methods
Plant materials, phenotype evaluation, and GCA estimation. We used two new types of rapeseed (artificial synthetic B. napus and subgenomic materials) and winter type rapeseed in the present investigation: (1) 41 artificial synthetic B. napus from the University of Goettingen, Germany, were crossed with three winter type lines (SW0736, SW0740 and SW0784) from Sweden in 1999; the F 5 families were crossed with Ogura-INRA CMS lines and restorer line R2000 in 2004. (2) Seven subgenomic materials (A r A r C c C c ) 45 , 2 Pol CMS restorers (5148R and 6178R) and yellow seed coat variety No2127 were crossed with R2000. F 1 obtained from (1) and (2) was used to construct the recurrent selection population in 2005 in Huazhong Agricultural University, China. A recurrent selection program was used to improve the GCA level of the population. The previous populations were randomly pollinated in an isolated environment. The sterile plants were harvested and the seeds were used to construct the next generation population. Meanwhile, seed quality (oil content, glucosinolate and erucic acid) was considered as an important breeding goal. In 2012, we randomly selected 175 plants (with the Rf o gene) and crossed them with three different testers (Yu7-120, Yu7-126 and Yu7-140) to produce hybrid seeds following NC II design 46 , resulting in a total of 525 hybrids. All hybrids, and 178 parental lines, were sown in three semi-winter rapeseed environments (Wuhan, 29°58′ N, 113° 53′ E; Xiangyang, 32° 04′ N, 112° 05′ E and Yichang, 30′ 40′ N, 111° 45′ E) in China. Field trials were followed as completely random design with two replications at each location. General combining ability (GCA) of each parental lines was calculated using the formula: gi = yi-ŷ, where gi stands for GCA of parental line, yi and ŷ each stand for the mean of crosses with same parent Pi and the mean of all crosses, respectively 47 . Based on the GCA, we set 20% as the selection intensity, which could not rapidly decrease the genetic diversity of the population. Afterwards, 35 lines with a high GCA were selected from the population, defined as the selected population or group.
The other three genealogical lines (zhongyou821, zhongshuang4, and zhongshuang5, Fig. S1) were used to detect the genome changes in pedigree breeding. The zhongyou821 is highly considered for rapeseed breeding in China. Many elite inbred lines including both open pollination cultivars and hybrid parents, were developed from this line. For example, both zhongshuang4 and zhongshuang5 are derived from zhongyou821, and bred as open pollination cultivars. Recently, F 1 hybid of zhongshuang4 and Pol CMS lines is found to exhibit excellent heterosis performance. Therefore, zhongshuang4 is considered as a good restorer line and used to develop several other hybrid cultivars and restorer lines. SNP filtering and genotype analysis. Genomic DNA was extracted from young leaves using the cetyl triethyl ammnonium bromide (CTAB) method 48 . The Illumina BrassicaSNP60 Bead Chip containing 52,157 SNPs was employed to genotype this panel of rapeseed. The experiment followed the manufacturer's protocol as described by Illumina Company. (http://www.illumina.com/technology/infinium_hd_assay.ilmn). The SNP data was clustered and called automatically using the Illumina GenomeStudio genotyping software. SNPs with no polymorphism and missing value > 10% were excluded. The source sequences of the remaining SNPs were identified through BlastN searches against the reference genome sequence of Darmor-bzh 49 (http://www.genoscope. cns.fr/brassicanapus). SNPs with an ambiguous physical position or multiple blast-hits were also excluded from the genotype data sets.
Polygenetic and linkage disequilibrium analysis. Genetic diversity (π ), polymorphism information content (PIC) and alleles frequencies of each SNP on 19 chromosomes were estimated by the PowerMarker software 50 . Linkage disequilibrium (LD) between SNPs was calculated by all markers using the TASSEL software version 5.1 51 . LD decay was evaluated on the basis of the r 2 value and corresponding distance between two SNPs. Selected regions, Fixed SNP and candidate QTL detecting. To calculate diversity changes across the genome, a sliding window method was used to analyze each chromosome separately, with a window size of five SNPs and a sliding step of two SNPs. Ratio of the genetic diversity value of each window between selected and basic populations was used to identify genomic regions affected by selection, which was estimated by the formula: π Ratio = π basic /π selected . We selected the top 5% windows as candidate regions for further analysis. In addition, we analyzed many reported QTLs of rapeseed yield and yield-related traits. If the closely linked markers or the mapped interval were located in or overlapped with selected regions, we considered them to be candidate selected QTLs. We also calculated the allele frequencies of each SNP on the 19 chromosomes, and identified the SNPs which allele frequencies were changed to a hundred percent in the selected population and defined such SNPs as Fixed SNP.
Genome changes detecting during the pedigree breeding. We used the Beagle4.1 software 52 to detect the chromosome segments of identity by descent (IBD) 53 between the two half-sib sister lines (zhuangsh-uang4 and zhongshuang5) and their common ancestor (zhongyou821) by genome-wide SNP markers. The P value of the significant level was set as 1 × 10 −7 . Uncertain regions (not defined IBD segments) were equally appropriate into the two adjacent blocks. We surveyed the inherited proportion of their genome from zhongyou821 and set different colors for chromosome segments according to the type of IBD.