Genome-wide indel/SSR scanning reveals significant loci associated with excellent agronomic traits of a cabbage (Brassica oleracea) elite parental line ‘01–20’

Elite parental lines are of great significance to crop breeding. To discover unique genomic loci associated with excellent economic traits in the elite cabbage inbred-line ‘01–20’, we performed comparisons of phenotypes as well as whole-genome insertion-deletion/simple sequence repeat loci between ‘01–20’ and each of its five sister lines. ‘01–20’ has a range of excellent agronomic traits, including early-maturing, and improvements in plant type and leaf colour. Eight unique loci were discovered for ‘01–20’ and ‘01-07-258’, another elite line similar to ‘01–20’ at the whole-genome level. In addition, two excellent double-haploid lines derived from a cross of ‘01–20’ also inherited these loci. Based on the quantitative trait locus association results, five of these loci were found to be associated with important agronomic traits, which could explain why the elite parent ‘01–20’ possesses greener outer leaves, a more compact and upright plant-type, rounder head, shorter core length, and better taste. Additionally, some of these loci have clustering effects for quantitative trait loci associated with different traits; therefore, important genes in these regions were analysed. The obtained results should enable marker-assisted multi-trait selection at the whole-genome level in cabbage breeding and provide insights into significant genome loci and their breeding effects.

Phenotyping. Observations and measurements of the phenotypes were performed for the main agronomic traits of '01-20' , and its five sister lines ('01-07-258' , '01-07-251' , '01-1-4' , '01-88' and '01- 16-5'), two DH lines ('D77' and 'D83') and the parental population. In total, 25 traits from three categories were assessed ( Table 2): plant-type-related traits, including plant diameter (Pd), plant height (Ph) and plant type (Pt) measured by two methods (see Table 2); leaf-related traits, including leaf colour (Lc), leaf number, leaf surface (Ls), leaf wax powder, petiole length (Pl) and petiole width (Pw); head-related traits, including head colour, head maturity period (Hm), head weight (Hw), head vertical diameter (Hvd), head transverse diameter (Htd), core length (Cl), ratio of core length to head vertical diameter (Cl/Hvd), ratio of core width to head transverse diameter (Cw/Htd), head shape index (Hsi), head solidity (Hs), dry matter content (Dmc) and crude fibre content (Cfc); and the trait of seed size (Ss). These traits were assessed following standards described in 'Descriptors and data standards for cabbage' 9 at the rosette or head harvesting stage ( Table 2). In addition, Dmc and Cfc were determined following drying and acid, or alkali, digestion methods, respectively, in accordance with the Association of Official Analytical Chemists (AOAC) standards 10 (Table 2). For colour-related traits, a CR-400 colour difference meter (Konica Minolta, Shanghai, China) was used to assay the leaf and head colour coordinates a* (redness to greenness), b* (yellowness to blueness) and L (lightness to darkness) (CIE1976_Lab standards) with a standard D65 light source, 0° diffuse illumination and a viewing angle of 2° to CIE 1931 under a dark background.
Average values for each trait of each line were calculated from three randomly selected plants in each plot at the rosette or harvesting stage. Adjusted means for the traits were obtained and used for further analysis. The head was cut open and sliced to 1-2 cm after removing the core and 500 g was randomly sampled and dried to constant weight (M) at 105 °C. Dmc = M/500* 100% (AOAC standards 10 ).
Crude fibre content Cfc Harvesting stage The crude fibre content was assayed by acid digestion and alkali digestion (AOAC standards 10 ).

Seed size Ss Seed
The diameter of a seed. Genotyping. Whole-genome scanning for insertion/deletion (indel) and SSR loci was performed for all cabbage materials, using 406 pairs of SSR and indel primers, and the corresponding genetic maps. These markers and maps were developed in our previous study for gene mapping and the detection of quantitative trait loci (QTLs) 8,11 , in which we discovered robust QTLs and QTL clusters for 24 main agronomic traits using a DH population of 196 lines derived from a cross of '01-20' × '96-100' , indicating that the whole-genome indel/SSR loci and the QTLs were highly reliable. The molecular marker assay protocol was as follows: polymerase chain reaction (PCR) mixture samples with a volume of 20 μ l, which contained 2 μ l of PCR buffer (10 × , Mg 2+ included), 1.6 μ l of dNTP (2.5 mM each), 0.4 μ l of Taq DNA polymerase (2.5 U/μ l), 5 μ l of DNA template (40 ng/μ l), 0.6 μ l of forward primer (10 μ M), 0.6 μ l of reverse primer (10 μ M) and 9.8 μ l of ddH 2 O, were used. The reaction mixture was incubated in a thermal cycler at 94 °C for 5 min, followed by 35 cycles of 94 °C for 30 s, 55 °C for 30 s and 72 °C for 45 s, and finally 72 °C for 7 min. The PCR products were separated on 8% polyacrylamide gels, which were then subjected to silver staining after running at 160 V for 1.5 h 12 . The electrophoresis band patterns for each primer pair was investigated: band pattern the same as '01-20' was recorded as 'a' , band pattern the same as '96-100' as 'b' , and a third pattern as 'c' .
Associated traits and candidate gene analysis for the distinctive loci. In a previous study, we mapped 144 QTLs for 24 agronomic traits using a DH population of 196 lines derived from a cross between '01-20' and '96-100' 13 . Based on the map and QTL information, we determined the positions of the distinctive loci that had been identified in the above molecular marker assays on the map constructed for the QTL analysis, to clarify the association between the traits and these loci. QTLs were named using the following criteria: abbreviation of the trait name, followed by chromosome code and QTL code. For example, Ph 1.2 represents the second QTL on chromosome C01 associated with plant height.
The genes located in the regions associated with the distinctive loci were analysed and compared with those in Arabidopsis, using the annotations for the Brassica oleracea reference genome acquired from BRAD (http:// brassicadb.org/brad/). Equipment and settings. Images in Figs 1, 2 and 3, were taken using a SONY DSC-HX30 (Sony Co.
Ltd., Tokyo, Japan) camera and were edited using Photoshop CS6 software (Adobe Systems Inc., San Jose, CA,
Among the other sister lines, '01-07-251' had the most upright Pt, the largest plant expansion (Pd and Ph), the highest Hw, the lowest Dmc and Cfc, and the largest Ss, but the longest maturing period (58.67 d) and the longest Cl (8.58 cm); '01-1-4' had the most wrinkled leaf surface and the shortest Hm and Cl, but the smallest head (0.55 kg). '01-88' had the most patulous Pt, a higher Hw and a shorter Hm, but the highest Dmc (6.42) and a low tolerance to splitting; and '01-16-5' had a patulous plant type and greener outer leaves, but the highest Cl/ Hvd (0.66) and Cfc (0.61), and the lowest tolerance to splitting. In addition, although 'Early Vikings' showed a relatively low uniformity, some individuals still showed outstanding performance in terms of Hm, Hw, leaf and head colour, Hsi and Hs (Fig. 2).
Of the two DH lines, 'D77' had the most upright Pt, the greyest Lc and Hc, the largest amount of Lw, the fewest leaves and the longest maturing period. 'D83' exhibited the largest plant expansion (Pd and Ph), the highest Hw, and the highest levels of both Dmc and Cfc. Both of these lines have excellent combining ability (data not shown), and thus their use in hybrid seed production is promising.
Thus, '01-20' had a range of excellent agronomic traits and no obvious defects, and it not only inherited the early-maturing trait from 'Early Vikings' , but also exhibited improvements in Pt, Lc, and head-and quality-associated traits. The DH lines also inherited certain excellent traits from '01-20' , such as high Hw, low Cl/Hvd and compact Pt.   Figure 3 shows some of the polymorphic markers for the six sister lines. Whole-genome scanning for indel/SSR loci for the six lines revealed that '01-20' had one unique locus, '01-07-258' had one unique locus, and '01-20' and '01-07-258' had six common loci, which were different from those of the other four sister lines (Table 4). In total, 385 out of the 406 loci (accounting for 95%) were the same for '01-20' and '01-07-258' , and the loci on C02, C04 and C07 were all the same between the two lines. This is in accordance them being similar lines, and it also indicates that analysing their unique loci together is appropriate.
The distinctive loci were located on chromosomes C02, C03, C05, C08 and C09 (Fig. 4). Some of them clustered together in the same genomic region. For example, the genetic distance between the loci Indel26 and Indel488 was shown to be 1.3 cM on C02, and the distance between scaffold29640 and Indel64 was 0.7 cM on C03. Figure 4 shows the allele types of all of the distinctive loci for the eight lines.
The seven distinctive loci identified for '01-20' on C02, C03 and C08, were also found in the two excellent DH lines, 'D77' and 'D83", with the exception of Indel139 on C05 (Fig. 4), further indicating the significant effects of these loci.
Five of the distinctive loci are associated with important agronomic traits. The eight newly discovered loci, described above, may help explain why '01-20' and '01-07-258' became elite parental lines. In a previous study 13 , we identified the genomic regions associated with 24 important agronomic traits. Thus, the distinctive loci were anchored to the map published by Lv 13 . We found that five of them, located at the major QTL cluster regions ( Fig. 4; indicated in orange: the most significant QTL cluster regions, blue: other major QTL cluster regions) on four chromosomes, were associated with important agronomic traits, including Lc, Hc, Pt, leaf length, Pd and Hs (Table 5). These QTLs were found to explain 6.0-26.1% of the phenotypic variance of these traits.
Several distinctive loci were found to be associated with Lc: Indel26 and Indel353 were associated with Lca*, explaining 8.0-9.5% of its phenotypic variance. An analysis of the allele variance effect showed that this allele in '01-20' contributed to a lower trait value than the other alleles. Thus, it had a negative effect (− a to a: green to red), which explains why '01-20' possesses greener leaves (Table 6). Indel26 was also associated with Lcb*, explaining 12.3% of its phenotypic variance. This allele in '01-20' contributed to a higher trait value than the other alleles. Thus, it had a positive effect (− b to b: blue to yellow), which means that the Lc of '01-20' tends to be yellowish rather than blue (Table 6).
In addition, Indel235 was also associated with Pt, explaining 11.3% of the phenotypic variance. An analysis of the allele variance effect showed that this allele in '01-20' contributed to a higher trait value than the other alleles. That is, it had a positive effect, which means that '01-20' plants tend to be upright rather than patulous (Table 6).
Indel64 was identified to be associated with Pd, Ph, Ll, Lw, Pl, Pw and Hs, explaining 6.4-21.2% of the phenotypic variance of these traits. An analysis of the allele variance effect showed that this allele in '01-20' contributed to lower trait values than the other alleles, except for the case of Hs. Thus, it had a negative effect, which explains why the expansion of '01-20' plants is smaller, and the leaves are short and small. Thus, the plants are compact (Table 6).
In addition to Lc, Indel353, the distinctive loci for '01-07-258' on C09 was also associated with the head quality traits, Dmc and Cfc, Hsi and the core trait Cw/Htd. It can explain 10.1% and 13.9% of the phenotypic variance of Dmc and Cfc, respectively. An analysis of the allele variance effect showed that this allele in '01-07-258' (allele type: b, see Table 6) contributed to higher trait values than the other alleles. Thus, it had a positive effect, which means the taste of '01-07-258' is not as tender and crisp as that of '01-20' , which has lower Dmc and Cfc levels ( Table 6).   In addition, in a previous study, we identified 12 QTL clustering regions associated with different agronomic traits 13 . Nine of these clusters were possessed by 75% of these lines. These regions may also be key factors in determining whether a line should be maintained for further selection. Thus, the 12 QTL clustering regions on the nine chromosomes may be the selection foundation and these distinctive loci are the core of the foundation.
Thus, the distinctive loci in the elite parental lines are associated with important agronomic traits, which could explain, at the genomic level, why '01-20' has greener leaves, is more compact, has a rounder head and a shorter core length, and tastes better. Loci, like Indel64 on C03, and the clustering of QTLs for different significant agronomic traits may play particularly important roles in the elite parental line '01-20' .

Discussion
Application of elite lines: a double-edged sword. Elite lines are of great significance in two ways. First, in plant breeding, they generate a many varieties and contribute greatly to crop production. For example, '01-20' , the elite line used in this study has generated as many as 10 varieties, having a spring cabbage market share of over 70%. In rice hybrid production, the elite female parent line 'Zhenshan' contributed to a number of widely used rice cultivars in China 16 . Second, they are good materials for studying the genetic effects of significant genomic loci associated with excellent traits, which in turn provide the basis for breeding programs 17 . Using two types of PCR-based DNA markers, Mahatma et al. estimated the genetic polymorphisms among nine elite cotton parental lines, which suggested that the genetic constituents of 'LRA-5166' are quite different from those of the other eight parental lines 18 . In addition, Lai et al. detected more than 1,000,000 single nucleotide polymorphisms, 30,000 indels and 101 low-sequence-diversity chromosomal intervals, as well as hundreds of genes showing presence/ absence variation in the maize genome by resequencing six elite maize inbred lines 19 . The current study, for the first time, compared the elite cabbage line '01-20' , five sister lines and two derived lines for their phenotypic and genetic constituents, and shed light on the key genomic loci that determined '01-20' as an elite line.
Although elite lines can make major contributions to crop production and quality, excessively applying them may also create problems. During plant breeding, some important agronomic traits, such as disease resistance, high yield and excellent taste, are constantly under directional selection, resulting in significantly reduced genetic diversity 20 . In a study by Hao et al. 4% of the 340 wheat base collections from the Northwest Spring Wheat Region in China were found to represent more than 70% of their entire variation 21 . In addition, Ding et al. detected a severe reduction in nucleotide variation at OsAMT1;1, a high-affinity ammonium transporter in rice (Oryza sativa) that controls ammonium uptake capacity, indicating that strong selection on nitrogen uptake-related traits has occurred in rice 22 . Similar reports have been published on soybean 2, barley 3 and wheat 23,24 . Additionally, the use of '01-20' as one parent can greatly improve the agronomic traits of hybrids regarding Pt, Lc, Hc, head type, production and taste, among others; however, its lack of disease resistance against Fusarium wilt may be risk crop production as this disease is currently affecting cabbage-producing areas in northern China.
Mining for genes/QTLs associated with important traits using elite parental lines. Elite parental lines are constantly subjected to the directional selection acting on genes that control desirable traits of agronomic importance during their domestication and improvement. Therefore, the genes or loci with a signature of selection from breeders should be identifiable by whole-genome nucleotide polymorphism scanning using DNA markers 25,26 .
With the development of molecular biology and of the sequencing of numerous genomes, uncovering the genetic basis of important traits of elite parental lines has become the focus of theoretical and applied studies. One way to characterise genes responsible for phenotypic variation is based on the signature of selection in lines with particularly superior characteristics. Vigouroux et al. screened 501 maize genes with a signature of selection and identified 10 as agronomically important candidates because they showed evidence (i.e., the presence of non-neutral SSRs) of exposure to selective pressure. It was further confirmed that one of these, encoding a MADS box transcriptional regulator, experienced a selective sweep during maize domestication 27 . In another study, Yamasaki et al. identified eight of the sequenced 1,095 maize genes through a selection test in diverse maize landraces and teosintes, and showed that their functions were consistent with agronomic selection for nutritional quality, maturity and productivity. Another way of identifying genes for corresponding traits is association analysis and QTL mapping using elite lines 28 . For example, Würschum et al. detected several QTLs of important traits in rapeseed, including flowering time, Ph, protein content, oil content, glucosinolate content and grain yield 29 . In addition, using genome-wide association mapping, Wang et al. identified marker-trait associations in 94 diverse elite wheat lines: marker XwPt-7187 was associated with kernel hardness, XwPt-1250 and XwPt-4628 with test weight, and marker Xgwm512 with Ph 30 .
In the current study, we compared elite line '01-20' with five of its sister lines by scanning SSR/indel loci across the whole genome. We identified eight loci for which the elite lines were distinctive, which were further found to be associated with important agronomic traits, including Lc, Hc, Pt, leaf length, Pd and Hs. To discover interesting candidate genes at these loci, we performed a preliminary analysis for the genes located in the region for the five distinctive loci associated with important agronomic traits, according to the annotations for the B. oleracea reference genome acquired from BRAD (gene alignment results, Supplement Table 2). We used the Basic Local Alignment Search Tool (BLAST) (https://www.ncbi.nlm.nih.gov/) tool and set the score cutoff value to 400. The annotation of the functions of the genes included transmembrane transporter, transcription factor, ATP binding, kinase and cytochrome. Based on the alignments with Arabidopsis, some of the genes may be associated with related traits. For example, in Region 2 on chromosome C03, which is associated with Pd, Lw, Ll, Pl, Pw and Hs, the homologous gene ubiquitin ligases EOL1 can act with ETO1 and EOL2 collectively to regulate ethylene biosynthesis in Arabidopsis by controlling type-2 ACC synthase levels 31 . Another homologous gene SPI encodes a WD40/BEACH domain protein and shares a similar actin-regulating ARP2/3 pathway that affects plant growth in various organs 32 . In Region 4 on chromosome C09, which is associated with Lcb*, Lca*, Hsi, Cw/ SCIeNtIFIC REPoRTs | 7:41696 | DOI: 10.1038/srep41696 Htd, Dmc and Cfc, the homologous chloroplast-encoded gene Ycf4 plays an essential role in the Photosystem I complex 33 . These genes may be associated with related traits. However, further studies are still needed to clarify the connection between the candidate genes and the related traits.
The above results shed light on why '01-20' , rather than its sister lines, became an elite parental line. These loci could be useful for the development of whole-genome background markers for cabbage breeding and to promote our understanding of the genetic basis of selected traits.