Marker-assisted selection (MAS) is often employed in crop breeding programs to accelerate and enhance cultivar development, via selection during the juvenile phase and parental selection prior to crossing. Next-generation sequencing and its derivative technologies have been used for genome-wide molecular marker discovery. To bridge the gap between marker development and MAS implementation, this study developed a novel practical strategy with a semi-automated pipeline that incorporates trait-associated single nucleotide polymorphism marker discovery, low-cost genotyping through amplicon sequencing (AmpSeq) and decision making. The results document the development of a MAS package derived from genotyping-by-sequencing using three traits (flower sex, disease resistance and acylated anthocyanins) in grapevine breeding. The vast majority of sequence reads (⩾99%) were from the targeted regions. Across 380 individuals and up to 31 amplicons sequenced in each lane of MiSeq data, most amplicons (83 to 87%) had <10% missing data, and read depth had a median of 220–244×. Several strengths of the AmpSeq platform that make this approach of broad interest in diverse crop species include accuracy, flexibility, speed, high-throughput, low-cost and easily automated analysis.
Marker-assisted selection (MAS) is now commonly employed in perennial crop breeding programs to pursue the acceleration of cultivar development.1,
The development of molecular markers requires the detection of association between target traits and genotypes. Two approaches are often used to detect such associations: (a) QTL analysis with structured families, and (b) genome-wide association study, which takes advantage of linkage disequilibrium (LD) in diverse germplasm to capture the linkage between markers and causal genes.16,17 However, for highly heterozygous and diverse crops, such as grape, genome-wide association study has limitations.18,
Multiple genotyping platforms and molecular marker types have been utilized for MAS in highly heterozygous crops, including simple sequence repeats (SSR) and single-locus or multi-locus SNP assays. SSRs are particularly well-suited for MAS because of their multi-allelic nature and high transferability among distinct species or genera,23,
To fill the gaps between the above technical and economic considerations for breeding and actual implementation of MAS,35 next-generation sequencing (NGS) technology offers a potential opportunity for unbiased genotyping with high-throughput and low per-sample cost. Genotyping-by-sequencing (GBS), with its simultaneous marker discovery and genotyping approach, delivers many benefits including availability of flanking DNA sequence information, high-sample throughput and scalability (multiplexing), and high resolution.36 It has been successfully applied in marker discovery for many self-pollinating crops as well as outcrossing species.37 However, successful implementation of GBS for MAS has not been reported in any heterozygous crop breeding programs. Many reasons may have contributed to the lack of adoptions. From a technical perspective, missing data, genotyping errors and heterozygote under-calling are common in GBS results due to uneven sequencing depth across sites and high level of sample multiplexing.38,
This study presents a novel and efficient strategy for molecular marker development and practical implementation in MAS, based on amplicon sequencing (AmpSeq). The semi-automated pipeline incorporates a machine learning model for primer design and uses Illumina’s Nextera dual-barcoding and sequencing platforms for genotyping (Illumina, San Diego, CA, USA). After detecting a SNP from GBS, the strategy starts from the design of primers using the GBS sequence tags. The converted amplicon markers can then be used for genotyping through NGS. The design involves multiplexing of both samples and markers. As a case study, we document the use of AmpSeq in grapevine breeding programs for three traits including flower sex, powdery mildew (PM) resistance and acylated anthocyanins. We chose these three traits to represent a Mendelian trait and two QTL with differing effects on phenotypic variance: (a) a single gene for flower sex with three alleles42; (b) a QTL with moderately high R2 (acylation of anthocyanins, initially reported here); and (c) a QTL with relatively low R2 (Ren2 locus for PM (Erisyphe necator) resistance from Vitis cinerea (Engelm. ex A. Gray) Engelm. ex Millard accession B9 (V. cinerea B9)43,44). All three loci are located on different chromosomes, and we were able to test the AmpSeq approach for flower sex (male versus female versus hermaphrodite) across four different families where the male flower allele descends from V. cinerea, the hermaphrodite flower allele descends from V. vinifera L. and the female allele descends from an unidentified North American Vitis species. Two of the three traits chosen for analysis would take 2–4 years to analyze phenotypically due to the time it takes for a seedling to produce flowers and fruit. We also report here the development of a pipeline package with tools for AmpSeq marker design and decision support.
Materials and methods
Four families were chosen for this study, all representing interspecific hybridization of diploid (2n=38) Vitis species: V. vinifera ‘Chardonnay’×V. cinerea B9; ‘Horizon’ (complex hybrid of V. vinifera, V. labrusca L., V. rupestris Scheele and V. aestivalis Michx.)×V. cinerea B9; ‘Horizon’×Illinois 547-1 (V. rupestris B38×V. cinerea B9) and MN1246×MN1264 (both, complex hybrids with a genomic background including at least V. vinifera, V. riparia Michx., V. rupestris, V. labrusca, V. cinerea and V. aestivalis). The first three populations were grown in research vineyards operated by Cornell University, Geneva, New York. The latter population was grown in research vineyards of the University of Minnesota, St Paul, located at Excelsior, Minnesota. All four populations segregated for flower sex (male, female and hermaphrodite) a trait controlled by a single locus with three alleles (reviewed in the study by Hyma et al.42) In addition, our ongoing research indicated that two populations segregated for acylation of anthocyanins, and the three populations descending from V. cinerea B9 segregated for Ren2 PM resistance.43,44
AmpSeq marker development pipeline
The AmpSeq marker development procedure consists of four steps illustrated in Figure 1. First, GBS marker-trait associations were evaluated in TASSEL 4.3.13 (ref. 45). Genetic maps were constructed by the HetMappS strategy, and QTL were mapped in R/qtl ver. 1.37–11 as described by Hyma et al.42 Marker phase and effect were estimated by a custom R script (Supplementary File 1) using the intermediate files ‘phased’ and ‘LGmap’ from the HetMappS pipeline.42 For the second step, the range of the haploblock across the QTL region was defined by haploblock start and end markers with similar absolute value of marker effects. The anchor marker selected for each QTL was based on high association with trait within the 1.8-LOD (logarithm base 10 of odds) support interval of the QTL. A Perl script included in the package ‘calculated_LD_distribution.pl’ took in a text file ‘test_marker’ as input, which includes physical positions of the three markers: haploblock start, haploblock end and the anchor SNP. This script retrieved additional SNP markers from the GBS pipeline (pre-filtered by maximum of 5% missing data) that were within the haploblock range and verified to be in LD with the anchor marker.
After the SNP markers were defined, a Perl script ‘parse_bam.pl’ was used to identify genomic regions suitable for designing PCR primers. The template for the primer sequences was based on the actual sequences from GBS data, rather than the reference genome sequence itself. This was to ensure that there was no mismatch between the primer sequences and their targeted alleles. The ‘parse_bam.pl’ script required three inputs: (1) the reference genome FASTA file; (2) a BAM file, converted from the SAM file generated by TASSEL GBS pipeline,45 which contained GBS reads aligned to the reference genome; and (3) a cutoff for the LD P-value as calculated from the previous step. By default, the script required that at least 10 bp of the left and right flanking sequences are covered by the GBS read, and ⩾90% alignment matching rate between the two alleles of up to 20 bp of each flanking sequences. Finally the script ‘primer3.pl’ was used to call the primer designing software Primer3 (ref. 46) and captured output properties from Primer3. The default primer sequence length from the pipeline was 22 bp. With a single SNP in the middle, the targeted amplicon size is 45 bp. From the output of the primer design pipeline followed by manual filtering, including removal of overlapping amplicons and primers with extreme annealing temperatures (Tm) outside of 47–79 °C, a total of 54 amplicons across the three loci were retained for testing: 19 for flower sex, 12 for PM resistance and 23 for acylated anthocyanins. The output file of the 54 amplicons is shown in Supplementary File 2. The documentation about the usage of the pipeline and the scripts are provided in Supplementary File 3.
Genotyping (PCR, library preparation and de-multiplex)
Two experiments were conducted to test three traits: flower sex, PM resistance due to the Ren2 locus and acylated-anthocyanin concentration. In Experiment 1, 19 primers for flower sex and 12 primers for Ren2 were pooled in one Illumina MiSeq lane (Illumina, San Diego, CA, USA) testing four 96-well plates (380 individuals and 4 blanks), each containing the parents and a subsample of progeny from each grapevine breeding family. Experiment 1 consisted of four breeding families: ‘Horizon’×V. cinerea B9, ‘Horizon’×Illinois 547-1, and ‘Chardonnay’×V. cinerea B9 and MN1246×MN1264. The other 23 primers for acylated-anthocyanin concentration were pooled in Experiment 2, testing two 96-well plates of ‘Horizon’×Illinois 547-1 and two 96-well plates of V. rupestris B38×‘Horizon’ (380 individuals and 4 blanks).
For each vine, a single small leaf (<1-cm diameter) was harvested and placed in one tube of a Costar 96-well cluster tube collection plate (Corning, Corning NY, USA), and DNA was isolated as described previously.42 Briefly, each 96-well plate received up to 91 unique samples plus two sets of duplicated individuals and a blank well to serve as quality controls. Frozen samples were ground using stainless-steel beads in a Geno/Grinder 2000 (OPS Diagnostics LLC, Lebanon NJ, USA) and processed using DNeasy 96-well DNA extraction kits (Qiagen, Valencia CA, USA). The Qiagen AP1 lysis buffer was amended with PVP-40 (2% w/v) prior to heating of the buffer, to improve DNA quality and quantity. Due to historically consistent yields of 25–50 ng μl−1 within and between plates, DNA was used following twofold dilution without quantification.
AmpSeq uses two rounds of PCR: the 1st PCR to amplify a multiplex of markers and incorporate linker sequences, and the 2nd PCR to use the linker sequences to add a unique pair of indices, or barcodes, to each sample. The linker 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3′ was added to the 5′ end of each forward primer to accommodate S5xx barcode adapters, and the linker 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3′ was added to the 5′ end of each reverse primer to accommodate N7xx barcode adapters described in Supplementary File 4. For results presented here, samples were processed without previously testing the primers as a multiplex, and both rounds of PCR were conducted on 384 wells per batch, processed as four 96-well plates, which were pooled after the 2nd PCR for Illumina sequencing.
For the 1st PCR, up to 31 primer pairs (62 total, 10 pmol each) were combined with Qiagen Multiplex PCR Plus Mix (Qiagen, Valencia CA, USA), following the manufacturer’s protocols calculated for 10 μl reaction volumes including 2 μl of DNA template. Thermocycling conditions for the 1st PCR were: 95 °C for 5 min; followed by 35 cycles of 95 °C for 30 s, 62 °C for 90 s and 72 °C for 30 s; followed by a final extension of 10 min at 68 °C. After the 1st PCR, 50 μl H2O was added to each well prior to serving as a template for the 2nd PCR.
In the 2nd PCR, 384 unique combinations of 16 row and 24 column indices were incorporated in 10 μl reactions consisting of Platinum Taq (2 U per reaction, Invitrogen, Grand Island NY, USA), Platinum Taq buffer (1× final), MgCl2 (1.5 mM final), dNTPs (0.2 mM final each), primers (10 pmol each) and 2 μl of each amplicon pool from the 1st PCR reaction. Thermocycling conditions for the 2nd PCR were: 95 °C for 5 min; followed by eight cycles of 95 °C for 30 s, 53 °C for 30 s and 72 °C for 30 s; followed by a final extension of 10 min at 68 °C. Equal volumes of all 384 wells were pooled into a single tube, then purified by Ampure (Beckman Coulter, Indianapolis IN, USA), following manufacturer’s instructions.
The amplicon pool was sequenced on an Illumina MiSeq instrument with single-end reads of 51 bp with 8 bp, dual index reads according to the manufacturer’s instructions. Reads were de-multiplexed using the Illumina bcl2fastq pipeline software, allowing a single mismatch in the index reads.
Phenotyping and data analysis
SNPs were called by a custom perl script ‘run_gatk2.pl’ (Supplementary File 6) from the de-multiplexed fastq file. Read counts were calculated in 500 bp sliding windows across the 12X.2 version of the PN40024 reference genome50 using BEDTools51 v2.22.1 to check the amplification specificity and efficiency of the AmqSeq markers. VCFtools52 v0.1.12a was applied to calculate missing rate per site (--missing-site) and per individual (--missing-indv), and mean depth per site (--site-mean-depth) and per individual (--depth). VCF file generated from ‘run_gatk2.pl’ was input to TASSEL 4.3.13 for association detection. Genotyping and phenotyping data of the two parents were used for phasing.
To detect family-based marker specificity, the squared correlation coefficients between AmpSeq markers were calculated per family for each trait by VCFtools--geno-r2 function, following a dendrogram construction in R53 v3.1.2. P-values of association between genotype and phenotype were calculated by a one-way analysis of variance co-segregation analysis General Linear Model (GLM) on a per-family basis and after pooling families with the same segregation pattern. The minor allele frequencies of the AmpSeq markers were also reported on a per-family and pooled basis.
Output from the primer design pipeline was used to classify primers as efficient and non-efficient based on their amplification and genotyping performance, and 4 statistical models (logistic regression, support vector machine, decision tree and random forest) were tested using the following cross-validation method for assessing 18 numerical parameters of each primer (num_tags, pvalue, phase, reject_code, a1, a1_count, a1_lseq_pct, a1_rseq_pct, a2, a2_count, a2_lseq_pct, a2_rseq_pct, ltemp, rtemp, a1_lseq_len, a1_rseq_len, a2_lseq_len, a2_rseq_len). In the cross-validation algorithm, the efficient and non-efficient categories are randomly partitioned into two groups, the training group and the testing group, and the testing group is used for testing the performance of four statistical models trained with the training group. The prediction accuracy of the cross-validation classification model was assessed using Receiver Operating Characteristic (ROC) curve analysis.54,
Converting GBS-derived SNPs to amplicon sequencing (AmpSeq) markers
The procedure for the marker development and amplicon sequencing primer design pipeline is illustrated in Figure 1. Three traits were chosen for study: flower sex, Ren2 PM resistance43,44 and acylated anthocyanins (that is, anthocyanins esterified to hydroxycinnamic acids). The percentage of phenotypic variance explained by each QTL (R2 value) confirmed these represent a qualitative trait (93%), a quantitative trait with a moderate QTL (13%), and a quantitative trait with a major QTL (54%), respectively (Table 1). For each locus, two flanking markers indicating the haploblock boundary, and one anchor marker having high association with the phenotype from the curated genetic maps are listed in Table 1. The primer design pipeline and curation resulted in 54 AmpSeq markers physically located between the two flanking markers and genetically linked to the anchor markers, for further testing (Supplementary File 2).
In Experiment 1, 12,677,206 reads were generated for the target QTL region on chromosome 2 (chr2) associated with flower sex, and 4,537,611 reads were generated for the target QTL region on chr14 associated with Ren2 PM resistance. These 17,214,817 reads at the two target loci comprise 99.99% of the total reads, while only 2,237 reads aligned to off-target regions. Similar results were obtained in Experiment 2, with 14,927,137 reads (98.82%) aligning to the target QTL region on chr 3 associated with acylated anthocyanins.
The sequencing depth per individual and missing rate were evaluated for each amplicon (Tables 2,3, 4). For both experiments, a majority of amplicons (80% in Experiment 1 and 83% in Experiment 2) had >50-fold coverage per individual, in a relatively narrow range from 50–284×. Most amplicons (87% in Experiment 1 and 83% in Experiment 2) had <10% missing data. Thus, in Experiment 1, the 380 individuals had 220× median coverage per marker and 168× mean coverage per marker. Similarly in Experiment 2 for 23 acylated-anthocyanin AmpSeq markers, 380 individuals had genotyping data with 244× median coverage per marker and 188× mean coverage per marker. One amplicon per experiment failed because no polymorphism was identified during the SNP calling procedure. The amplification specificity, stable sequencing depth and consistent missing rate in the two experiments indicated the marker development strategy and amplicon primer design pipeline were effective for all three loci, with no obvious bias during amplification, pooling or sequencing.
Evaluation and validation of AmpSeq markers in grapevine breeding families
To evaluate the 19 AmpSeq markers for flower sex, genotypic data were obtained for four breeding families in Experiment 1. This trait is reportedly controlled by a single locus with three alleles, with male (M) dominant over hermaphrodite (H), which is dominant over female (f).42 Three families segregated 1:1 for M/H—‘Horizon’×Illinois 547-1, ‘Horizon’×V. cinerea B9 and ‘Chardonnay’×V. cinerea B9. One family segregated 1:1 for H/f—MN1246×MN1264. Parents of the four families represent three phenotypes: ‘Horizon’ (HH), ‘Chardonnay’ (HH) and MN1264 (Hf) produce hermaphrodite flowers; Illinois 547-1 (Mf) and V. cinerea B9 (Mf) produce male flowers; and MN1246 (ff) produces female flowers.42 Amplicon data were merged with flower sex field ratings for a one-way analysis of variance co-segregation analysis. For the pooled association test using families with the same segregation pattern, only 1 amplicon had a non-significant association, and 16 out of 18 amplicons had a −log10(P-value) >13, suggesting a high association between AmpSeq markers and the recorded phenotypes (Figures 2a and b and Table 2). The high R2 value of each marker was consistent with the high R2 value of the flower sex QTL (93%, Table 1). Interpretation and direct application of AmpSeq genotypes for MAS required phase information, details of which are documented in Supplementary File 7.
To shed light on marker transferability among breeding populations, genotype-phenotype association and minor allele frequency were reported by family in Table 3 and compared using a dendrogram (Figures 3a–d). AmpSeq markers with similar P-values and R2 patterns clustered together. Some AmpSeq markers in certain families have an R2 value equal to 1, suggesting the potential of 100% predictive accuracy in explaining 100% of the phenotypic variance.
To evaluate the 12 AmpSeq markers for Ren2 PM resistance, genotypic data from Experiment 1 were merged with phenotypes from two families, each having a different phenotyping approach: Set 1 used 78 progeny of ‘Horizon’×V. cinerea B9 scored using the transformed mean of total hyphal transects in vitro; Set 2 used visual ratings (on a 1–5 scale) of natural infection in the vineyard for 91 progeny of ‘Horizon’×Illinois 547-1.
For Set 1, eight markers significantly predicted resistance [−log10(P-value) >2], and the marginal P-values and small R2 values (Figure 2c, Table 3) reflected the moderate QTL (R2=13%) discovered by GBS. For Set 2, 11 of 12 markers were significant (−log10(P-value) >2) (Figure 2d) with an average of 5.3, and all 12 markers explained more variance for vineyard ratings than for in vitro hyphal transects. The results confirmed that AmpSeq markers for Ren2 PM resistance worked in a separate but related breeding family, even when evaluated by a different phenotyping method. A majority of the amplicons (9/12) can be used to track resistant alleles in phase, meaning that the minor allele is associated with PM resistance. AmpSeq markers with similar P-values and R2 patterns clustered together (Figures 3e and f).
To evaluate the 22 AmpSeq markers for acylated-anthocyanin content, association tests were executed on 182 individuals including: 82 progeny from ‘Horizon’×Illinois 547-1, a subset of the original family for marker development, and 100 progeny from V. rupestris B38×‘Horizon’ to evaluate marker transferability. In a pooled analysis, 1 amplicon had a non-significant association, and 15 of 22 amplicons had a −log10(P-value) >10 (Figure 2e, Table 4). Individually, these 15 AmpSeq markers explained an average of 33% of the phenotypic variance, comparable to the QTL contributing 54% of the phenotypic variance. As with the flower sex dendrogram, AmpSeq markers with similar P-values and R2 patterns clustered together (Figures 3g and h).
Criteria of efficient AmpSeq markers
To reduce costs for primer synthesis and testing, we tested models to determine post hoc the relative importance of AmpSeq marker parameters to predict the most effective markers. Four statistical models (logistic regression, support vector machine, decision tree and random forest) were tested using quantified parameters of each primer to classify the primer output from the primer design pipeline into efficient and non-efficient categories. In all cases, regardless of the training set size, the random forest model (with over 90% predictive accuracy) outperformed the other three models (Figure 4). Thus, we developed a decision support tool based on the random forest model (Supplementary File 8), which can be used to facilitate AmpSeq marker selection following the primer design pipeline. The most important parameters in this model included the P-value of LD with the anchor marker, the primer annealing temperatures, the rejection code and the length of the sequence flanking each side of the SNP (Supplementary File 8).
This study had two main goals: (1) to develop a cost-effective and robust genotyping platform for MAS in heterozygous crops, and (2) to generate marker sets for MAS in grapevine breeding. A semi-automated primer design pipeline was developed to convert GBS tags to AmpSeq markers. The primer design procedure was facilitated by a decision support tool to predict which primers would perform appropriately in terms of predictability based on a pre-trained random forest model. For grapevine breeding, AmpSeq markers were developed for three economically important and representative traits with high breeding value: flower sex, PM resistance and acylated-anthocyanin concentration. The strategy was effective for all three traits given the diverse background and high heterozygosity of grapevine. In the process, 54 markers were tested on a total of 760 individuals in 6 breeding families involving 7 Vitis species across 2 breeding programs. The results indicated that the majority of AmpSeq markers have potential for accurate trait predictions, and a MAS package can be implemented for interspecific hybrid families.
Technical considerations of the AmpSeq strategy
Two key decision points for the usage of the AmpSeq primer design pipeline were the selection of an anchor marker and the definition of haploblock boundaries of each QTL. The optimal parameters depended on the nature of the trait and mapping family, and here we describe what worked for these three traits. First, each anchor marker was selected from our final GBS linkage map, after the HetMappS pipeline and genetic map curation42 removed SNPs that had segregation bias, low genotyping rate, putative sequencing error, redundancy or aligned to repetitive genome regions. Thus, the anchor marker was selected from high-quality, mapped SNPs. Second, the anchor marker had a high P-value for marker-trait association. Third, the anchor marker was within the 1.8 LOD interval of the QTL to ensure tight genetic linkage. The anchor marker was used to retrieve SNPs in LD from an un-filtered VCF file within the haploblock defined by the two boundary markers, to increase marker density under the QTL region and provide more alternatives for AmpSeq marker design. The range of the haploblock can be defined by the marker effect as suggested in ‘Materials and methods’, or by other haplotyping software specific for heterozygous species, like iXora58 and SHAPEIT2 (ref. 59).
Two custom options can be set for the amplicon primer design pipeline: one is P-value threshold of the LD test, and the other one is the primer size, which was specified as 22 in this study to accommodate the 50 bp sequencing length and data analysis pipeline. To optimize the P-value of the LD test, all the parameters output from the pipeline were explored by four models (logistic regression, support vector machine, decision tree and random forest), in which the random forest model was the most useful to predict high-quality markers. In the random forest model, four primer parameters primarily contributed to the decision: the P-value of LD with the anchor marker, the rejection code and the annealing temperatures (Tm) of the forward and reverse primers. Supported by the split value of the decision tree model (Supplementary File 9) and general guidelines for PCR based on experimental experience and thermocycling conditions used here, the recommended criteria for efficient amplicon markers includes: (1) the default P-value of LD with the anchor marker should be 1e−25; (2) Tm of primers should be between 52 and 70 °C to obtain high depth and avoid missing data; (3) the rejection code should be 0, representing exactly two alleles identified in GBS tags; (4) extremely high read depths of GBS tags, which may imply presence of a repetitive genome region, should be filtered; and (5) overlapping amplicons, which would result in cross-amplification, should be excluded. The pooling strategy employed in the study (Experiment 1: 384 individuals with 31 amplicon markers for 2 traits; Experiment 2: 384 individuals with 23 amplicon markers for 1 trait) resulted in read depths greater than 50×, low missing rate and equal amplification efficiency, suggesting that multiplexing could be increased for more traits, markers or individuals.
In terms of predictive power of converted markers, the R2 value of the QTL plays a major role. For a Mendelian trait such as flower sex, in which one major locus can explain 93% of phenotypic variance, the association between genotypes and phenotypes is highly significant. Some AmpSeq markers approached 100% predictive accuracy, as they can explain 100% of phenotypic variance in a given family. In contrast, for the major QTLs contributing <50% phenotypic variation, the predictive power is lower, which is consistent with theoretical framework.60 However, the predictive power is still attractive in breeding, when phenotypic screening is expensive and challenging. The notion was supported by several studies showing selection performed combining MAS with phenotypic evaluation is more efficient than selection based on phenotyping alone, especially when the family is large and trait heritability is low.8,61,
Advantage of AmpSeq genotyping platform
The most significant advantage of AmpSeq for MAS is the ability to harness the high-resolution GBS or other NGS techniques during marker development, which provides resiliency against rapid LD decay in species with high heterozygosity and diversity,69 while nearly eliminating issues of missing data and heterozygote under-calling common to GBS. An average density of 55, 218 and 133 kb per AmpSeq marker was obtained for flower sex, PM resistance and acylated anthocyanins, respectively. In grapevine, SSR and indel markers reported to predict flower sex27 have failed in progenies resulting from the cross between complex North American hybrids (BI Reisch and JJ Luby, personal communication), which may be due to the loss of linkage between the causal gene and markers. The situation has been similar for MAS of PM resistance. The closest pairs of Ren2 SSR markers were 550-kb apart in the 12X.2 reference genome.70,
AmpSeq gains efficiency via the NGS capacity to multiplex more markers per trait, more traits and more individuals. Our approach results in the genotyping of a haplotype block in one test for MAS decision making, which showed higher accuracy in cattle breeding73,74 and potential application in heterozygous crops such as cocoa75 and cotton,76 as well as for flowering time, cluster width and berry size in grapevine.77 While biallelic SNPs were traditionally criticized by limited polymorphisms per marker,78 the pooling of multiple SNP markers provides flexibility that overcomes this limitation.79 Further, multiple traits can be genotyped simultaneously for pyramiding, maximizing the application of MAS in contrast to phenotype screening. For example, anthocyanin pigments acylated to hydroxycinnamic acids are of interest to wine and juice producers because their absorbance behavior is not highly pH dependent, as compared with non-esterified anthocyanins in grapes which exist primarily in colorless forms over a juice pH range of 3–4 (ref. 80). Our results indicate that a hermaphrodite vine with increased resistance to PM and higher concentrations of berry acylated anthocyanins could be selected in a single AmpSeq run. While our current barcoding system is for 380 individuals (accommodating 4 blank negative controls), additional barcodes could be used for increased sample throughput.39
As mentioned above, marker transferability is another big concern in a breeding program with diverse germplasm spanning more than one species,30 and SNP microarrays for grapevine and apple are both reported to suffer from this constraint.1,81,82 Loss of polymorphism of SSR markers has also been reported when used in unrelated germplasm, as is the case for the marker gwm261 for the locus Reduced height 8 (Rht8) in wheat83 or Ren4 in grapevine.84 This can result from sampling bias in the original study, since a distinct allele may not be amplified for the marker. The current work indicates that the amplicon markers are transferable in related families in grapevine. Among AmpSeq markers available in the current MAS package, dendrogram construction by genotype clustering could guide the choice of useful markers when transferring to new families without phenotypic data. In some, but not all cases, the markers most significantly associated with the trait clustered together (Figure 3), and in all cases markers not associated with the trait had poor correlations with the other markers. Still, because the amplicons are likely not genotyping the causal allele, re-training markers pools through validating, correcting and supplementing the current MAS package with markers developed in other breeding families by GBS or other genotyping platforms, following the same AmpSeq strategy, is recommended.
Finally, AmpSeq also provides advantages in terms of cost, time and ease of application. Actual cost to obtain AmpSeq data is comparable to single-locus SSRs and cheaper than most other multi-locus platforms. With the simple and straightforward PCR and library preparation protocol, the turn-around time from marker development through testing can be reduced to 1 month, which in our experience is quicker than other SNP platforms and SSRs. In addition, the data analysis pipeline can be automated to output results in a spreadsheet format, which may be attractive to some plant breeders, eliminating bioinformatic challenges typical of GBS data and removing the need for graphic interpretation typical of other genotyping platforms.
A case of MAS implementation in grape breeding
The three traits selected for testing are economically important and representative targets for North American hybrid winegrape cultivar development: flower sex, PM resistance and acylated anthocyanins. Hybridization with wild species takes place to introgress positive adaptive traits including disease resistance, pest resistance and cold hardiness,20,85,
Customization of the AmpSeq strategy for other crops
In summary, the AmpSeq platform described here should provide several considerable advantages for breeders of other crop species due to its reliability, flexibility, high-throughput, cost-effectiveness, ease-of-automation and speed. This approach is among the first practical examples showing how SNP-based markers can be applied in a high-throughput screening, which may increase the application of GBS tags and leverage the massive power of NGS. This practicality may help fill the gap between genomic discoveries and breeding applications. With some customization, we propose that the AmpSeq strategy can be widely used by other crops.
We would like to thank Michelle Schaub, Hema Kasinathan, Anna Nowogrodzki, Paige Appleton and Jackie Lillis for their technical support phenotyping PM resistance; Steve Luce and Peter Hemstad for helping to maintain the mapping families used in this study; Mani Awale for phenotyping flower sex at the South Dakota State University breeding program; Steve Bogdanowicz for helpful discussions when designing the wet lab techniques and barcoding strategy; and the Cornell University Biotechnology Resource Center for providing the necessary infrastructure and personnel to generate the raw genetic data analyzed and presented here. The US Department of Agriculture National Institute of Food and Agriculture Specialty Crop Research Initiative provided funding for this project (award no. 2011-51181-30635).