Genome-wide association study in Asia-adapted tropical maize reveals novel and explored genomic regions for sorghum downy mildew resistance

Globally, downy mildews are among the important foliar diseases of maize that cause significant yield losses. We conducted a genome-wide association study for sorghum downy mildew (SDM; Peronosclerospora sorghi) resistance in a panel of 368 inbred lines adapted to the Asian tropics. High density SNPs from Genotyping-by-sequencing were used in GWAS after controlling for population structure and kinship in the panel using a single locus mixed model. The study identified a set of 26 SNPs that were significantly associated with SDM resistance, with Bonferroni corrected P values ≤ 0.05. Among all the identified SNPs, the minor alleles were found to be favorable to SDM resistance in the mapping panel. Trend regression analysis with 16 independent genetic variants including 12 SNPs and four haplotype blocks identified SNP S2_6154311 on chromosome 2 with P value 2.61E-24 and contributing 26.7% of the phenotypic variation. Six of the SNPs/haplotypes were within the same chromosomal bins as the QTLs for SDM resistance mapped in previous studies. Apart from this, eight novel genomic regions for SDM resistance were identified in this study; they need further validation before being applied in the breeding pipeline. Ten SNPs identified in this study were co-located in reported mildew resistance genes.


Results
Phenotypic evaluation for SDM resistance. The CAAM panel with a set of 368 lines was evaluated against SDM at Mandya, India, during the years 2012 and 2015 under artificial epiphytotic conditions. The disease severity was high in both years, with the standard susceptible check CM500 showing 100 per cent mortality. The mean disease severity of the trials was also high, with 96.28 and 89.21 per cent during 2012 and 2015, respectively. A Box-Cox transformation lambda (λ) value of 0.96 was obtained and used to transform the data. There was significant genotypic variance (p ≤ 0.001) for the trait in the panel and the broad-sense heritability estimated by the combined analysis was high (0.72). Heritability estimates for 2012 and 2015 were 0.92 and 0.74, respectively (Table 1).
Population structure and linkage disequilibrium (LD) decay. Principal component analysis using 63,546 SNPs (CR 0.9, and MAF ≥0.1 with LD pruning) did not reveal a clear population structure with the first three principal components (Fig. 1). All the important tester lines from various maturity groups used in the Asian tropics and mapped within the 3D PCA plot revealed the spread of SDM resistant lines within the studied panel. Two early maturing testers developed in the Asian tropics, CML470 and CML474, clearly grouped into one cluster in the PCA plot. The Scree plot plotted (Supp. Figure 1) with all the eigen values suggested 10 principal components that should be considered in the GWAS to account for the population structure.
Genome-wide LD across 37,043 SNPs with minor allele frequency (MAF) ≥0.3 was investigated in the CAAM panel. The genome-wide LD decay was plotted as LD (r 2 ) between adjacent pairs of markers versus distance in kb which showed that LD decay was 15.53 Kb at r 2 = 0.1 and 5.39 kb at r 2 = 0.2, (Fig. 2). Chromosome-wise LD analysis showed fastest LD decay in chromosome 5 (10.35 Kb at r 2 = 0.1 and 3.59 Kb at r 2 = 0.2), followed by chromosome 2 (11.39 Kb at r 2 = 0.1 and 3.95 Kb at r 2 = 0.2). However, the slowest LD decay was observed on chromosome 8 (28.8 Kb at r 2 = 0.1 and 9.99 Kb at r 2 = 0.2), followed by chromosome 4 (22.14 Kb at r 2 = 0.1 and 7.68 Kb at r 2 = 0.2) in this panel (Supp. Table 1).

Marker-trait associations for SDM resistance.
A subset of markers with a call rate ≥0.7 and minor allele frequency ≥0.03 were used for association studies. With the total 333,039 SNPs fulfilling these criteria, a naïve association analysis with only these SNPs was carried out. The Quantile-quantile plot (QQ plot) drawn using the observed against the expected -log 10 (P values) showed significant genomic inflation (Fig. 3a). Following this, to control the genomic inflation, a general linear model analysis including the genotypes and 10 PCs (G + Q) was carried out, which also showed high genomic inflation in the QQ plot. Further, a single marker mixed linear model including the genotypes, 10 PCs and the kinship matrix (G + Q + K) was able to keep the genomic inflation within acceptable limits. Based on this analysis, 26 significant SNP hits for disease response were identified that had Bonferroni-corrected P values ≤ 0.05 on chromosomes 1, 2, 3, 5, 6, 8 and 9 ( Fig. 3b; Supp. Table 2). SNP S2_6154311 on chromosome 2 showed the strongest association with the lowest P value (1.0E-012) and Bonferroni-corrected P value (1.0E-06). The phenotypic variance explained by significant SNPs ranged from 5.50 to 13.07%; SNP S2_6154311 and S1_88004654 explained the highest and lowest proportion of variances, respectively, from among the SNPs associated with DM resistance (Supp. Table 2). Interestingly, in all 26 associations identified, the minor allele was found to contribute positively towards lowering SDM incidence (Supp. resistance in multiple studies (Table 2). Annotated genes in the reference genome of B73 RefGen_V2 (available at http://ensembl.gramene.org/Zea_mays) carrying the 26 associated SNPs showed many genes reported to be associated with broad spectrum disease resistance or resistance specifically to mildews (Table 3).
Haplotype estimation and trend regression analysis. The 26 SNPs identified as being significantly associated with SDM incidence were spread across seven chromosomes. Some of these SNPs had physical    coordinates (based on B73 RefGen_V2) that were in close proximity, and some of them were SNPs located inside the same annotated genes. Hence haplotype block estimation analysis was carried out to determine whether these SNPs form haplotype blocks using the haplotype defining algorithm to minimize historical recombinations. In this analysis, four haplotype blocks were detected on three chromosomes from among the 26 SNPs associated with SDM incidence. Based on this analysis, 16 independent variants (including both SNPs and haplotype blocks) were considered for further analysis. Trend regression analysis on these 16 variants against SDM incidence estimated the proportion of variance explained by each of these to range from 3.61 to 26.74% (Table 4). S2_6154311 contributed 26.74% of the variation, followed by Hap-1, which contributed 17.34% of the variation.

Discussion
The downy mildews of maize caused by Peronosclerospora species are a group of economically important and widespread diseases in many tropical and subtropical regions of the world 5 . The lack of adequate and genetically diverse sources of resistance to DMs has been a major constraint in tropical Asia, especially in the maize growing environments of South and Southeast Asia. Introgressing DM resistance in the breeding pipeline is one of the priorities of CIMMYT's maize breeding programs as well as National Agricultural Research System (NARS) partners in Asia. Breeding for DM resistance, especially phenotyping under artificial epiphytotic conditions with high disease pressure is cost-intensive. Therefore, identifying, validating and deploying molecular markers with DM resistance can increase the efficiency of developing DM resistant tropical and subtropical maize germplasm especially in those countries where DM resistance is mandatory for the release of maize varieties. The CAAM panel used in this study includes tropical/subtropical inbred lines from CIMMYT breeding programs in Asia, Mexico, Kenya, Zimbabwe and Colombia, that are also adapted to the Asian tropics. Derivatives of several lines developed by crossing CIMMYT populations with Suwan sources of DMR from Thailand, and the Asian mildew acid tolerant late (AMATL) populations are included in the CAAM panel. This panel is an excellent resource to study genomic regions for SDM resistance considering CIMMYT ARMP's breeding efforts aimed at improving DM resistance of Asian maize germplasm. Heritability estimates of the trials evaluated at Mandya were high (0.74-0.92); for the two years (2012 and 2015) combined, the SDM scores ranged from 7.81 to 100%, indicating there is genetic variability for SDM in the CAAM panel. Genotypic variance was also highly significant for SDM resistance in the CAAM panel. Lines showing the lowest disease percentage were CML433, (CML474/ S92145-2EV-7-3-B*5)-F2-25-1-B, CML472, CA14709-4-7-5-1, which were either developed for resistance to DM for the Asian region or derived from crosses with DMR lines.
A GBS protocol commonly used by the maize research community was applied in this study, which produced a partially imputed SNP dataset, calling about a million SNPs. For principal component and kinship analysis, high quality SNPs with filtering criteria of CR ≥0.9 and MAF ≥0.1, and pruned at an r 2 threshold of 0.5 were used. It was necessary to reduce the number of SNPs for computational efficiency, but by selecting SNPs that have higher MAF and by LD-pruning, due care was taken to avoid confounding of the calculations due to large blocks of SNPs that have strong LD with each other 33 . The PCA showed only a moderate population structure within the CAAM panel, with no clear grouping of the lines into well-defined clusters. The first three PCs explained only 39.3% of the variation in the panel. The DMR lines did not fall into a particular cluster, but were scattered among the groups. This was also observed by George et al. 14   along with CIMMYT's tropical and subtropical lines, and suggested that the tropical and subtropical lines in the Asian region possess significant genetic diversity that did not allow a clear-cut distinction into well-defined clusters. Warburton et al. 34 observed that the CIMMYT populations that served as germplasm sources for many Asian lines had a large amount of diversity within, rather than between, source populations. Due to this heterogeneous nature of the CIMMYT populations, they suggested that it would be difficult to find a well-defined population structure in the Asian maize lines. Genetic diversity is directly related to LD, which in turn affects LD based association mapping; the larger the LD blocks, the slower the LD decay, which will result in lower mapping resolution 35,36 . For estimating genome-wide LD, we used a subset of the SNP data filtered for CR ≥0.9 and MAF ≥0.3. This was also done for computational efficiency required for the analysis, with due care taken for SNP selection to represent all population structure groups by way of increasing MAF. The same set of SNPs used for PCA were not used in this analysis as the former set had already been pruned at an r 2 threshold of 0.5. In the CAAM panel, the average LD decay of the panel was 15.53 kb at r 2 = 0.1 and 5.39 kb at r 2 = 0.2, with localized differences within and between chromosomes. The slowest LD decay was observed in chromosome 8, and a similar observation was made by Suwarno et al. 35 , while studying a tropical/subtropical carotenoid association mapping panel. LD decay in tropical maize germplasm is generally found to be high as compared to temperate maize germplasm. Lu et al. 36 suggested the LD decay distance in temperate maize germplasm (10 to 100 kb) was 2 to 10 times higher than that of tropical maize germplasm (5 to 10 kb). Similarly, Romay et al. 37 studied 2,815 inbred lines and found LD decay of 1 kb for tropical maize germplasm and 10 kb for temperate germplasm. The high LD decay of tropical maize germplasm suggested that it has a broader genetic base resulting from high recombination events and might have more rare alleles than temperate lines 38 ; this provides an excellent opportunity for maize breeders to select germplasm that combines higher grain yield with disease resistance and relevant abiotic stress tolerance traits. QTL mapping for DM resistance was reported by many groups on multiple DMs in diverse germplasm and environments 5,[15][16][17][18][19]31 . A genome-wide study using high density marker information from widely adapted maize lines has not yet been reported. On Asia-adapted CIMMYT maize lines genotyped at high density, three methods of association testing were employed in this study; the single locus mixed linear model was eventually selected for reporting results, as it was successful in controlling genomic inflation by correcting for population structure and familial relatedness 39 . A set of 26 SNPs on seven chromosomes were found to be significantly associated with SDM resistance (Supp. Table 2). Since many of the SNPs identified were having strong LD in average pairwise LD analysis (Sup. Table 3), we carried out haplotype block detection among those SNPs. We estimated the haplotype frequencies and identified 16 genomic regions in linkage equilibrium, either as SNPs or haplotype blocks. The SNP alleles/haplotypes were regressed against the phenotype to ascertain the genomic regions that contributed most significantly to DM resistance. SNP/haplotype-trait associations identified were assessed by way of previously reported QTL in these genomic regions, and functional domains in the predicted gene models carrying the SNPs. This will further be followed up by validation in independent bi-parental populations, before deployment in breeding pipeline.
The most significant association was found to be on chromosome 2 (SNP S2_6154311; Bonferroni-corrected p value = 3.92E-23) contributing 26.70% to phenotypic variation. The SNP S2_6154311 was found to be located in bin 2.02 according to the Maize Genome Data Base (WWW.maizegdb.org, Maize B73 RefGen_V2). Lohithaswa et al. 19 identified a QTL for SDM resistance in the chromosomal bin 2.02 between SSR markers umc2363 and umc1165 (Table 3). Nair et al. 16 identified QTL on chromosome 3, bin 3.04/3.05 (between SSR markers umc1223 and bnlg420) in a backcross mapping population that was also phenotyped at Mandya for SDM resistance. Jampatong et al. 18 also identified a QTL on chromosome 3.04 for SDM resistance in their study on an F 2:3 mapping population. Our study also identified two highly significant (Bonferroni value ≤ 0.05) SNPs, S3_28841309 and S3_51326624, on chromosome 3 (bin 3.04) for SDM resistance, explaining 9.20 and 13.80% of the phenotypic variance, respectively. This particular chromosomal bin was found to be harbouring clusters of disease resistance genes 40 for as many as six of the 11 diseases in a meta-analysis study 41 . Phumichai et al. 30 identified significant SSR marker umc1033 linked with DM resistance on chromosome 9 (bin 9.02) in the association studies. In our study, SNP S9_16183526 (Bonferroni value ≤ 0.05) located on chromosome 9 in bin 9.02 was identified for DM resistance.
Another most important genomic region for DMR is on chromosome 6, which has been identified in many studies 5,16,18 . In our study, nine SNPs were found to be significantly associated with SDM resistance on chromosome 6, and seven of them formed two haplotype blocks. One SNP and two haplotype blocks were located on bin 6.05, where a significant QTL was identified by George et al. 5 for multiple DM resistance against five different pathogens including the SDM pathogen. Nair et al. 16 also identified a major QTL on chromosome 6.05 for SDM and RDM in their backcross population. The significant associations identified in this region in our GWAS add strength to the earlier findings, and could be taken up for further analysis and potential application in maize breeding programs. Another SNP identified on chromosome 6, SNP S6_28477908, is located in bin 6.01; two earlier studies identified SDM resistance QTL in this bin 18,19 . Chromosome 6 is also reported to harbour clusters of resistance genes for multiple biotic stresses, and bin 6.01 was reported to have an array of resistance genes, as in the case of chromosomal bin 3.04 40 . The other significant associations identified in this study seem to be novel, and were not found in the previous QTL mapping studies on SDM resistance (Table 4).
Candidate genes associated with SDM. Generally, plant disease resistance is achieved through various genes acting on mechanisms ranging from innate plant immunity to broad spectrum basal defence mechanisms to specialized mechanisms targeting specific pathogens. Many of the SNPs that were found to be associated with SDM resistance in this study were located in annotated genes (B73 RefGen_V2) with functional domains that were previously reported to influence disease resistance in various crops (Table 3). Mildews, including both downy mildews and powdery mildews, are caused by obligate biotrophic pathogens that require a live host for SCIenTIfIC RePoRTS | (2018) 8:366 | DOI:10.1038/s41598-017-18690-3 their survival. At least 10 out of the 26 SNPs identified as significant in the present study were localized within five different genes that had been previously implicated in mildew resistance in different crops. A novel haplotype identified on chromosome 1 was located within GRMZM2G040441, which has an active MLO-like domain. An MLO protein was cloned initially from barley as the resistance gene against powdery mildew 42,43 , and thereafter, many orthologs and paralogs for the MLO gene were identified and characterized in great detail in many plants 44 . MLO proteins are reported to have characteristic domain features which aid the defence signalling mechanism. Though we have not encountered reports on MLO-like proteins involved in downy mildew resistance, it would be worthwhile to investigate the role of MLO-like proteins in SDM resistance in maize.
Significantly associated SNPs on chromosome 6 were located in four genes, three of which are adjacent genes predicted on bin 6.05. Whether the association of many SNPs in close proximity is due to high LD in the region or whether there is a cluster of genes operating towards a resistance reaction needs further investigation. Three significant associations on chromosome 6 were discovered within the Cupin super family gene. Germins, which are part of this super family, have been reported to be involved in powdery mildew resistance in wheat and barley 45,46 . Similarly, three significantly associated SNPs were found be located within GRMZM2G108057, putatively coding for a cation transporting ATPase. Regulation of cation ATPase is involved in one of the earliest cellular responses during activation of plant immune responses by modulation of extracellular pH and acidification of the apoplast, which was observed in incompatible interactions of barley plants with the powdery mildew fungus Blumeria graminis 47 . Another closely located gene, GRMZM2G479608, is predicted to code for a transmembrane Bax inhibitor motif containing protein 4. In barley, over-expression of the Bax Inhibitor 1 gene was found to breakdown mlo mediated penetration resistance against powdery mildew 48 . Watanabe et al. 49 concluded that Bax Inhibitor 1 was a conserved plant cell death suppressor, which is a key molecular switch downstream from a range of stress signals. On chromosome bin 6.01, a significant association was detected with a SNP located in gene GRMZM2G124297 predicted to code for the HECT sub-class of E3 Ubiquitin ligases. This class of proteins have been reported to provide innate immunity in plants 50 . E3 Ubiquitin ligases were also found to be involved in SAR independent resistance to the downy mildew pathogen in Arabidopsis thaliana 51,52 . As many as eight novel associations were identified in this study, and five of them were within predicted genes with functional domains implicated in disease resistance. Though the correlations identified in association studies cannot be dubbed as causations, associated SNPs on genes previously reported to be trait related add much value and strength to the results to carry forward with independent validation in biparental populations and functional studies of the candidate genes.

Conclusion
The present study is the first one to report significant SNPs and haplotypes associated with SDM resistance in maize, based on GWAS. Six of these SNPs/haplotypes are located within major QTL intervals reported in previous mapping studies on biparental populations, and hence, could be of value to maize breeding programs in tropical Asia that are intensively engaged in developing and deploying DM resistant elite maize varieties. Also, since ten marker-trait associations were identified within candidate genes implicated in disease resistance, especially mildew resistance, the findings from the present study could serve as a strong base for future functional studies to dissect resistance to downy mildews in maize.

Methods
Germplasm. The CAAM panel consisted of 419 inbred lines that were developed and adapted in Asia, based on a global collection of maize germplasm available from CIMMYT. It involved inbred lines with tolerance to abiotic stresses like drought, high temperature and excess moisture, besides Quality Protein Maize (QPM) lines, and lines developed from downy mildew resistant populations in Asia. The CAAM panel includes lines that are adapted to tropical, subtropical, lowland, mid-altitude and highland environments, and were classified as early maturing, intermediate maturing and late maturing based on growing degree days (GDD).
Phenotypic evaluation. The CAAM panel, with a set of 368 lines, was phenotyped for SDM (Peronosclerospora sorghi) at the hot-spot location of Mandya, Karnataka, India (12°N; 76°E; 695 masl; 705 mm/ year average annual rainfall) during the monsoon season in 2012 and 2015. Mandya was also identified as the location with the highest DM susceptibility scores, along with Maros in Indonesia, in an earlier study that evaluated a common mapping population across five locations in Asia 5 . Artificial epiphytotic conditions were created following the infector row method by planting 'infector rows' of CM500, 30 days before planting the test entries (in the first week of July). The infector rows were planted on all sides of the experimental block, and also on one bed (2 m wide) after every two beds of test entries (each 3 m wide) to ensure uniform disease pressure on the test entries across the field. Fresh conidia were collected early in the morning (between 2 and 3 am) from 3-to 4-week-old systematically infected maize plants in the DM nursery by suspending the severely infected leaves in water. The conidia suspension was sprayed on seedlings (2-3 leaf stage) in the infector rows using hand-sprayers immediately after collection. The inoculation process was repeated continuously for 7-10 days to achieve good disease incidence on infector rows. Test entries were planted in one-row plots about 30 days after infector row planting (in the first week of August, when relative humidity was >90%), at the time when DM infection on infector row plants reached >70%. Each test entry was planted in standard 3-m-long rows with plant-to-plant spacing of 20 cm and row-to-row spacing of 75 cm in an alpha-lattice design with two replications. Similar to the infector rows, conidial suspension was collected from the DM nursery and sprayed on the test entries and disease indicator rows, starting from 7 to 8 days after seedling emergence, and during the following 7 days to avoid any chance of escape 53 . Severe SDM incidence (>95%) on all indicator rows of susceptible check entries planted along with the test entries was considered an indication of uniform and sufficient disease pressure throughout the trial. Disease scoring. Percentage disease incidence was calculated as the number of SDM infected plants at 21 and 35 days after emergence compared to the total number of plants per plot. Disease percentage at 35 days was considered as the final score because the disease score remained constant after 35 days. A DM rating was recorded using a modified rating method suggested for maize 54 , i.e. 0% = highly resistant, 1-10% = resistant, 11-25% = moderately resistant, 26-50% = moderately susceptible, 51-75% = susceptible, and 76-100% = highly susceptible.
Phenotypic data analysis. The phenotypic data on percentage disease scores were skewed towards susceptibility in the CAAM panel. The data were subjected to Box-Cox power transformations 55 , which helped the data approach normality. Best Linear Unbiased Estimators (BLUEs) obtained using the software METAR-4.1 56 from multi-location data analysis were used for GWAS analysis. Analysis of variance components were estimated using Genstat (14 th edition) 57 . Broad-sense heritability of the combined analysis across years was calculated as H 2 = σ 2 g / (σ 2 g + σ 2 ge /e + σ 2 e /er), where σ 2 g , σ 2 ge and σ 2 e are the genotypic, genotype-by-year interaction and error variance components, respectively, and e and r are the number of years and number of replicates within each year included in the corresponding analysis, respectively. DNA isolation and genotyping. DNA of all inbred lines was isolated from the leaf samples of 3-4-week-old seedlings using the standard CIMMYT laboratory protocol 58 . Genotyping was carried out on the GBS platform 59 at the Institute of Genomic Diversity, Cornell University, Ithaca, USA. Genomic DNA was digested with the restriction enzyme ApeK1. GBS libraries were constructed in 96-plex and sequenced on Illumina HiSeq. 2000. SNP calling was performed using the TASSEL-GBS pipeline with B73 as the reference genome 60 to generate a comprehensive genotype collection, the AllZeaGBSv2.7 Production Build (www.panzea.org). The TASSEL-GBS pipeline is optimized for low sequencing depth (0.5 to 3×) over a large number of markers in a large sample of individuals. This collection included genotypes of more than 60,000 maize samples. In this study, we focused on the subset of 368 lines forming the CAAM panel, and partially imputed raw GBS data were used for further genetic analyses. The original partially imputed data set consisted of 955,690 SNPs across all the chromosomes, which included partially imputed data based on an algorithm that searched for the closest neighbour in small SNP windows across the entire maize database, allowing for a 5% mismatch 37 . If the requirements were not met, the SNP was not imputed, leaving only about 10% of the data unimputed. For GWAS, filtration criteria of call rate (CR) ≥0.7 and minor allele frequency (MAF) ≥0.03 were used, yielding 333,039 SNPs. For calculating PCA and kinship matrix, high quality SNPs with filtering criteria of CR ≥0.9 and MAF ≥0.1, and pruned at an r 2 threshold of ≤0.5 were used for selecting 63,546 SNPs. Filtering criteria of CR ≥0.9 and MAF ≥0.3 generated 37,043 SNPs for the LD adjacent pair analysis used for calculating r 2 .
Principal component and kinship analysis. The PCA method described by Price et al. 61 was implemented in the SNP & Variation Suite (SVS) V_8.6.0 (SVS, Golden Helix, Inc., Bozeman, MT, www. goldenhelix. com). A three-dimensional plot of the first three principal components was drawn to visualize the possible population stratification among the samples. A scree plot was plotted to determine the number of principal components to be included in the GWAS analysis. A kinship matrix was computed from identity-by-state distances matrix as executed in SVS V_8.6.0., where IBS distance = (no. of markers IBS2) + 0.5 × (no. of markers IBS1) no. non-missing markers, where IBS1 and IBS2 are the states in which the two inbred lines share one or two alleles, respectively at a marker 62 . LD analysis and haplotype trend regression. The extent of genome-wide and chromosome-wise linkage disequilibrium (LD) was based on adjacent pairwise r 2 values (the squared correlation coefficients among alleles at two adjacent SNP markers) between adjacent SNPs among 37,043 high quality SNPs from the GBS data and physical distances between those SNPs 63 . Nonlinear models with r 2 as responses (y) and pairwise distances (x) as predictors were fitted into the genome-wide and chromosome-wise LD data using the 'nlin' function in R 64 . Average pairwise distances in which LD decayed at r 2 = 0.2 and r 2 = 0.1 were then calculated based on the model given by Hill & Weir 65 . LD analysis and haplotype frequency estimation were also done among all the SNPs found to be associated with SDM resistance using the Expectation Maximisation (EM) algorithm 66 as implemented in SVS V_8.6.0. Haplotype frequencies were estimated from among the SNPs associated with the trait using 50 EM iterations, an EM convergence tolerance of 0.0001 and a frequency threshold of 0.01. Haplotype blocks were detected based on the block-defining algorithm to minimise historical recombinations 67 . Trend regression analysis of the haplotypes and SNPs was carried out based on a stepwise regression of the SDM phenotype with the pre-estimated haplotypes with forward elimination and a P-value cut-off of 0.001.

GWAS.
GWAS was carried out on the SDM phenotype based on three methods: using only genotype data (G; Naïve model), using genotype data corrected for population structure (G + Q; General linear model (GLM)) and using genotype data corrected for both population structure and kinship (G + Q + K; single locus mixed linear model (MLM)). These analyses used genotype association tests with an additive model for the first two analyses and a mixed model using single locus (EMMAX) 68 for the third analysis as implemented in SVS V_8.6.0. The mixed association mapping model used was Y = SNP*β + PC*α + K *µ + ε, where Y = response of the dependent variable (SDM percent), SNP = SNP marker (fixed effects), PC = principal component coordinate from the PCA (fixed effects), K = kinship matrix (random effects), α is the vector of PC, β and µ are the vectors of SNP and K, respectively, and ε is the error. In the linear models, the first 10 principal components were used as covariates. Manhattan plots were plotted using the −log 10 P values of all SNPs used in analysis; Q-Q plots were plotted of the observed −log 10 P values and the expected −log 10 P values to study the genomic inflation.