Genome-wide association studies in tropical maize germplasm reveal novel and known genomic regions for resistance to Northern corn leaf blight

Northern Corn Leaf Blight (NCLB) caused by Setosphaeria turcica, is one of the most important diseases of maize world-wide, and one of the major reasons behind yield losses in maize crop in Asia. In the present investigation, a high-resolution genome wide association study (GWAS) was conducted for NCLB resistance in three association mapping panels, predominantly consisting of tropical lines adapted to different agro-ecologies. These panels were phenotyped for disease severity across three locations with high disease prevalence in India. High density SNPs from Genotyping-by-sequencing were used in GWAS, after controlling for population structure and kinship matrices, based on single locus mixed linear model (MLM). Twenty-two SNPs were identified, that revealed a significant association with NCLB in the three mapping panels. Haplotype regression analysis revealed association of 17 significant haplotypes at FDR ≤ 0.05, with two common haplotypes across three maize panels. Several of the significantly associated SNPs/haplotypes were found to be co-located in chromosomal bins previously reported for major genes like Ht2, Ht3 and Htn1 and QTL for NCLB resistance and multiple foliar disease resistance. Phenotypic variance explained by these significant SNPs/haplotypes ranged from low to moderate, suggesting a breeding strategy of combining multiple resistance alleles towards resistance for NCLB.

. Summary of selected genetic mapping studies for NCLB resistance using different mapping populations in various genetic backgrounds. Molecular markers used in these studies were Single nucleotide polymorphism (SNPs), Simple sequence repeats (SSR), restricted fragment length polymorphism (RFLP), cleaved amplified polymorphic sites (CAPS). Phenotypic traits like area under disease progress curve (AUDPC), plant disease index (PDI), disease leaf area (DLA), incubation period (IP), disease severity (DS), weighted mean disease (WMD) were used for QTL analysis. www.nature.com/scientificreports/ revealed that chromosome 8 possesses a cluster of QTLs and significant real (consensus) QTLs for NCLB, GLS and SLB with confidence interval (CI) lesser than 5 cM 10 . QTL mapping, though a powerful tool, has its own limitations such as i) limited number of recombination events during population development resulting in low mapping resolution, ii) only two alleles of each mapping population studied and iii) difficult to identify the positional candidate genes or to make strong inference on linkage relationships among other QTL identified 6 . In most of the QTL mapping studies, the mapping populations and breeding populations are unrelated, and hence the translation of the QTLs identified to breeding targets had been very few. GWAS, in assembled mapping panels representing the wide diversity in breeding programs, is another powerful tool to dissect complex traits and complements linkage mapping by improving mapping resolution. GWAS has been used to identify allelic variants that allow improved tolerance to various biotic and abiotic stresses in maize. Resistance to a large number of economically important and complex diseases of maize like Fusarium ear rot 16 , GLS 17,18 , head smut 19 , NCLB 6,20,21 , SLB 22 , sugarcane mosaic virus 23 , Maize streak virus 24 , Maize lethal necrosis 25 , sorghum downy mildew 26 and tar spot 27 have been dissected using GWAS. Several reports on GWAS for NCLB resistance in maize are available, mostly in temperate germplasm and environments. In a GWAS study conducted by Van Inghelandt et al. 21 , a large association mapping panel of 1487 inbred lines of temperate origin was used to dissect the genetic architecture of NCLB resistance, and reported association of significant SNPs on chromosomes 2, 5, 6 and 7 whereas, some of the SNPs were also identified on chromosomes 7 and 9 after correcting for flowering time variate. In a nested association mapping population of 4630 RILs, 208 SNPs associated with NCLB resistance on all 10 chromosomes of maize were identified, along with 29 QTLs, mostly with multiple loci 6 . Ding et al. 20 studied the CIMMYT tropical maize germplasm that were phenotyped at different locations of Mexico and Africa for NCLB resistance and identified 12 SNPs on chromosome 3, 4, 6, 7, 8 and 10 for AUDPC, 14 SNPs on chromosome 1, 2, 3, 4, 6, 7, 9 and 10 for mean disease rating and 19 SNPs on chromosome 3, 4, 5, 7 and 10 for NCLB rating. In general, QTL mapping and GWAS studies in maize have revealed one important aspect that, NCLB resistance is a polygenic trait, and resistance due to major effects contributed by Ht genes was rare in most of the germplasm and environments studied. These studies suggested that there could be environment-specific and germplasm-specific moderate to large effect genomic regions controlling resistance to NCLB, which could be exploited in incorporating quantitative resistance to this important disease in breeding programs. Association mapping studies for NCLB resistance have been conducted by various research groups using temperate and tropical maize germplasm in American, African and European environments. However genome wide association studies using high density markers from large set of maize lines from Asian region are seldom reported. Therefore, the present research was designed to conduct GWAS for NCLB resistance under Asian conditions using tropical maize germplasm represented in three maize panels. These three panels represented maize germplasm from CIMMYT and several national partners, bred in different geographies across the tropics, and hence potentially selected in the presence of varied races of S. turcica. They represent most of the genetic diversity that is available across different tropical/sub-tropical geographies where CIMMYT breeding programs operate, and hence could be ideal resources for understanding the genetics of NCLB disease in South Asian tropics. In this study, apart from single SNP-based GWAS, we also identified haplotypes for resistance to NCLB within and across association panels representing differing germplasm backgrounds.

Results
Phenotypic evaluation for resistance to NCLB. Subsets of three AM panels, CIMMYT Asia Association Mapping (CAAM), Drought Tolerant Maize for Africa (DTMA) and Improved Maize for African Soils (IMAS), consisting of 376, 224 and 324 lines, respectively were evaluated for NCLB resistance across different locations/years in India. Disease severity was high in the CAAM panel, with maximum score of 5.00 on a scale of 1.00-5.00 over all the three locations, with minimum disease score of 2.02, 1.50 and 1.46 at Mandya, Arabhavi and Kashmir, respectively. The average disease score across locations was 3.74. Broad-sense heritability (h 2 ) was moderate to high (0.58-0.70) across individual locations with presence of significant genotypic variance (P value ≤ 0.001). DTMA panel at Mandya observed a mean of 2.85 with minimum disease score of 1.97 and maximum score of 4.76. Broad sense heritability estimated was 0.53, with highly significant genotypic variance (P value ≤ 0.001). Similarly, NCLB scores in IMAS panel ranged from 1.5-4.00 at Mandya in the first year and 2.00-5.00 in the second year, with mean rating of 2.55 and 3.44 during the two years, respectively. Overall analysis across the years revealed an average disease score of 3.00 with a maximum score of 4.55 and a minimum score of 1.96 where overall heritability (h 2 ) estimate of 0.54 was observed with 0.48 in season 1 and 0.74 in season 2, respectively. IMAS panel also revealed significant genotypic variance (P value ≤ 0.001) ( Table 2). The frequency distribution of mean NCLB disease ratings followed a near normal pattern in CAAM, DTMA and IMAS (Fig. 1). All the three AM panels revealed a significantly negative genotypic correlation between NCLB scores and days to anthesis (DA) (P value ≤ 0.001) (Table 3). Hence, best linear unpredicted estimates (BLUPs) were estimated using DA as a covariate to further conduct GWAS for NCLB resistance in all association panels.
Principal component analysis and linkage disequilibrium (LD) decay. Principal Component Analysis (PCA) was performed by using the high density Genotyping by Sequencing (GBS) data, filtered for a call rate > 0.9, minor allele frequency > 0.1 and LD pruning at r 2 = 0.5. The first three principal components of each panel are depicted in Fig. 2 The genome wide linkage disequilibrium (LD) was plotted as LD (r 2 ) between adjacent pairs of markers versus the distance between adjacent markers in Kb (  Table 1).

GWAS for NCLB resistance.
A robust subset of SNPs from high density imputed 955 K GBS genotypic data was used to conduct GWAS with 293,606, 297,437, and 309,608 SNPs after following the filtration criteria of call rate ≥ 0.7 and minor allele frequency ≥ 0.05 in CAAM, DTMA and IMAS panels, respectively. Naïve or G-test association model showed highest genomic inflation; whereas MLM model corrected for both population struc-   www.nature.com/scientificreports/ www.nature.com/scientificreports/ ture and kinship revealed the least genomic inflation as observed in the Quantile-Quantile (QQ) plots ( Fig. 4). Therefore highly significant associations for NCLB resistance in the panels were ascertained based on MLM analysis. The narrow sense heritability for NCLB resistance in CAAM, DTMA and IMAS panels was 0.56, 0.52 and 0.53, respectively based on the IBS kinship matrix employing all SNPs used in GWAS. A total of five SNPs were identified to be associated with NCLB resistance in CAAM panel with P values ranging from 5.27 × 10 -07 to 8.22 × 10 -06 . Three SNPs in DTMA panel and 14 SNPs in IMAS panel were identified in MLM analysis, with P values ranging from 3.76 × 10 -06 to 1.63 × 10 -05 and 4.98 × 10 -08 to 8.60 × 10 -06 in DTMA and IMAS panels, respectively (Table 4). SNP S7_165196774 located on chromosome 7 and three other SNPs S8_95422954, S8_95422964 and S8_95422973 with closely placed physical co-ordinates on chromosome 8 showed the lowest P values for NCLB resistance in the CAAM panel explained phenotypic variance ranging from 5.32 to 6.68%. Similarly in DTMA panel, GWAS identified three highly significant SNPs on chromosome 7 (S7_110282525, S7_110282502 and S7_131034143), explaining phenotypic variance ranging from 8.45 to 9.65%. A group of 8 SNPs located at close physical co-ordinates near 157 Mb on chromosome 8 were among the 14 SNPs which were identified to be showing most significant association with NCLB in the IMAS panel (Table 4). SNP S8_157987611 was observed to be significantly associated at a P value of 4.98 × 10 -08 and explained 9.10% of phenotypic variance. GWAS conducted on different panels identified several SNPs that have co-localized physical co-ordinates within chromosomal bins where Ht genes or QTLs for NCLB resistance were previously reported. Predicted gene annotations in the B73 maize reference genome version 2 (http://ensem bl.grame ne.org/Zea_mays) were studied to identify the genes based on the SNPs associated with NCLB resistance. Several significant SNP associations in these three GWAS studies were located within genes with functional domains leading to biotic and abiotic stress tolerance, immune response, metabolism, plant development and maturity and responses to abiotic stresses (Table 4).   Table 2). Thirty nine haplotypes were found to be significantly associated with NCLB disease rating in DTMA panel with explained phenotypic variance of 2.64-13.90% (Supplementary Table 3). In IMAS panel, 38 haplotype blocks were detected to be associated with NCLB resistance explaining phenotypic variance ranging from 1.71 to 11.50% (Supplementary Table 4). HTR analysis identified 17 common haplotypes having a significant effect (FDR ≤ 0.05) on the trait in at least two different AM panels spread across seven chromosomes (1, 2, 4, 5, 8, 9 and 10), each consisting of 2-10 SNPs (Table 5; Fig. 4). The proportion of variance explained by these common haplotype blocks ranged from 1.71 to 9.42%. No haplotype was identified to have a significant effect on the trait in all three AM panels. CAAM and DTMA panels shared eight common haplotypes with significant effect on the trait, whereas six and three common haplotypes were identified between DTMA and IMAS panels and CAAM and IMAS panels, respectively that are significantly associated with NCLB disease.

Discussion
NCLB is an important foliar disease of maize in almost all temperate and tropical maize growing regions of the world. Resistance for NCLB in maize can be achieved through breeding using qualitative and quantitative resistance, either separately or together. However, resistance provided by qualitative/major genes becomes ineffective in the presence of virulent strains. Tropical environments show high pathogen abundance and high genetic diversity which leads to inflated disease severity, and hence the chances of breakdown of resistance are high. Compared to other grass crops like rice and wheat, majority of disease resistance deployed by maize breeders are quantitative in nature, and not qualitative 28 . It was also noted that the major genes influencing NCLB resistance have high environmental dependence with regard to light and temperature 29 , and act like partial/quantitative resistance in some environments. Resistance to NCLB is considered to be a mandatory trait in breeding successful maize Table 4. Highly significant Single nucleotide polymorphisms (SNPs) identified in GWAS analysis of CAAM, DTMA and IMAS association panels that were evaluated NCLB resistance. www.nature.com/scientificreports/ varieties across the tropics, and hence is an important breeding target. Therefore identifying, validating and deploying high value genomic regions for the trait will help in achieving enhanced genetic gains for the trait. Targeted molecular breeding for traits demand genetic mapping and molecular characterization of the functional genomic regions associated with the trait 30 . Association mapping utilizes the ancestral recombination events in a natural population to make marker-phenotype relations 31 . It has several advantages over linkage mapping such as, (1) existing population can be used rather than developing new bi-parental population for mapping.
The three association mapping panels used in this study represent most of the genetic diversity that is available across different geographies where CIMMYT breeding programs operate, and hence could be ideal resources for understanding the genetics of NCLB disease in Asia. The Trial means for NCLB scores of IMAS and DTMA panels were lesser compared to CAAM panel at Mandya, which is a location with high disease severity and where all the three panels were evaluated, indicating that higher levels of resistance is available in the African and Latin American CIMMYT germplasm, as compared to CIMMYT-Asia germplasm. One of the reasons could be that the DTMA and IMAS panel included lines predominantly adapted to Sub Saharan Africa (SSA), where large number of lines were evaluated for foliar diseases like GLS, NCLB and common rust by collaborators through a regional maize disease nursery project (REGNUR) 33 . However the CAAM panel lines were bred or/and selected for the Asian environments, and had a history of breeding for resistance to diseases like downy mildews 34,35 . The CAAM panel evaluated at three locations observed highest disease score mean at Kashmir, located at higher altitude in the northern boundary of India, which may be due to highly congenial environment for disease development owing to cool and humid weather, and probable presence of more virulent races of the pathogen at www.nature.com/scientificreports/ that location. For the phenotyping trials, artificial inoculation was conducted at all locations with the pathogen sources collected from respective locations. For principal component and kinship analysis, SNPs fulfilling the criteria of CR ≥ 0.9, MAF ≥ 0.1 and LD pruned at an r 2 threshold of 0.5 were used. LD-pruning was done to reduce the confounding effects due to large blocks of SNPs that have strong LD with each other 36 . There was only moderate structure observed in the three panels, with no clear differentiation of major adaptation groups, except in the IMAS panel. The CIMMYT maize germplasm was not found to have strong population structure in various earlier studies 25,26,37 . George et al. 38 observed that CIMMYT's tropical and sub-tropical lines in the Asian region possess significant genetic diversity that did not allow a clear distinction into separate clusters. Warburton et al. 39 observed that the CIMMYT pools and populations which served as the germplasm sources for derivation of many breeding lines in the tropical and sub-tropical adaptation groups had a large amount of diversity within, than between source populations. This heterogeneous nature of the CIMMYT populations was suggested to be responsible for the lack of a welldefined population structure in the germplasm. A rapid LD decay was observed in all the panels (0.9 kb at r 2 = 0.2 for CAAM, 1.75 kb at r 2 = 0.2 for DTMA and 0.99 kb at r 2 = 0.2 for IMAS panel). Lu et al. 40 found that the LD decay distance in temperate maize germplasm (10-100 kb) was 2 to 10 times higher than that of tropical maize germplasm (5-10 kb). Our results were more similar to the finding by Romay et al. 41 that LD decays much more rapidly in the tropical germplasm to about 1 kb at r 2 = 0.2. The higher LD decay in tropical germplasm suggests the more diverse genetic base that resulted from the historic recombination events and might have more rare alleles than temperate germplasm 42 . The LD decay was different for the 10 chromosomes in all panels, with the slowest decay observed in chromosome 8 consistently in all the panels studied. This was also observed by Suwarno et al. 43 in a Carotenoid association mapping panel comprising of tropical, sub-tropical and a small proportion of temperate lines. Pace et al. 44 have also observed this pattern in a sub-set of AMES panel, which is predominantly a temperate maize lines panel. This is an interesting observation and will require further probing to understand the reasons behind the slower LD decay in chromosome 8 and its implications in molecular breeding applicability in terms of traits like resistance to NCLB, with genomic regions conferred by genes/QTL located on Chromosome 8.
The single locus mixed linear model was used after correcting for population structure and familial relationships (kinship), for conducting GWAS in all the panels to reduce the genomic inflation. Highly significant SNPs associated with NCLB resistance were selected based on the significance threshold corrected for multiple testing corrections, taking average extent of genome-wide LD into consideration 45 . A Total of 22 SNPs significantly associated with NCLB resistance were identified on chromosomes 1, 6, 7, 8 and 10. The most significant association in the CAAM panel was with SNP S7_165196774 (P value 5.27 × 10 -7 ), at 165. 19 Mb, in the bin 7.04 (www. maize gdb.org, Maize B73 RefGen_V2), co-located within the physical interval of markers flanking the major gene Ht3 21 . Ht3 has been introgressed from Tripsacum floridanum into maize 46 and Ht3 gene provides resistance against the S. turcica races 0, 1, 2, N, 12 and 2N 47 , by inhibiting the extension of chlorotic spots and decreasing the production of spores by the pathogen 46 . S. turcica races 0, 1 and 2 are prominent in the Asian countries like China and India 48 , and Ht3 could be effective against these races. Considering the most significant SNP identified could possibly be in LD with Ht3, owing to its physical location, it provides a strong lead to follow up for future NCLB resistance mapping and deployment efforts. A glutathione S. transferase (GST) gene, belonging to a plantspecific clade implicated in defence has also been identified in bin 7.04 that confers multiple disease resistance to NCLB, SLB and GLS 49 . In the DTMA panel, two highly significant SNPs (S7_110282525, S7_110282502), closely located at 110 Mb (bin 7.02) on chromosome 7, were found to be associated with NCLB resistance. Van Inghelandt et al. 21 identified SNPs on bin 7.02 associated with NCLB resistance in a GWAS of 1487 maize inbred lines representing elite European and North American germplasm. Similarly, another significantly associated SNP identified in the DTMA panel (S7_131034143) was located on chromosomal bin 7.03. Dingerdissen et al. 50 identified a QTL in this chromosomal bin for area under disease progression curve (AUDPC) trait in F 2:3 lines derived from Mo17 and B52 at Embu, Kitale and Muguga, Kenya against the races 0 and N prevalent in Kenya.
In the IMAS panel, eight closely located SNPs, located at 157 Mb on bin 8.06 of chromosome 8, were identified to be the most significant association to NCLB resistance. In the maize genome, chromosome 8 (bin 8.05-8.06) is known to harbour genes for various defence pathways, and could be considered as one of the "complex, important and interesting" genomic region in terms of maize disease resistance, and NCLB resistance in particular 3 . It is considered as an important genomic region for many dQTLs and major genes like Ht2 and Htn1 for NCLB resistance 28 . Though there are apparent differences in the definition of qualitative and quantitative resistance, sometimes, pure qualitative and quantitative resistance are considered to be two ends of the same continuum and most resistance genes exist between the two extremes 51 . Chung et al. 3 fine mapped a major QTL explaining a large proportion (14-62%) of phenotypic variance in NCLB resistance for the race 0 and 1 on bin 8.06, and it was described as either identical or allelic or closely linked and functionally similar to the major gene Ht2, which is partially dominant and is effective against the 0, 1, 3 and N races. Htn1 gene is also present in this genomic region, which is known to delay the lesion development up to four weeks after infection, reduce the number of lesion and delay the sporulation and found to be effective against most NCLB races 5 . Htn1 was cloned and found to be a wall associated receptor-like protein, and confer quantitative and partial resistance against NCLB 4 . Many other studies have also identified NCLB QTLs in these chromosomal bins. Poland et al. 6 identified a large effect QTL at 152.2 Mb on bin 8.06, segregating in multiple NAM families. Similarly Chen et al. 52 also identified a QTL in bin 8.06 for lesion width, while studying a RIL population. A major QTL was identified for AUDPC on chromosome 8 between the bins 8.05-8.06 in F 2:3 populations studied for NCLB resistance 53 . Recently, a study conducted on a nested near isogenic line library for resistance to NCLB, also identified NILs with introgressions across centromeric region of chromosome 8 (bin 8.05), which overlaps two major genes ht2 and htn 54 . The fact that one of our mapping panels also identified a strongly associated set of closely located SNPs in this important chromosomal bin, indicated possible presence of a quantitatively expressed major gene or dQTL in this region www.nature.com/scientificreports/ present in the genetic background of the maize lines of this particular panel. S. turcica races prevalent in the locations that have been used for phenotyping in the present study are not yet reported, but physiological race 1 of S. turcica of maize was reported in the adjoining areas 48 . Wang et al. 55 identified two minor QTLs on bin 8.03 associated with NCLB resistance in a RIL population. Our study also identified a group of closely linked three SNPs on bin 8.03 of chromosome 8 (S8_95422954, S8_95422964, S8_95422973) associated with NCLB resistance in the CAAM panel.
Haplotype regression analysis identified 17 haplotype blocks that are common across at least two panels among the three panels studied, and hence considered to be candidates for further studies towards NCLB resistance in Asian tropics. The use of haplotypes increase the phenotypic variance explained, and thus allows the identification of genomic regions responsible for controlling a large part of variation in the trait of interest 56 . The size of the haplotype block depends on the degree of LD present in the population 57 . Haplotype information can be beneficial when identifying marker phenotype associations and can offer advantages for the genetic dissection of loci underlying the complex trait 20 . Out of the 17 common haplotypes identified to be significant for NCLB resistance across different AM panels, eight haplotypes were shared between CAAM and DTMA panel, six were common between DTMA and IMAS panels and three haplotypes were shared between CAAM and IMAS panels. Haplotype Hap_1.1 was identified on chromosomal bin 1.06 in the CAAM and DTMA panels, and this bin is considered to be an important genomic region controlling resistance to multiple foliar diseases like NCLB, Stewart's wilt, GLS and SLB 58-60 . Jamann et al. 60 identified a receptor-like kinase gene, pan1, that underlie a QTL for NCLB in this region. Similarly, the physical co-ordinates of the SNPs forming the haplotype block Hap_9.1 identified in DTMA and IMAS panels fall within the confidence interval of qMdr 9.02 reported for multiple disease resistance to NCLB, GLS and SLB 61 . Another haplotype identified on chromosomal bin 9.03 (Hap_9.2) identified in CAAM and DTMA panels was found to be located in close physical proximity to two closely spaced SNPs at 99.41 Mb identified in the Iodent material in a GWAS study conducted by Van Inghelandt et al. 21 . QTLs for resistance to NCLB and for multiple disease resistance on chromosomal region 4.05 have been identified in various studies 11,59,62 , and our study also identified two haplotype blocks Hap_4.2 and Hap_4.3 in this chromosomal bin. Overall, it was found that several SNPs/haplotypes identified in this study are in close proximity to previously reported major genes and QTL clusters, but many novel genomic regions were also discovered that could be environment and germplasm-specific.
Some of the SNPs identified in this study were found to be located in annotated genes (B73 RefGen_V2) with functional domains implicated in defence mechanisms in crops like maize, rice, and Arabidopsis. Highly significant SNP S7_165196774, identified in the CAAM panel is located in the gene GRMZM2G116426, having functional domains of alpha/beta-Hydrolases (ABH) superfamily proteins. ABHs support a variety of unique catalytic functions for defence and hormone regulation 63 . ABH esterase regulates the response of salicylic acid in plants, which is a key hormone to plant immune responses 64 . Highly significant SNPs identified in the DTMA panel on chromosome 7 are located within GRMZM2G334165 gene coding for protein kinase superfamily. Protein kinases play a central role in signalling during pathogen recognition and the subsequent activation of plant defence mechanisms. The microbial (pathogen) elicitors, also known as pathogen-associated molecular patterns (PAMPs), are recognized by the membrane-localized pattern recognition receptors (PRRs) of plants 65 . Transmembrane receptor kinases are one of the PRRs which help in plant defence mechanism. Eight significantly associated SNPs on chromosome 8 in the IMAS panel were found to be located in GRMZM2G319130 gene putatively coding for regulator of chromosome condensation (RCC1) family protein. RCC1 proteins contain plant specific disease resistance, zinc finger, chromosome condensation (DZC) domain 66 , and RML3 gene implicated in resistance to Leptosphaeria mculans in Arabadopsis was found to have RCC1 domain. It was also found to be effective for broad spectrum resistance against several necrotrophic fungi. Two genes BQ081031 and BQ080005 encoding candidate regulators of RCC1 family protein were found to be down-regulated specifically in the resistant reaction following Phytophthera. sojae infection which causes stem and root rot in soybean 67 .

Conclusion
From three GWAS panels genotyped at high density, and phenotyped for NCLB disease under artificial disease pressure in multiple environments in India, 22 significant SNP associations were identified. Seventeen haplotypes were identified which were significantly associated with the trait across two or more panels studied. Several SNPs/haplotypes identified in this study were located within or in close proximity to major genes like Ht3, Ht2 and Htn1 and many previously reported dQTLs, and multiple foliar disease resistant QTL. These regions will be candidates for further validation studies and possible utilization in the breeding programs in Asia. Considerable differences were observed among different germplasm in terms of resistance to NCLB, and hence it is suggested to bring together diverse sources of resistance alleles to improve resistance to NCLB.

Materials and methods
Plant material. Three association mapping panels CAAM, DTMA and IMAS panels assembled by CIM-MYT, Global Maize Program were used to study genome wide association for NCLB resistance. The CAAM panel included 419 tropical/ sub-tropical lines from the different breeding programs of CIMMYT adapted to Asian ecologies. This diverse panel included the lines derived from the different source populations for drought, waterlogging, heat stress, acid soil tolerance and downy mildew resistant lines. The panel has early, medium and late maturing lines with predominantly yellow kernel color. This panel has been earlier studied for GWAS for traits like root traits under drought 68 and resistance for sorghum downy mildew 26  , that were replicated in each block. These trials were conducted during the rainy season as the conditions were more congenial for disease development. All entries were planted in 2 m row plot using a spacing of 0.75 m between rows and 0.20 m between plants in each row.
Artificial inoculation. S. turcica strains were isolated from previous year's diseased maize leaves. Infected leaves were cut into 5-10 mm small pieces, washed with 0.6% sodium hypochlorite for 1 min and rinsed with sterile distilled water for 3-4 times under aseptic conditions. Excess water was blot dried on sterile tissue paper and infected leaf pieces were placed on Petri plates carrying pure culture Potato Dextrose Agar (PDA). The plates werse incubated at 28 °C for 3-5 days, the growing hyphal tips were transferred to PDA allowed to grow for 8-10 days at 28 °C, conidia were isolated using single spore isolation method. Pure culture of S. turcica were maintained on PDA for further use. For artificial inoculation in the field experiments, mass multiplication of fungal culture was done on sterile sorghum grains. Approximately 200-250 g of sorghum grains were autoclaved in 500 ml conical flask, and on attaining the normal room temperature, the grains were inoculated with pure culture of S. turcica earlier grown on PDA. Flasks were incubated at 28 °C for 15-20 days until the grains were uniformly covered with fungal growth. The cultured grains were dried and ground into powder and stored in paper bags until use. Trials were inoculated by putting 1 g of ground sorghum powder into the whorl of 30 days old maize crop and the process repeated at 40 days to avoid any escapes. Soon after the inoculations, plain water was sprinkled by manual sprayer of 15 L capacity on all fungus inoculated plants. This increased the humidity and leaf wetness necessary for disease development, and thus better and more reliable phenotyping data.
Disease scoring. NCLB symptoms started developing after a week of artificial inoculation, however symptoms became distinguishable after reproductive growth of the plants. Disease rating in trials was recorded two times, first score was taken at 65-70 days of crop, and the second or final scoring was taken on 75th-80th day. NCLB rating was recorded using 1-5 scale 69  Phenotypic data analysis. A Mixed linear model was used for analysis of phenotypic data from alpha-lattice design where genotypes, environments, interaction between genotype with environment and interaction with replication and environment were considered as random effects.
where Y ijko is phenotypic performance of the ith genotype at the jth environment in the kth replication of the oth incomplete block, μ was an intercept term, gi was the genetic effect of the ith genotype, lj was the effect of the jth environment, r kj was the effect of the kth replication at the jth environment, b ojk was the effect of the oth incomplete block in the kth replication at the jth environment, and e ijko was the residual. For the CAAM panel and DTMA panel, best linear unbiased predictions (BLUPs) were estimated using Meta-R version 4.1 70 using anthesis date (AD) parameter as covariate because NCLB scores were significantly correlated to AD. In augmented design trials, BLUPs were estimated across years using linear model for repeated entries and linear model for entries in SAS. Linear model for repeated entry Y k 0 jl = µ + β j(l) + γ l + τ k 0 +(τ γ ) k 0 l + ε k 0 jl k o = 1, 2, …, q (repeated entries), j = 1, 2, …, b (blocks), l = 1, 2, …, l (locations) where β j(l) : is the effect of the jth block nested in lth location, γ l : is the effect of the lth location, τ k 0 : is the effect of the kth repeated entry, (τγ) k 0 l : is the effect of the interaction between the k 0 th entry and the lth location and the linear model for entries Y ijl = µ + β j(l) + γ l + τ i +(τ γ ) il + ε ijl . i = 1, 2, …, v (entries), j = 1, 2, …, b (blocks), l = 1, 2, …, l (locations). β j(l) : is the effect of the jth block nested in lth location, γ l : is the effect of the lth location, τ i : is the effect of the ith entry, (τγ) il : is the effect of the interaction between the ith entry and the lth location. Broad-sense heritability (H 2 ) www.nature.com/scientificreports/ of multi-location trials was estimated as H 2 = σ 2 g /( σ 2 g + σ 2 ge /e + σ 2 e /er), where σ 2 g , σ 2 ge and σ 2 e are the genotypic, genotype-by-environment interaction and error variance components, respectively, and e and r are the number of environments and number of replicates within each environment included in the analysis, respectively. Meta-R version 4.1 was also used in generating descriptive statistics and genetic correlations between the NCLB scores and anthesis date.
DNA isolation and genotyping. DNA of all maize lines constituting association mapping panels was isolated from leaf samples of 3-4 weeks old seedlings using the standardised procedure followed by CIMMYT 71 (CIM-MYT 2005). Panels were genotyped at Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA for Single nucleotide polymorphism (SNPs) using genotyping by sequencing method (GBS). The GBS libraries were constructed following the method of Elshire et al.ss 72 , and SNP calling was performed using TASSEL GBS pipeline 73 . Physical co-ordinates of all SNPs were derived from the maize reference genome version B73 AGPV2. The original partially imputed GBS SNP data had 955,690 genotypic data points (SNPs) across all the chromosomes of approximately 22,000 maize lines publicly available through Panzea database (www.panze a.org). For GWAS, filtration criteria of call rate (CR) ≥ 0.7 and minor allele frequency (MAF) ≥ 0.05 were used in all panels, yielding 293,606, 297,437 and 309,608 SNPs for CAAM, DTMA and IMAS panels, respectively. For estimating PCA and kinship matrix, high quality SNPs with filtering criteria of CR ≥ 0.9, MAF ≥ 0.1, and pruned at an r 2 threshold of ≤ 0.5 were used for selecting 64,344 SNPs for CAAM, 69,254 for DTMA and 69,286 for IMAS panel.
Principal component, kinship and genome wide linkage disequilibrium analysis. The PCA method described by Price et al. 74 was conducted in all panels using SNP & Variation Suite (SVS) Version_8.6.0 (SVS, Golden Helix, Inc., Bozeman, MT, www. goldenhelix.com). The first three principal components were used to project the possible population stratification among the samples using 3D plot. A kinship matrix was computed from identity-by-state (IBS) distance matrix 75 as executed in SVS Version_8.6.0. IBSdistance = No.ofmarkersIBS2+(0.5XNo.ofmarkersinIBS1) Numberofnon−missingmarkers . Genome-wide LD was estimated for adjacent high quality SNPs with filtering criteria of CR ≥ 0.9, MAF ≥ 0.1 for CAAM (126,120 SNPs), DTMA (148,013 SNPs) and IMAS (139,061 SNPs) panels respectively, as adjacentpairwise r 2 values (the squared allele frequency correlations, among alleles at two adjacent SNP markers). For estimation of LD decay across the genome, r 2 values between SNPs were plotted against the physical distances between the SNPs 76 . LD decay plot using non-linear model was plotted in R using 'nlin' function 77 . Average pairwise distances in which LD decayed at r 2 = 0.2 and r 2 = 0.1 were then estimated based on the model given by Hill & Weir 78 .
GWAS and haplotype regression. GWAS was carried out on AD adjusted BLUPs for NCLB resistance employing three methodologies: uncorrected genotypic data only (G-test or naïve model), genotypic data corrected for structure (Q) using 10 principle components (G + Q; general linear model (GLM)) and genotypic data corrected for both structure and kinship (K) (G + Q + K; Single locus mixed linear model (MLM)). G-test and GLM used association test with additive model and MLM used mixed model single locus (EMMAX) 79 as executed in SVS Version 8.6.0. The mixed association mapping model used was Y = SNP*β + PC*α + K *μ + ε, where Y = response of the dependent variable (NCLB Score), SNP = SNP marker (fixed effects), PC = principal component coordinate from the PCA (fixed effects), K = kinship matrix (random effects), α is the vector of PC, β and μ are the vectors of SNP and K, respectively, and ε is the error. Manhattan plots were plotted using the − log 10 P values of all SNPs used in analysis; Q-Q plots were plotted of the observed − log 10 P values and the expected − log 10 P values to study the genomic inflation. Considering the genome-wide LD between SNPs, the effective number of independent markers was used to obtain the P value thresholds. The number of SNPs in linkage equilibrium with each other were estimated at an r 2 threshold of 0.1. A Bonferroni corrected P value threshold at α = 1 was used to compute the significant P value thresholds 45 for each panel.
SNPs within the bottom 0.1 percentile of the distribution in GWAS in each study panel were selected for haplotype detection and trait regression in all the three panels. Haplotype frequency estimation was done using the Expectation Maximisation (EM) algorithm with 50 EM iterations 80 , EM convergence tolerance of 0.0001 and a frequency threshold of 0.01. To minimise the historical recombination, haplotype blocks were detected based on the block defining algorithm 81 . Regression analysis was carried out with the haplotypes detected, based on step-wise regression of the NCLB BLUP estimates in all three panels separately with forward elimination at FDR-value cut off of 0.05. www.nature.com/scientificreports/