Seed protein content and its relationships with agronomic traits in pigeonpea is controlled by both main and epistatic effects QTLs

The genetic architecture of seed protein content (SPC) and its relationships to agronomic traits in pigeonpea is poorly understood. Accordingly, five F2 populations segregating for SPC and four agronomic traits (seed weight (SW), seed yield (SY), growth habit (GH) and days to first flowering (DFF)) were phenotyped and genotyped using genotyping-by-sequencing approach. Five high-density population-specific genetic maps were constructed with an average inter-marker distance of 1.6 to 3.5 cM, and subsequently, integrated into a consensus map with average marker spacing of 1.6 cM. Based on analysis of phenotyping data and genotyping data, 192 main effect QTLs (M-QTLs) with phenotypic variation explained (PVE) of 0.7 to 91.3% were detected for the five traits across the five populations. Major effect (PVE ≥ 10%) M-QTLs included 14 M-QTLs for SPC, 16 M-QTLs for SW, 17 M-QTLs for SY, 19 M-QTLs for GH and 24 M-QTLs for DFF. Also, 573 epistatic QTLs (E-QTLs) were detected with PVE ranging from 6.3 to 99.4% across traits and populations. Colocalization of M-QTLs and E-QTLs explained the genetic basis of the significant (P < 0.05) correlations of SPC with SW, SY, DFF and GH. The nature of genetic architecture of SPC and its relationship with agronomic traits suggest that genomics-assisted breeding targeting genome-wide variations would be effective for the simultaneous improvement of SPC and other important traits.

Protein deficiency affects the health of millions of children and their mothers, but protein-rich plant foods may offer solutions particularly in areas of the world where intake of animal protein is low 1 . One such crop is pigeonpea (Cajanus cajan (L.) Millsp), which serves as an important source of dietary protein to over one billion people globally 2 . It is widely cultivated in the tropics and semi-arid tropics of Asia and Africa. Pigeonpea maintains better yields than other legume crops under environmental extremes such as heat, drought and low soil fertility conditions 3,4 . These attributes position pigeonpea as the preffered crop for the resources-poor farmers in marginal environments 5 . However, increasing seed protein content (SPC) of pigeonpea is, therefore, an important contribution towards alleviating malnutrition among the poor. Improvement of SPC requires an understanding of its genetic architecture and how it relates to traits of agronomic importance.
Few studies have been reported on genetic control of SPC in pigeonpea with results suggesting quantitative inheritance 6,7 . However, the classical quantitative genetics approaches used in the reported studies are limited in power and resolution to dissect the genetic architecture of a quantitative trait like SPC. Similarly, information is limited on the genetic basis of the often positive or negative or no relationships of SPC with agronomic traits such as seed yield, seed weight, days to flowering, and growth habit in the crop 7,8 . Determining the genetic basis of trait correlations in pigeonpea is essential in designing breeding strategies that aim at improving and stabilizing SPC while maintaining yield and other desirable agronomic attributes. The available genomics, transcriptomics and proteomics resources in pigeonpea coupled with advances in high-throughput genotyping technologies provide opportunity to dissect the genetic architecture of several quantitative traits in the crop 2,[9][10][11][12][13][14][15] . However, genetic architecture of SPC in pigeonpea and the basis of its relationships with other traits of importance has remained

Results
Phenotypic variation in SPC and agronomic traits in five mapping populations. The mean value of SPC in low protein containing parents ranged from 19.3 to 21.5% and from 22.3 to 24.6% among high protein containing parents ( Table 1). The lowest SPC difference (0.8%) was observed between crossing parents of Pop2 and highest (3.1%) was in crossing parents of Pop5. In the case of F 2 s, SPC ranged from 5.8% in Pop5 to 10 Table 1. Population size, mean, variance, skewness, kurtosis, minimum and maximum values, and w-test for seed protein content (SPC), days to first flowering (DFF), 100-seed weight (SW) and seed yield (SY) in five F 2 mapping populations of pigeonpea. NS: not significantly different from a Gaussian distribution at P = 0.05; *, ** and ***: significantly different from a Gaussian distribution at 0.05, 0.01 and 0.001 probability levels, respectively. P 1 : parent 1, P 2 : parent 2, |P 1 -P 2 |: absolute difference in trait value between two parents of a cross, S: skewness, K: kurtosis.
Phenotypic correlation among traits. Correlations among traits are presented in Table 2. Correlations were negative between SPC and DFF in all mapping populations but significant (P ≤ 0.05) in only two populations (Pop1 and Pop3). Similarly, correlations between SPC and SY were negative and significant in all populations except in Pop4. In contrast, significant positive correlations were noted between SPC and GH in three of the five populations. While correlations between SPC and SW were positive in all populations except Pop3, although only significant in two populations (Pop1 and Pop2). Correlations between agronomic traits were generally negative and significant for DFF × GH and SY × GH, positive and nonsignificant for SW × GH and DFF × SY, and negative and nonsignificant for SY × SW.
Sequence data and SNP discovery. In 12 ), respectively, which segregated in 1:2:1 F 2 genotypic ratio at a χ 2 cutoff P ≥ 10 −9 were retained for genetic mapping (Table 4). Owing to high distortion from the expected F 2 segregation ratio, SNPs segregating in a 1:2:1 ratio at P > 0.05 were used as anchor markers for initial genetic map Consensus genetic map. All markers used in the construction of the consensus map in the present study were SNPs. As a result, there was no discrepancy in marker names among the individual maps. Segregation data for 3,400 markers from five mapping populations were used to integrate the multiple genetic maps into a consensus map (Table 5). Among the markers, 2,386 were distinctive to particular mapping populations, 617 were common between two, 227 among three and 170 among four mapping populations. The common markers were used as anchor points for integration of the individual genetic maps. Most of the linkage groups for the population-specific genetic maps were integrated into the consensus genetic map. All common markers together led to the production of a consensus genetic map comprising 984 loci on 11 CcLGs covering a map distance of 1,609.5 cM with an average inter-marker distance of 1.6 cM ( Supplementary Fig. S7).
Collinearity between component and consensus genetic maps. All genetic maps were, to a large extent, collinear with the consensus map (   Pop5 (CcLG05, CcLG07and CcLG09) showed a reversal of marker order between component genetic map and consensus map as revealed by the negative correlation coefficients ("r"; Table 6). Similarly, CcLGs from Pop4 that contributed any markers to the consensus map displayed poor collinearity with the consensus map. Finally, genome-wide, there were 13 gaps larger than 10 cM (one each on CcLG02 and CcLG11, two each on CcLG05, CcLG09 and CcLG10, and three each on CcLG03 and CcLG07). Such gaps have been thought to result from recombination hotspots or regions that are identical-by-descent and thus lack of polymorphisms 24 . LG † N ML (cM) AID (cM) N "r" n "r" n "r" n "r" n "r"      Table S2; Fig. S1), indicating it contributed to the observed negative correlation between the two traits ( Table 2). Another M-QTL region region flanked by SNPs S3_14813065 and S3_14778845 also in Pop1 on CcLG03 had positive and negative additive effects on GH and DFF, respectively, and likely contributed to the observed negative correlation between the two traits.

Main effect QTLs for SPC and agronomic traits and their colocalization.
Similarly, M-QTL flanked by SNPs S3_22234078 and S3_19578263 on CcLG03 in Pop2 showed pleiotropic effect on SPC, SY, GH and DFF (Supplementary Table S2, Supplementary Fig. S2). The M-QTL region displayed positive additive effects on SPC and GH, but negative additive effects on DFF and SY, possibly explaining the high positive correlation between SPC and GH, and negative correlation of SPC with DFF and SY, respectively. In the same Pop2, SPC shared an M-QTL with SW on CcLG01 in a region fanked by SNPs S1_15372966 and S1_9033631 with positive additive effect on both traits thus possibly contributing to the positive correlation between the two traits ( Table 2). Another region in the same Pop2 on CcLG11 flanked by SNPs S11_21940736 and S11_18137395 conditioned SPC and SY having positive and negative additive effects, respectively, and likely resulted to the negative correlation between the two traits ( Table 2).
In Pop3, three M-QTL regions flanked by SNPs S3_28538775 and S3_22913898, S3_18154848 and S3_17193829, and S3_18154875 and S3_14813065 all on CcLG03 affected SPC, GH and DFF with additive effects being negative for SPC and GH, and positive in two and negative in one of the M-QTLs for DFF (Supplementary Tables S2, Supplementary Fig. S3) possibly contributing to the positive correlation between SPC and GH, and negative correlation between SPC and DFF, and between GH and DFF ( Table 2). Another M-QTL region in Pop3 on CcLG04 flanked by SNPs S4_3592410 and S4_2761907 had negative and positive additive effects on SY and GH, respectively, and likely contributed to the negative correlations between the two traits. A third M-QTL region in the same Pop3 flanked by SNPs S2_2989918 and S2_2144739 on CcLG02 conditioned both SPC and SY having positive and negative additive effects, respectively, and likely contributed to the negative though none significant correlation between the traits.
Similarly, a QTL region flanked by SNPs S1_1145802 and S1_11242012 on CcLG01 in Pop4 conditioned both SY and DFF with positive and negative additive effects, respectively (Supplementary Table S2, Fig. S4), and likely contributed to the negative though non-significant correlation between the traits (Table 2). Additionally, a QTL No. QTLs 6 (1) No. QTLs  www.nature.com/scientificreports www.nature.com/scientificreports/ on CcLG02, flanked by SNPs S2_11771536 and S2_10960200 conditioned SW and DFF with positive and negative additive effects, respectively, and likely contributed to the negative correlation between the two traits. A major M-QTL on CcLG10 flanked by SNPs S10_15140940 and S10_632618 influenced both SW and SY with positive additive effects on both traits, indicating it contributed to the positive correlation between the traits. There were two tight linkages (0.1 cM distance), one between M-QTLs for SPC and DFF, and another between SPC and SY both on CcLG11.
Neither CIM nor ICIM detected any overlap or tight linkage of M-QTLs in Pop5 between any of the measured traits (Supplementary Table S2, Fig. S5) although significant correlations were detected between SPC and SY, GH and DFF, SW and DFF, SY and DFF, and SY and GH (Table 2). Epistatic QTLs. To gain more insight into the complexity of the genetic control of SPC and its relationship with other traits, epistatic QTLs (E-QTLs) were mapped in each of the five F 2 populations using QTL Icimapping software v4.0 (http://www.isbreeding.net/software/?type=detail&id=14) ( detected in all of the five populations, however, eight SNPs were each found to flank at least one member of an E-QTL pair for GH and SW in two to three populations, while four SNPs each flanking at least one member of E-QTL pair for SY were detected in two populations. E-QTLs for SPC and DFF were highly population-specific without any commonly shared markers among populations.

E-QTLs shared among traits within and across populations. The number of E-QTL pairs shared
between SPC and the agronomic traits were variable depending on the population (Fig. 3). In Pop1, SPC shared E-QTLs with SW, SY and GH. In Pop2, SPC shared E-QTLs with SW, SY, DFF and GH, while in Pop3, SPC shared E-QTL markers with SW and GH. In Pop4, SPC shared two E-QTLs with SY, and one E-QTL with DFF. In Pop5, two E-QTLs for SPC were shared with SW, and one with SY. The five populations also had some E-QTL pairs in common but with varying effects on measured traits.

Discussion
To detect the QTLs conditioning SPC and its relationship with agronomic traits, we used parental lines with only moderate contrast in SPC (0.8 to 3.5%) between any pair parents of a cross. Wide segregation among the F 2 progenies of a cross beyond what is expected from parental values was observed in the F 2 populations, indicating transgressive segregation, a phenomenon commonly observed for SPC in other legumes such as soybean 26,27 and   www.nature.com/scientificreports www.nature.com/scientificreports/ pea 28,29 . Strong quantitative variations with transgression for days to flowering, SY and SW were also observed in the present study, consistent with reports of earlier studies in segregating populations of pigeonpea 10 .
Given that a 5-cM SNP spacing is considered sufficiently dense for optimized QTL detection power 30 , the SNP marker spacing in each of the five populations in the present study provides adequate power to detect a QTL. Marker segregation distortion was observed in all the five populations with similar proportion of markers showing deviation from expectation. Segregation distortion could have resulted from various factors such as residual heterozygosity, gametic or zygotic selections and genotyping errors 31 . It is a common phenomenon observed in both intra-and inter-specific crosses and has been reported in several crops including pigeonpea 9,12,13 and chickpea 32 . Although distorted markers have generally been discarded in earlier studies, evidence indicates that distorted markers can be potentially helpful in the detection of QTLs 33 . It has also been noted that discarding distorted markers could possibly remove substantial amounts of information and reduce genome coverage 34 . Thus, in the present study distorted markers segregating in 1:2:1 Mendelian ratio with χ 2 cutoff P ≥ 10 −9 were retained for genetic map construction. By integrating the five component genetic maps into a consensus genetic map, conserved marker orders were observed among the five genetic maps that could be attributed to use of relatively similar population size (137 to 179), same type of mapping populations (all F 2 s) and same type of marker system (GBS-derived SNPs) 9 . The constructed genetic maps were then used for QTL analysis to map genomic regions associated with SPC and four agronomic traits.
In pigeonpea, QTLs have been mapped for plant type and earliness including days to flowering and growth habit 10,12,35 and disease resistance 13 . However, such studies have lacked for SPC and it is only till recently that we developed gene-derived sequence-based markers using whole genome resequencing of pigeonpea parental lines 25 . Genomic regions associated with SPC and correlated traits offer opportunity to develop varieties with enhanced SPC and stable yield using genomics-assisted breeding approaches. In this context, analyses of QTLs for SPC and agronomic characters (SW, SY, DFF and GH) were conducted based on five populations. To ensure reliability of detected QTLs the present investigation used two methods, CIM and ICIM. ICIM also facilitated the detection of E-QTLs. Although both methods detected comparable number of M-QTLs across traits in the studied populations, CIM detected slightly more M-QTLs than ICIM, which agrees with results of an earlier study 36 . Three or more M-QTLs were detected by both programs while ICIM detected two or more E-QTLs for each trait in each of the studied populations. The involvement of several M-and E-QTLs for each of the measured traits explained observed variations in the traits and indicate quantitative inheritance. Colocalized M-QTLs as well as E-QTLs explained trait correlations in each of the populations studied.
The detection of two to three major and several modifier/minor effect M-QTLs for SPC spread on nearly all linkage groups of pigeonpea in each of the five studied populations is in agreement with results obtained in soybean 26,27 . The M-QTLs for SPC were highly population-specific although three CcLGs contained at least one major M-QTL in two to three of the five populations. The three CcLGs (CcLG02, CcLG03 and CcLG11) also contained M-QTLs with the highest PVE than M-QTLs on the other CcLGs suggesting their relative importance in harbouring genomic regions governing SPC in the pigeonpea. The localisation of the major M-QTLs on CcLG02 in three of the five populations in the present study could be supported by the detection on the same chromosome of some genes known for their functional role in seed storage protein accumulation such as NADH-GOGAT 25 , or for their location in the vicinity of QTL regions associated with variability of SPC in plants 37 . By projecting the population specific M-QTLs to the consensus genetic map, six consensus genomic regions, each comprising M-QTLs for SPC from two to four populations were generated. Such consensus regions may be targeted for further investigation in future studies. Across the five mapping populations, SPC increasing alleles were contributed by both the low and high trait parents. The majority of the trait increasing alleles from the low trait parent were minor except in Pop2 and Pop4 for which the respective low SPC parent contributed one major M-QTL each. Whereas it has been concluded that SPC in pigeonpea is conditioned by recessive oligo-genes 38 , it is apparent from our results that the trait is polygenic with a combination of gene actions conditioning its variation in the crop. This observation is in agreement with earlier conclusions that SPC is conditioned by both additive and non-additive genes in pigeonpea 39 . Predominance of non-additive types of gene action in the present study is also in agreement with earlier observations in pigeonpea 40 and other legumes [41][42][43] .
The detection of at least two genomic regions for DFF in each population is in agreement with reports of previous studies in pigeonpea 44  Most of the M-QTLs for DFF and GH tended to colocalize to the same genomic regions, flanked by the same SNP loci, which is in agreement with observations that genes conditioning flowering time have pleiotropic effect on GH 35,45,46 . The pleiotropic M-QTLs largely acted recessively in conditioning GH, consistent with observations in other crops 47 , but the same loci acted either additively, dominantly, partially dominantly or overdominantly on DFF which agrees with earlier reports on genetics of early flowering 39,[48][49][50] .
The detection of at least two and up to seven major and several minor M-QTLs to condition SW in the crop indicates quantitative inheritance for the trait, consistent with findings in other legume crops 28,[51][52][53][54] . A genomic region on CcLG01 consistently showed highest PVE in three of the five mapping populations suggesting a common major genomic loci segregating in a wide range of genetic backgrounds, similar to reports in soybean 51 .
With the exception of one major M-QTL on CcLG01, all other M-QTLs for SW showed population specificity. The diversity of QTL gene action observed for SW in this study mirrors earlier reports where recessiveness, dominance and overdominance have been reported to condition SW in plants [55][56][57] .
QTLs for SY (on plant basis) have been mapped in other crops 58,59 , but the present study is the first in pigeonpea. The detection of highly population-specific minor and major effect M-QTLs for SY on nearly all CcLGs points to a complex genetic architecture of the trait. Detection of large effect population-specific M-QTLs on CcLG10 in three of the five populations, and minor and major M-QTLs on CcLG01, CcLG02, CcLG03 and CcLG11 in three to four populations suggests the relative importance of the chromosomes in hosting genomic regions associated with the trait.
The pervasiveness of population-specific M-QTLs for SPC, SY, SW and to a lesser extent for DFF could be attributed to effects of population size or marker coverage 20 . However, this is unlikely because population-specific M-QTLs of relatively minor effects ranging from 0.7% to 8.6% across traits were mapped in the five populations. Rather, it is possible that a QTL detected in a certain cross may not be detected in another cross because the parents of the second cross carry identical alleles at the same locus 17,20 .
E-QTLs were detected that explained additional phenotypic variation for SPC and the other traits. Effects of E-QTLs have been reported in other legume crops such as soybean for SPC 26,27,60 and SW 60,61 . Similarly, E-QTLs for SW, SY, flowering time and GH have been reported in common bean 62 . The large number of E-QTLs for SPC and for the agronomic traits identified in present study indicates that QTLs with minor effects or no effect interact with each other to influence expression of the traits. Such scenarios have been reported in other crop plants 60,62 . Uniquely, Pop2 displayed the highest contribution of E-QTL effects on phenotypic variation of all measured traits in the present study. It is likely that the relatively low marker density in Pop2 contributed to the high PVE of the E-QTLs in this population. Across populations, the number of E-QTLs detected also varied by trait. The pattern of contributions of M-QTLs vs E-QTLs to phenotypic variation for the studied traits seemed to be highly genetic background-dependent as has been frequently reported in other crops 62-65. In an earlier study, we reported a number of putative candidate gene-based nsSNP for SPC, some of which significantly cosegregated with the trait in an F 2 validation population, ICP 5529 × ICP 11605 25 . Similarly, Saxena et al. 12 developed a CcFTL1 gene-based Indel marker whose cosegregation with the determinate GH locus Dt1 was validated in the same F 2 population. The same F 2 ICP 5529 × ICP 11605 population is one of the populations used in the present study. Interestingly, a number of the gene-derived markers also flanked M-QTLs and/ or E-QTLs with significant effect on the agronomic traits. For example, the detection of a major M-QTL for GH (qGH-icim-4.1; PVE = 13.1%) on CcLG04 with one of the flanking markers being a 2PGK gene-derived nsSNP likely indicates the role of the gene on GH in the crop. A further evidence for the influence of the 2PGK gene on GH is that qGH-icim-4.1 epistatically interacted with another major M-QTL for GH (qGH-icim-3.2; PVE = 61.6%) on CcLG03 resulting in an E-QTL with a much higher PVE of 73.8% than that of the individual M-QTLs. One of the flanking markers to qGH-icim-3.2 is the CcTFL1 gene-derived Indel marker 12 . The involvement of the 2PGK gene-derived nsSNP in another epistatically acting QTL to influence SW also agrees with colocalization of 2PGK gene with a QTL for SW in pea 28 . One more gene of interest from our results is Sucrose synthase (Sus6) from which a derived nsSNP flanked an epistatically acting QTL on CcLG01 to influence SW with the resultant E-QTL having a major effect (11.5%). Sucrose synthase has for long been known to play a major role in SW in several crop plants as mentioned in Turner et al. 66 . The role of Sus6 as one of the possible determinants of SW in our study is further indicated by location of the derived nsSNP in the vicinity of Consensus-SW-QTL 1 comprising a major SW M-QTL from Pop2 and a minor SW M-QTL from Pop3 on CcLG01 of the consensus genetic map. The same nsSNP was only 3.1 cM away from a major M-QTL on CcLG01 in Pop5.
In this study, two lines of evidence revealed the associations between SPC and the other plant traits, and that the nature of the associations is genetic background-dependent. First, the phenotypic correlation analysis showed that SPC associates positively with GH and SW and negatively with DFF and SY. The pattern of correlation of SPC with SW is consistent with results of earlier studies which showed that the two traits associate either positively or negatively and sometimes non-significantly depending on genetic material used 6 . In the case of SPC with DFF, negative though small and none significant relationships have been reported in pigeonpea 8,67 . The negative and relatively weak correlation between SPC and SY in the present study is consistent and within the range previously reported in pigeonpea 8,67,68 , and soybean 69,70 . No relationship between SPC and GH has been reported in pigeonpea before. However, the indeterminate and determinate GH in soybean have been reported to be associated with high and low SPC, respectively 71 . Significant correlation of SPC with morphological and growth-related traits have also been reported in pea 28 .
Second, colocalization of M-QTLs and shared E-QTLs for SPC with that of the other traits were found that possibly explains trait correlations. For instance, the colocalization of M-QTLs for SPC with M-QTLs for DFF with opposite allelic effects could explain the negative correlations between SPC and DFF in Pop1, Pop2, Pop3 and Pop4 though the correlations were non-significant in Pop2 and Pop4. Similarly, the colocalization of M-QTLs for SPC and M-QTLs for GH with allelic effects in the same direction in Pop1, Pop2 and Pop3 explains positive correlation between the two traits. Likewise, correlation of SPC with SW in Pop2 could be explained by the overlapping M-QTLs on CcLG02 with allelic effects in the same direction. While the negative correlation of SPC with SY could be attributed to opposing effect of colocalized M-QTLs for the two traits such as in Pop2.
However, not all correlations of SPC with agronomic traits could be explained by colocalization of M-QTLs, for instance, GH and SY showed relatively strong correlation with SPC in Pop4 but no M-QTL overlaps were present. Therefore, presence of E-QTLs shared between SPC and the agronomic traits were searched that could explain correlations that are not explained by the M-QTLs. The phenomenon where one E-QTL affects expression of more than one trait have been termed 'epistatic pleiotropy' 72 . In this regard, the majority of epistatic pleiotropy involving SPC and other traits in the present study are the type in which the effects of a given pleiotropic locus are dependent upon the alleles present at the other loci 73 . For example, in Pop1 a QTL on CcLG01 flanked by markers S1_4757043 and S1_1575466, affected (i) SPC when it interacted with other QTLs on CcLG07 and CcLG08, (ii) SW when it interacted with QTLs on CcLG02 and CcLG06, and (iii) SY when it interacted with a QTL on CcLG03.
Similarly, a single epistatically pleiotropic QTL (EP-QTL) on CcLG01 (S1_887236 and S1_3399209) in Pop3 influenced the expression of SPC, SW and GH when it interacted with other QTLs on CcLG02 and CcLG03 and possibly contributed to the significant covariation between SPC and SW, and SPC and GH. Such EP-QTLs involving SPC were widespread among populations, and in some cases provided the only explanation to phenotypic (2020) 10:214 | https://doi.org/10.1038/s41598-019-56903-z www.nature.com/scientificreports www.nature.com/scientificreports/ correlation between SPC and the other traits. For instance, the significant correlation between SPC and SY in Pop4 in the absence of overlaps in their M-QTLs could be explained by EP-QTL on CcLG07 flanked by markers S7_14683829 and S7_14588865. The same EP-QTL also influenced expression of SW and DFF although the two traits show weak and non-significant correlation with SPC. In Pop5, three EP-QTLs were detected, two of which influenced SPC and SY, and one influenced SPC and SW even though no significant relationships of SPC with SW and SY were found. Tuberosa et al. 74 noted that the occurrence of QTL colocalization for multiple traits that possibly share a common morpho-physiological basis, or that are reasonably associated on a cause-effect basis, should lower the chance of declaring false positives in the regions where QTLs overlap.
In conclusion, two to three major M-QTLs in the presence of several modifier/minor effect QTLs, and with additive and non-additive QTL gene action types including epistasis, control the expression of SPC in the present study. Overlaps of main effect and E-QTLs explain the correlations between SPC and agronomic traits. Projection of M-QTLs for SPC and agronomic traits onto the consensus map revealed common genomic regions governing SPC and its relationship with agronomic traits across different genetic backgrounds. Among the genomic regions, QTL Cluster 5 (CcLG03), QTL Cluster 10 (CcLG11) and QTL Cluster 9 (CcLG11), in order of increasing importance, harboured M-QTLs for two or more traits and therefore may be targeted for the simultaneous improvement of the associated characters. More trait-specific regions such as Consensus-PROT-QTL 1 and Consensus-PROT-QTL 2 (CcLG02), Consensus-PROT-QTL 4 (CcLG04), Consensus-PROT-QTL 6 (CcLG11), Consensus-SW-QTL 1 and Consensus-SW-QTL 2 (CcLG01), Consensus-DFF-QTL 1 (CcLG03) and Consensus-GH-QTL 1 (CcLG03) as well as the more population-specific SY M-QTLs with large PVEs on CcLG03 (Pop1 and Pop2), CcLG04 (Pop3), CcLG05 (Pop4), CcLG10 (Pop3 and Pop4) and CcLG11 (Pop2) could also be targeted for the improvement of the traits. The genomic regions identified in the present study would pave the path for early generation screening of large segregating populations or screening of germplasm resources and haplotype based breeding for identification of plants/genotypes carrying favourable alleles/haplotypes and minimizing the negative correlation effect of other traits on SPC. By this way high yielding lines with higher SPC could be developed with less resources and time. However, the large contribution of epistasis to the variation and correlation among the traits and the presence of a large number of population-specific M-QTLs for each of the traits, suggests that breeding approaches that target genome wide variations such as genomic selection 75 would be an alternative in achieving larger genetic gains for both SPC and yield in a shorter period. Further, the validation of the results in additional germplasm and under diverse environmental conditions may be necessary to determine the stability of the QTLs identified as well as facilitate detection of other loci.

Methods
Crossing parents and seed protein content. Six pigeonpea genotypes that included ICP 11605, ICP 8863, ICP 14209, HPL 24, ICP 5529 and ICPL 87119 were used in the present study. ICP 8863 was selected from landrace ICP 7626 (P-15-3-3) and it is widely cultivated in India. It is high yielding with 100-seed weight of ~9.5 g and matures in 150-160 days. It is resistant to fusarium wilt (FW) but susceptible to sterility mosaic (SM) virus 76 . ICP 8863 has moderate SPC of ~22.0%. ICP 11605 (ICPL 151) was selected from the cross ICP 6997 × Prabhat. It is a determinate cultivar, yielding ~1.03 t/ha with 100-seed weight of 10 g and matures in 120-130 days 77 and has a low SPC of ~20.9%. ICP 14209 is a landrace variety with moderate SPC (23.0%). ICPL 87119 was developed from the cross ICP 1-6-W3-Wl × C 11 and it is widely adapted and cultivated in India. It matures in 160-180 days, is high yielding and has resistance to FW and SM 78 . It is low in SPC (~19.3%). HPL 24 is an advanced breeding line derived from the cross of cultivar C. cajan cv Baigani × C. scarabaeoides previously reported to have ~30% SPC 6 . It is indeterminate and of medium maturity duration. ICP 5529 with pedigree P-4864-1, originated from India. It is indeterminate with medium maturity duration and with SPC indicated to be 27%.
Mapping populations, field experiments and phenotyping. In order to develop the mapping populations (F 2 ), five crosses were made: ICP 11605 × ICP 14209, ICP 8863 × ICP 11605, HPL 24 × ICP 11605, ICP 8863 × ICPL 87119 and ICP 5529 × ICP 11605. For brevity, the populations are hereafter referred to as Pop1, Pop2, Pop3, Pop4 and Pop5, respectively. One F 1 plant was selfed to generate F 2 seeds in each of the five populations. For trait evaluation, the parents and 350 to 400 F 2 seeds were sown under field conditions to ensure an adequate number of plants. Sowing was done in 4 m long rows spaced 75 cm apart and 30 cm plant to plant distance within a row. Plot sizes were two rows for each of the two parents and 25 to 28 rows in the F 2 . All cultural practices were carried out. At maturity individual pods from individual plants were carefully hand-harvested leaving out plants at the beginning and at the end of each row and those at the field borders to avoid border effects. Sun drying was done for one week before threshing and another one week after threshing to ensure uniform reduction in seed moisture content. Seed protein content was measured as described in Obala et al. 25,67 . Besides SPC, data were also recorded for SW in grams, SY in grams per plant, DFF, and GH scored as determinate or indeterminate.
DNA isolation and genotyping. Total genomic DNA (gDNA) from 188 F 2 plants and the parents from each of the five mapping populations were isolated and genotyped-by-sequencing as described in Saxena et al. 12,13 . Briefly, the sequence reads obtained from the Illumina HiSeq. 2500 platform were used for SNP identification and genotyping using GBS analysis pipeline implemented in TASSEL v4.020 (TASSEL-GBS) 79 . Firstly, the reads were sorted, separated according to the sample barcodes and trimmed to first 64 bases starting from the enzyme cut site. Reads containing 'N' within the first 64 bases and reads with >50% of low-quality base pairs (Phred <5%) were discarded. The filtered, high-quality reads from each sample were aligned to the pigeonpea draft genome sequence (C. cajan v1.0) 2 using Bowtie 2 sequence alignment software. The alignment file was processed through TASSEL-GBS pipeline for SNP calling and genotyping. The quality of SNPs called in each F 2 individual was compared with the SNPs identified in parental lines. The parental line SNPs were obtained from whole-genome resequencing (WGRS) data 80 . SNPs having confident parental calls were considered for further analysis. SNPs and (2020) 10:214 | https://doi.org/10.1038/s41598-019-56903-z www.nature.com/scientificreports www.nature.com/scientificreports/ F 2 individuals having more than 30% and 70% missing data, respectively, were filtered out. The quality SNP data was used for construction of genetic maps and QTL analysis.

Construction of population-specific genetic maps. Four of the five population-specific genetic maps
were constructed in the present study while the remaining one population-specific map was constructed under a separate project 12 . The construction of all five population-specific genetic maps followed the same procedure as described in Saxena et al. 12,13 . Construction of consensus genetic map. Genotyping data from the five F 2 genetic maps were used to develop a consensus genetic map using JoinMap v4.1 following the procedure described by Bohra et al. 9 . To assess the level of correspondence in the order of markers between consensus and component genetic maps, correlation coefficients (r) were calculated from marker positions in consensus and individual genetic maps and their significance were tested. To further visualize the extent of correlation between consensus and component maps, scatter plots were generated between each of the consensus linkage group and corresponding component linkage group from all the populations. A comparative mapping programme CMap v1.01 81 was used to align all developed genetic maps together to visually assess the congruency of marker orders. QTL mapping. Composite interval mapping (CIM) implemented in Windows QTL Cartographer v2.5 82 and inclusive composite interval mapping (ICIM) implemented in QTL Icimapping v4.0 83 were used to detect main effect QTLs (M-QTLs) while epistatic QTLs (E-QTLs) were detected using ICIM. The advantage of both CIM and ICIM is that they are regression-based and are therefore robust against non-Gaussian trait distribution 84 . For CIM, the Standard Model 6, walk speed of 1.0 cM, and forward-backward stepwise regression for setting number of marker cofactors for background control were used to identify M-QTLs. To leave out signals within 10.0 cM distance on either side of the flanking markers or QTL test site, a window size of 10 cM was used. Thresholds for declaring QTLs were determined by 1000 permutations at significance of 0.05.
In using ICIM to detect M-QTLs, marker selection was performed just once using stepwise regression and considering all marker information simultaneously 85 . Phenotypic values were then adjusted by all markers retained in the regression equation, except the two markers flanking the current mapping interval. Permutation tests were conducted using SPC in the five F 2 mapping populations to determine the criteria for model selection in the first step of ICIM. For all five F 2 populations, the probability of a marker moving into the model corresponding to the overall type I error α = 0.05 was approximately 10 −5 . The probability of a marker moving out of the model was set at twice the probability of a marker moving into the model. The LOD threshold to declare the existence of a QTL was calculated by permutation tests as well. However, because of the always conservative nature of thresholds retained from permutation tests 86 , a default LOD threshold of 2.5 was used to report QTLs and determine common (consensus) QTLs across populations. Furthermore, where M-QTL identified by CIM was also detected by ICIM, the region was considered as one QTL. Similarly, where an M-QTL for a given trait identified by either CIM or ICIM colocalize with M-QTL(s) of other traits detected by either of the two methods, the region was treated as a region of co-localisation. Type of gene action for each M-QTL was derived from the dominance coefficient (h) defined as the ratio between the observed QTL dominance effect (d) and absolute value of QTL additive effects (|a|) 87 . We used the absolute value of additive effects because the sign of a QTL effect only shows which parent contributed the favorable allele but not the true direction of the specific additive effect 87 88 .
For E-QTL mapping, all possible pairs of scanning positions were tested by ICIM, since digenic interactions may be detected regardless of whether the two interacting QTLs have significant additive effects or not 85 . The probability of a marker moving into the model was set at 10 −6 while the probability of a marker moving out of the model was set at twice the probability of a marker moving into the model 85 . The default QTL-Icimapping LOD threshold of 5.0 was used to declare the existence of E-QTLs.
Common or consensus QTLs across five F 2 populations. Due to differences in the individual genetic maps, it was difficult to directly find common QTLs across the five populations on the basis of the QTL or marker position in each genetic map. Therefore, QTLs obtained in each of the five individual populations were projected onto the consensus map by using either QTL peak-or flanking-marker positions indicated in the individual population genetic map using a procedure adopted from Schweizer and Stein 89 as follows. If only peak-marker positions from the individual map were available, the QTL region was assumed by default to extend 5 cM left and right of the peak-marker position, resulting in a confidence interval of 10 cM. If only one flanking marker could be projected onto the consensus map, a QTL interval of 10 cM extension left or right from the left or right flanking marker, respectively, was assumed by default. If neither peak nor flanking markers were included in the consensus map, nearby tightly linked markers (maximum of 5 cM from the peak or flanking markers) were searched on the consensus map. If no replacement markers could be identified within this distance, the QTL was excluded from the analysis. Based on these projections, two types of common QTLs were defined. Firstly, a 'Consensus QTL' was defined as any region of the consensus genetic map with overlapping M-QTL intervals for a particular trait from more than one population. Secondly, a region of consensus genetic map at which M-QTL interval for one trait overlaps with that of one or more of the other traits was considered a 'QTL Cluster' QTL nomenclature. For individual populations, a specific identifier was assigned to each QTL, whereby "q" stands for QTL, followed by a set of upper case letters indicating the trait, followed by linkage group (CcLG) name, then a hyphen, method of QTL detection, and lastly, the QTL number on that CcLG in ascending order. (2020) 10:214 | https://doi.org/10.1038/s41598-019-56903-z www.nature.com/scientificreports www.nature.com/scientificreports/ For example, the designation "qPROT-cim-3.1" stands for "QTL for SPC" detected using CIM on LG "CcLG03" and it is the first QTL for SPC on that CcLG. For QTLs projected onto the consensus genetic map, a prefix is added to the QTL name indicating the source population. For example "Pop1qPROT-cim-3.1" indicates a QTL for SPC from Pop1.