Article | Published:

Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits

Nature Genetics volume 49, pages 10821088 (2017) | Download Citation


Pigeonpea (Cajanus cajan), a tropical grain legume with low input requirements, is expected to continue to have an important role in supplying food and nutritional security in developing countries in Asia, Africa and the tropical Americas. From whole-genome resequencing of 292 Cajanus accessions encompassing breeding lines, landraces and wild species, we characterize genome-wide variation. On the basis of a scan for selective sweeps, we find several genomic regions that were likely targets of domestication and breeding. Using genome-wide association analysis, we identify associations between several candidate genes and agronomically important traits. Candidate genes for these traits in pigeonpea have sequence similarity to genes functionally characterized in other plants for flowering time control, seed development and pod dehiscence. Our findings will allow acceleration of genetic gains for key traits to improve yield and sustainability in pigeonpea.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


Primary accessions



  1. 1.

    in The Pigeonpea (eds. Nene, Y.L., Hall, S.D. & Sheilla, V.K.) 15–46 (C.A.B. International, 1990).

  2. 2.

    et al. Genetic diversity and demographic history of Cajanus spp. illustrated from genome-wide SNPs. PLoS One 9, e88568 (2014).

  3. 3.

    The origin, variation, immunity, and breeding of cultivated plants. Chron. Bot. 13, 1–366 (1951).

  4. 4.

    et al. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 30, 83–89 (2011).

  5. 5.

    , , & Legume genomics and breeding. Plant Breed. Rev. 33, 257–304 (2010).

  6. 6.

    et al. Phenotyping chickpeas and pigeonpeas for adaptation to drought. Front. Physiol. 3, 179 (2012).

  7. 7.

    & Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  8. 8.

    et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsisthaliana. Science 317, 338–342 (2007).

  9. 9.

    et al. Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat. Genet. 42, 1053–1059 (2010).

  10. 10.

    , & Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

  11. 11.

    , , & Estimation of individual admixture: analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005).

  12. 12.

    Cajanus DC and Atylosia W. & A. (Leguminosae) (Agricultural University Wageningen Papers) (Wageningen Universiteit Project, 1986).

  13. 13.

    et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414 (2015).

  14. 14.

    , , , & A SUPER powerful method for genome wide association study. PLoS One 9, e107684 (2014).

  15. 15.

    et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).

  16. 16.

    et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).

  17. 17.

    et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488–1491 (2007).

  18. 18.

    et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

  19. 19.

    et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466, 368–372 (2010).

  20. 20.

    et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).

  21. 21.

    et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat. Genet. 41, 1223–1227 (2009).

  22. 22.

    et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).

  23. 23.

    , , , & Impact of segmental chromosomal duplications on leaf size in the grandifolia-D mutants of Arabidopsisthaliana. Plant J. 60, 122–133 (2009).

  24. 24.

    , , , & A retrotransposon-mediated gene duplication underlies morphological variation of tomato fruit. Science 319, 1527–1530 (2008).

  25. 25.

    et al. Aluminum tolerance in maize is associated with higher MATE1 gene copy number. Proc. Natl. Acad. Sci. USA 110, 5241–5246 (2013).

  26. 26.

    et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 44, 803–807 (2012).

  27. 27.

    , & Structural variations in plant genomes. Brief. Funct. Genomics 13, 296–307 (2014).

  28. 28.

    et al. Pigeonpea composite collection and identification of germplasm for use in crop improvement programmes. Plant Genet. Resour. 9, 97–108 (2011).

  29. 29.

    & Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14, 840–852 (2013).

  30. 30.

    et al. A conserved molecular basis for photoperiod adaptation in two temperate legumes. Proc. Natl. Acad. Sci. USA 109, 21158–21163 (2012).

  31. 31.

    Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).

  32. 32.

    & Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).

  33. 33.

    & Managing and Enhancing the Use of Germplasm—Strategies and Methodologies (International Crops Research Institute for the Semi-Arid Tropics, 2009).

  34. 34.

    et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

  35. 35.

    et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30, 105–111 (2011).

  36. 36.

    et al. GAPIT: genome association and prediction integrated tool. Bioinformatics 28, 2397–2399 (2012).

  37. 37.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

Download references


The authors are thankful to the US Agency for International Development (USAID) for providing financial support to R.K.V. The authors would like to thank A. Gafoor, B. Poornima and P. Bajaj for their support in this work. This work has been undertaken as part of the CGIAR Research Program on Grain Legumes. ICRISAT is a member of the CGIAR Consortium.

Author information


  1. International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.

    • Rajeev K Varshney
    • , Rachit K Saxena
    • , Hari D Upadhyaya
    • , Aamir W Khan
    • , Abhishek Rathore
    •  & Vinay Kumar
  2. School of Agriculture and Environment and Institute of Agriculture, University of Western Australia, Crawley, Western Australia, Australia.

    • Rajeev K Varshney
  3. Shenzhen Millennium Genomics, Inc., Shenzhen, China.

    • Yue Yu
    • , Shaun An
    •  & Wei Zhang
  4. MACROGEN, Inc., Seoul, Republic of Korea.

    • Changhoon Kim
    • , Dongseon Kim
    • , Jihun Kim
    •  & Jong-So Kim
  5. Institute of Biotechnology, Professor Jayshankar Telangana State Agricultural University (PJTSAU), Hyderabad, India.

    • Ghanta Anuradha
    •  & Kalinati Narasimhan Yamini
  6. Agricultural Research Station (ARS)–Gulbarga, University of Agricultural Sciences (UAS), Karnataka, India.

    • Sonnappa Muniswamy
  7. Department of Plant Sciences, University of California–Davis, Davis, California, USA.

    • R Varma Penmetsa
  8. Biological Sciences and International Center for Tropical Botany, Florida International University, Miami, Florida, USA.

    • Eric von Wettberg
  9. Visva-Bharati, Shantiniketan, India.

    • Swapan K Datta


  1. Search for Rajeev K Varshney in:

  2. Search for Rachit K Saxena in:

  3. Search for Hari D Upadhyaya in:

  4. Search for Aamir W Khan in:

  5. Search for Yue Yu in:

  6. Search for Changhoon Kim in:

  7. Search for Abhishek Rathore in:

  8. Search for Dongseon Kim in:

  9. Search for Jihun Kim in:

  10. Search for Shaun An in:

  11. Search for Vinay Kumar in:

  12. Search for Ghanta Anuradha in:

  13. Search for Kalinati Narasimhan Yamini in:

  14. Search for Wei Zhang in:

  15. Search for Sonnappa Muniswamy in:

  16. Search for Jong-So Kim in:

  17. Search for R Varma Penmetsa in:

  18. Search for Eric von Wettberg in:

  19. Search for Swapan K Datta in:


R.K.V., R.K.S., Y.Y., C.K., D.K., J.K., S.A., V.K., J.-S.K. and W.Z. contributed to generation of whole-genome resequencing data. H.D.U. and R.K.V. contributed genetic material. H.D.U., G.A., K.N.Y. and S.M. performed phenotyping. R.K.V., R.K.S., H.D.U., A.W.K., C.K., A.R., D.K., J.K., S.A., J.-S.K., R.V.P., E.v.W. and S.K.D. worked on different analyses. R.K.V. and R.K.S. together with C.K., A.R., J.-S.K., R.V.P. and E.v.W. wrote and finalized the manuscript. R.K.V. and R.K.S. directed the project, and R.K.V. conceived and designed the study.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Rajeev K Varshney.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–12, Supplementary Tables 10–12 and 17, and Supplementary Note

Excel files

  1. 1.

    Supplementary Table 1

    Details on 300 Cajanus accessions (breeding lines, landraces and wild species accessions) including biological status, species, geographical region, country and state.

  2. 2.

    Supplementary Table 2

    Details on raw sequencing data generated on 292 Cajanus accessions.

  3. 3.

    Supplementary Table 3

    Identification and distribution of molecular variation (SNPs and indels) among 11 pseudomolecules CcLG01 to CcLG11 and unanchored genome sequence as CcLG0.

  4. 4.

    Supplementary Table 4

    Nonsynonymous-to-synonymous ratio in breeding lines, landraces and wild species in 1-Mb non-overlapping windows.

  5. 5.

    Supplementary Table 5

    Non-synonymous to synonymous ratio in breeding lines, landraces and wild species in 10 Kb non-overlapping windows

  6. 6.

    Supplementary Table 6

    19 genomic regions (R1 to R19) showing high (>2.5) nonsynonymous-to-synonymous ratio in breeding lines, landraces and wild species in 1-Mb non-overlapping windows.

  7. 7.

    Supplementary Table 7

    Structural variations (CNVs and PAVs) identified in breeding lines as compared to the reference genome.

  8. 8.

    Supplementary Table 8

    Structural variations (CNVs and PAVs) identified in landraces as compared to the reference genome.

  9. 9.

    Supplementary Table 9

    Structural variations (CNVs and PAVs) identified in wild species accessions as compared to the reference genome.

  10. 10.

    Supplementary Table 13

    ROD values calculated during domestication (wild species versus landraces) and breeding (landraces versus breeding lines) at 10-kb non-overlapping windows.

  11. 11.

    Supplementary Table 14

    FST values for ROD regions with maximum values calculated in a pairwise manner for landraces versus breeding lines and wild species versus landraces.

  12. 12.

    Supplementary Table 15

    Genes that played an important role in domestication of crop species and their homologs in Cajanus.

  13. 13.

    Supplementary Table 16

    Trait phenotyping data used for GWAS.

  14. 14.

    Supplementary Table 18

    MTAs identified for target traits with P values.

  15. 15.

    Supplementary Table 19

    Number of favorable alleles identified in Cajanus accessions for detected MTAs in each trait.

  16. 16.

    Supplementary Table 20

    The distribution of favorable alleles for 17 MTAs detected for 100-seed weight in Cajanus accessions.

  17. 17.

    Supplementary Table 21

    MTAs identified in GWAS for different target traits and their corresponding structural variations (CNVs and PAVs) in breeding lines, landraces and wild species accessions.

About this article

Publication history