Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers

Journal name:
Nature Biotechnology
Year published:
Published online


Pigeonpea is an important legume food crop grown primarily by smallholder farmers in many semi-arid tropical regions of the world. We used the Illumina next-generation sequencing platform to generate 237.2 Gb of sequence, which along with Sanger-based bacterial artificial chromosome end sequences and a genetic map, we assembled into scaffolds representing 72.7% (605.78 Mb) of the 833.07 Mb pigeonpea genome. Genome analysis predicted 48,680 genes for pigeonpea and also showed the potential role that certain gene families, for example, drought tolerance–related genes, have played throughout the domestication of pigeonpea and the evolution of its ancestors. Although we found a few segmental duplication events, we did not observe the recent genome-wide duplication events observed in soybean. This reference genome sequence will facilitate the identification of the genetic basis of agronomically important traits, and accelerate the development of improved pigeonpea varieties that could improve food security in many developing countries.

At a glance


  1. Extensive synteny between the pigeonpea and soybean genomes.
    Figure 1: Extensive synteny between the pigeonpea and soybean genomes.

    Soybean pseudomolecules, labeled as Gm, are represented as green boxes. Numbers along each chromosome box are sequence length in megabases. Pigeonpea pseudomolecules, labeled as CcLG, are shown with each chromosome as a different color. Syntenic blocks were identified through reciprocal best matches between gene models and block identification using i-ADHoRe. Each line radiating from a pigeonpea pseudomolecule represents a gene match found in a block between soybean and pigeonpea.

  2. Microsynteny analysis between pigeonpea and soybean genomes.
    Figure 2: Microsynteny analysis between pigeonpea and soybean genomes.

    One chromosome arm of soybean chromosome 01S (south arm) and pigeonpea CcLG06 (indicated as a green circle in the whole-genome dot-plot in Supplementary Fig. 6) is shown here as a representation of microsynteny. Mapping of the pigeonpea transcriptome assembly contigs (TACs) of the pigeonpea transcriptome assembly (CcTA v2) onto both genomes (indicated by green lines) was used as a measure of conserved gene order. (a) The first part shows local rearrangements. (b) The later part indicates very good collinearity among genes in the two genomes.

  3. Distribution of gene families among five eudicot genomes (M. truncatula, soybean, L. japonicus, pigeonpea and grapevine).
    Figure 3: Distribution of gene families among five eudicot genomes (M. truncatula, soybean, L. japonicus, pigeonpea and grapevine).

    Homologous genes in pigeonpea, soybean, M. truncatula, L. japonicus and grapevine were clustered to gene families. The numbers of gene families are indicated for each species and species intersection.

Accession codes

Referenced accessions



  1. Cannon, S.B., May, G.D. & Jackson, S.A. Three sequenced legume genomes and many crop species: rich opportunities for translational genomics. Plant Physiol. 151, 970977 (2009).
  2. Vavilov, N.I. The origin, variation, immunity, and breeding of cultivated plants. Chron. Bot. 13, 1366 (1951).
  3. De, D.N. Pigeonpea. in Evolutionary Studies in World Crops: Diversity and Change in the Indian Subcontinent (ed., Hutchinson, J.). 7987 (Cambridge University Press, London, 1974).
  4. Royes, W.V. Pigeonpea. in Evolution of Crop Plants (ed., Sommonds, N.W.). 154156 (Longmans, London and New York, 1976).
  5. Mula, M.G. & Saxena, K.B. Lifting the Level of Awareness on Pigeonpea—a Global Perspective (International Crops Research Institute for the Semi-Arid Tropics, 2010).
  6. Latham, M.C. Human nutrition in the developing world. FAO Food and Nutrition Series No. 29 (UN Food and Agriculture Organization, 1997) left fencehttp://www.fao.org/DOCREP/W0073e/w0073e05.htmright fence.
  7. Bohra, A. et al. Harnessing the potential of crop wild relatives through genomics tools for pigeonpea improvement. J. Plant Biol. 37, 85100 (2010).
  8. Varshney, R.K. et al. Pigeonpea genomics initiative (PGI): an international effort to improve crop productivity of pigeonpea (Cajanus cajan L.). Mol. Breed. 26, 393408 (2010).
  9. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  10. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311317 (2010).
  11. Timmis, J.N. et al. Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat. Rev. Genet. 5, 123135 (2004).
  12. Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178183 (2010).
  13. Chan, A.P. et al. Draft genome sequence of the oilseed species Ricinus communis. Nat. Biotechnol. 28, 951956 (2010).
  14. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463467 (2007).
  15. Schnable, P.S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 11121115 (2009).
  16. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551556 (2009).
  17. Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).
  18. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 10611067 (2007).
  19. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 12751281 (2009).
  20. Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101108 (2011).
  21. Sato, S. et al. Genome structure of the legume, Lotus japonicus. DNA Res. 15, 227239 (2008).
  22. Tuskan, G.A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 15961604 (2006).
  23. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 4548 (2000).
  24. Zdobnov, E.M. & Apweiler, R. InterProScan–an integration platform for the signaturerecognition methods in InterPro. Bioinformatics 17, 847848 (2001).
  25. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 2730 (2000).
  26. Lavin, M., Herendeen, P.S. & Wojciechowski, M.F. Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary. Syst. Biol. 54, 575594 (2005).
  27. Simillion, C., Janssens, K., Sterck, L. & van de Peer, Y. i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles. Bioinformatics 24, 127128 (2008).
  28. Brcic-Kostic, K. Neutral mutation as the source of genetic variation in life history traits. Genet. Res. 86, 5363 (2005).
  29. Wilson, G.A. et al. Orphans as taxonomically restricted and ecologically important genes. Microbiology 151, 24992501 (2005).
  30. Schmid, K. & Aquadro, C. The evolutionary analysis of “orphans” from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159, 589598 (2001).
  31. Donoghue, M.T. et al. Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol. Biol. 11, 47 (2011).
  32. Guo, W.J., Li, P., Ling, J. & Ye, S.P. Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome. Comp. Funct. Genomics 2007, 21676 (2007).
  33. Taylor, J.S. & Raes, J. Duplication and divergence: the evolution of new genes and old ideas. Annu. Rev. Genet. 38, 615643 (2004).
  34. Dubey, A. et al. Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.). DNA Res. 18, 153164 (2011).
  35. Stokstad, E. The plant breeder and the pea. Science 316, 196197 (2007).
  36. Varshney, R. et al. Agricultural biotechnology for crop improvement in a variable climate: hope or hype? Trends Plant Sci. 16, 363371 (2011).
  37. Isokpehi, D.R. et al. Identification of drought-responsive universal stress proteins in Viridiplantae. Bioinform Biol. Insights 5, 4158 (2011).
  38. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573580 (1999).
  39. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462467 (2005).
  40. Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351i358 (2005).
  41. Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  42. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988995 (2004).
  43. Stanke, M. et al. Augustus: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435439 (2006).
  44. Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516522 (2000).
  45. Majoros, W.H., Pertea, M. & Salzberg, S.L. TigrScan and GlimmerHMM: two opensource ab initio eukaryotic gene-finders. Bioinformatics 20, 28782879 (2004).
  46. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 2529 (2000).
  47. Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955964 (1997).
  48. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
  49. Beckstette, M., Homann, R., Giegerich, R. & Kurtz, S. Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 7, 389 (2006).
  50. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 16391645 (2009).
  51. Enright, A.J., van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575 (2002).
  52. Thiel, T., Michalek, W., Varshney, R.K. & Graner, A. Exploiting EST databases for the development and characterization of gene derived SSR-markers in barley (Hordeumvulgare L.). Theor. Appl. Genet. 106, 411422 (2003).
  53. Rozen, S. & Skaletsky, H.J. Primer3 on the WWW for general users and for biologist programmers. in Bioinformatics Methods and Protocols: Methods in Molecular Biology (eds., Krawetz, S. & Misener, S.) 365386 (Humana, Totowa, 2000).

Download references

Author information


  1. International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India.

    • Rajeev K Varshney,
    • Rachit K Saxena,
    • Sarwar Azam,
    • Reetu Tuteja,
    • Hari D Upadhyaya,
    • Trushar Shah &
    • K B Saxena
  2. CGIAR Generation Challenge Programme (GCP), c/o CIMMYT, Mexico DF, Mexico.

    • Rajeev K Varshney
  3. Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China.

    • Wenbin Chen,
    • Guangyi Fan,
    • Bicheng Yang,
    • Gengyun Zhang,
    • Huanming Yang,
    • Jun Wang &
    • Xun Xu
  4. University of Georgia, Athens, Georgia, USA.

    • Yupeng Li,
    • Aiko Iwata &
    • Scott A Jackson
  5. National Center for Genome Resources (NCGR), Santa Fe, New Mexico, USA.

    • Arvind K Bharti,
    • Andrew D Farmer &
    • Gregory D May
  6. University of North Carolina, Charlotte, North Carolina, USA.

    • Jessica A Schlueter,
    • Adam M Whaley &
    • Jaime Sheridan
  7. National University of Ireland Galway (NUIG), Botany and Plant Science, Galway, Ireland.

    • Mark T A Donoghue,
    • Reetu Tuteja &
    • Charles Spillane
  8. University of California, Davis, California, USA.

    • R Varma Penmetsa &
    • Douglas R Cook
  9. Monsanto Company, Creve Coeur, Missouri, USA.

    • Wei Wu,
    • Shiaw-Pyng Yang &
    • Todd Michael
  10. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA.

    • W Richard McCombie
  11. Department of Biology, University of Copenhagen, Denmark.

    • Jun Wang
  12. BGI-Americas, Cambridge, Massachusetts, USA.

    • Xun Xu


R.K.V., W.C., R.K.S., G.F., R.V.P., H.D.U., K.B.S., W.R.McC., B.Y., G.Z., D.R.C., G.D.M., X.X., contributed to generation of genome sequence, transcriptome sequence and genetic mapping data; W.C., G.F., R.T., W.W., S.-P.Y., T.M., W.R.McC., G.Z., H.Y., J.W., X.X., worked on genome assembly; W.C., Y.L., A.K.B., R.K.S., S.A., A.D.F., H.Y., J.W., X.X., contributed to genome annotation and gene function; R.K.V., W.C., Y.L., A.K.B., R.K.S., J.A.S., J.S., A.I., M.T.A.D., A.M.W., A.D.F., J.S., R.T., T.S., C.S., D.R.C., G.D.M., X.X., S.A.J., worked on genome analysis and comparative genomics and R.K.V., together with S.A.J., D.R.C., C.S., W.C., A.K.B., R.K.S., S.A., J.A.S., wrote and finalized the manuscript. R.K.V. conceived and directed the project.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2 MB)

    Supplementary Tables 1–14,16,18,19 and Supplementary Figures 1–12

Excel files

  1. Supplementary Table 15 (2 MB)

    Primer sequences for the SSR markers

  2. Supplementary Table 17 (2 MB)

    SNP information across 12 pigeonpea genotypes

Additional data