Pigeonpea is an important legume food crop grown primarily by smallholder farmers in many semi-arid tropical regions of the world. We used the Illumina next-generation sequencing platform to generate 237.2 Gb of sequence, which along with Sanger-based bacterial artificial chromosome end sequences and a genetic map, we assembled into scaffolds representing 72.7% (605.78 Mb) of the 833.07 Mb pigeonpea genome. Genome analysis predicted 48,680 genes for pigeonpea and also showed the potential role that certain gene families, for example, drought tolerance–related genes, have played throughout the domestication of pigeonpea and the evolution of its ancestors. Although we found a few segmental duplication events, we did not observe the recent genome-wide duplication events observed in soybean. This reference genome sequence will facilitate the identification of the genetic basis of agronomically important traits, and accelerate the development of improved pigeonpea varieties that could improve food security in many developing countries.
At a glance
- Three sequenced legume genomes and many crop species: rich opportunities for translational genomics. Plant Physiol. 151, 970–977 (2009). , &
- The origin, variation, immunity, and breeding of cultivated plants. Chron. Bot. 13, 1–366 (1951).
- Pigeonpea. in Evolutionary Studies in World Crops: Diversity and Change in the Indian Subcontinent (ed., Hutchinson, J.). 79–87 (Cambridge University Press, London, 1974).
- Pigeonpea. in Evolution of Crop Plants (ed., Sommonds, N.W.). 154–156 (Longmans, London and New York, 1976).
- Lifting the Level of Awareness on Pigeonpea—a Global Perspective (International Crops Research Institute for the Semi-Arid Tropics, 2010). &
- Human nutrition in the developing world. FAO Food and Nutrition Series No. 29 (UN Food and Agriculture Organization, 1997) http://www.fao.org/DOCREP/W0073e/w0073e05.htm.
- Harnessing the potential of crop wild relatives through genomics tools for pigeonpea improvement. J. Plant Biol. 37, 85–100 (2010). et al.
- Pigeonpea genomics initiative (PGI): an international effort to improve crop productivity of pigeonpea (Cajanus cajan L.). Mol. Breed. 26, 393–408 (2010). et al.
- De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010). et al.
- The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010). et al.
- Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat. Rev. Genet. 5, 123–135 (2004). et al.
- Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010). et al.
- Draft genome sequence of the oilseed species Ricinus communis. Nat. Biotechnol. 28, 951–956 (2010). et al.
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007). et al.
- The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009). et al.
- The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009). et al.
- Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007). et al.
- CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007). , &
- The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 1275–1281 (2009). et al.
- The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011). et al.
- Genome structure of the legume, Lotus japonicus. DNA Res. 15, 227–239 (2008). et al.
- The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006). et al.
- The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000). &
- InterProScan–an integration platform for the signaturerecognition methods in InterPro. Bioinformatics 17, 847–848 (2001). &
- KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000). &
- Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary. Syst. Biol. 54, 575–594 (2005). , &
- i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles. Bioinformatics 24, 127–128 (2008). , , &
- Neutral mutation as the source of genetic variation in life history traits. Genet. Res. 86, 53–63 (2005).
- Orphans as taxonomically restricted and ecologically important genes. Microbiology 151, 2499–2501 (2005). et al.
- The evolutionary analysis of “orphans” from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics 159, 589–598 (2001). &
- Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol. Biol. 11, 47 (2011). et al.
- Significant comparative characteristics between orphan and nonorphan genes in the rice (Oryza sativa L.) genome. Comp. Funct. Genomics 2007, 21676 (2007). , , &
- Duplication and divergence: the evolution of new genes and old ideas. Annu. Rev. Genet. 38, 615–643 (2004). &
- Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.). DNA Res. 18, 153–164 (2011). et al.
- The plant breeder and the pea. Science 316, 196–197 (2007).
- Agricultural biotechnology for crop improvement in a variable climate: hope or hype? Trends Plant Sci. 16, 363–371 (2011). et al.
- Identification of drought-responsive universal stress proteins in Viridiplantae. Bioinform Biol. Insights 5, 41–58 (2011). et al.
- Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
- Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005). et al.
- De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005). , &
- BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
- GeneWise and Genomewise. Genome Res. 14, 988–995 (2004). , &
- Augustus: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, 435–439 (2006). et al.
- Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000). &
- TigrScan and GlimmerHMM: two opensource ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004). , &
- Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000). et al.
- tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997). &
- Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004). et al.
- Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 7, 389 (2006). , , &
- Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009). et al.
- An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575 (2002). , &
- Exploiting EST databases for the development and characterization of gene derived SSR-markers in barley (Hordeumvulgare L.). Theor. Appl. Genet. 106, 411–422 (2003). , , &
- Primer3 on the WWW for general users and for biologist programmers. in Bioinformatics Methods and Protocols: Methods in Molecular Biology (eds., Krawetz, S. & Misener, S.) 365–386 (Humana, Totowa, 2000). &