Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments

Varshney, Rajeev K; Shi, Chengcheng; Thudi, Mahendar; Mariac, Cedric; Wallace, Jason; Qi, Peng; Zhang, He; Zhao, Yusheng; Wang, Xiyin; Rathore, Abhishek; Srivastava, Rakesh K; Chitikineni, Annapurna; Fan, Guangyi; Bajaj, Prasad; Punnuri, Somashekhar; Gupta, S K; Wang, Hao; Jiang, Yong; Couderc, Marie; Katta, Mohan A V S K; Paudel, Dev R; Mungra, K D; Chen, Wenbin; Harris-Shultz, Karen R; Garg, Vanika; Desai, Neetin; Doddamani, Dadakhalandar; Kane, Ndjido Ardo; Conner, Joann A; Ghatak, Arindam; Chaturvedi, Palak; Subramaniam, Sabarinath; Yadav, Om Parkash; Berthouly-Salazar, Cécile; Hamidou, Falalou; Wang, Jianping; Liang, Xinming; Clotault, Jérémy; Upadhyaya, Hari D; Cubry, Philippe; Rhoné, Bénédicte; Gueye, Mame Codou; Sunkar, Ramanjulu; Dupuy, Christian; Sparvoli, Francesca; Cheng, Shifeng; Mahala, R S; Singh, Bharat; Yadav, Rattan S; Lyons, Eric; Datta, Swapan K; Hash, C Tom; Devos, Katrien M; Buckler, Edward; Bennetzen, Jeffrey L; Paterson, Andrew H; Ozias-Akins, Peggy; Grando, Stefania; Wang, Jun; Mohapatra, Trilochan; Weckwerth, Wolfram; Reif, Jochen C; Liu, Xin; Vigouroux, Yves; Xu, Xun

doi:10.1038/nbt.3943

Download PDF

Article
Open access
Published: 18 September 2017

Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments

Rajeev K Varshney ORCID: orcid.org/0000-0002-4562-9131¹^na1,
Chengcheng Shi²^na1,
Mahendar Thudi¹,
Cedric Mariac³,
Jason Wallace⁴,
Peng Qi⁴,
He Zhang²,
Yusheng Zhao⁵,
Xiyin Wang⁴,
Abhishek Rathore ORCID: orcid.org/0000-0001-6887-4095¹,
Rakesh K Srivastava¹,
Annapurna Chitikineni¹,
Guangyi Fan²,
Prasad Bajaj¹,
Somashekhar Punnuri⁶,
S K Gupta¹,
Hao Wang⁷,
Yong Jiang ORCID: orcid.org/0000-0002-2824-677X⁵,
Marie Couderc³,
Mohan A V S K Katta¹,
Dev R Paudel ORCID: orcid.org/0000-0002-0739-4343⁸,
K D Mungra⁹,
Wenbin Chen²,
Karen R Harris-Shultz¹⁰,
Vanika Garg¹,
Neetin Desai^11,12,
Dadakhalandar Doddamani¹,
Ndjido Ardo Kane¹³,
Joann A Conner¹⁴,
Arindam Ghatak^11,15,
Palak Chaturvedi ORCID: orcid.org/0000-0002-5856-0348¹¹,
Sabarinath Subramaniam^16,17,
Om Parkash Yadav¹⁸,
Cécile Berthouly-Salazar^3,19,
Falalou Hamidou^20,21,
Jianping Wang ORCID: orcid.org/0000-0002-0259-1508⁸,
Xinming Liang²,
Jérémy Clotault^3,22,
Hari D Upadhyaya¹,
Philippe Cubry ORCID: orcid.org/0000-0003-1561-8949³,
Bénédicte Rhoné^3,23,
Mame Codou Gueye¹³,
Ramanjulu Sunkar²⁴,
Christian Dupuy²⁵,
Francesca Sparvoli ORCID: orcid.org/0000-0002-3304-7548²⁶,
Shifeng Cheng²,
R S Mahala²⁷,
Bharat Singh⁶,
Rattan S Yadav²⁸,
Eric Lyons¹⁶,
Swapan K Datta²⁹,
C Tom Hash ORCID: orcid.org/0000-0003-3138-9234²⁰,
Katrien M Devos⁴,
Edward Buckler ORCID: orcid.org/0000-0002-3100-371X^7,30,
Jeffrey L Bennetzen⁴,
Andrew H Paterson⁴,
Peggy Ozias-Akins¹⁴,
Stefania Grando¹,
Jun Wang²,
Trilochan Mohapatra³¹,
Wolfram Weckwerth^11,32,
Jochen C Reif ORCID: orcid.org/0000-0002-6742-265X⁵,
Xin Liu^2,33,
Yves Vigouroux^3,22 &
…
Xun Xu^2,33,34

Nature Biotechnology volume 35, pages 969–976 (2017)Cite this article

43k Accesses
292 Citations
332 Altmetric
Metrics details

Subjects

An Erratum to this article was published on 05 April 2018

This article has been updated

Abstract

Pearl millet [Cenchrus americanus (L.) Morrone] is a staple food for more than 90 million farmers in arid and semi-arid regions of sub-Saharan Africa, India and South Asia. We report the ∼1.79 Gb draft whole genome sequence of reference genotype Tift 23D₂B₁-P1-P5, which contains an estimated 38,579 genes. We highlight the substantial enrichment for wax biosynthesis genes, which may contribute to heat and drought tolerance in this crop. We resequenced and analyzed 994 pearl millet lines, enabling insights into population structure, genetic diversity and domestication. We use these resequencing data to establish marker trait associations for genomic selection, to define heterotic pools, and to predict hybrid performance. We believe that these resources should empower researchers and breeders to improve this important staple crop.

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Genetic gains underpinning a little-known strawberry Green Revolution

Article Open access 19 March 2024

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Main

Global temperatures are expected to increase from 1 to 6 °C by 2100, with serious consequences for agriculture¹. This means that climate-appropriate measures to ensure food security are a priority, especially as the human population is projected to reach 9.1 billion by 2050². Crops that are adapted to the predicted environmental changes have been proposed as one solution³. Even now, availability and further improvement of crops that can withstand climate change could reduce the hunger of the 805 million undernourished people living mainly in developing countries⁴.

Pearl millet (Pennisetum glaucum (L.) R. Br., syn. Cenchrus americanus (L.) Morrone), a C4 grass, is a highly cross-pollinated diploid (2n = 2x = 14) with excellent photosynthetic efficiency and biomass production potential. It is cultivated as a staple food grain and source of straw for fodder and fuel in arid and semi-arid regions of sub-Saharan Africa, India and South Asia. Climate-smart vegetative, reproductive, and physiological features of pearl millet make this crop well-suited to growth in harsh conditions including low soil fertility, high soil pH, high soil Al³⁺ saturation, low soil moisture, high temperature, high salinity and limited rainfall. Pearl millet reliably produces grain in regions that have a mean annual precipitation as low as 250 mm. In the same drought conditions maize (Zea mays), rice (Oryza sativa), sorghum (Sorghum bicolor), bread wheat (Triticum aestivum) and durum wheat (Triticum durum) are likely to fail⁵.

Pearl millet is cultivated on ∼27 million hectares worldwide and is the staple food for more than 90 million farmers living in poverty. Millet grain is highly nutritious, with 8–19% protein, low starch, high fiber (1.2 g/100 g)⁶, and higher micronutrient concentrations (iron and zinc) than rice, wheat, maize and sorghum⁷. Importantly, the potential of this crop to tolerate air temperatures >42 °C during the reproductive phase means that it can be cultivated using irrigation in the very hot summers of northwestern India⁸.

Despite the clear importance of pearl millet in agriculture, the production and productivity of this staple crop are very low, with an average grain yield of just 900 kg/ha. This is because pearl millet is mainly grown in dryland conditions, which are marginal production environments, and with minimal use of commercial inputs, such as, adequate irrigation, fertilizers and pesticides. Genetic gains, the rate of increase in yield over a given time period, during 1996–2013 in pearl millet have averaged around 24 kg of grain/ha/year in India, which has the highest millet productivity and production of the main pearl millet growing countries⁹. Pearl millet is vulnerable to several foliar diseases including downy mildew (caused by Sclerospora graminicola), Pyricularia leaf spot or blast (Pyricularia grisea), and rust (Puccinia substriata var. indica). Indeed, these pathogen infections can result in massive yield losses and reduced fodder quality. A limited range of genomics tools for pearl millet have impeded the ability of researchers and breeders to exploit methods for improvement, until now.

To accelerate pearl millet crop improvement, we sequenced the whole genome of reference genotype Tift 23D₂B₁-P1-P5. We also resequenced 994 pearl millet genotypes, including 963 inbred lines and single plants from each of 31 wild accessions, in order to understand the population structure, genetic diversity and domestication of this staple crop. We carried out a genome-wide association study (GWAS) to predict yield-associated traits in both irrigated and drought conditions. We also used genomic prediction to predict hybrid performance. These applications highlight the utility of our resequencing data set for accelerating breeding and enhancement of genetic gains in pearl millet.

Results

Genome assembly

To assemble the pearl millet genome, we used whole genome shotgun (WGS) and bacterial artificial chromosome (BAC) sequencing. Ten small inserts (of ∼170, 250, 500 and 800 bp), and 13 large inserts (of ∼2, 5, 10, 20 and 40 kb) WGS libraries were constructed using Tift 23D₂B₁-P1-P5¹⁰ genotype. These libraries were sequenced on the Illumina HiSeq 2000 and 520 Gb of sequence data, representing 296× genome coverage, were produced (Supplementary Table 1). Two BAC libraries, with an average insert size of ∼120 kb, were constructed from Tift 23D₂B₁-P1-P5 using EcoRI and HindIII. 972 Gb of sequence data were generated from 100,608 BAC clones at ∼80× genome coverage (Supplementary Table 2 and Supplementary Fig. 1). In brief, 1.49 Tb of sequence data, after stringent filtering and correction steps, were assembled into 1.58 Gb of contigs (sequences without gaps or Ns) and 1.82 Gb of scaffolds (contigs joined with estimated gaps filled in).

Based on k-mer statistics, the pearl millet genome size was estimated to be 1.76 Gb (Supplementary Fig. 2), indicating that ∼90% of the genome was assembled. Scaffolds longer than 1 kb totaled 1.79 Gb, with 50% of scaffolds (N50) being longer than 884.95 kb (N50 contig = 18,180 bp) and the largest scaffold spanning 4.82 Mb (Supplementary Table 3). To evaluate the assembly, we generated additional whole genome sequence data with 1× coverage on the PacBio platform. More than 90% of these long reads were mapped back to a scaffold with more than 90% similarity and 90% ratio of aligned length (Supplementary Fig. 3).

Linkage information from three biparental mapping populations, and collinearity with the genome of foxtail millet (Setaria italica)¹¹ were used to assemble genomic scaffolds into pseudomolecules. We assembled 1.56 Gb into seven pseudomolecules (Pg1 to Pg7, Fig. 1 and Supplementary Table 4). The average GC content of pearl millet (47.9%) is higher than that of foxtail millet (46.1%), sorghum (44.5%), barley (Hordeum vulgare, 44.4%), and rice (43.5%) (Supplementary Fig. 4). We assessed the variability in GC content in 10-kb non-overlapping sliding windows (Supplementary Fig. 5) to show that the observed GC content did not arise from sequencing-based GC bias. The GC content in whole genome coding sequence (CDS; 54.76%) and in 384 expanded gene families (53.14%) was examined as well; it was at a similar proportion to the total genome, providing confidence in this result (Supplementary Table 5 and Supplementary Fig. 6). Analysis of completeness was carried out using the core eukaryotic gene mapping approach (CEGMA), which revealed that >97% of genes were present in the assembly (Supplementary Table 6).

Repetitive sequences

In total, 1.22 Gb of repeat elements were identified in a 1.58-Gb genome assembly, indicating that 77.2% of the assembled genome is repetitive. In addition, because the repetitive parts of the genome are always the parts that are under-represented in the genome assembly, most of the unassembled DNA (0.18 Gb) is most likely repetitive, too. This is not surprising, because multiple repeats will often collapse into a single repeat in an assembly and also because “repeat masking” is often performed before some assembly steps^11,12,13. We expect the true percentage of repetitive DNA to be a minimum of 80%. This is similar to the proportion of repetitive DNA found in the 2.3-Gb maize genome (>85%), and considerably more than in 730-Mb sorghum¹⁴ (∼61%), ∼400-Mb foxtail millet¹¹ (∼46%) or 466-Mb rice¹⁵ (∼42%) genomes. In common with the pattern in many other plant genomes, long-terminal repeat (LTR) retrotransposons were the most abundant class of repetitive DNA, and comprise >50% of the nuclear genome of pearl millet (Supplementary Table 7). Using RepeatMasker, we found that sequence divergence rates were high (peak at 28%) among long interspersed nuclear elements (Supplementary Fig. 7).

Genes and annotation

A total of 69,398 transcriptome assembled contigs (TACs), amounting to 43 Mb in total, were identified using pearl millet transcriptome sequences from two different studies^16,17 and a new pearl millet transcriptome assembly generated for this study (Supplementary Table 8). Ab initio homology-based gene prediction were combined with transcript assembly to infer a non-redundant set of 38,579 gene models with an average transcript size of 2,420 bp and an average coding sequence of 1,014 bp (Table 1; Supplementary Table 9). The average lengths of mRNA, CDS, introns and exons in pearl millet were similar to those reported for other cereal genomes (Supplementary Fig. 8). Among 458 of the most conserved genes in CEGMA, 437 (95.4%) genes were complete but 8 (1.7%) genes were not found in the genome sequence, 8 (1.7%) genes were not included in the gene set, and 5 (1.1%) genes had more than one copy (possibly fragmented genes). In addition, for 956 genes in benchmarking universal single-copy orthologs (BUSCO) analysis, we annotated 96.7% genes, and 95.4% of these are complete. Gene models of rice and Arabidopsis thaliana have been annotated and carefully validated. We chose to use the gene models of rice, which is more closely related to pearl millet than A. thaliana, to investigate the completeness of pearl millet genes. Of the 4,202 single-copy genes in rice, 90.86% have homologs in pearl millet, and 86% of these pearl millet genes were complete when compared with rice gene models (ratio of pearl millet length/rice length 0.8), reflecting the completeness of single-copy genes. Gene density increased toward the ends of pseudomolecules (Fig. 1), consistent with findings in all other cereal genomes published to date^11,14,15. Most of the annotated genes coded for proteins with homology to proteins in SwissProt¹⁸ (55.61%) and InterPro (ref. 19) (65.53%). Functions were assigned to 27,893 (72.30%) genes, leaving 10,686 (27.70%) genes unannotated (Supplementary Table 10).

Table 1 Statistics of genome assembly

Full size table

Predicted pearl millet proteins were compared to those already annotated in ten plant species (Arabidopsis²⁰, Brachypodium (Brachypodium distachyon)²¹, banana (Musa acuminata)²², barley²³, foxtail millet¹¹, maize²⁴, rice¹⁵, sorghum¹⁴, soybean (Glycine max)²⁵ and bread wheat)²⁶ and, as expected according to evolutionary relatedness, the highest number of orthologs were identified in foxtail millet (74.16%) and the lowest number in Arabidopsis (61.88%; Supplementary Table 11). Reciprocal pairwise comparisons of predicted proteins for 38,579 pearl millet gene models with 385,891 gene models from the same ten plant species (as above) identified 17,949 orthologous groups (Supplementary Table 12), of which 5,232 contained only a single pearl millet gene, which is suggestive of simple orthology (Supplementary Table 13; Supplementary Fig. 9). In addition to protein-coding genes, we predicted 909 tRNA, 235 rRNA, 183 microRNA (miRNA) and 752 small nuclear RNA (snRNA) genes in our assembly (Supplementary Table 14).

Gene families

We identified unique and shared gene families among different species in the grass subfamilies Panicoideae, Pooideae and Ehrhartoideae using OrthoMCL (Ortho Markov Cluster Algorithm http://orthomcl.org/orthomcl/)²⁷. Pearl millet and foxtail millet share 15,887 gene families (of those, 14,398 are also found in sorghum) while pearl millet and barley share 13,607 gene families (Fig. 2a). A total of 15,869 gene families are present in at least one species in each of the three subfamilies (i.e., Panicoideae, Ehrhartoideae and Pooideae) analyzed (Fig. 2a). 354 gene families were substantially expanded in pearl millet and 1,692 gene families were contracted (Fig. 2b). We compared the average length of the genes for the 384 expanded gene families among all the ten species and used “Quantile” statistics concept to estimate the short CDS. In this concept, Q₁ is “25th percentile”, Q₃ is “75th percentile” and interquartile range (IQR) is estimated as Q₃–Q₁. We consider a length shorter than Q₁–3(IQR) to be an extreme outlier. By using this method, we found that only 24 (6.25%) genes had substantially shorter CDS in pearl millet genes compared to other species. Thus, only a small proportion of the expanded gene families might be misidentified because of possible partial genes (Supplementary Fig. 10).

**Figure 2: Gene conservation and gene family expansion and contraction in pearl millet.**

Expansion and contraction of gene families between species might also highlight differences in bioinformatics analysis carried out for different genomes. Bias in gene model identification among different studies might render a comparison of expansion or contraction challenging. One potential source of bias is if a gene is split, that is, a complete gene is instead annotated as two separate genes. Based on eukaryotic orthologous gene sequences, we estimate that 2.3% of our genes might have been misannotated in this way (Supplementary Table 6). Although we found that 1,692 families were contracted in pearl millet, contraction is 5.4 times more likely than expansion. One explanation may be that there was a far higher proportion of split genes in the reference genomes of the other species that we use for comparison than in our pearl millet assembly. This would make our number of gene family contractions an overestimate.

Gene families that seem to be the most greatly expanded are those encoding cutin, suberin, wax biosynthetic genes (P < 10⁻⁶) and transmembrane transporters of secondary metabolites (ABC transporters, P < 10⁻²⁴) (Supplementary Table 15). Triterpenoids are a component of wax, and we also observed a substantial expansion of the gene families associated with terpenoid backbone biosynthesis, and monoterpenoid (P < 0.05) and di-terpenoid biosynthesis (P < 0.005). Notably, increased cuticular wax synthesis improves drought tolerance in Arabidopsis species²⁸, while reduced wax production has been associated with drought sensitivity in rice²⁹. An enriched repertoire of genes for lipid synthesis and export of macromolecules in pearl millet might contribute to its heat and drought tolerance.

Resistance to pathogens is a crucial contributor to crop yield. The majority of resistance genes in plants contain a nucleotide binding site (NBS). Identification of NBS-containing genes in pearl millet will help to identify putative resistance genes. 378 NBS-encoding genes were manually verified after initial searching, comprising ∼1% of the total gene set, similar to the proportion found in other cereal genomes (Supplementary Table 16). NBS-leucine rich repeats (NBS-LRR) genes made up ∼43% of the NBS-genes, with NBS-only genes comprising ∼41%. Of the 378 NBS-encoding genes, 360 were mapped to one of the seven pseudomolecules, with significantly (Chi-squared test P-value < 10⁻¹⁰) biased distribution among the pseudomolecules; ∼26.2% and ∼25.7% were located on Pg4 and Pg1, respectively (Supplementary Table 17). These are also the same two pseudomolecules to which a downy mildew resistance quantitative trait locus (QTL) was mapped³⁰. We observed large tandem arrays of NBS genes near the telomere region of Pg1 (two 4-gene groups, four 5-gene groups and one 6-gene groups) followed by Pg4 (three 2-gene groups and two 4-gene groups) (Supplementary Fig. 11 and Supplementary Table 18), consistent with a biased distribution of these loci and suggesting that tandem duplication may be an important source of local gene amplification.

Population structure, diversity and domestication

To better elucidate population structure, assess genetic diversity and understand pearl millet domestication, we resequenced 994 lines. The lines resequenced comprised 260 inbred male sterility maintainer (B-) and 320 male fertility restorer (R-) lines, 345 Pearl Millet Inbred Germplasm Association Panel (PMiGAP) lines (including cultivated germplasm from Africa and Asia, elite improved open-pollinated cultivars, hybrid parental inbreds and inbred mapping population parents)³¹, 38 inbred parents of mapping populations and 31 wild accessions. We generated a total of 1.16 Tb whole-genome resequencing (WGRS) data with 1.68× coverage (∼3.05 Gb per line) on PMiGAP lines and a total of 116 Gb WGRS data with 1.86× coverage (∼3.38 Gb per line) on parental lines of mapping populations (Supplementary Tables 19 and 20). In addition, for PMiGAP lines, 78.9 Gb of data at an average coverage of 0.12× was generated using genotyping by sequencing³², while for B- and R-lines, 614.45 Gb of data at 0.59× coverage with an average of 1.06 Gb per sample was generated using RAD sequencing³³ (Supplementary Table 21). Single plants from each of 31 wild accessions sampling the Sahel from Senegal to Sudan were resequenced at an average 2× coverage using WGRS approach (Supplementary Table 22).

We identified 88,256 simple sequence repeat (SSR) motifs using the MIcroSAtellite program³⁴ in the pearl millet genome sequence and designed primers for 74,891 SSR-containing sequences (Supplementary Tables 23 and 24), which can be used by the pearl millet community for genetics and breeding applications. Based on resequencing data, we identified 29,542,173 single-nucleotide polymorphisms (SNPs) in PMiGAP lines (Supplementary Table 25 and details for parents of mapping populations and hybrid parental lines Supplementary Tables 26,27,28), 3,844,446 insertions and deletions shorter than 50 bp (Supplementary Tables 29,30,31), and 423,118 genome-wide structural variations larger than 50 bp such as deletions, duplications and insertions (Supplementary Table 32 and Supplementary Figs. 12–15). We conducted a principal component analysis (PCA) and constructed a neighbor-joining tree based on 450,000 high-quality SNPs. The PCA analysis and phylogenetic tree showed four main clusters, three that contained wild accessions and one that grouped together the cultivated germplasm (Fig. 3a,b). The three wild accession clusters were separated by geographical origin into East, Central and West African clusters (Fig. 3a,b).

**Figure 3: Domestication and genetic diversity in elite and wild accessions of pearl millet.**

The closest of the wild groups to the cultivated samples is from the central part of West Africa (Fig. 3b), indicating that pearl millet originated in this region, consistent with prior research³⁵. The oldest archaeological remains, which date to 4,500 years ago, were found in the north-central Sahel, in accordance with our genetic analyses³⁶. Studies of archaeological remains found that by 3,500 years ago cultivation of pearl millet was widespread in Sahelian Africa^37,38,39. Spread of pearl millet agriculture to Asia, and in particular to India also dates to 3,500 years ago⁴⁰. Average pairwise nucleotide diversity within populations (θ_π) and Watterson's estimator of segregating sites (θ_ω) both indicated high diversity among wild accessions (average θ_π = 0.00366 and θ_ω = 0.00342) compared with PMiGAP (average θ_π = 0.00238 and θ_ω = 0.00289) on all seven pseudomolecules (Supplementary Table 33). In agreement with the PCA analysis and neighbor-joining tree, we observed strong population structure in the wild accessions and weak population structure in PMiGAP lines (Supplementary Figs. 16 and 17). The weak cultivated pearl millet structure suggests a homogenous genetic diversity across large geographical scale. This pattern is certainly associated with a rapid spread of pearl millet agriculture in Africa and India without major bottlenecks during diffusion. This pattern is expected for inbreds derived from a highly allogamous species. The strong structuration of wild diversity and the central geographical origin of the cultivated sample suggest strong untapped and unique diversity for breeding from wild populations found in East Africa (Sudan, Chad) and the West (Senegal, Mauritania).

Domestication in pearl millet, like that observed in maize²⁴, was associated with profound modifications of spike morphology and plant architecture (Fig. 3c). We found several genomic regions that showed reduced diversity in the cultivated (but not wild) species that may harbor genes selected for during domestication. Using a negative log ratio of diversity between cultivated (red) and wild (blue) samples, values close to 1 indicate a tenfold decrease in diversity whereas values close to 0 indicate that diversity is maintained in the cultivated samples. We also identified regions with an excess of differentiation based on a fixation index (F_ST) measure (Supplementary Fig. 18). These analyses provided orthogonal and consistent results and identified 140 genomic regions with values above the 95% threshold for both loss of diversity and differentiation. Using a stringent threshold of 99.5%, and considering only values identified by both statistics, 24 genomic regions had reduced diversity in the cultivated germplasm, of which eight were located on Pg7, six on Pg6 and five on Pg1 (Supplementary Tables 34 and 35). Linkage groups 6 and 7 have previously been identified as carrying QTL that explain most phenotypic differences between wild and cultivated pearl millet germplasm^41,42. Most of the identified regions have negative Tajima's D values (<−2.0), suggesting a signature of positive selection (Supplementary Table 34). One striking case of diversity loss of more than tenfold was associated with the regulation of an auxin-induced gene PINOID on Pg6. This gene is known as barren inflorescence2 (ref. 43) in maize, and variation in this gene has been associated with phenotypic variation of the inflorescence⁴⁴. Our analyses also pinpointed genes encoding protiens that might be associated with morphogenesis (LIM2 and PINOID on Pg6, Myosin 11 on Pg7) or gene regulation (Basic helix–loop–helix, bHLH110 on Pg3, Zinc Finger on Pg6). Validation of the role(s) of each of these genes in domestication will require functional analyses and further phenotype–genotype association analyses using fine-scale QTL approaches.

GWAS

Genome-wide SNP data were used to compute linkage disequilibrium decay (LDD) in all three germplasm sets. We set the r² threshold as 0.2 and observed rapid LDD of less than 0.5 kb in B- and R- lines (48 bp) as well as in PMiGAP lines (84–444 bp) (Supplementary Fig. 19). LDD in pearl millet is on par with that in maize, and we note that both these plants are allogamous⁴⁵. Relatively rapid LDD is expected in sets of lines that represent the variation present in a highly allogamous panmictic population. Grain and stover yield, and its component traits, is of crucial importance in pearl millet and has undergone selection during domestication. We carried out GWAS across 288 test-cross progenies of PMiGAP lines for 20 traits, and identified 1,054 strongly significant marker trait associations (MTAs) for 15 traits (Supplementary Table 36): grain number per panicle (91 MTAs), grains per square meter (75 MTAs), stover dry matter yield (kg ha^-1; 5 MTAs), fresh stover yield (t ha^-1; 38 MTAs), tillers per plants (147 MTAs), panicle diameter (cm; 1 MTAs), panicle harvest index (%; 1 MTAs), panicle length (cm, EL; 3 MTAs), panicle yield (kg/ha; 9 MTAs), panicle number (ha^-1; 246 MTAs), plant population (ha^-1; 68 MTAs), grain yield (kg/ha; 11 MTAs), grain harvest index (%; 5 MTAs), plant height (cm; 344) and 1000 grain mass (g; 10 MTAs). The MTAs explained 9–27% of phenotypic variation (Supplementary Table 36). Selected markers were found common across stress and year for important traits such as grain number per panicle on Pg1 and Pg5 (Supplementary Fig. 20). These markers might be relevant for pearl millet breeding.

Genomic prediction of hybrid performance

We applied our resequencing data to carry out genomic selection to predict grain yield for test crosses. Four scenarios of prediction were investigated, namely the performance of grain yield in each of the three environments (control, early stress and late stress) and across environments. We observe high prediction accuracy, measured as the Pearson correlation coefficient between the predicted and observed values, standardized with the square root of the heritability (h = 0.78), amounting to 0.6 for the performance across environments. Analyses of this kind have been undertaken for grain yield in other crops using genomic selection⁴⁶. A modelling study recently found that with this level of prediction accuracy, genomic selection could substantially improve selection gain per year⁴⁷.

We also predicted hybrid performance, by using genomic selection strategy that considers additive and dominance effects. The ridge regression best linear unbiased prediction method⁴⁶ was trained using phenotypic grain yield data from 64 pearl millet hybrids grown in five environments in India in replicated trials during the time period 2004–2013. The grain yield data were analyzed with 302,110 SNPs with missing values below 5% and minor allele frequency above 5% for 580 B- and R- lines (Fig. 4a). We found 170 promising hybrid combinations (Supplementary Table 37 and Fig. 4a). Of these, 11 combinations were already used for producing hybrids that showed better performance (Supplementary Table 38). However, 159 combinations have never been used in hybrid breeding (Fig. 4b), and therefore they are good candidates for developing high-yielding hybrids.

**Figure 4: Prediction of hybrid performance.**

We inspected the predicted hybrid performance of all possible 167,910 single-cross combinations by applying hierarchical clustering combined with a heat plot, and examined the potential of this approach to identify promising heterotic groups. The analyses revealed two sets of lines that are predicted to have an average 8% higher hybrid performance when crossed to each other than the total set of 167,910 single-cross combinations (Fig. 4c and Supplementary Fig. 21). These predicted high-yield hybrids could be used as a nucleus to establish high-yielding heterotic groups for hybrid pearl millet breeding⁴⁸ (Supplementary Tables 37 and 38).

Discussion

Pearl millet is a staple food for more than 90 million people in Africa and Asia. People living in arid and semi-arid regions, in particular, rely on pearl millet, which can crop in the harsh conditions. We sequenced the genome of pearl millet reference genotype Tift 23D₂B₁-P1-P5 (available at https://www.ncbi.nlm.nih.gov/assembly/GCA_002174835.1/). The draft genome assembly presents 90% of the pearl millet genome with N50 of scaffolds as 884.95 kb and 87.2% assembled genome into seven pseudomolecules. The genome assembly of cereal species like pearl millet with high levels of repetitive DNA is always challenging. Therefore, in addition to a WGS approach, BAC-sequence data were used to develop the draft genome assembly and PacBio data were generated to validate the assembly. To achieve chromosome level assembly, one can use new approaches of sequencing such as Bionano Genomics optical mapping and Dovetail Genomics chromosome confirmation capture data in different combinations⁴⁹.

Our analysis identified 38,579 protein-coding genes, of which 27,893 (72.30%) were annotated. CEGMA and BUSCO analyses together with comparison with gene models of rice have indicated completeness of predicted genes in pearl millet. Expansion of gene families associated with terpenoid backbone biosynthesis and monoterpenoid and diterpenoid biosynthesis in the genome might explain the high level of heat and drought tolerance in pearl millet as compared to other cereals.

Genome sequence can provide information either about specific genomic regions or specific genes that are associated with agronomically important traits including grain and fodder yield. Pearl millet fodder is the main feedstock for ruminant (and other) livestock, and breeding to improve fodder quality and yield is of crucial importance to both the meat and the dairy industries. In order to identify loci or variants associated with agronomic features, we undertook a large-scale resequencing effort. Resequencing of the PMiGAP set revealed that small structural rearrangements, such as insertions and deletions in the genome have occurred throughout the evolution of pearl millet. This is similar to observations made in maize: a third or more of maize genes seem to be optional. Frequent insertions and deletions pose substantial challenges to resequencing efforts because self-pollinated and small-genome species such as rice are easier to sequence and analyze as compared to cross-pollinated and large-genome species like maize owing to their increased genomic structural variability⁵⁰. With an objective to save the cost, but without losing information, 1.68× coverage WGRS data and 0.12× GBS data were generated on PMiGAP lines and 0.59× coverage RAD-sequencing data were generated on B- and R- lines.

The sequence information from the more genetically diverse PMiGAP inbred panel will be of broader use for genome-wide association mapping and allele mining. All of these sequences are available at https://www.ncbi.nlm.nih.gov//sra/?term=SRP063925. Resequencing data of almost 1,000 pearl millet lines (963 inbreds of cultivated pearl millet and 31 heterozygous wild individuals, available at https://www.ncbi.nlm.nih.gov//sra/?term=SRP063925) provides researchers and breeders with an enormous resource of genome-wide variations including SNPs, indels, SSRs and structural variations (Supplementary Tables 23,24,25,26,27,28,29,30,31,32) for mining alleles of genes with significant MTAs and for developing pearl millet hybrids with increased heterosis. Our analysis on resequencing data on PMiGAP lines together with phenotyping data for 20 traits for GWAS and genomic selection suggests that simultaneous improvement of grain and stover yield might be feasible in pearl millet. Indeed, improved grain and stover yield performance of hybrids in India has been noted over the past 50 years, which underlines the potential for further improvements that could be informed by our analyses.

We also show the use of the genome sequence and resequencing information to make predictions of test-cross hybrid performance. After inspecting predicted hybrid performance of 167,910 single-cross combinations, we identified 159 pair of lines that have not been used so far for hybrid breeding but can exhibit high hybrid performance. This type of analysis has considerable potential for accelerating future rates of selection gain. Our prediction models were also applied to define heterotic pools for pearl millet for South Asia, which could be crucial for increasing the efficiency of hybrid breeding programs in the same region.

Together the draft genome and resequencing data provide a resource for the research community that should enable a better understanding of trait variation and accelerate the genetic improvement of pearl millet. For instance, we identified 1,054 MTAs for 15 agronomic traits that will be useful for pearl millet breeding. Our findings will also contribute to a better understanding of the genetic basis of the exceptional drought and heat tolerance of pearl millet as we have identified expansion of gene families associated with drought and heat tolerance. A detailed understanding of how well pearl millet crops do in hot, arid and semi-arid regions might enable engineering of not only pearl millet but also other cereal crops like rice, maize and wheat, which are currently able to provide only limited produce in arid or semi-arid regions. This is especially important owing to the pressing need for heat- and drought-tolerant cereal crops in the coming years.

Methods

Plant material.

The pearl millet genotype Tift23D₂B₁-P1-P5 was bred at the Coastal Plain Experiment Station (Tifton, Georgia, USA) by introducing the d2 dwarfing gene into the genetic background of elite seed parent maintainer line Tift 23B1, and was chosen to generate a draft genome sequence.

Three bi-parental mapping populations were used to develop the genetic map for organizing scaffolds into pseudomolecules. These populations were: (i) a small recombinant inbred line (RIL) population developed at ICRISAT, Patancheru, based on the cross ICMB 841-P3 × ICMB 863B-P2 (MAPPOP1); (ii) a RIL population developed at the Coastal Plain Experiment Station, Tifton, Gerogia (USA) based on Tift 99B × Tift 454 (MAPPOP2); and (iii) an F₂ population derived from a wild × domestic cross (MAPPOP3) from Institut de Recherche pour le Developpement (IRD) France. 580 B- and R- lines included 200 B- and 200 R- lines from ICRISAT plus 60 B- and 120 R- lines from 5 organizations from India namely Haryana Agricultural University, Hisar, Haryana; Junagadh Agricultural University, Jamnagar, Gujarat; Mahatma Phule Krishi Vidyapeeth, Dhule, Maharashtra; Sri Karan Narendra Agriculture University, Durgapura, Rajasthan; and JK Agri Genetics Ltd., Hyderabad, Telangana, were resequenced using restriction-site-associated DNA (RAD) sequencing (Supplementary Table 39). The PMiGAP lines contains 345 lines: 263 landraces/traditional cultivars, 46 breeding lines, 25 advanced/improved cultivars and 11 accessions with unknown biological status and represents germplasm from 27 countries in two continents (Supplementary Table 40). These 345 accessions were subjected to WGRS. In addition, 38 inbred parents of mapping populations segregating for drought, downy mildew and rust (Supplementary Table 41) and 31 wild accessions representing seven countries (Mali, Mauritania, Senegal, Sudan, Chad, Mali and Niger) were also resequenced using the WGRS approach (Supplementary Table 42).

Whole genome shotgun sequencing and assembly.

We constructed 10 small insert libraries including 4 with 170 bp insert, 2 with 250 bp insert, 2 with 500 bp inserts and 2 with 800 bp insert, and 13 mate-pair libraries including 4 with 2 kb insert, 4 with 5 kb insert, 2 with 10 kb insert, and 2 with 20 kb insert and 1 with 40 kb insert from pearl millet genotype Tift 23D2B1-P1-P5. To make libraries with ∼170 to ∼800 bp inserts, high quality DNA samples were sheared, end-repaired, and 'A' bases were added to the 3′ end of the DNA fragments to facilitate ligation to adapters. Fragments in the appropriate size range were selected after separation on an agarose gel and amplified using PCR. For mate-pair libraries, a biotinylation reaction was performed after fragmentation and end-repair. Then DNA fragments of the required size were selected and circularized. Circular DNAs were sheared into approximately 400-600 bp fragments, and biotinylated fragments were captured for terminal modification and adaptor ligation to construct libraries. Paired end reads were generated for each library on an Illumina HiSeq 2000 platform.

For BAC library construction, DNA from pearl millet genotype Tift 23D₂B₁-P1-P5 was fragmented using HindIII and EcoRI, and then ligated into vector pCC1BAC. The ligations were transformed into E. coli DH10b host cells. After DNA isolation from BAC clones, Covaris LE220 system was used to shear DNA into ∼500 bp. Agilent Bravo Automated Liquid Handling Platform and an Agilent BenchCel Microplate Handler were used to construct BACs for sequencing. Then 96-microTUBE plates (Covaris) were used as sample vessels for automated batch processing followed by index adaptor ligation and size selection⁵¹. Generally, the sizes of the BAC ranged from 80 -180 kb and fragments for sequencing were about 500 bp. In total 100,608 BAC clones were constructed and HiSeq 2000 was used for sequencing paired end reads of each BAC clone.

For each library, we filtered the reads that comprised more than 5 percent of “Ns” or polyA structure, and also removed reads that possessed 20 or more bases with quality score less than or equal to 7. Reads with >10 bp aligned to the adaptor sequence (allowing ≤3 bp mismatch) were considered as adaptor contaminants and removed. Additionally, paired-end reads with a total length smaller than the library insert size allowing a window of 30 bp were removed. We also trimmed the reads if the quality of bases at the head or tail of the reads was low.

k-mer analysis.

We performed k-mer analysis⁵² for the estimation of the genome size of pearl millet genotype Tift 23D₂B₁-P1-P5. Genome size was estimated by the formula: Genome size = k-mer_num/Peak_depth where k-mer_num was the total number of k-mers and Peak_depth was the expected value of k-mer depth obtained from the distribution curve. The number of k-mers (generally K = 17) was calculated from short fragment size reads with a one bp slide, and then the frequency of each k-mer was determined. A distribution curve of depth versus frequency was plotted, where the x-axis represents the depth and the y-axis represents the proportional frequency at that depth divided by the total frequency of all the depths.

Development and improvement of genome assembly.

For WGS assembly, clean reads were assembled by SOAPde novo⁵³ (Version 2.04) (parameters: pregraph -s assembly.lib -K 63 -R -d 1 -o pm; contig -g pm –R; map -s assembly.lib -g pm -k 45; scaff -g pm). The k-mer frequency follows a Poisson distribution when read length << genome size⁵⁴. Short insert libraries were assembled into contigs. The reads were mapped back onto the contigs to estimate overlap between contigs. Gapcloser⁵³ (Version1.10, parameter: -a pm.scafSeq.fill -b reads.lib -o pm.scafSeq.fillGap -t 24) within the SOAPde novo package was used to fill gaps in the scaffold with paired end reads. BAC-by-BAC sequencing of 100,608 BAC clones was conducted to improve the quality of the genome assembly. Each sequenced BAC was assembled separately by SOAPde novo. First, sequences shorter than 2,000 bp or having more than 30% unknown bases in BAC clones were discarded. The remaining sequences were then pooled with WGS scaffolds together to extend and collapse redundant sequences.

For improving WGS-based assembly, BAC- sequence data were included in analysis using Rabbit package⁵⁵. This package consists of three modules: Relation Finder, Overlapper and Redundancy Remover. In the first step, 40 bp at the end of each sequence was trimmed as they turn to be of lower quality. Then overlapping between sequences were detected by BLAT⁵⁶ with minimum overlap length set to be 3,000 bp. In second module for extension, overlapping with identity greater than 90% were merged and sequences were extended. To avoid the duplicates in the final assembly, segmental duplications and divergent haplotypes were identified and filtered based on the Poisson-based k-mer model following methods described in Liu et al.⁵². To evaluate the assembly of the pearl millet genome, we first calculated the length and N50 distribution for the BAC sequences. The BAC lengths ranged from 80-140k, and their N50s were from 10-40k (Supplementary Fig. 22). Gaps can occur in the fragmented BAC assemblies since the insert size of the pair end reads is 500 bp. PacBio reads were processed using Blasr (processed with PBJelly pipeline) to evaluate the assembled sequence.

GBS and SNP calling on mapping populations.

GBS libraries were prepared using restriction enzyme ApeKI as described by Elshire et al.³². The MAPPOP1 and MAPPOP2 populations were sequenced at 384-plex (that is, 384 samples per flowcell lane) on an Illumina HiSeq 2000, while the MAPPOP3 population was sequenced at 96-plex (96 samples per flowcell lane). SNPs were called using the TASSEL-GBS pipeline in TASSEL v4.1.32⁵⁷. The TASSEL-GBS pipeline incurs an overhead for each separate pseudomolecule processed, hence we concatenated the thousands of individual scaffolds into ∼20 megascaffolds to ease computation. Reads were processed into clean 64 bp “tags” and mapped against the reference scaffolds with Bowtie 2 (ref. 58). SNPs were called with the DiscoverySNPCallerPlugin in TASSEL, with minimal filters to reduce the number of false positives due to sequencing errors (minor allele frequency ≥ 0.01, minor allele count ≥ 10, genotype calls in at least 10% of samples) (Supplementary Code 1).

RAD sequencing.

Genomic DNA of each B- and R- individual was digested with EcoRI. After electrophoresis, DNA fragments of the desired lengths were gel purified. Adaptor ligation and DNA cluster preparation were performed and fragments were sequenced on an Illumina HiSeq 2000 platform. Similarly, 29 DNA libraries were constructed for B- and R- lines (580 samples) and sequenced using the RAD-Seq approach³³.

Genetic map construction.

SNPs called from the GBS data on three populations (MAPPOP1, MAPPOP2 and MAPPOP3) were first filtered for quality based on minor allele frequency, missingness and heterozygosity (Supplementary Code 2). Linkage groups were defined based on hierarchical clustering of SNPs and ordered with MSTMap. For each population, we created three maps: one from stringently filtered SNPs, one from moderately fileted SNPs, and one mapping GBS sequencing tags back to the stringently filtered map (Supplementary Code 2). The framework map generated in the largest RIL population (Tift 99B × Tift 454) formed the basis of an initial colinearity study between pearl millet and foxtail millet, and the resulting comparative knowledge was used to incorporate additional scaffolds for which orthology to the foxtail millet genome had been established using BLASTP (to identify putative orthologous pearl millet and foxtail millet genes at an E-value threshold of 1e-5) and MCScanX⁵⁹ (to identify colinear segments of at least five syntenic genes between pearl millet and foxtail millet) analyses into the framework map. The genetic maps generated for each of the crosses, and the map that we built based on collinearity information between pearl millet and foxtail millet, were merged using ALLMAPS⁶⁰ with the most weight assigned to the synteny map followed by the stringent SNP maps, the moderately filtered SNP maps, and finally the GBS sequencing tags (Supplementary Code 3). Linkage group numbering was adopted as per an existing consensus map¹⁷ based on mapping SSR sequences to the assembled genome (Supplementary Code 3).

Repeat annotation, gene prediction and genome annotation.

We searched the genome for tandem repeats with Tandem Repeats Finder⁶¹ (Version 4.04) (parameters: 2 7 7 80 10 50 2000 -d -h). Transposable elements (TEs) were identified in the genome by a combination of homology-based and de novo approaches⁶². For homology-based predictions, we used the repeat database Repbase16.10⁶³ to identify known repeats in the genome assembly with the program RepeatMasker⁶⁴ (Version 3.3.0) (parameter: -nolow -no_is -norna -parallel 1 -lib RepeatMaskerLib.embl.lib). At the protein level, RepeatProteinMask, a software in the RepeatMasker package, was used to perform RMBlast against the TE protein database (parameter: -noLowSimple -pvalue 0.0001). For de novo prediction, the programs RepeatModeler⁶⁵ (Version 1.0.5) and LTR_FINDER⁶⁶ (Version 1.0.5) were used on the entire genome to generate a pearl millet repeat database, which was subsequently used as input library with RepeatMasker (Version 3.3.0) to identify TEs.

For predicting genes, we applied several approaches: (i) Homology-based prediction: Proteins previously annotated in other species (Supplementary Table 9) were mapped to the genome using BLAT⁵⁶ (Version 34) with default parameters. Alignments in which the coverage of the query protein was less than 0.3 were removed. In addition, if there were multiple BLAT hits (BLAT output was set to the five best hits), secondary hits were removed if their aligned length was less than 0.3 of the aligned length of the top BLAT hit to filter paralogs with lower sequence identity. GeneWise⁶⁷ (with parameter -trev -sum -genesf) was used to predict spliced alignments. (ii) De novo gene prediction: AUGUSTUS⁶⁸ (Version 2.5.5,–species = maize–uniqueGeneId = true–noInFrameStop = true–gff3 = on–strand = both) and Fgenesh⁶⁹ (Version 1.3) were used to detect gene models in the repeat masked genome. (iii) Prediction based on transcript sequences: The assembled transcriptome sequences were aligned to the genome assembly using BLAT (Version 34) using the parameters identity ≥ 0.98 and coverage ≥ 0.98 to generate spliced alignments. (iv) Integration evidence: Source evidence generated from the three approaches mentioned above were integrated using GLEAN⁷⁰ to produce a consensus gene set.

To annotate the function of the final gene models, protein sequences were aligned against KEGG⁷¹ (release 58) and SwissProt¹⁸ (release 20156) with BLASTP (E-value ≤ 1.0e-05) to find the best matches. InterProScan¹⁹ (Version4.8, performed with profilescan, blastprodom, hmmsmart, hmmpanther, hmmpfam, fprintscan and patternScan analysis) was used to identify motifs and domains in the proteins encoded by the gene models along with gene ontology annotations⁷². For ncRNA annotation, tRNA genes in the assembly were identified by tRNAscan-SE⁷³ (Version 1.23). rRNA genes were aligned with plant query sequences (rRNA from Arabidopsis and rice species) using BLASTN with an E-value threshold of 1.0e-05. Other non-coding RNAs, such as miRNAs and snRNAs were predicted by homology searches against the Rfam database⁷⁴ using the INFERNAL⁷⁵ (Version 0.81) software.

RNA seq data generation and development of transcriptome assembly.

The transcriptome sequence data were generated from individuals “9-8” and “3-9” accessions at IRD. Library preparation and sequencing (PE 100 bp) on an Illumina Hi-Seq 2000 platform was performed by Fasteris (Plan-les-Ouates, Switzerland). A total of 81,207,232 and 74,187,066 sequence reads were obtained for “3-9” and “9-8”, respectively. Adaptor sequences were trimmed and reads were processed for de novo assembly using Velvet 1.0.18⁷⁶ and then Oases 0.1.18⁷⁷. Several values of hash length were tested to optimize the assembly: 39, 51, 63, 65, 69 and 73. The obtained assemblies were compared for their ability to map raw reads using BWA⁷⁸. We consequently decided for a hash length of 73. The transcript assembly was then searched for redundancy. Contigs sharing identity over ≥95% of the length of the shortest sequence in a set of putative homologous sequences were clustered. The final transcript assembly contained 50,313 contigs, with a total of 36,479,993 nucleotides. Three transcriptomes (Zeng et al.¹⁶, Rajaram et al.¹⁷, and the transcriptome data generated at IRD, France, available under BioProject ID PRJNA391885) were combined and clustered using CDHIT-EST⁷⁹ with default parameters to eliminate redundancy at the sequence level. Then, CAP3⁸⁰ was used to assemble the contigs. Ns on either end of the resultant contigs were trimmed. Finally, contigs of at least 200 bp in length were used in gene annotation.

Gene family and phylogenetic analysis.

For gene family analysis, BLASTP with an E-value cutoff of ≤ 1.0e-05 was used to compare all annotated pearl millet protein sequences against a protein data set of 10 sequenced plant species (Arabidopsis²⁰, Brachypodium²¹, banana²², barley²³, foxtail millet¹¹, maize²⁴, rice¹⁵, sorghum¹⁴, soybean²⁵ and T. urartu²⁶). The proteins were clustered using OrthoMCL²⁷ (–mode 3) to define gene families which included both paralogs and orthologs. The number of gene families in each species and genus was calculated based on the composition of the OrthoMCL clusters. Genes that were single copy in an OrthoMCL cluster for all species analyzed were selected to construct a phylogenetic tree using the PhyML (parameters: -d nt -b -4 -m HKY85 -a e -c 4 -t e) program⁸¹ (Version 3.0). Divergence times between pearl millet and other species were estimated using MCMCTREE⁸² with default parameter. First, the gene family size for each species was calculated based on the output of OrthoMCL, and rooted tree in newick format. CAFE⁸³ (-p 0.05 -t 4 -r 10000 -filter) was used to predict the expansion and contraction of gene family numbers based on the phylogenetic tree and gene family statistics.

Population analysis.

Population genetic analyses of the PMiGAP lines, including PCA and diversity detection were conducted essentially as described for rice by Xu and colleagues⁸⁴. We used a subset of 450,000 SNPs, with a missing rate <10% across PMiGAP lines and wild accessions. Briefly, for PCA, eigenvector decomposition of the SNP genotype data was calculated using the R function eigen⁸⁵. A Tracey-Wisdom test with default parameter settings was performed to determine the significance of axes using the twstats program. To build a phylogenetic tree, the percentage of pairwise nucleotide differences between individuals (p-distance) was calculated⁸⁵. The program fneighbor (PHYLIPNEW v3.69.650 within the package EMBOSS v6.6.0.0; parameter: -matrixtype s -treetype n) was used to construct a neighbor joining tree. The resulting tree was edited and visualized using MEGA5⁸⁶ by choosing Radiation style. Population structure was assessed using the program Snmf (–k K –c)⁸⁷. Five runs were performed and the values with the smallest Cross-Entropy for K from 2 to 7 were selected to generate the structure graphs. To better assess the structure, we performed the analysis in a geographical context, using TESS3⁸⁸ that takes geographical coordinates of the sample into account. Furthermore, parameters of population genetic diversity π, θ_ω and differentiation (F_ST) were calculated based on the SNP data as described earlier⁸⁵. To analyze diversity across the genome, we used a window of 100 kb and calculated the diversity π, θ_ω and differentiation F_ST for each window for PMiGAP lines and wild accessions using BioPerl modules (Bio::PopGen::Statistics and Bio::PopGen::PopStats) on a sliding window of 100 kb using genotype data. The effective sequence length (without Ns) in each window was used as the denominator to calculate per-bp values. We then calculated a minus log of the ratio of diversity between cultivated and wild samples: –log (π cultivated/ π wild). For this log ratio of diversity and differentiation, we retained the most extreme values using a classical threshold of 95% for a unilateral test and a more stringent threshold of 99.5%. This later stronger stringent threshold was used to identify the most likely gene candidates selected during domestication. Loci with higher levels of differentiation (most extreme F_ST) and stronger loss of diversity in the cultivated compared to the wild accessions were considered to be provisionally involved in the domestication process.

Identification of NBS domain, TIR domain, LRR motif and CC motif.

All pearl millet proteins were assessed for the presence of NBS domains (PF00931, NB-ARC) using the Hidden Markov Model based method implemented in hmmsearch (version 3.0)⁸⁹ with an e-value cutoff = 1. To filter false positive hits, all identified NBS containing proteins were screened against the Pfam-A database. NBS domains that overlapped with other domains identified at lower e-values were filtered out. Likewise, the TIR domain (PF01582) was used as query against all pearl millet proteins with hmmsearch and further checked by looking at the overlapping domains. To detect LRR motifs, predicted NBS encoding proteins were searched against 10 LRR families in LRR clan (CL0022) with an e-value cutoff = 1. All regions predicted as LRR motifs and not overlapping with other domains identified with lower e-values were considered real LRR motifs.

SNP calling, structural variation and linkage disequilibrium (LD) decay.

Sequence reads generated for the B- and R- lines, PMiGAP lines, and parental lines and wild lines were mapped separately to the pearl millet genome assembly using BWA (v0.6) (parameter: aln -n 0.04 -o 1 -e 30 -i 15 -d 10 -l 35 -k 2 -m 2000000 -t 4 -M 3 -O 11 -E 4 -R 30 -q 0 -I; sampe -a 500 -o 100000 -n 3 -N 10 -c 1.0e-05). The BAM files generated by BWA were sorted and provided as input to the GATK software package⁹⁰ (Version 3.1-1). The UnifiedGenotyper module within GATK was used to detect SNP variants. The variants were filtered using VariantFiltration, a module from GATK (parameters: QD < 2.0 || FS > 60.0 || MQ < 40.0 || HaplotypeScore > 13.0; parameters for indel: QD < 2.0 || FS > 200.0), and the number of variants distribution in intergenic/coding regions were calculated. The data used in the downstream analysis were controlled with MAF 0.05 and missing rate 0.5. SNPs with a mean depth > 100 and missing rate > 0.5 were removed. The remaining SNPs were used in further analyses. Variants for wild lines that used in population structure and domestication analysis were detected together with PMiGAP accessions and processed with the same strategy (BAM and VCF files available at http://ceg.icrisat.org/ipmgsc/).

The BAM files from each resequenced accession was analyzed by Breakdancer (version 1.1.2)⁹¹ with default parameters to detect structural variation namely, deletions, insertions, inversions, and intra-chromosomal translocations. Breakdancer results of accessions that come from a same line (see Supplementary Table 32) were combined to remove redundancy and to calculate the number and length of the rearrangements.

Using SNP data sets from PMiGAP lines, Haploview software⁹² (-maxdistance 250 –minMAF 0.05 -dprime -memory 5096) was used to calculate correlation coefficient (r²) values for LD. The average (r²) values between pairwise distances (bp) were calculated and figures were plotted using R.

Statistical analysis.

Phenotyping data and GWAS analysis. For establishing marker trait associations, 288 test cross hybrids were generated by crossing of PMiGAP lines as pollen parents with a common seed parent ICMA 843-22. These hybrids were grouped by maturity (early, medium early, medium and late) and phenotyped for 20 morphological traits under two drought stress conditions (early and late stress) along with controls (or no stress) for two years (2011, 2012). Experiments were conducted in an alpha-lattice designs with two replications in three test environments during Summer 2011 and 2012 (January to May) in the red precision (RP) experimental fields at the ICRISAT, Patancheru, Telengana, India (545 m above mean sea level, 17.53° N latitude and 78.27° E longitude). The early maturity group consisted of lines which had days to 50% flowering (DFF) from 42-52 days; the medium-early maturity group consisted of lines with DFF from 53-57 days; the medium maturity group consisted of entries with DFF from 58-62 days; the late maturity group consisted of lines which recorded more than 62 days for DFF. Early drought stress is a more severe stress imposed by withholding irrigation from about one week before flowering until maturity. Late stress is a less severe drought stress initiated during early grain-filling by withholding irrigation from 50% flowering time till maturity.

The three test environments consisted of early-onset of stress, late-onset stress, and a common, fully-irrigated non-stress treatment. Drought stress was imposed by withholding irrigation from about one week before flowering in early-onset treatment, while drought stress in the late-onset treatment was imposed by withholding irrigation from 50% flowering. Data were recorded for a total of 20 traits namely, grain yield (GYHA), panicle yield (HYHA), panicle harvest index (PHI), time to 75% flowering (TB), plant height (PH), panicle length (EL), panicle diameter (ED), panicle number (HCHA), number of tillers per plant (Till), biomass yield (BM), grain harvest index (HI), thousand grain weight (TGW), grain number per panicle (GNP), grain number per m² (GNM2), agronomic score (AgS), stover dry matter fraction (DMF) and vegetative growth index (GI). PH, EL, and ED were measured on the main stems of five representative plants of each entry in a plot at maturity. At harvest, data were recorded from the harvested area on plant population (PCHA), panicle numbers (HCHA) and fresh stover yield (FSWTHA). Effective tiller number (Till) was calculated as the ratio HCHA/PCHA. HYHA, GYHA and TGW were recorded after oven drying for about 24 h. Stover dry matter yield (DMY) was estimated from plot FSWTHA using the fresh and dry weights of a chopped subsample of stover from each plot. BM was calculated as HYHA + DMY on a plot basis. Grain number per panicle (GNP) was derived from primary data as [(GYHA/HCHA)/ (TGW/1000)]. Grain harvest index was calculated as the ratio between grain yield and biomass yield at harvest, and panicle harvest index as the ratio between grain weight and panicle weight. Flowering time was recorded as days from seedling emergence to stigma emergence for 75% of the main shoots in a plot. The traits measured include grain yield (kg/ha), panicle yield (kg/ha), panicle harvest index (%), time to 50% flowering (number of days), plant height (cm), panicle length (cm), panicle diameter (cm), panicle number, tillers per plant, biomass yield (kg/ha), vegetative growth index (kg/ha/day), grain harvest index (%), fresh stover yield (t/ha), stover dry matter yield (kg/ha), stover dry matter fraction, 1000-grain mass (g), grain number per panicle, and grain number per m² (Supplementary Data set 2). Analysis of variance for all traits was performed using the PROC MIXED procedure in SAS 9.3 (SAS Institute Inc 2013) with Kenward-Roger degree of freedom approximation method considering replicates and accessions as fixed effects, whereas incomplete blocks within each replication were considered as random effects for combined intra and inter block analysis. Best linear unbiased estimates (BLUEs) were calculated for all accessions.

For GWAS analysis, a total of 3,117,056 SNPs retained after filtering the minor alleles (MAF<0.05) and 20% missing data were used. Marker-trait associations were established using AOV model with a bloc effect for maturity group in R (Phenotype∼Bloc+SNP). We tested the suitability of the model by plotting the observed P-values from the association test against an expected (cumulative) probability distribution. These quantile-quantile (q-q) plots clearly indicated that we corrected properly for population stratification (Supplementary Fig. 23). Significance of associations between loci and traits were determined adjusting for multiple testing by using FDR at a 0.001 threshold level and considering p value lower than 10⁻¹⁰.

Genomic prediction analysis for testcross performance.

Grain yield performance of 259 PMiGAP lines was used for hybrid prediction analysis. In our analysis, flowering time was considered as a cofactor. For genomic prediction analysis, we performed a one-stage phenotypic data analysis on 259 PMiGAP lines as test cross hybrid trials using a linear mixed model that included genotype, flowering time, year, stress, interaction among genotype, stress and year, replication, incomplete block and residual effects. The effect of flowering time was always assumed to be fixed. When estimating variance components, all other effects were assumed to be random. To get the BLUE of each line, we set the genotype effect as fixed.

The heritability on the line mean basis was estimated as where and are variance components arising from genotype, genotype × year interaction, genotype × stress interaction, the three-way interaction and the residual, respectively. y, s and r are the number of different years, stresses, and replications. In addition, we calculated the BLUE for each genotype in each environment (stress versus control) across years. That is, for each environment we fitted a linear mixed model including genotype, flowering time, year, genotype × year interaction, replication, incomplete block and residual effects. The assumptions of the parameters were similar to above. The heritability in this case was estimated as All phenotypic data analyses were done using the ASreml- R 3 software⁹³.

A total of 2,235,060 SNPs with <20% missing rates were used with above mentioned phenotyping data for genomic prediction analysis. We used the genomic best linear unbiased prediction (G-BLUP) model for genomic selection: , where y refers to n-dimensional vector of phenotypic records, 1_n is an n-dimensional vector of ones, is the mean, g is an n-dimensional vector of additive genotypic values and e is an n-dimensional vector of residual terms.

In the model we assume that is a fixed parameter, and g, e are random parameters with and , where G denotes the n × n genomic relationship matrix. G was calculated as follows: Let X = (x_ij) be the n × p matrix of SNP markers, where x_ij equals the number of a chosen allele at the j^th locus for the i^th genotype. Let p_j be the allele frequency of the j^th marker. W = (w_ij) is an n × p matrix with w_ij = x_ij − 2p_j.

Then we have Note that when calculating the kinship coefficient for two genotypes, only those markers without missing values in both genotypes were considered.

The accuracy of genomic prediction was evaluated by fivefold cross-validation with a total of 100 cross-validation runs. The cross-validated prediction accuracy was calculated as the Pearson product-moment correlation between predicted and observed genotypic values of the lines in the test set. The GBLUP model was implemented using the R software⁹⁴.

Hybrid prediction analysis.

Grain yield of 64 pearl millet hybrids grown at five locations in India (Jamnagar, Anand, SK Nagar, Mahuva, Kothara) during the time period 2004-2013 was measured. Trials were conducted during 2004, 2005, 2006, 2008, 2011 and 2012 in Kharif, Summer and pre rabi season. However, during 2007, 2009, 2010 and 2013 trials were conducted in only Kharif and Summer. We adopted randomized block design with a spacing of 60 cm between the rows and 10-15 cm between the plants and adopted standard agronomic practices. The 64 hybrids were generated by crossing 20 male and 23 female lines.

By using the grain yield phenotyping data for 64 hybrids as mentioned above, we used the following linear mixed model to estimate the variance components as well as BLUEs:

Yield ∼Genotype + Replication.

To estimate variance components, all effects were treated as random. The BLUEs for each environment were calculated by the same mixed model but modelling genotype as fixed effect. Repeatability was estimated as , where N_R refers number of replications, refers to genetic variance, and refers to residual variance. Four environments with repeatability lower than 0.5 were removed from further analysis. The BLUEs of the 64 hybrids of each environment were used for an analysis across environments by fitting following model:

Yield∼Genotype + Environment.

The genotype effects were treated as fixed and the environment effects as random. The distribution of the BLUEs across environments approximated a normal distribution. The variance components of genotypes , genotype x environment interactions and of the residuals were estimated using a one-step model. Broad-sense heritability was then calculated as the ratio of genotypic to phenotypic variance:

where l refers to the number of environments and r is the average number of replications per environment. The hybrid prediction was based on 302,110 high-quality SNP markers obtained from 580 B- and R- lines. We used ridge regression-BLUP considering additive and dominance effects to predict the hybrid performance. Details of the implementation of the models have been described earlier⁹⁵. Briefly, the general form of the model is defined as the following:

where 1_n is a vector of ones and n is the number of hybrids, μ refers to the overall mean across all four locations. Z_A and Z_D are n × m design matrices for the additive and dominance effects of the markers, where m refers to the number of markers. The elements of Z_A are -1, 0, 1, and elements of Z_D is 0, 1. While a = (a₁, a₂, ..., a_m)^T and d = (d₁, d₂, ..., d_m)^T are the vectors of length m, and a_i d_i denote the additive and dominance effects for the i^th marker, respectively. e = (e₁, e₂, ..., e_n)^T is a vector of length n and e_j is the residual for the j^th hybrid.

Prediction accuracy was studied using cross validations. In each cross validation, 48 hybrids were randomly selected as training set and the remaining 16 hybrids were used as test set. The cross validation was run 500 times and accuracy was estimated as the Pearson correlation coefficient between predicted and observed values standardized with the square root of the heritability (h = 0.76). Next, we used all 64 hybrids as a training set and predicted the hybrid performance of 167,910 possible single-cross combinations among the 580 inbred lines (260 B-lines and 320 R-lines). Based on the predicted values, we selected 0.1% hybrids that had the highest predicted yields (170/167,910 hybrids). Of those 170 hybrids, 11 have been bred so far and are thus a subset of the 64 phenotyped hybrids. The remaining 159 hybrids are based on parental inbred lines that have never been used for hybrid breeding and could be tested in the field. All analyses were done using the ASreml-R 3 software⁹³.

Data availability.

Genome sequence assembly and annotation data: BioProject ID PRJNA294988; BioSample ID SAMN04124419. Resequencing data: SRA SRP063925. Transcriptome data: BioProject ID PRJNA391885. BAM and SNP files are available at http://ceg.icrisat.org/ipmgsc. GigaScience Database record: http://dx.doi.org/10.5524/100192 Scripts used in the MS are available at https://github.com/ICRISAT-CEG/PM-Scripts.git

A Life Sciences Reporting Summary is available.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Accession codes

Primary accessions

BioProject

Sequence Read Archive

SRP063925

Change history

28 February 2018
In the version of this article initially published, in the HTML, the wrong Creative Commons Attribution license (cc-by-nc rather than cc-by) was inserted. The error has been corrected in the HTML version of the article.

References

National Research Council (NRC). Advancing the science of climate change (The National Academies Press, Washington, DC, 2010).
FAO. http://www.fao.org/fileadmin/templates/wsfs/docs/expert_paper/How_to_Feed_the_World_in_2050.pdf (2009).
Beddington, J. et al. Achieving food security in the face of climate change. Final report from the Commission on Sustainable Agriculture and Climate Change. Copenhagen, CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS). (available at http://www.ccafs.cgiar.org/commission) (2012).
FAO. World hunger falls, but 805 million still chronically undernourished. http://www.fao.org/news/story/en/item/243839/icode/ (2014).
Vadez, V., Hash, T., Bidinger, F.R. & Kholova, J. II 1.5 Phenotyping pearl millet for adaptation to drought. Front. Physiol. 3, 386 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nambiar, V.S., Dhaduk, J.J., Sareen, N., Shahu, T. & Desai, R. Potential functional implications of pearl millet (Pennisetum glaucum) in health and disease. J. Appl. Pharm. Sci. 01, 62–67 (2011).
Google Scholar
Tako, E., Reed, S.M., Budiman, J., Hart, J.J. & Glahn, R.P. Higher iron pearl millet (Pennisetum glaucum L.) provides more absorbable iron that is limited by increased polyphenolic content. Nutr. J. 14, 11 (2015).
Article PubMed PubMed Central CAS Google Scholar
Gupta, S.K. et al. Seed set variability under high temperatures during flowering period in pearl millet (Pennisetum glaucum L. (R.) Br.). Field Crops Res. 171, 41–53 (2015).
Article Google Scholar
Yadav, O.P. & Rai, K.N. Genetic improvement of pearl millet in India. Agric. Res. 2, 275–292 (2013).
Article CAS Google Scholar
Liu, C.J. et al. An RFLP-based genetic map of pearl millet (Pennisetum glaucum). Theor. Appl. Genet. 89, 481–487 (1994).
Article CAS PubMed Google Scholar
Bennetzen, J.L. et al. Reference genome sequence of the model plant Setaria. Nat. Biotechnol. 30, 555–561 (2012).
Article CAS PubMed Google Scholar
Liu, R. & Bennetzen, J.L. Enchilada redux: how complete is your genome sequence? New Phytol. 179, 249–250 (2008).
Article CAS PubMed Google Scholar
Al-Dous, E.K. et al. De novo genome sequencing and comparative genomics of the date palm (Phoenix dactylifera). Nat. Biotechnol. 29, 521–527 (2011).
Article CAS PubMed Google Scholar
Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Article CAS PubMed Google Scholar
Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).
Article CAS PubMed Google Scholar
Zeng, Y., Conner, J. & Ozias-Akins, P. Identification of ovule transcripts from the Apospory-Specific Genomic Region (ASGR)-carrier chromosome. BMC Genomics 12, 206 (2011).
Article CAS PubMed PubMed Central Google Scholar
Rajaram, V. et al. Pearl millet [Pennisetum glaucum (L.) R. Br.] consensus linkage map constructed using four RIL mapping populations and newly developed EST-SSRs. BMC Genomics 14, 159 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).
Article CAS PubMed PubMed Central Google Scholar
Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010).
D'Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213–217 (2012).
Article CAS PubMed Google Scholar
Mayer, K.F. et al. A physical, genetic and functional sequence assembly of the barley genome. Nature 491, 711–716 (2012).
Article CAS PubMed Google Scholar
Schnable, P.S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009).
Article CAS PubMed Google Scholar
Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).
Article CAS PubMed Google Scholar
Ling, H.Q. et al. Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 496, 87–90 (2013).
Article CAS PubMed Google Scholar
Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar
Seo, P.J. et al. The MYB96 transcription factor regulates cuticular wax biosynthesis under drought conditions in Arabidopsis. Plant Cell 23, 1138–1152 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhu, X. & Xiong, L. Putative megaenzyme DWA1 plays essential roles in drought resistance by regulating stress-induced wax deposition in rice. Proc. Natl. Acad. Sci. USA 110, 17790–17795 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hash, C.T. & Witcombe, J.R. Pearl millet molecular marker research. Internatl. Sorghum Millets Newslett. 42, 8–15 (2001).
Google Scholar
Sehgal, D. et al. Exploring potential of pearl millet germplasm association panel for association mapping of drought tolerance traits. PLoS One 10, e0122165 (2015).
Article PubMed PubMed Central CAS Google Scholar
Elshire, R.J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6, e19379 (2011).
Article CAS PubMed PubMed Central Google Scholar
Miller, M.R., Dunham, J.P., Amores, A., Cresko, W.A. & Johnson, E.A. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res. 17, 240–248 (2007).
Article CAS PubMed PubMed Central Google Scholar
Thiel, T., Michalek, W., Varshney, R.K. & Graner, A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 106, 411–422 (2003).
Article CAS PubMed Google Scholar
Oumar, I., Mariac, C., Pham, J.L. & Vigouroux, Y. Phylogeny and origin of pearl millet (Pennisetum glaucum [L.] R. Br) as revealed by microsatellite loci. Theor. Appl. Genet. 117, 489–497 (2008).
Article CAS PubMed Google Scholar
Manning, K., Pelling, R., Higham, T., Schwenniger, J.C. & Fuller, D.Q. 4500-Year old domesticated pearl millet (Pennisetum glaucum) from the Tilemsi Valley, Mali: new insights into an alternative cereal domestication pathway. J. Archaeol. Sci. 38, 312–322 (2011).
Article Google Scholar
Amblard, S. & Pernès, J. The identification of cultivated pearl millet (Pennisetum) amongst plant impressions on pottery from Oued Chebbi (Dhar Oualata, Mauritania). Afr. Archaeol. Rev. 7, 117–126 (1989).
Article Google Scholar
Klee, M., Zach, B. & Neumann, K. Four thousand years of plant exploitation in the Chad Basin of northeast Nigeria I: The archaeobotany of Kursakata. Veg. Hist. Archaeobot. 9, 223–237 (2000).
Article Google Scholar
Kahlheber, S., Bostoen, K. & Neumann, K. Early plant cultivation in the central African rain forest. First millennium BC pearl millet from south Cameroon. J. Afr. Archaeol. 7, 253–272 (2009).
Article Google Scholar
Fuller, D., Korisettar, R., Venkatasubbaiah, P.C. & Jones, M.K. Early plant domestications in southern India: some preliminary archaeobotanical results. Veg. Hist. Archaeobot. 13, 115–129 (2004).
Article Google Scholar
Poncet, V. et al. Genetic control of domestication traits in pearl millet (Pennisetum glaucum L., Poaceae). Theor. Appl. Genet. 100, 147–159 (2000).
Article CAS Google Scholar
Poncet, V. et al. Comparative analysis of QTLs affecting domestication traits between two domesticated x wild pearl millet (Pennisetum glaucum L., Poaceae) crosses. Theor. Appl. Genet. 104, 965–975 (2002).
Article CAS PubMed Google Scholar
McSteen, P. et al. barren inflorescence2 Encodes a co-ortholog of the PINOID serine/threonine kinase and is required for organogenesis during inflorescence and vegetative development in maize. Plant Physiol. 144, 1000–1011 (2007).
Article CAS PubMed PubMed Central Google Scholar
Pressoir, G. et al. Natural variation in maize architecture is mediated by allelic differences at the PINOID co-ortholog barren inflorescence2. Plant J. 58, 618–628 (2009).
Article CAS PubMed Google Scholar
Chia, J.M. et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 44, 803–807 (2012).
Article CAS PubMed Google Scholar
Riedelsheimer, C. et al. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat. Genet. 44, 217–220 (2012).
Article CAS PubMed Google Scholar
Longin, C.F., Mi, X. & Würschum, T. Genomic selection in wheat: optimum allocation of test resources and comparison of breeding strategies for line and hybrid breeding. Theor. Appl. Genet. 128, 1297–1306 (2015).
Article PubMed Google Scholar
Zhao, Y. et al. Genome-based establishment of a high-yielding heterotic pattern for hybrid wheat breeding. Proc. Natl. Acad. Sci. USA 112, 15624–15629 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jiao, W.B. et al. Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. 27, 778–786 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812–815 (2012).
Article CAS PubMed Google Scholar
Wu, J. et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res. 23, 396–408 (2013).
Article CAS PubMed PubMed Central Google Scholar
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Preprint at https://arxiv.org/abs/1308.2012 (2013).
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
Article PubMed PubMed Central Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
You, M. et al. A heterozygous moth genome provides insights into herbivory and detoxification. Nat. Genet. 45, 220–225 (2013).
Article CAS PubMed Google Scholar
Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
CAS PubMed PubMed Central Google Scholar
Glaubitz, J.C. et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One 9, e90346 (2014).
Article PubMed PubMed Central CAS Google Scholar
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).
Article CAS PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Varshney, R.K. et al. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat. Biotechnol. 31, 240–246 (2013).
Article CAS PubMed Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Smit, A.F.A., Hubley, R. & Green, P. RepeatMasker Open-3.0 1996–2010 http://www.repeatmasker.org (1996).
Smit, A.F.A. & Hubley, R. RepeatModeler Open-1.0 2008–2015 http://www.repeatmasker.org (2008).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Article CAS PubMed PubMed Central Google Scholar
Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516–522 (2000).
Article CAS PubMed PubMed Central Google Scholar
Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).
Article PubMed PubMed Central CAS Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T.M. & Eddy, S.R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).
Article CAS PubMed Google Scholar
Nawrocki, E.P., Kolbe, D.L. & Eddy, S.R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Article CAS PubMed PubMed Central Google Scholar
Schulz, M.H., Zerbino, D.R., Vingron, M. & Birney, E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28, 1086–1092 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central CAS Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877 (1999).
Article CAS PubMed PubMed Central Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
De Bie, T., Cristianini, N., Demuth, J.P. & Hahn, M.W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).
Article CAS PubMed Google Scholar
Xu, X. et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat. Biotechnol. 30, 105–111 (2011).
Article CAS PubMed Google Scholar
Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Article PubMed PubMed Central CAS Google Scholar
Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739 (2011).
Article CAS PubMed PubMed Central Google Scholar
Frichot, E., Mathieu, F., Trouillon, T., Bouchard, G. & François, O. Fast and efficient estimation of individual ancestry coefficients. Genetics 196, 973–983 (2014).
Article PubMed PubMed Central Google Scholar
Caye, K., Deist, T.M., Martins, H., Michel, O. & François, O. TESS3: fast inference of spatial population structure and genome scans for selection. Mol. Ecol. Resour. 16, 540–548 (2016).
Article CAS PubMed Google Scholar
Eddy, S.R. Profile hidden Markov models. Bioinformatics 14, 755–763 (1998).
Article CAS PubMed Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).
Article CAS PubMed PubMed Central Google Scholar
Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
Article CAS PubMed Google Scholar
Butler, D.G., Cullis, B.R., Gilmour, A.R. & Gogel, B.J. ASReml-R Reference Manual. Technical report, Queensland Department of Primary Industries. http://www.vsni.co.uk/software/asreml/ (2009).
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/ (2014).
Zhao, Y., Zeng, J., Fernando, R. & Reif, J.C. Genomic prediction of hybrid wheat performance. Crop Sci. 53, 802–810 (2013).
Article Google Scholar

Download references

Acknowledgements

We are thankful to several colleagues and collaborators especially X. Tan from The University of Georgia and C.T. Satyavathi from ICAR-All India Coordinated Research Project on Pearl Millet, for their help in analysis and interpretation of some data. This study was supported in part by the Bill and Melinda Gates Foundation, USA (Grant ID# OPP1052922), Agence Nationale de la Recherche, France (Grant ID: ANR-13-BSV7-0017), and Basic Research Program from the Shenzhen Municipal Government, China (NO.JCYJ20150529150505656). This work has been undertaken as part of the CGIAR Research Program on Dryland Cereals, ICRISAT, India. ICRISAT is a member of the CGIAR.

Author information

Rajeev K Varshney and Chengcheng Shi: These authors contributed equally to this work.

Authors and Affiliations

International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana State, India
Rajeev K Varshney, Mahendar Thudi, Abhishek Rathore, Rakesh K Srivastava, Annapurna Chitikineni, Prasad Bajaj, S K Gupta, Mohan A V S K Katta, Vanika Garg, Dadakhalandar Doddamani, Hari D Upadhyaya & Stefania Grando
BGI-Shenzhen, Shenzhen, China
Chengcheng Shi, He Zhang, Guangyi Fan, Wenbin Chen, Xinming Liang, Shifeng Cheng, Jun Wang, Xin Liu & Xun Xu
Institut de recherche pour le développement (IRD), Montpellier, France
Cedric Mariac, Marie Couderc, Cécile Berthouly-Salazar, Jérémy Clotault, Philippe Cubry, Bénédicte Rhoné & Yves Vigouroux
University of Georgia, Athens, Georgia, USA
Jason Wallace, Peng Qi, Xiyin Wang, Katrien M Devos, Jeffrey L Bennetzen & Andrew H Paterson
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany
Yusheng Zhao, Yong Jiang & Jochen C Reif
Fort Valley State University, Fort Valley, Georgia, USA
Somashekhar Punnuri & Bharat Singh
Cornell University, Ithaca, New York, USA
Hao Wang & Edward Buckler
University of Florida, Gainesville, Florida, USA
Dev R Paudel & Jianping Wang
Junagadh Agricultural University, Jamnagar, Gujarat, India
K D Mungra
United States Department of Agriculture—Agricultural Research Service (USDA-ARS), Tifton, Georgia, USA
Karen R Harris-Shultz
Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria
Neetin Desai, Arindam Ghatak, Palak Chaturvedi & Wolfram Weckwerth
Amity University, Mumbai, Maharashtra, India
Neetin Desai
Institut Sénégalais de Recherches Agricoles (ISRA), Dakar, Senegal
Ndjido Ardo Kane & Mame Codou Gueye
University of Georgia, Tifton, Georgia, USA
Joann A Conner & Peggy Ozias-Akins
School of Bioinformatics and Biotechnology, D.Y. Patil University, Mumbai, Maharashtra, India
Arindam Ghatak
University of Arizona, Tucson, Arizona, USA
Sabarinath Subramaniam & Eric Lyons
Phoenix Bioinformatics, Redwood City, California, USA
Sabarinath Subramaniam
Indian Council of Agricultural Research (ICAR)—Central Arid Zone Research Institute (CAZRI), Jodhpur, Rajasthan, India
Om Parkash Yadav
Laboratoire Mixte International Adaptation des Plantes et Microorganismes Associés aux Stress Environnementaux, Centre de Recherche de Bel Air, Dakar, Senegal
Cécile Berthouly-Salazar
ICRISAT Sahelian Center, Niamey, Niger
Falalou Hamidou & C Tom Hash
Faculty of Sciences and Techniques, University Abdou Moumouni, Niamey, Niger
Falalou Hamidou
University of Montpellier, Montpellier, France
Jérémy Clotault & Yves Vigouroux
Laboratoire de biométrie et Biologie Evolutive, Université Lyon 1, Villeurbanne, France
Bénédicte Rhoné
Oklahoma State University, Stillwater, Oklahoma, USA
Ramanjulu Sunkar
Institut des Mondes Africains (IMAf), Paris, France
Christian Dupuy
CNR-Consiglio Nazionale delle Ricerche, Istituto di Biologia e Biotecnologia Agraria, Milan, Italy
Francesca Sparvoli
Pioneer Hi-Bred Private Limited, Hyderabad, Telangana State, India
R S Mahala
Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Ceredigion, UK
Rattan S Yadav
Visva-Bharati, Santiniketan, West Bengal, India
Swapan K Datta
USDA-ARS, Ithaca, New York, USA
Edward Buckler
Indian Council of Agricultural Research (ICAR), New Delhi, India
Trilochan Mohapatra
Vienna Metabolomics Center (VIME), University of Vienna, Vienna, Austria
Wolfram Weckwerth
BGI-Qingdao, Qingdao, China
Xin Liu & Xun Xu
China National GeneBank (CNGB), Shenzen, China
Xun Xu

Authors

Rajeev K Varshney
View author publications
You can also search for this author in PubMed Google Scholar
Chengcheng Shi
View author publications
You can also search for this author in PubMed Google Scholar
Mahendar Thudi
View author publications
You can also search for this author in PubMed Google Scholar
Cedric Mariac
View author publications
You can also search for this author in PubMed Google Scholar
Jason Wallace
View author publications
You can also search for this author in PubMed Google Scholar
Peng Qi
View author publications
You can also search for this author in PubMed Google Scholar
He Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yusheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiyin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Rathore
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh K Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Annapurna Chitikineni
View author publications
You can also search for this author in PubMed Google Scholar
Guangyi Fan
View author publications
You can also search for this author in PubMed Google Scholar
Prasad Bajaj
View author publications
You can also search for this author in PubMed Google Scholar
Somashekhar Punnuri
View author publications
You can also search for this author in PubMed Google Scholar
S K Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Marie Couderc
View author publications
You can also search for this author in PubMed Google Scholar
Mohan A V S K Katta
View author publications
You can also search for this author in PubMed Google Scholar
Dev R Paudel
View author publications
You can also search for this author in PubMed Google Scholar
K D Mungra
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Karen R Harris-Shultz
View author publications
You can also search for this author in PubMed Google Scholar
Vanika Garg
View author publications
You can also search for this author in PubMed Google Scholar
Neetin Desai
View author publications
You can also search for this author in PubMed Google Scholar
Dadakhalandar Doddamani
View author publications
You can also search for this author in PubMed Google Scholar
Ndjido Ardo Kane
View author publications
You can also search for this author in PubMed Google Scholar
Joann A Conner
View author publications
You can also search for this author in PubMed Google Scholar
Arindam Ghatak
View author publications
You can also search for this author in PubMed Google Scholar
Palak Chaturvedi
View author publications
You can also search for this author in PubMed Google Scholar
Sabarinath Subramaniam
View author publications
You can also search for this author in PubMed Google Scholar
Om Parkash Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Cécile Berthouly-Salazar
View author publications
You can also search for this author in PubMed Google Scholar
Falalou Hamidou
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xinming Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jérémy Clotault
View author publications
You can also search for this author in PubMed Google Scholar
Hari D Upadhyaya
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Cubry
View author publications
You can also search for this author in PubMed Google Scholar
Bénédicte Rhoné
View author publications
You can also search for this author in PubMed Google Scholar
Mame Codou Gueye
View author publications
You can also search for this author in PubMed Google Scholar
Ramanjulu Sunkar
View author publications
You can also search for this author in PubMed Google Scholar
Christian Dupuy
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Sparvoli
View author publications
You can also search for this author in PubMed Google Scholar
Shifeng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
R S Mahala
View author publications
You can also search for this author in PubMed Google Scholar
Bharat Singh
View author publications
You can also search for this author in PubMed Google Scholar
Rattan S Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Eric Lyons
View author publications
You can also search for this author in PubMed Google Scholar
Swapan K Datta
View author publications
You can also search for this author in PubMed Google Scholar
C Tom Hash
View author publications
You can also search for this author in PubMed Google Scholar
Katrien M Devos
View author publications
You can also search for this author in PubMed Google Scholar
Edward Buckler
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey L Bennetzen
View author publications
You can also search for this author in PubMed Google Scholar
Andrew H Paterson
View author publications
You can also search for this author in PubMed Google Scholar
Peggy Ozias-Akins
View author publications
You can also search for this author in PubMed Google Scholar
Stefania Grando
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Trilochan Mohapatra
View author publications
You can also search for this author in PubMed Google Scholar
Wolfram Weckwerth
View author publications
You can also search for this author in PubMed Google Scholar
Jochen C Reif
View author publications
You can also search for this author in PubMed Google Scholar
Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yves Vigouroux
View author publications
You can also search for this author in PubMed Google Scholar
Xun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.K.V. conceived and designed the experiments; R.K.V., X.L., M.T. and Y.V. jointly supervised research; C.S., M.T., C.M., Ji.W., H.Z., A.G., P.C., Y.Z., X.W., R.K.S., G.F., Y.J., M.C., D.R.P., W.C., K.R.H.-S., N.D., C.B.-S., Xm.L., J.C., C.D. and S.C. performed the experiments; A.R., J.C.R., Y.V. and C.S. performed statistical analysis; M.T., Ji.W., A.R., P.B., A.G., Ph.C., V.G., D.D., M.A.V.S.K.K., H.W., J.A.C., P.Q., K.M.D., P.C. and B.R. analyzed and interpreted the data; R.K.S., S.K.G., W.W., Y.V., X.X., A.C., M.T., S.P., B.S., W.J., Ju.W., E.L., K.D.M., S.S., H.D.U., M.C.G., R.S., C.T.H., A.H.P., K.M.D., E.B., J.L.B., P.O.-A., F.H., M.C.G., S.G., R.S.M., R.S.Y., F.S., N.A.K., O.P.Y., S.K.D. and T.M. contributed to reagents/materials/analysis tools; R.K.V., M.T., X.L., Y.V., W.W., Ju.W., Ja.W., K.M.D. and J.L.B. wrote the manuscript.

Corresponding authors

Correspondence to Rajeev K Varshney, Xin Liu, Yves Vigouroux or Xun Xu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–23. (PDF 4617 kb)

Life Sciences Reporting Summary

Life Sciences Reporting Summary. (PDF 114 kb)

Supplementary Tables

Supplementary Tables 1–14, 16–18, 22, 24, 28, 32–33, 42. (PDF 1754 kb)

Supplementary Table 15

Summary of genes expanded during pearl millet evolution. (XLSX 12 kb)

Supplementary Table 19

Summary of data generated on the PMiGAP lines using whole genome sequencing. (XLSX 30 kb)

Supplementary Table 20

Data generated on 38 inbred parents of different mapping populations using whole genome resequencing. (XLSX 11 kb)

Supplementary Table 21

Data generated for B- and R-lines of pearl millet using RAD-Seq approach. (XLSX 41 kb)

Supplementary Table 23

Summary of SSR motifs identified, primers designed and their genome coordinates. (XLSX 8049 kb)

Supplementary Table 25

Distribution of SNPs in intra-genic and inter-genic regions across PMiGAP lines. (XLSX 46 kb)

Supplementary Table 26

Distribution of SNPs in intra-genic and inter-genic regions across parental lines of mapping populations. (XLSX 15 kb)

Supplementary Table 27

Distribution of SNPs in intra-genic and inter-genic regions across B- and R- lines. (XLSX 69 kb)

Supplementary Table 29

Insertions and deletions identified in the PMiGAP lines. (XLSX 10 kb)

Supplementary Table 30

Insertions and deletions identified in the parental lines of mapping populations. (XLSX 10 kb)

Supplementary Table 31

Insertions and deletions identified in B- and R- lines. (XLSX 10 kb)

Supplementary Table 34

Regions with loss of diversity and strong differentiation between wild and cultivated pearl millet. (XLSX 101 kb)

Supplementary Table 35

List of the genes found in the regions showing strong differentiation between wild and cultivated germplasm and diversity loss in cultigen. (XLSX 15 kb)

Supplementary Table 36

Genome-wide marker-trait associations for grain and stover yield. (XLSX 109 kb)

Supplementary Table 37

Best 170 predicted hybrid combinations. (XLSX 15 kb)

Supplementary Table 38

Best 11 tested hybrid combinations. (XLSX 10 kb)

Supplementary Table 39

Pedigree details of B- and R- used in the study. (XLSX 31 kb)

Supplementary Table 40

Details of 345 Pearl Millet Inbred Germplasm Association Panel (PMiGAP) lines used in the study. (XLSX 35 kb)

Supplementary Table 41

Details of 38 parental lines of mapping populations of pearl millet used in the study. (XLSX 12 kb)

Supplementary Code 1 (ZIP 1 kb)

Supplementary Code 2 (ZIP 56 kb)

Supplementary Code 3 (ZIP 26 kb)

Supplementary Dataset 1 (XLSX 132 kb)

Supplementary Dataset 2 (XLSX 347 kb)

Rights and permissions

This work is licensed under a creative commons Attribution 4.0 international licence. The images or other third party material in this article are included in the article's creative commons licence, Unless indicated otherwise in the credit line; if the material is not included under the creative commons licence, users will need to obtain permission from the licence holder to reproduce the material. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Varshney, R., Shi, C., Thudi, M. et al. Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments. Nat Biotechnol 35, 969–976 (2017). https://doi.org/10.1038/nbt.3943

Download citation

Received: 18 April 2017
Accepted: 17 July 2017
Published: 18 September 2017
Issue Date: October 2017
DOI: https://doi.org/10.1038/nbt.3943

This article is cited by

Management of the Striga epidemics in pearl millet production: a review
- Armel Rouamba
- Hussein Shimelis
- Emmanuel Mrema
CABI Agriculture and Bioscience (2024)
Unraveling the interplay between root exudates, microbiota, and rhizosheath formation in pearl millet
- Abdelrahman Alahmad
- Mourad Harir
- Wafa Achouak
Microbiome (2024)
Meta-QTL analysis reveals the important genomics regions for biotic stresses, nutritional quality and yield related traits in pearl millet
- Shreshth Gupta
- Sagar Krushnaji Rangari
- Mahendar Thudi
CABI Agriculture and Bioscience (2024)
Major transcription factor families at the nexus of regulating abiotic stress response in millets: a comprehensive review
- Ankita Prusty
- Anurag Panchal
- Manoj Prasad
Planta (2024)
Gene editing tool kit in millets: present status and future directions
- Vidhi Sapara
- Mitesh Khisti
- Palakolanu Sudhakar Reddy
The Nucleus (2024)