Pangenomic analysis identifies structural variation associated with heat tolerance in pearl millet

Pearl millet is an important cereal crop worldwide and shows superior heat tolerance. Here, we developed a graph-based pan-genome by assembling ten chromosomal genomes with one existing assembly adapted to different climates worldwide and captured 424,085 genomic structural variations (SVs). Comparative genomics and transcriptomics analyses revealed the expansion of the RWP-RK transcription factor family and the involvement of endoplasmic reticulum (ER)-related genes in heat tolerance. The overexpression of one RWP-RK gene led to enhanced plant heat tolerance and transactivated ER-related genes quickly, supporting the important roles of RWP-RK transcription factors and ER system in heat tolerance. Furthermore, we found that some SVs affected the gene expression associated with heat tolerance and SVs surrounding ER-related genes shaped adaptation to heat tolerance during domestication in the population. Our study provides a comprehensive genomic resource revealing insights into heat tolerance and laying a foundation for generating more robust crops under the changing climate.

Article https://doi.org/10.1038/s41588-023-01302-4 Additionally, more than 91.60% of the embryophyte Benchmarking Universal Single-Copy Orthologs (BUSCOs) were present in each genome (Supplementary Table 2). The long terminal repeat (LTR) assembly index (LAI) scores all exceeded 24 and thus met the criterion standard 21 (Table 1). Further evaluation using Merqury showed a quality value (QV) over 40 for our ten assemblies, which exceeded the Vertebrate Genomes Project standard of QV40 22 (Supplementary Table 2). These results demonstrate the accuracy, completeness and contiguity of the ten pearl millet genome assemblies. In addition, we predicted an average of 36,847 gene models for each assembly, among which more than 99.30% showed matches with the known functional database (Supplementary Table 2). Transposable elements (TEs) constituted 71.58% of each genome, ranging from 70.44% to 72.62% (Supplementary  Tables 2-4 and Supplementary Note 1).
We constructed a pan-genome using 11 pearl millet assemblies, including the previously released genome 7 . Among the total gene family sets, 14,608 core gene families were obtained across all accessions, accounting for more than half (46.60-52.08%) of the total sets; dispensable families (39.75-49.94%), in which genes were present in 2-10 accessions, constituted the second-largest proportion. The smallest proportion consisted of private gene sets, which were only detected in one genome and accounted for 0.73-8.73% of the total sets (Fig. 1c).
To further evaluate the representativeness of the pan-genome, we compared the distribution of SNPs between the 11 accessions and the aforementioned 394 core lines. They displayed a similar pattern across the genome and showed strong significant correlations in SNP density, nucleotide diversity (π) and synonymous (d S ) and nonsynonymous (d N ) substitution rates (SNP density, rho = 0.95; π, rho = 0.89; d S , rho = 0.98; d N , rho = 0.98) (Extended Data Fig. 3a,b). The number of added gene families declined quickly, with only 301 (0.64% of all gene families; 301 out of 47,344) additional gene families being identified when the eleventh accession was included (Extended Data Fig. 3c,d). Moreover, the accessions used to generate the pan-genome showed a similar Shannon's diversity index (H) and π to the 394 accessions (H: 8.07782 versus 8.03436; π: 0.0001327 versus 0.0001209). In general, these results suggest that the pan-genome accessions are genetically diverse and representative of the diversities of the pearl millet population. We further observed that core genes were more functionally conserved and enriched in general biological processes than the dispensable and private genes, as with previous findings in other plants 17,23,24 (Fig. 1d, Extended Data Fig. 4 and Supplementary Note 2). In total, we built a high-quality pan-genome resource that will contribute to pearl millet improvement.

Graph-based genome and SV identification
A total of 744,364 SVs were identified by realigning the assemblies against the PI537069 reference genome as this accession comes from the geographical origin (Northwest Africa) of pearl millet 25 and has a relatively high assembly quality (Table 1 and Supplementary Table 2 Table 5). Approximately 37.94% of PAVs were less than 2 kb in length, INVs (68.11%) were concentrated within 100 kb, CNVs (62.53%) were enriched in the size range of less than 4 kb and most TRANS (91.10%) were less than 20 kb in length (Extended Data Fig. 5a).
To build the graph-based genome, the SVs from all the pearl millet accessions were merged to yield 424,085 non-redundant SVs. PAVs accounted for 74.70% of private SVs present in only one accession but constituted a relatively high proportion (87.51%) of the non-private SVs. Similar trends were observed for CNVs and TRANS (Fig. 2b). We observed that the SVs were enriched in repeat regions (Fig. 2c). Across these genomes, 37-44% of SVs overlapped with genic and flanking regions (5 kb) (Fig. 2d), suggesting potential roles of SVs in gene regulation. In addition, the SVs and graph-based genome were validated by urgently required to develop crops that can withstand rising global temperatures and could thus be used to maximize agricultural production to help satisfy the food demands of an increasing population.
Pearl millet (Pennisetum glaucum (L.) R. Br., syn. Cenchrus americanus (L.) Morrone) (2n = 2x = 14) is a C 4 cereal crop that is important in safeguarding the security of food and forage in the arid and semiarid tropics due to its superior tolerance to high temperatures [4][5][6][7][8] . It is also a staple food of more than 90 million farmers living in poverty and is grown on more than 31.2 million hectares 9 . Pearl millet is an ideal model for understanding how plants use heat-related genes and mechanisms to thrive at warmer temperatures. However, few studies have investigated the molecular mechanisms underlying the regulation of heat stress responses (HSRs) in pearl millet relative to other major crops 10,11 and the underlying mechanisms are not well understood.
Recent studies revealed that many genes involved in environmental stress responses are strongly affected by structural variations (SVs) [12][13][14] ; however, the causal relationship of SVs with HSRs is poorly understood. SVs have roles in gene expression alterations linked to important plant phenotypes 15 . However, the detection of SVs is challenging when relying on short-read sequencing data 16,17 . This challenge has promoted the development of new approaches for SV detection using graph-based pan-genomes that are based on multiple high-quality assemblies [17][18][19] . Therefore, building graph-based pan-genomic resources has the potential to advance the characterization and understanding of the biological impact of SVs on phenotypic variations and accelerate the breeding of pearl millet.
In this study, we generated de novo genome assemblies of ten pearl millet accessions and constructed a graph-based pan-genome assembly to identify genomic SVs. We leveraged SVs, transcriptomics and in vivo validation to reveal the relationship between SVs and gene expression under heat stress conditions. With this approach, we identified SVs that contributed to heat adaptation during crop domestication. By integrating multi-omics analyses, we suggested a possible mechanism in which the resistance of pearl millet to heat stress depends mainly on the endoplasmic reticulum (ER) and validated an RWP-RK (https:// www.ebi.ac.uk/interpro/entry/pfam/PF02042/) transcription factor as a positive coregulator of heat tolerance along with the ER pathway. Our findings advance the conceptual understanding of heat tolerance in pearl millet, promise to expedite genomics-assisted breeding for heat tolerance in this important crop and will benefit comparative and functional genomics studies of other crops.

Genome assembly and pan-genomic analysis of representative pearl millet accessions
We selected ten representative accessions from eight major geographical regions based on the phylogenetic relationships of a 394-line core collection of pearl millet 7,20 ( Fig. 1a, Table 1). We assembled their chromosome-level genomes by integrating PacBio high-fidelity (HiFi) long-read sequences, Bionano optical mapping data, high-throughput chromosome conformation capture (Hi-C) data and Illumina paired-end sequences ( Fig. 1a, Fig. 2). The contig N50 values were substantially increased from 155 to 3,959-fold over those of the previously published pearl millet reference genome 7 (Table 1 and Supplementary Table 2).
To measure the quality of these ten newly assembled genomes, we realigned high-quality paired-end reads against the assemblies and observed alignment rates ranging from 95 Tables 6-8 and Supplementary  Note 3). Overall, this graph-based pan-genome is an essential genomic resource supporting the study of SVs and will provide a prominent reference for the discovery of SVs in pearl millet populations.
Article https://doi.org/10.1038/s41588-023-01302-4 The relative water content, relative electrical conductivity (REC) and malondialdehyde (MDA) content did not change significantly (P > 0.05) until 21 d of heat treatment (Extended Data Fig. 6b,c), while in maize leaves, the relative water content decreased and the MDA content increased significantly under 4 h of heat stress (40 °C) 26 . The slower responses might indicate better heat tolerance in pearl millet than in maize.
To dissect the molecular mechanism underlying heat tolerance in pearl millet, we first conducted comparative genomic analyses, which revealed that expanded, positively selected and species-specific gene families, as well as genes located near recently expanded LTR TEs (LTRs) were enriched in stress-related pathways in pearl millet (Extended Data Figs. 2g and 7a,b and Supplementary Note 4). Notably, one transcription factor family (RWP-RK) was identified as expanding           Table 9). This family responded to biotic or abiotic stresses [27][28][29][30] , supporting the potential roles of its members in heat tolerance. We investigated LTRs located near the RWP-RK genes and found that early LTR expansion might be associated with RWP-RK transcription factor family expansion and probably caused increases in specific RWP-RK genes in pearl millet ( To further characterize the roles of RWP-RK genes in response to heat stress, we sequenced leaf and root transcriptomes after high-temperature treatment (Supplementary Table 3). A total of ten differentially expressed RWP-RK genes were predicted, including two specific and eight nonspecific transcription factors (Extended Data Fig. 7d, Supplementary Table 10 and Supplementary Note 5). When overexpressing an RWP-RK (PMF0G00024.1) in rice, we found that the leaves of the transgenic lines (RWP-RKox) were less withered than the leaves of wild-type (WT) plants under high temperature ( Fig. 3e and Extended Data Fig. 7e). The RWP-RKox plants showed significantly higher peroxidase (POD) and superoxide dismutase (SOD) activities and lower MDA contents after exposure to heat stress conditions than the WT plants ( Fig. 3f), which provides a potential avenue for the future molecular breeding of heat-tolerant crops. We also characterized this RWP-RK transcription factor in a coregulated network and used a dual luciferase assays to verify that this transcription factor could transactivate two stress-related genes, PMA2G00541.1 and PMA6G02031.1 (Fig. 3g, Supplementary Table 11 and Supplementary Note 5). Taken together, these results indicate that the expansion of the RWP-RK transcription factor family has potentially contributed to heat tolerance in pearl millet.

RWP-RK coregulates a fast heat response with ER-related genes
To further dissect the molecular mechanism underlying heat tolerance in pearl millet, we sequenced the leaf and root transcriptomes of Tifleaf3 under high-temperature treatments at eight time points (dataset A) and selected six accessions to perform leaf transcriptome sequencing under stress for 1 and 24 h (dataset B; Supplementary Table 3). Based on gene functional enrichment analyses, the two transcriptome datasets revealed differentially expressed genes (DEGs) that were enriched mainly in ER-related pathways involved in the repair and elimination of misfolded proteins (Fig. 4a, Extended Data Fig. 8a,b, Supplementary Table 12 and Supplementary Note 6.1). We analyzed the RNA sequencing (RNA-seq) data from maize 31 and rice 32 and identified greater proportions of upregulated ER-related and heat shock factor (HSF) (https://www.ebi.ac.uk/interpro/entry/pfam/PF00447/) genes in pearl millet than in these two crops under heat treatment (1 h and 24 h; Fig. 4b).
In addition, the aforementioned ten RWP-RK genes exhibited significant correlations (Pearson's rho ≥ 0.6, P < 0.05) with most ER-related genes (60.2%; 325 out of 540) and HSF genes (50%; 16 out of 32) in response to heat stress (Supplementary Table 12), suggesting that RWP-RK genes might coregulate the heat tolerance of pearl millet with some ER-related genes and HSF genes. We further predicted potential RWP-RK binding sites upstream of these genes and found that higher proportions of ER-related genes had binding sites in pearl millet than in maize and rice (Extended Data Fig. 8c). The transient coexpression of the aforementioned RWP-RK (PMF0G00024.1) and two ER-related genes, encoding an immunoglobulin protein (BiP) (https://www.kegg.jp/ entry/K09490; PMA2G00107.1) and the oligosaccharyltransferase complex (OST) (https://www.kegg.jp/entry/K12669; PMA4G03758.1), further confirming that RWP-RK functions at least partially by transactivating ER-related genes (Fig. 4c). Collectively, these results indicate that pearl millet may quickly respond to heat stress at the gene transcription level via the coregulation of RWP-RK genes with HSF genes and ER-related genes to eliminate proteins with temperature-induced misfolding (Fig. 4d).

Several focal SVs are associated with heat-related gene expression
Previous reports revealed that SVs could affect the transcription of nearby genes 16,17,33 ; our data showed that nearly half of SVs were near genes (Fig. 2d). Therefore, we investigated the influence of SVs on the expression of nearby genes that responded to heat stress. The results showed that SVs were enriched in nearby genes showing changes in gene expression in all accessions and that genes located near SVs are probably more responsive to heat stress ( Fig. 5a To identify potential SVs related to transcriptional changes of particular heat-related genes, we distinguished four HR (Tifleaf3, PI583800, PI526529 and PI587025) and two HS (PI521612 and PI537069) accessions based on the distinct phenotypes and physiological indicators of these accessions when grown under heat treatment (Fig. 5f, Extended Data Fig. 9i,j and Supplementary Note 6.2). Considering that different breeds in the same group may use different genes to respond to heat stress, we focused on 2,354 SVs present in only three or all four HR accessions and nearby 2,769 genes. We designed an analysis pipeline to screen out 44 candidate SVs potentially related to the expression changes of 34 heat-related genes (Extended Data Fig.  9k, Supplementary Table 13 and Supplementary Note 7). Almost all these genes (33 out of 34) were responsive to heat stress based on our RNA-seq data and 11 genes (32.35%) were included in ER-related gene pathways (Supplementary Table 14), suggesting potential contributions of the neighboring SVs to the HSR. Notably, we found four fixed SVs between the HR and HS groups in the vicinity of PMA1G04478.1 and PMA7G02533.1 encoding two HSP70 proteins (https://www.kegg.jp/ entry/K03283) and PMA5G02838.1 encoding one heat shock chaperonin-binding protein, which were associated with differences in gene expression in the HR group than those in the HS group ( Fig. 5g and Extended Data Fig. 9l). Interestingly, PMA1G04478.1 and PMA5G02838.1 in the ER-related pathway were also identified and the main response of pearl millet to heat stress was found in this pathway ( Fig. 5g and Supplementary Tables 13 and 14). In general, the transcription levels of these three genes, which have essential roles in the HSR, were probably affected by their nearby SVs, further demonstrating that these SVs might have important roles in the heat tolerance of pearl millet.

Contributions of SVs to heat adaptation and domestication
To characterize the SVs underlying heat tolerance during adaptation in a pearl millet population (SRP063925) 7 , we genotyped SVs by mapping all of the resequences to our graph-based pan-genome and identified a total of 124,532 SVs. We focused on the SVs with population frequency differences (fdSVs) between accessions from tropical and temperate zones by applying a sliding window methodology 34 (Supplementary Note 7). In total, 1,471 genes were annotated against 269 selection sweep regions harboring 4,411 fdSVs (Fig. 6a). Interestingly, we found that 27 of these genes were significantly (P = 0.038; chi-squared test) and functionally annotated as belonging to ER-related pathways (that is, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway ko04141) (Supplementary Tables 15 and 16). From the 591 genes whose expression was previously shown to be associated with SVs (Supplementary Table 13), we identified 25 genes near 27 fdSVs that were present only in the HR group; their expression levels were significantly correlated with the presence of fdSVs (Supplementary Table 17). Notably, one of the fdSVs was positioned close to (360 bp) and upstream of PMA2G02653.1, a gene encoding a protein in the zinc finger family that has a role in the ER system [35][36][37] . This gene was enriched in Gene Ontology (GO) terms associated with the response to temperature Article https://doi.org/10.1038/s41588-023-01302-4 stress (GO: 0050826) and was also responsive to heat stress (Extended Data Fig. 10a). We further identified this fdSV as present in accessions that were preferentially located in higher-latitude regions ( Fig. 6a and Supplementary Note 7). In general, these results revealed the contributions of SVs possibly associated with the ER system to heat stress adaptation.   Table 3). b, Comparison of the proportions of upregulated ERrelated and HSF genes after 1 h and 24 h of heat treatment in pearl millet, maize and rice. Heat-resistant (HR) and heat-susceptible (HS) rice samples, respectively. c, Dual luciferase assays were applied to verify that PMF0G00024.1 (RWP) could transactivate the PMA2G00107.1 (107) and PMA4G03758.1 (3758) genes. The error bars represent the mean ± s.d.; n = 3 biological replicates. Significant differences were tested using a two-tailed t-test and are shown as P values. d, Proposed activation network of pearl millet in response to growth under heat stress. After 1 h of high-temperature stress in six pearl millet accessions, many misfolded proteins activated the expression of degradation-related genes in the ER, such as genes encoding recognition proteins, including calnexin (CNX) and calreticulin (CRT), and degradation-related proteins, including heat shock proteins (HSPs), thereby correcting or degrading misfolded proteins to maintain protein homeostasis in cells. In addition, the HSF and RWP-RK genes potentially participate in this process to coregulate HSP genes. Article https://doi.org/10.1038/s41588-023-01302-4 To characterize the domestication of pearl millet with a shift toward higher heat tolerance, we used the above pearl millet population (SRP063925) to identify 113 selection sweep regions harboring 3,952 fdSVs overlapping with 1,285 genes between the landrace and improved cultivars relative to the wild accessions (Extended Data Fig. 10c and Supplementary Table 1). Functional enrichment analyses showed that these genes were associated mainly with stress-related GO terms, including temperature, abiotic stimulus and isoprenoid biosynthetic process (Extended Data Fig. 10c). We also found that 79.3% of those genes (1,019 out of 1,285) exhibited transcriptional  Table 18), indicating that fdSVs potentially influence domestication genes under heat stress. In addition, 17 of these genes near 16 fdSVs were present only in the HR group and the fdSVs were significantly correlated with their gene expression levels; among these genes, PMA2G02653.1 was also related to temperature adaptation (Fig. 6a). Additionally, we found that a 716-bp insertion (SV) was present in a higher proportion of the wild accessions than the landrace ones and improved cultivars (Fig. 6b). This insertion was positioned 2.1 kb upstream of PMA5G03691.1, which encodes a coiled-coil 90B-like protein that is probably responsible for pollen germination and is associated with the grain number per panicle (GNP) trait. The presence of this insertion was possibly correlated with heat-induced gene expression (Extended Data Fig. 10d). We then conducted a genome-wide association study (GWAS) examining the associations of the 124,532 PAVs and 1,455,924 SNPs with GNP in a population reported by Varshney et al. 7 (Supplementary Table 19). An association peak on chromosome 5 showed an overlap between PAVs and SNPs. This quantitative trait locus corresponds to grain number 7 . In our study, we found PMA5G03691.1 and an insertion in the close vicinity of this quantitative trait locus (Fig. 6c, Supplementary Table 19 and Supplementary Note 7). We next observed this insertion in 113 accessions with lower GNP values than 116 accessions without the SV (Fig. 6c). These results suggested that this insertion was probably under positive selection    Fig. 7 | A proposed mechanism by which heat tolerance is integrally related to the transport system of the ER. After heat stress (H) for 21 d, only a small proportion of leaves exhibited wilting relative to the control (CK) group; leaves showed physiological changes (MDA) for up to 21 d, suggesting that pearl millet exhibits strong heat resistance. Significant differences were tested using a twotailed t-test (**P < 0.01). We leveraged multi-omics analyses to reveal a possible mechanism of heat tolerance in the ER transport system: (1) compared with maize and rice, pearl millet showed a higher proportion of ER-related genes that were differentially expressed, indicating a quicker response to heat stress in this system; (2) this heat stress led to the production of misfolded proteins that could be recognized and degraded via the cooperation of ER-related proteins such as CRT, CNX, BiP, NEF, PDA6 and HSP. HSF genes might be involved in the heat response because their expression is upregulated and they can coregulate HSP genes 45,46 . SVs surrounding 11 ER-related genes probably contributed to this response. For instance, one SV was associated with the expression of an ER-related gene, HSP70, which plays a role in the degradation of misfolded proteins. Additionally, 27 genes enriched in the ER system were located in regions with differentiated SV distributions between two populations in temperate and tropical zones; (3) furthermore, an RWP-RK gene (PMF0G00024.1) from an expanded transcription factor family was confirmed as a positive regulator involved in heat resistance and was coregulated by ER-related and HSF genes. We finally used dual luciferase assays to confirm that this PMF0G00024.1 gene could transactivate genes encoding ER-related BiP (PMA2G00107.1) and OST (PMA4G03758.1). Article https://doi.org/10.1038/s41588-023-01302-4 during domestication and influenced the responsiveness of nearby genes to heat, possibly contributing to seed production in pearl millet grown at higher temperatures. Furthermore, we identified a total of 142 PAVs that were each associated with one or more traits (20 traits in total), which might provide insights into the contributions of these SVs to pearl millet molecular breeding (Supplementary Table 19 and Supplementary Note 7). Collectively, these results demonstrate the utility of pearl millet graph-based pan-genome analysis for the identification of both heat tolerance adaptation and its relationship to domestication.

Resistance to heat in pearl millet depends on the ER system
We performed integrated multi-omics analyses supplemented with cis-genetic functional verification to propose a possible mechanism by which the superior heat tolerance of pearl millet is related to the expansion and altered expression of genes involved in the ER system (Fig. 7). In particular, the ER system showed a quicker response to high temperature in pearl millet than in maize and rice. Abundant evidence has shown that SVs participate in the heat tolerance response by affecting gene regulation; for example, SVs between HR and HS materials led to differential expression levels of 11 ER-related genes. Several other distinctly differentiated SVs in ER-related genes were also associated with the heat stress adaptation of pearl millet populations at different temperatures. Moreover, by means of functional analysis, we confirmed that one gene (PMF0G00024.1) from an expanded RWP-RK transcription factor family acted as a positive regulator of heat resistance; this transcription factor also transactivated one ER-related gene. These observations indicate that SVs and RWP-RK genes may coregulate the quick response to heat stress with ER-related genes in pearl millet.

Discussion
Pearl millet is an ideal model for investigating the mechanisms underlying plant heat resistance 9 . We identified distinctly differentiated SVs in ER-related genes that were associated with the heat stress adaptation of pearl millet populations at different temperatures (Fig. 6a); however, we did not find genes in SNP-based selection sweep regions that showed significant enrichment in ER-related pathways. These findings indicate that SV-based population analyses can capture genetic variations complementary to SNPs, providing additional information about the diversity losses caused by population bottlenecks during plant adaptation 38 . In addition, the expansion of RWP-RK transcription factors was likely related to LTR and these factors coregulated heat tolerance with ER-related and heat stress-related genes (Figs. 3 and 4). RWP-RK transcription factors have an important role in the nitrogen starvation response and gametophyte development in plants 39,40 . However, no heat tolerance-related functions of these transcription factors have been reported. Our findings expand the possible functions of RWP-RK transcription factors and illustrate a possible diversification in which this family of transcription factors is responsible for multiple stress condition responses in plants. This finding supports a previous hypothesis that pearl millet probably includes abundant heat tolerance-related genetic resources 6 .
The graph-based pan-genome resource offers several potential tools to improve the breeding process in pearl millet. We developed a comprehensive SV map of pearl millet to identify signals associated with phenotypes (that is, GNP) (Fig. 6c), which enables us to investigate potential mechanisms influencing nearby genes that are challenging to detect based only on SNP genotyping. This pan-genome also provides a new window for identifying evolutionary processes, such as the formation of adaptative SVs, to elucidate demographic and selection processes in pearl millet. The dispensable genome within the pan-genome resource offers a pathway for identifying genes associated with traits such as abiotic stress resistance or production, which would benefit the selection of suitable materials for use as breeding targets in pearl millet. In our pan-genome, PmiG showed a higher ratio of private gene families relative to the other assemblies (Fig. 1c), possibly caused by the relatively fragmented sequences generated by previous short-read sequencing or assembly techniques 7,[41][42][43] . A similar result was reported in a soybean pan-genome study 17 . The relatively lower contig N50 value intuitively suggested that the PmiG genome sequence is more fragmented (Table 1), which would lead to a lower average length of genes and coding sequences and a higher proportion of short genes (<1 kb) (Extended Data Fig. 4c,d). Thus, fragmentation of assembled sequences would result in incomplete prediction of genes, potentially contributing to the private gene set in the PmiG. Nonetheless, the PmiG, as the first published pearl millet genome, has been widely used as a reference genome in the pearl millet community 20,25,44 . Including it in our pan-genome research will help to refer to the basis of previous research and provide a smooth transition to the era of high-quality pearl millet genome research.
In conclusion, our study uses a pan-genome approach coupled with multi-omics to comprehensively investigate plant response mechanisms to heat stress. This work provides an excellent reference for future studies on stress tolerance, especially in non-model plants.
Our study also offers an approach for breeding crop varieties with enhanced tolerance to various stresses that can cope with the diverse challenges imposed by the changing climate.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01302-4.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.

Sampling and sequencing
Ten pearl millet accessions (PI537069, PI521612, PI526529, PI587025, PI583800, PI343841, PI186338, PI250656, PI527388 and Tifleaf3) were obtained as representative plants from different geographical regions. All ten accessions were planted in a greenhouse at a density of three plants per pot (filled with nutrient soil), including nine plants of each accession, and grown at a temperature of 26 °C during the light period (14 h of light) and 22 °C during the dark period (10 h of darkness). Thirteen-week-old leaves were collected and immediately frozen in liquid nitrogen for the extraction of genomic DNA using a DNAsecure Plant Kit (TIANGEN). Library construction and Illumina, Hi-C, PacBio and Bionano sequencing were performed at Novogene (Supplementary Note 1).

Genome survey
The genome size of pearl millet was estimated using k-mer frequency analysis based on the Lander-Waterman algorithm 47 . We divided the total length of sequence reads by the sequencing depth represented by the peak value of the frequency curve. The following formula was used to estimate genome size: (N × (L − k + 1) − B)/D = G, where N is the total number of sequence reads, L is the average length of the sequence reads, k is the k-mer length (17 bp), B is the total number of low-frequency k-mers (frequency ≤ 1 in this analysis), G is the genome size and D is the overall estimated depth based on the k-mer distribution 48 . Additionally, flow cytometry was used to confirm the estimated genome size according to a reported method 49 with a BD FACSCalibur flow cytometer and the fluorochrome propidium iodide.

Initial assembly
The PacBio HiFi reads were used to assemble the initial contigs in the Hifiasm (v.0.13-r308) 50 package with default parameters. The Pruge_ haplotig (v.1.1.0) 51 tool was used to process genomic heterozygous regions to remove redundancy in the genomes using default parameters with several exceptions: -a 50.

Scaffolding with Bionano optical maps
The filtered raw DNA molecules in BNX format were aligned, clustered and assembled into a Bionano optical map using the Bionano Genomics assembly pipeline. Then, a BNX file recorded the basic labeling and DNA length information was converted with the AutoDetect in Bionano Solve package (v3.5.1) (https://bionanogenomics.com/support/software-downloads/). The initial assemblies were aligned to the Bionano data and then analyzed with RefAligner in Bionano Solve package (v3.5.1). The alignments were visualized with a snapshot in IrysView in Bionano Solve package (v3.5.1). Finally, genome maps were combined with the initial assembly to produce hybrid scaffold genome maps using the Bionano Solve package (v.3.5.1) with the parameters -B 1 -N 1.

Pseudochromosome construction
Linkage information for the scaffold and initial assembly was obtained by aligning high-quality Hi-C data to the preceding assemblies using the Burrows-Wheeler Aligner (BWA) software (v.0.7.8) 52 . Chromosome-scale scaffolds were anchored based on linkage information, restriction enzyme sites and the string graph formulation using the ALLHIC (v.0.9.8) 53 package with the following parameters: -K 7 -minREs 50--maxlinkdensity 3--NonInformativeRatio 0. Placement and orientation errors showing obvious discrete chromatin interaction patterns were adjusted manually. For those accessions without Hi-C data, we used collinearity with the PI537069 assembly for clustering and orientation to generate chromosome-level assemblies.

Genome assessment
To evaluate the assembly quality of the genomes, BUSCO (v.4.1.2; http:// busco.ezlab.org/) 54 and the CEGMA (v.2.5) (http://korflab.ucdavis. edu/dataseda/cegma/) 55 were used to check the completeness of the genome assembly or annotation. The draft assemblies were further evaluated by mapping the high-quality Illumina paired-end reads to the genome assembly using the BWA-MEM (v.0.7.8) 52 algorithm. The quality of the genome assemblies was further evaluated using LTR TE completeness based on the LAI tool wrapped in LTR_retriever (v.2.8) 21 and using Merqury (v.1.3) 22 with the default parameters.

Annotation of repetitive sequences
Transposons were annotated by combining two strategies, that is, homolog and de novo predictions. For the homology-based approach, the Repbase TE library 56

Annotation of gene structure
Gene annotation was conducted by combining de novo-, homolog-and transcriptome-based predictions. For the homolog-based approach, we downloaded homologous proteins from the A. thaliana, Z. mays, S. bicolor, O. sativa, S. italica and pearl millet genomes (Phytozome 13, https://phytozome.jgi.doe.gov/pz/portal.html; NCBI, https://www. ncbi.nlm.nih.gov/) and aligned them to the pearl millet genome with Tblastn (v.2.2.26) 64 using an expected value of 1 × 10 −5 . Solar (v.0.9.6) 65 was used to combine the BLAST hits (Homo-set), which were used to predict the exact gene structures of the corresponding genomic regions with GeneWise (v.2.4.1) 66 (https://www.ebi.ac.uk/Tools/psa/ genewise). For the transcriptome-based approach, RNA-seq data from Illumina were mapped to the assembled genome with TopHat (v.2.0.13) 67 , followed by Cufflinks (v.2.1.1) 68 . In addition, Trinity (v.2.1.1) 69 was used to assemble the RNA-seq data and its output was used to create pseudo-expressed sequence tags, which were then mapped to the assembly. Gene models were predicted by using the Program to Assemble Spliced Alignments (PASA) genome annotation tool 70 . This gene set was denoted as the PASA-T-set and was used to train ab initio gene prediction programs. For the de novo-based approach, five ab initio gene prediction programs, including AUGUSTUS (v. 3  These two databases provide a portal for obtaining GO terms (http://geneontology.org/ http://www.geneontology.org/page/ go-database) 81 . The pathways of the genes were identified via BLAST searches against the KEGG database (v.53) (http://www.kegg.jp/kegg/ kegg1.html) 82 with an expected value cutoff of 1 × 10 −5 .

Pan-genome construction
We constructed a pan-genome using the 11 pearl millet assemblies. The core and dispensable gene sets among the 11 pearl millet genomes were estimated based on gene family clustering using OrthoFinder (v.2.3.1) 83 . All protein sequences were subjected to homologous searches using BLASTP with an expected value of 1 × 10 −5 . Protein sequences were clustered into paralogous and orthologous sequences using OrthoFinder with an inflation parameter of 1.5.

SV identification
To build a genetic variance atlas for the 11 pearl millet genomes, we aligned the other ten genomes to the PI537069 reference genome using MUMmer (v.4.0.0) 84 . The alignment of the genomes was performed using NUCmer 84 (--c 1000--maxgap=500) and the alignment block filter was implemented using a delta filter in one-to-one alignment mode (−1). Blocks longer than 1,000 bp were used for further analysis. We used the SV function of the MUMmer (SVMU) pipeline to automate PAV discovery by parsing the results of NUCmer. From the SVMU results, SV-based insertions or deletions (with the tag INS or DEL) were treated as PAVs and CNVs were treated as CNVs. Inversion events (referring to SVs more than 1 kb in length) were identified by SVMU. SyRI (v.1.6.3) (https://github.com/schneebergerlab/syri) 85 was used to identify translocation regions. We also used PI537069 as a reference to construct a graph-based genome with the vg tool (v.1.25.0) (https://github.com/ vgteam/vg) 86 . To genotype the population SVs, the Illumina short reads (SRP063925) of each accession were mapped to the graph-based genome using the vg tool with default parameters.

Transcription factor family identification and analysis
To identify and compare transcription factor families in pearl millet and other species, we collected the protein sequences of A. thaliana (TAIR10) 87 , Z. mays (B73_RefGen_v4) 88 103 . The iTAK tool (v.1.7a) 104 was used for transcription factor prediction with default parameters. To avoid bias caused by differences in the number of genes among the different plants 105 , we calculated the proportion of transcription factor as N TF /N total , where N TF is the number of transcription factors and N total is the total number of genes in the corresponding plant. Moreover, we predicted the binding sites of RWP-RK transcription factors with the FIMO tool (v.5.3.2) (https://meme-suite.org/meme/ meme_5.3.2/doc/fimo.html) 106 .

Contributions of SVs to nearby gene expression
To investigate whether the SVs could broadly influence nearby gene expression, we used RNA-seq dataset B for the six accessions subjected to 1 h of control conditions (Supplementary Table 3

PAV-GWAS
To explore the usefulness of the graph-based genome and identify SV-driven alterations of genes controlling important agronomic traits, we conducted a PAV-GWAS analysis. After PAV filtration (removal of PAVs with a minor allele frequency < 0.05 or missing rate > 0.1), a total of 124,532 PAVs were used to perform PAV-GWAS in 242 accessions. Association analysis was conducted using the GEMMA (v.0.94.1) software package 107 . For the mixed linear model analysis, we used the equation y = Xα + Sβ + Kµ + e, where y represents the phenotype, X represents the genotype, S is the structure matrix and K is the relative kinship matrix. Xα and Sβ represent fixed effects and Kµ and e represent random effects. The top three principal components were used to build the S matrix for population structure correction. The matrix of simple matching coefficients was used to build the K matrix.

Determination of physiological indicators
Seeds (2.00 g) of Tifleaf3 were cultured in a plastic box (

Measurement of POD, MDA and REC
Leaves (0.1 g) were ground and 1.5 ml of PBS solution (150 mM) was added. The mixtures were centrifuged at 12,879.36g for 20 min at 4 °C. The supernatant was then collected. For the determination of MDA activity, 0.5 ml of enzyme extract was added to 1 ml of reaction solution (20% trichloroacetic acid and 0.5% thiobarbituric acid) and the mixture was incubated in a 95 °C water bath for 30 min. Thereafter, the mixture was placed in an ice bath at room temperature (25 °C) and centrifuged at 12,879.36g for 10 min. The absorbance was recorded at 532 nm and 600 nm using a spectrophotometer (Sorvall ST 16). For the determination of POD activity, a 1.5 ml reaction system was used. First, 925 μl sodium acetate (100 mM) was added, after which 0.5 ml guaiacol (0.25%) and 25 μl enzyme extract were added. After mixing, 50 μl of hydrogen peroxide (0.75%) was added to the mixture. The absorbance was recorded at 470 nm every 10 s. SOD enzymatic activity was determined as described by Dhindsa et al. 108 . Starting with 50 μl of crude enzyme solution, 1.1 ml of 50 mM phosphate buffer, 100 μl of 0.06 mM riboflavin, 100 μl of 195 mM l-methionine, 50 μl of 0.003 mM EDTA and 100 μl of 1.125 mM nitroblue tetrazolium were added. In addition, two tubes without enzyme extract were included as controls. The reaction was performed under 3000 lx light for 30 min and the reaction was terminated in the dark. Absorbance was recorded at 560 nm. For the measurement of REC, 0.1 g samples of fresh leaves were collected with six biological replicates. The leaves were wrapped using gauze and placed in a 50-ml Eppendorf tube and 20 ml of pure water was added Article https://doi.org/10.1038/s41588-023-01302-4 to completely cover the leaves. The tube was placed in an incubator at room temperature (25 °C). After 25 h, the S1 EC was measured and the sample was kept in a boiling water bath for 30 min. The S2 EC was measured when the water had cooled to room temperature (25 °C). The REC was calculated using the following equation: REC = S1/S2 × 100%.

Transcriptomic analyses of pearl millet under high temperature
Seeds (2.00 g) of six accessions of pearl millet were cultivated in a 10 × 15 × 6 cm plastic basin filled with quartz sand and placed in a growth chamber (26 °C under light for 14 h and 22 °C in darkness for 10 h). The culture conditions were as described by Sun et al. 109 . The V3 stage seedlings were equally divided into two groups: a high-temperature treatment group and a control (CK) group. The conditions of the high-temperature treatment group were 14 h under light at 45 °C and 10 h in darkness at 40 °C, while the CK group was cultured under unchanged conditions (26 °C and 22 °C). After 1 and 24 h of treatment, leaves were collected and stored at −80 °C. In addition, the seeds (2.00 g) of Tifleaf3 were grown under similar conditions and seedlings were divided into treatment and control groups as described above. The culture conditions of the heat-treated group were 14 h under light at 40 °C and 10 h in darkness at 35 °C; the control group was kept under unchanged conditions (26 °C and 22 °C). After treatment for 1, 3, 5, 7, 24, 48, 96 and 144 h, the roots and leaves of the seedlings were collected and stored at −80 °C. A total of 168 samples were collected and three biological replicates were set for each treatment and control. Each replicate consisted of the mixed tissues of 16 seedlings. To obtain the materials used for the annotation of gene structure, the ten accessions were planted in a greenhouse, with nine plants of each accession (26 °C under light for 14 h and 22 °C in darkness for 10 h). We collected leaves (three biological replicates), stems (one sample) and roots (one sample) 5 weeks after the planting of each accession to build 30 RNA-seq libraries. A Total RNA Kit (QIAGEN) was used to extract RNA from these samples to build a complementary DNA library (NEBNext Ultra Directional RNA Library Prep Kit for Illumina) in preparation for RNA-seq. After sequencing, the raw data were filtered with FastQC (v.0.11.9) (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc/) 110 . Transcripts were quantified with the Kallisto (v.0.46.2) 111 software using PI537069 as a reference. Finally, DEGs (|log 2 (group 1/group 2)| ≥ 1, P adj < 0.05) were identified with DESeq2 (v.1.26.0) 112 . GO and KEGG enrichment analyses were performed using the OmicShare tools (http://omicshare.com/ tools) (P < 0.05). Moreover, for the processing of published maize and rice transcriptomic data, we downloaded raw reads from maize in the V3 stage under 38 °C (14 h under light and 10 h in darkness) stress and normal conditions (25 °C; 14 h under light and 10 h in darkness) 31 and raw data from rice in the V3 stage grown under either 45 °C (13 h under light and 11 h in darkness) stress or normal culture condition (25 °C; 13 h under light and 11 h in darkness) 32 . The same methods and parameters were applied to the RNA-seq analysis of published maize and rice data.

Transgenic plant validation
The PMF0G00024.1 gene sequence was synthesized via synthetic gene sequence generation and was introduced to the pBWA (V)HS-CCDB vector under the control of the 35S promoter. Three hundred rice seeds without mildew spots that showed normal buds were sterilized with 75% alcohol for 1 min, soaked in sodium hypochlorite for 20 min, washed with sterile water three times and then placed into a culture medium to culture calluses. The culture was conducted under light at 26 °C for 20 d. In addition, a single Agrobacterium colony was cultured in medium in a shake flask to obtain an Agrobacterium resuspension with an OD 600 of 0.2. The calluses were added to the Agrobacterium suspension step. After 10-15 min of infection, calluses were picked, placed in a cocultivation medium and incubated at 20 °C for 48-72 h. Subsequently, cultured calluses were transferred to a selection medium containing hygromycin and cultured for 20-30 d (26 °C in darkness) for the first selection. After the first selection, 180 calluses were transferred to a new culture medium and cultured for 7-10 d (26 °C in darkness) for the second selection step. Ninety callus tissues were obtained and differentiation and rooting were induced. Finally, a total of 20 seedlings were obtained. The resistant calluses were differentiated into seedlings and PCR detection was performed using the primers listed in Supplementary Table 20. The PCR-positive seedlings were transplanted into the soil (26 °C under light for 14 h and 22 °C in darkness for 10 h). When they reached the four-leaf stage, quantitative PCR with reverse transcription was performed with the primers RWP1 and RWP2, with three technical repeats for each sample (Supplementary Table 20).

RWP-RK and ER-related genes
The open reading frames of RWP-RK (PMF0G00024.1) were inserted into the pGreenII62-SK vector to generate effector plasmids. The promoter sequence of PMA2G00107.1 was synthesized by Hzykang and then cloned into the pGreenII 0800-LUC vector to generate reporter plasmids. Effector and reporter plasmids were expressed in tobacco leaves, mediated by Agrobacterium injection. Tobacco leaves in the injection area were collected and fluorescence activity was measured using a luciferase assay kit (cat. no. DL101, Vazyme Biotech). The primers used in this section are shown in Supplementary Table 20.

Tobacco leaf transformation assays to assess the impact of SVs on gene expression
The promoter sequences were cloned into the T vector using the 5 min TA/Blunt-Zero Cloning Kit (cat. no.C601, Vazyme Biotech). We used PCR (enzyme mix, cat. no. P520, Vazyme Biotech) to add the vector sequence at the end of the promoter fragment and obtained the PBI121-GUS linearized vector (Supplementary Table 20). Circularization was performed according to the instructions of the Clone Kit (cat no. MC40101, Monad). The recombinant vectors were injected into Nicotiana benthamiana leaf cells using an Agrobacterium-mediated transfection system (GV3101). GV3101-pBI121-35s-GUS, GV3101-pBI121-Promoter-GUS and GV3101-pBI121-Promoter_SV-GUS were cultured to an OD 600 of 0.6 before injection. Two hundred microliters of liquid from each treatment was infiltrated into the tobacco leaves. Gloves were changed after the infiltration of each construct to prevent contamination. Tobacco was pretreated at a high temperature for 24 h (40 °C for 8 h and 35 °C for 16 h) and then cultured under the same conditions for 2 d after injection. The blank group was cultured at 25 °C (8 h under light and 16 h in darkness) and sampled by injection. The histochemical staining and quantitative analysis of GUS in three independent biological replicates were performed as described by Jefferson et al. 113 .

PCR validation of SVs
Genomic DNA was extracted from fresh leaves using a DP360 kit (TIAN-GEN) and PCR was performed using 2× Phanta Flash Master Mix (cat. no. P520, Vazyme Biotech). Five SVs were analyzed by PCR genotyping (condition: followed by 35 cycles of denaturation at 98 °C for 10 s, annealing at 60 °C for 5 s and extension at 72 °C for 5 s kb −1 ) using the primers indicated in Supplementary Table 20.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The raw sequencing data and transcriptome data of PI186338, PI250656, PI343841, PI521612, PI526529, PI527388, PI537069, PI583800, PI587025 and Tifleaf3 have been deposited in the NCBI Sequence Read Archive under BioProject accession no. PRJNA749489, PRJNA689619 and PRJNA756390. The assemblies of ten pearl millet have been deposited in NCBI GenBank under the accession no. JAMZRY000000000 Article https://doi.org/10.1038/s41588-023-01302-4 Extended Data Fig. 7 | Expansion and expression of RWP-RK transcription factor (TF) family members. a, Estimation of the divergence times of seven pearl millet accessions and six closely related species. PmiG represents Tift 23D2B1-P1-P5. The right panel displays the distribution of single-copy, multiplecopy, unique and other gene orthologs. b, Functional enrichment analyses showed the enrichment of some genes in stress-related processes or pathways among expanded family, specific, and positively selected gene sets. c, Estimated insertion times of LTR TEs encompassing the specific RWP-RKs of pearl millet belonging to clade A and the nonspecific RWP-RKs belonging to clade B. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; and dots represent outliers. d, Expression of seven RWP-RK genes under control (CK) and heat treatments (H). Significant differences were performed by DESeq2 to determine fold change (FC) of gene expressions between the CK and H groups (two-tailed Wald test). The FC was determined by * (0 < |log2FC|< 1, FDRadjusted P < 0.05) or ** (|log2FC|>= 1, FDR-adjusted P < 0.05). The blue and red */** represent gene down-and upregulation, respectively. e, Expression of two RWP fragments (RWP1 and RWP2) in 13 transgenic rice lines. A value above 0 indicates the upregulation of this gene. Error bars, mean ± s.d.; n = 3 biological replicates.