The Aegilops tauschii genome reveals multiple impacts of transposons

Zhao, Guangyao; Zou, Cheng; Li, Kui; Wang, Kai; Li, Tianbao; Gao, Lifeng; Zhang, Xiaoxia; Wang, Hongjin; Yang, Zujun; Liu, Xu; Jiang, Wenkai; Mao, Long; Kong, Xiuying; Jiao, Yuannian; Jia, Jizeng

doi:10.1038/s41477-017-0067-8

Download PDF

Article
Open access
Published: 20 November 2017

The Aegilops tauschii genome reveals multiple impacts of transposons

Nature Plants volume 3, pages 946–955 (2017)Cite this article

16k Accesses
139 Citations
36 Altmetric
Metrics details

Subjects

Abstract

Wheat is an important global crop with an extremely large and complex genome that contains more transposable elements (TEs) than any other known crop species. Here, we generated a chromosome-scale, high-quality reference genome of Aegilops tauschii, the donor of the wheat D genome, in which 92.5% sequences have been anchored to chromosomes. Using this assembly, we accurately characterized genic loci, gene expression, pseudogenes, methylation, recombination ratios, microRNAs and especially TEs on chromosomes. In addition to the discovery of a wave of very recent gene duplications, we detected that TEs occurred in about half of the genes, and found that such genes are expressed at lower levels than those without TEs, presumably because of their elevated methylation levels. We mapped all wheat molecular markers and constructed a high-resolution integrated genetic map corresponding to genome sequences, thereby placing previously detected agronomically important genes/quantitative trait loci (QTLs) on the Ae. tauschii genome for the first time.

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Genetic gains underpinning a little-known strawberry Green Revolution

Article Open access 19 March 2024

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Wheat is one of the most important food crops in the world. It is also a species with an extremely large and complex genome that contains more TEs than any other known species¹. This has for some time caused wheat research to lag behind that of crops with smaller genomes like rice, sorghum, etc. TEs were discovered by Barbara McClintock in 1951², but for about three decades dating from her discovery, TEs were largely thought to be ‘Junk DNA’, parasite genes in a host genome, or selfish DNA³. TEs make up a large fraction of many plant genomes; indeed, they are the major determinant of genome size. For example, more than 80% of the genomes of maize⁴ and wheat¹ are composed of TEs. More recently, TEs have been recognized as important functional components of genomes. Four distinct functional contributions of TEs are now recognized, including their roles in determining genome size and rearrangements, in generating mutations, in altering chromosome architecture and in the regulation of gene expression. TEs are now recognized as an abundant and unexplored natural source of regulatory sequences for host genes, and TE biology has become an active research area in recent years^5,6,7,8,9,10.

Sequencing technologies such as Illumina HiSeq X Ten and PacBio RS II sequencing, and new library construction methods such as 10x Genomics, as well as new assembly techniques such as DeNovoMAGIC2^11,12, are now significantly accelerating the progress of wheat genome sequencing efforts. Common wheat is a hexaploid species with A, B and D subgenomes. Whole genome shotgun sequencing of the Chinese Spring cultivar^13,14 and its diploid ancestors Triticum urartu ¹⁵ and Ae. tauschii ¹⁶ have been reported. Additionally, chromosome sorting was used to sequence chromosome 3B of the hexaploid wheat cultivar Chinese Spring, which advanced wheat research significantly¹. Although previous efforts generated draft genomes for diploid ancestors and for bread wheat, the majority of contigs could not be mapped to chromosomes in these assemblies. We previously constructed the draft genome of Ae. tauschii, and found that it was particularly rich in genes related to adaptation¹⁶. Here, we present a reference D genome in which more than 92.5% of the genome sequences are anchored to chromosomes. We mapped the global distribution of TEs and examined multiple functional impacts of TEs on D genome evolution, gene structure and gene expression.

Results and discussion

Genome assembly and feature annotation

Over 778 Gb of short read sequences were generated using two Illumina sequencing platforms (HiSeq 2000 and HiSeq 2500) (Supplementary Table 1). Short reads were first assembled by DeNovoMAGIC2^11,12 using 450 bp, 2 kb, 5 kb and 8 kb libraries to generate the V1.0 assembly, which comprises 271,060 contigs (N50 = 50.3 kb) and 117,344 scaffolds (N50 = 6.8 Mb). The assembly was further elongated using SSPACE with 20 kb and 40 kb libraries to generate the V1.1 assembly (N50 = 13.1 Mb). Finally, we used long reads generated with the PacBio RS II sequencing platform to further improve the contig assembly (Supplementary Table 2). The final assembly (V1.2) comprises 188,412 contigs (N50 = 112.6 kb) and 112,517 scaffolds (N50 = 12.1 Mb) (Supplementary Table 3). Our assembly represents a greater than 210-fold improvement in contiguity compared with the previously published Ae. tauschii assembly reported by our research group in 2013¹⁶ (see statistics for V0.1 in Supplementary Table 2). The total scaffold length of the V1.2 assembly (4.31 Gb) spans 95.8% of the estimated genome size (4.5 Gb)¹⁶, and the largest 434 scaffolds cover 90% of the genome (Supplementary Table 3).

A high-density genetic map containing 164,872 single-nucleotide polymorphism (SNP) loci spanning 1,153.6 cM in seven linkage groups was used to anchor the scaffolds to chromosomes. In total, 658 scaffolds were aligned and anchored to the genetic map, with a total length of 4.0 Gb, spanning 92.5% of the assembled genome; 97.9% of the aforementioned SNPs were placed on the scaffolds (Supplementary Table 4). We confirmed that the sizes of the chromosomes in our assembly were consistent with the results of a cytogenetics analysis that we performed for this study: chromosomes 2, 3, 5 and 7 are relatively longer, but chromosomes 1, 4 and 6 are shorter (Table 1 and Supplementary Fig. 2). Attesting to the quality of the assembly, Illumina paired end reads were mapped to our assembly and 99.96% of them could be mapped, which suggests that our assembly contains almost all of the information in the raw reads (Supplementary Table 6). Completeness of gene regions was assessed using CEGMA (conserved core eukaryotic gene mapping approach) and BUSCO (Benchmarking Universal Single Copy Orthologs). Two hundred and forty-three of the 248 (97.9%) conserved core eukaryotic genes from CEGMA were captured in our assembly), and 240 (98.8%) of these were complete. BUSCO analysis showed that 97% of the plant single-copy orthologues were complete. We also mapped 10,748 EST sequences generated from our Ae. tauschii full-length cDNA libraries¹⁶ to the assembly; 10,471 (97.4%) of these could be mapped to the scaffolds with greater than 90% coverage, which indicated that gene regions were almost complete in our assembly. To assess large-scale accuracy, we compared our assembly with the sequences of 17 BACs obtained either from the NCBI GenBank or from our in-house library. All of these BACs could be aligned to our assembly, in the correct order, with high sequence identity (15 of 17 alignments had identity > 99%), and had greater than 97% coverage. Only two of the gaps located in genic regions, all other gaps on the BACs were intergenic sequence, repeats or N strings (Supplementary Fig. 1 and Supplementary Table 10). The accuracy of the assembly was further confirmed by contig sequences which were randomly selected from a genome version generated from PacBio data¹⁷. All of the 120 selected PacBio Contigs could be unambiguously aligned to our assembly with sequence identity greater than 99.3% (Supplementary Table 11). These dramatic improvements in quality result from the use of emerging techniques like DeNovoMAGIC2, large insert mate-pair sequencing and PacBio, especially for TE-rich genomes. We annotated protein-coding genes (PCGs), RNA, pseudogenes and catalogued functional annotations for PCGs. Importantly, we also analysed the distribution of TEs on the seven chromosomes (Fig. 1). The Ae. tauschii genome sequence contains 42,828 PCGs, 98% of which were anchored to chromosomes in our assembly (Table 1); this is more than the 39,425 PCGs reported for Chinese Spring¹⁴. The average lengths, the number of exons and the GC content of the coding regions of the PCGs are similar to those of the genes in the genomes of other grass species (Supplementary Table 19). However, both the intron lengths and the intron GC content of the Ae. tauschii PCGs are much larger than those in any other sequenced grass species (Supplementary Table 19). These differences are perhaps due to the dramatically higher number of TE insertions in the introns of the Ae. tauschii PCGs. Of the PCGs, 92.6% were functionally annotated based on information from the NR (non-redundant database in NCBI), SwissProt, InterPro, Pfam and KEGG databases (Supplementary Table 20). We identified 25,893 likely pseudogenes with premature stop codons or frameshift mutations, a much higher number than that in rice (1,439–5,608)^18,19 or Arabidopsis (801–4,108)^18,20. When including gene fragments without disabling mutations in a broader pseudogene definition, the total number of pseudogenes reached 267,546 in the Ae. tauschii genome, which is two times larger than the number in the maize genome (B73 RefGen_v3, in the 5b + annotation build)²¹. Therefore, Ae. tauschii appears to be the plant species with the highest reported pseudogene content. In addition, 3,630 transfer RNA genes, 238 miRNA genes, 1,271 small nuclear RNA genes and 2,856 ribosomal RNA genes were predicted in the genome (Supplementary Table 22).

Table 1 The gene and TE distribution on the seven chromosomes of the Ae. tauschii genome

Full size table

**Fig. 1: Distribution of genomic features in the *Ae. tauschii* genome.**

Three whole genome duplication events have been identified in the evolutionary history of all of the sequenced grass genomes, including the tau event at ~150 million years ago (Ma), the sigma event at ~127 Ma and the rho event at ~70 Ma^4,22. To infer the evolutionary history of Ae. tauschii, we used MCScanx²³ to detect syntenic genomic regions among Ae. tauschii, Oryza sativa, Brachypodium distachyon and Sorghum bicolor; all of these intergenomic comparisons showed very strong co-linearity (Supplementary Figs. 12–14), further indicating the high-quality of our Ae. tauschii assembly and PCG annotation. An intragenomic comparison of Ae. tauschii showed a relatively smaller number of syntenic regions than that in other sequenced grass genomes (Fig. 2a and Supplementary Fig. 15), with the largest syntenic region occurring between chromosomes 1 and 3. Interestingly, this region is also present in the respective syntenic regions of the genomes of five other sequenced grass genomes (Fig. 2b). Genome alignments between Ae. tauschii and both O. sativa and S. bicolor revealed a clear 2-to-2 multiplicity ratio between orthologous regions, which supports the idea that the pan-cereal rho event²⁴ was the most recent whole genome duplication event in the evolutionary history of Ae. tauschii (Fig. 2b).

Transposable element analysis

TEs account for fully 85.9% of the assembled sequence of the Ae. tauschii genome, similar to what occurs in bread wheat (chromosome 3B)¹ and maize⁴, but much higher than the TE content in rice²⁵, sorghum²⁶ or B. distachyon ²⁷ (Supplementary Table 13). Note that 85.9% TE content is much higher than what we previously estimated based on our Ae. tauschii V0.1 draft genome (62.3%)¹⁶, and also higher than in Chinese Spring (76.6%)¹⁴. This can be partly attributed to the greater than 210-fold improvement in contiguity of our new assembly. Retrotransposons and DNA transposons cover, respectively, 59.9% and 19.6% of the genome (Supplementary Table 14). Long terminal repeats (LTRs) are the most abundant type of TE, covering 58% of the genome. Three superfamilies (Gypsy, 38.3%; CACTA, 16.8%; Copia, 16.5%) account for 71.7% of the total TE component (Supplementary Table 14). Compared to both rice and maize, the proportion of CACTAs in Ae. tauschii is almost tenfold greater^4,25.

To further explore the proliferation in each TE superfamily, we here define ‘family’ according to the peptide sequence similarity of key transposase domains (threshold ≥ 90%; retrotransposase domain (RT) for retrotransposons and the catalytic domains of DD [E/D] (DDE) DNA transposons). In total, we identified 4,669 TE families (100,879 copies with complete transposase domains) among eight superfamilies/orders in our assembly (Supplementary Table 15). Consistent with the DNA-sequence-based TE annotation (above), our domain-based analysis showed that LTR retrotransposons were the most abundant type of TE, and CACTA elements were the most abundant type among the DNA transposons.

We identified 172 CACTA families that formed four clades in a phylogenetic tree (Fig. 3a). The four clades, Aet-CACTA-1 to Aet-CACTA-4, consisted of 66, 49, 38 and 19 families. To investigate their origins, we identified the CACTA families of five published grass genomes and combined them with the Ae. tauschii CACTAs to obtain a six-species CACTA data set containing 855 families (Supplementary Table 17); the genomes included were common wheat^13,14, Triticum urartu ¹⁵, B. distachyon ²⁷, rice²⁵ and sorghum²⁶. The topology of the neighbour-joining phylogenetic tree distinguished four clades of CACTAs in Ae. tauschii, and clearly suggested that the Aet-CACTA-4 clade originated in the Triticeae lineage, whereas the other three clades originated before the divergence of the major grass groups. A total of 19 families are included in Aet-CACTA-4, relatively fewer than in the three other clades. The centromere-enriched distribution of the Aet-CACTA-4 TEs in Ae. tauschii is distinct from the distributions of the other known CACTA subfamilies, which are more abundant in terminal, gene-rich regions (Fig. 3b).

**Fig. 3: Rapid proliferation of transposable elements.**

Analysis of the evolutionary relationships of transposase domains showed that the high-copy-number families accounted for only a small proportion of all families, but 59.1% (2,829/4,782) of families had a copy number of 2 or 3 (we set the lowest copy number as 2 in this study) (Supplementary Table 15 and Supplementary Fig. 9). For example, only eight out of 172 CACTA families were high-copy-number families (copy number > 100), but these eight families (5% of all families) had 5,333 copies, representing 84% of all identified CACTA copies. A similar pattern was observed for LTR elements, where 99 (2.5%) of the high-copy-number families accounted for 65,912 (72.6%) of all identified LTR copies. Since CACTA and LTR elements account for over 70% of all Ae. tauschii TEs, this pattern suggests that a few dominant families might play key roles in recent genome structural evolution, supporting previous suppositions from a study of bread wheat^1,28.

We next analysed the global distribution of TEs on chromosomes and found that they tended to increase in density along the chromosome from distal regions towards the centromere (Fig. 1 and Supplementary Fig. 19). The distribution of Gypsy TEs is similar to that of total TEs, as indicated by a significant positive correlation (r = 0.92, P < 10⁻¹³). Given that Gypsy TEs represent fully 38.3% (Supplementary Table 14) of total TEs, this high correlation is not surprising. However, the density of both Copia and CACTA TEs decreased from distal regions to centromeric/pericentromeric regions, showing a significant negative correlation with the distribution of total TEs and of Gypsy TEs (P < 10⁻¹³). To explore the impact of TEs on the distribution of genomic features, we plotted the distributions of both gene and pseudogene density, and also examined gene expression, DNA methylation and recombination on every chromosome (Fig. 1). We found that TEs are associated significantly with all of the other genomic features, highlighting the consequential impact of TEs on this genome. The density of Gypsy elements is significantly positively correlated with average gene expression levels, but is negatively correlated with both gene density and recombination. Similarly, negative correlations were observed for Copia and DNA TEs, although these trends were less obvious. All of these trends are consistent with the findings from an analysis of chromosome 3B of bread wheat¹.

Impact of TEs on gene evolution and regulation

Considering that 85.9% of the Ae. tauschii genome is composed of TEs, we explored the impact of TEs on gene duplication, gene structure, methylation, expression and pseudogenization. We found a large number of recently duplicated genes (not resulting from whole genome duplication event in the Ae. tauschii genome), some of which in theory may have resulted from TE movement. All-against-all best reciprocal BLASTP searches identified a total of 9,569 paralogous gene pairs. The synonymous nucleotide substitution rate (K _s) for the paralogues was calculated using codeml (PAML)²⁹. Interestingly, the K _s distribution showed an older peak at ~0.65 and a more recent peak at ~0.25 (Fig. 2c), suggesting that, in addition to the pan-cereal rho event, a larger number of dispersed genes were duplicated more recently in the Ae. tauschii genome (3,034 gene pairs with K _s values less than 0.3) than other sequenced grass genomes (Fig. 2d and Supplementary Fig. 16). These genes were classified into gene family categories that are significantly enriched for wounding responses, endopeptidase inhibitor activity, serine-type endopeptidase inhibitor activity and enzyme inhibitor activity (Supplementary Table 23). Careful classification of these recently duplicated gene pairs showed that 1,204 were specific to the wheat D genome (Supplementary Fig. 17). A separate positional analysis revealed that 1,102 of the recently duplicated gene pairs are likely to have arisen via tandem duplication, and the remaining 1,932 non-tandem duplicate pairs were uniformly distributed across all chromosomes (Supplementary Fig. 16). We investigated the sequence similarities between the 1.5 kb flanking sequences of the 1,932 non-tandem duplicates, and found that 23 pairs had similar 5′ and 3′ flanking sequences (defined as 80% identity and 50% of the length), 273 pairs had similar 5′ flanking sequence and 46 pairs had similar 3′ flanking sequences. We further investigated if the flanking regions of these recently duplicated genes overlapped with TEs, and found that the flanking sequences of 3,415 (88%) genes had TEs. Given the tremendous number of TEs in the Ae. tauschii genome, and considering that less than about 36% of these resulted from tandem duplication, it seems reasonable to speculate that a burst of TE-associated gene duplication may explain the unexpectedly large number of recently duplicated genes that we observed in our K _s analysis.

Compared to other grass species, including maize, rice and B. distachyon, the occurrence of both retrotransposons and DNA transposons in both gene bodies and in flanking regions is highest in Ae. tauschii (Fig. 4a). Of the predicted genes of Ae. tauschii, 45.5% contain at least one TE (Supplementary Table 21), which is two times higher than in maize, another TE-rich genome⁴. Among the 12 TE superfamilies we examined, CACTAs are the most abundant superfamily of TEs inserted in introns; these are present in 8,547 genes (43.9 % of the genes with TE insertions, Supplemental Table 21). We compared the length of inserted TEs in the genic regions and intergenic regions, and found that the TEs inserted in genic regions are significantly shorter than those inserted in intergenic regions (two-sample Kolmogorov–Smirnov (KS) test, P < 0.001) (Supplementary Fig. 11). As expected, genes containing TEs were on the whole expressed at lower levels than were genes without TEs (Supplementary Fig. 18, KS test, P < 10⁻¹³).

**Fig. 4: TE distribution and methylation profiles across genes in the *Ae. tauschii* genome.**

Pseudogenes can be regulators of biological function^30,31. To test if the high pseudogene content in the Ae. tauschii genome detected above is correlated with historic bursts of TE movement in the Ae. tauschii genome, we identified processed pseudogenes (those resulting from retrotransposition) and examined the distribution of TEs across pseudogenes. Among the pseudogenes, about 29% of the multi-exon ancestor genes had lost their introns, suggesting that retrotransposition was involved in their pseudogenization. We also found that several superfamilies of retrotransposons and DNA TEs (for example, Gypsy, CACTAs, Helitrons, and so on, Fig. 4b) were enriched in pseudogene bodies and/or in the flanking regions of pseudogenes. More than 80% of the pseudogenes are somehow disrupted by TEs from these superfamilies, suggesting a pivotal role of TE movement in pseudogenization.

Cytosine DNA methylation is important in the epigenetic regulation of gene expression and in silencing transposons and other repetitive sequences³². To investigate DNA methylation and gene expression profiles in the Ae. tauschii genome, we conducted whole genome bisulfite sequencing and RNA-seq using the same sample tissues. The average percentages of methylation of CG, CHG and CHH contexts in leaf were 89.7%, 59.1% and 2.1%, respectively (Supplementary Table 24). The genome CG and CHG methylation levels are about the same as those in maize (86.4%, 70.9%)³³, and are much higher than those in B. distachyon (56.5%, 35.3%)³⁴ and rice (44%, 24%)³⁵. This is consistent with previous observations indicating a positive correlation between genome size, TE content and genomic methylation levels³⁶. The methylation level of CG and CHG contexts within genes is much lower than that in the intergenic regions, which might be attributable to the flanking sequences of genes in the Ae. tauschii genome being frequently occupied by TEs, and typically heavily methylated (Fig. 4c). Similar to Arabidopsis, rice and maize, CG methylation is the most abundant type of methylation in the Ae. tauschii genome, and the CG methylation levels are low near both transcriptional start sites (TSSs) and transcriptional terminal sites (TTSs). Intriguingly, this difference is more obvious in expressed genes than in non-expressed genes, which suggests that the lower level of CG methylation around TSSs and TESs might be important in regulating the activation of gene expression. In contrast, the level of CHG methylation is higher in non-expressed genes than in expressed genes, and this trend is consistent throughout gene bodies (in both exons and introns). Regions with elevated CHH methylation levels located immediately adjacent to TSSs and TESs are referred to as ‘mCHH islands’. mCHH islands were first identified in maize, and have been proposed to function as ‘insulators’ that separate the silencing of TEs from that of nearby genes. However, there is as yet no clear conclusion about the relationship between mCHH islands and gene expression^37,38,39. We found that mCHH islands are obvious in Ae. tauschii and, further, we found that the CHG methylation level is higher in non-expressed genes than in expressed genes (Fig. 4c), a finding that appears to provide an additional line of evidence to support this gene expression insulator theory. Our combined bisulfite sequencing and RNA-seq analysis revealed that genes with TEs inserted in their introns are expressed at lower levels than are genes without TEs, so we examined the methylation profile around TEs to test if TE insertions affect methylation of the surrounding region (Fig. 4d). Indeed, we found that introns with TE insertions showed higher methylation levels than introns without any TEs, although the level was lower than that for TEs in intergenic regions.

Integrated genetic map and key agronomic genes/QTLs map

Extensive previous research efforts have genetically mapped a large number of agronomically important genes/QTLs. However, most of these QTLs could not be physically mapped, and individual results are not typically comparable because studies used different mapping populations and different sets of markers. To address this deficiency, we mapped all available markers, including 735 first-generation restriction-fragment length polymorphisms (RFLPs)⁴⁰, 3,536 second-generation marker SSRs⁴¹ and millions of third-generation SNPs (90 K⁴², 820 K⁴³ and 660 K used in our laboratory) to the Ae. tauschii genome, and generated a high-resolution integrated genetic map corresponding to the genome sequences (Fig. 5). By using this integrated genetic map, we anchored 50 genes/QTLs (with marker sequences previously detected in various populations) to the D genome. We also identified 256 agronomically important genes that have been identified by map-based cloning in cereal crops (203 from rice, 22 from wheat, 16 from barley and 15 from maize). These genes include 53 conferring disease resistance, 19 for abiotic stress tolerance, nine for domestication, 135 for development, 12 for quality and 28 for yield (Supplementary Table 25). Thereby, we generated the first genome-based gene/QTL map for Ae. tauschii (Fig. 5 and Supplementary Table 26). A total of 33, 54, 38, 33, 37, 20 and 58 genes/QTLs were anchored to chromosomes 1D to 7D respectively, suggesting that 2D and 7D have made relatively stronger positive contributions to wheat improvement than the other chromosomes. All of these results highlight that, in addition to its utility in genomics research, a chromosome-scale reference genome is a valuable resource for molecular breeding and gene cloning.

Fig. 5: A high-resolution integrated genetic map which assists anchoring agronomically important genes/QTLs of *Ae. tauschii.*

Conclusions

We used highly efficient sequencing and assembly techniques to construct a high-quality reference genome for the TE-rich species Ae. tauschii. Our assembly has a scaffold N50 value of 13.1 Mb, and 92.5% of the scaffolds were anchored to chromosomes. We also developed a recombination map and anchored 97.9% of genes and 94.7% of pseudogenes and, impressively, 94.1% of TEs. We discovered a new, Triticeae-specific CACTA family, namely Aet-CACTA-4, which contains 19 subfamilies. The physical distribution of Aet-CACTA-4 is distinct from the other three known CACTA subfamilies. The Ae. tauschii genome has the largest number of pseudogenes among all examined genomes, which may be attributable to historical bursts of TE activity. About half of the Ae. tauschii genes have TE insertions, and bisulfite and transcriptome sequencing revealed that these genes had both elevated methylation levels and reduced transcription. We mapped all of the genetic markers from wheat studies spanning three decades, and constructed the first accurate integrated genetic–physical map of Ae. tauschii. We used these tools to map almost all of the previously detected agronomically important genes/QTLs to chromosomes. These resources should contribute immediately to advancing molecular breeding programmes and disease resistance initiatives, and will facilitate the basic functional characterization of many important and long-sought wheat genes.

Methods

Genome sequencing

The genomic DNA of Ae. tauschii AL8/78 was used to construct multiple types of libraries, including short insert size (450 bp) libraries, mate-paired (2 kb, 5 kb, 8 kb, 20 kb and 40 kb) libraries and PacBio SMRT Cell libraries. For the 450 bp short inserts, the library was sequenced on an Illumina HisSeq2500 instrument with 250 bp per end. In total, we produced over 778 Gb of short read sequences (Supplementary Table 1). PacBio SMRT Cell libraries were sequenced with a PacBio RS II instrument; over 53 Gb of raw data were obtained. No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Genome assembly and evaluation

The genome was assembled using the software package DeNovoMAGIC2 (NRGene, Nes Ziona, Israel)^11,12 software (Supplementary Information 2.2), which is a DeBruijn graph-based assembler designed to efficiently extract the underlying information from raw reads to solve the complexity of the DeBruijn graph because of genome repetitiveness. Sequencing data from the PCR-Free library and the Nextera MP libraries were used for DeNovoMAGIC2 assembly. PCR duplicates, an Illumina adaptor (AGATCGGAAGAGC), and Nextera linkers (for MP libraries) were removed from the raw sequencing data. Overlapping reads from the PE 450 bp 2 × 250 bp libraries were then merged with a minimal required overlap of 10 bp to create the stitched reads. Following these pre-processing steps, merged PE reads were scanned to detect and filter reads with putative sequencing errors (reads containing a sub-sequence that does not reappear several times in other reads). The first step of the DeNovoMAGIC2 assembly algorithm consists of building a De Bruijn graph (kmer = 191 bp) of contigs from the overlapping PE reads. Next, PE reads are used to find reliable paths in the graph between contigs for repeat resolving and contig extension. Later, contigs are linked into scaffolds with PE and MP information, estimating gaps between the contigs according to the distance of PE and MP links. A final fill gap step uses PE and MP links, as well as De Bruijn graph information, to detect a unique path connecting the gap edges.

Mate-paired data (20 kb, 40 kb) were mapped to the basic assembly using bowtie (http://bowtie-bio.sourceforge.net/index.shtml), and only unique mapping reads were retained. Further scaffolding was performed by SSPACE (https://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE, V3.0). PBJelly (http://www.winsite.com/Home-Education/Science/PBJelly/) was used to fill gaps using approximately 10X of SMRT sequencing data. We generated an F₂ mapping population of 490 individuals derived from a cross between the Ae. tauschii accessions Y2280 and AL8/78. The F₂ individuals were grouped using JoinMap4.0 and then ordered using MSTmap. Finally, using the kosambi function, we generated a high-resolution genetic map, which includes 164,872 SNPs developed by restriction site-associated DNA (RAD) tag sequencing technology; the total length is 1,153.58 cM¹⁶. The high-density genetic map was used to anchor the scaffolds to chromosomes using BLAST⁴⁴. The completeness of gene regions of our assembly was evaluated using both CEGMA (Core Eukaryotic Gene Mapping Approach, http://korflab.ucdavis.edu/datasets/cegma/) and BUSCO (Benchmarking Universal Single-Copy Orthologs, http://busco.ezlab.org/).

Genome annotation

Protein-coding region identification and gene prediction were conducted using a combination of homology-based prediction, de novo prediction, and transcriptome-based prediction methods. Protein sequences from nine plant genomes (B. distachyon, S. bicolor, O. sativa, Zea mays, Hordeum vulgare, Triticum aestivum, T. urartu, Setaria italic and Panicum virgatum) were downloaded from Ensemble (Release 33) and were aligned to the Ae. tauschii assembly using TblastN⁴⁴ with an E-value cut-off of 1 × 10⁻⁵. The BLAST hits were conjoined using Solar software⁴⁵. GeneWise (https://www.ebi.ac.uk/Tools/psa/genewise) was used to predict the exact gene structure of the corresponding genomic regions for each BLAST hit. A collection of wheat FLcDNAs (16,807 sequences) were directly mapped to the Ae. tauschii genome and assembled by PASA (http://pasapipeline.github.io/). Five ab initio gene prediction programs, Augustus (http://augustus.gobics.de/, version 2.5.5), Genscan (http://genes.mit.edu/GENSCAN.html, version 1.0), GlimmerHMM (http://ccb.jhu.edu/software/glimmerhmm/, version 3.0.1), Geneid (http://genome.crg.es/software/geneid/) and SNAP (http://korflab.ucdavis.edu/software.html), were used to predict coding regions in the repeat-masked genome. RNA-seq data were mapped to the assembly using Tophat (http://ccb.jhu.edu/software/tophat/index.shtml, version 2.0.8). Cufflinks (http://cole-trapnell-lab.github.io/cufflinks/, version 2.1.1) was then used to assemble the transcripts into gene models. Functional annotation of protein-coding genes was achieved using BLASTP⁴⁶ (E-value 1 × 10⁻⁵) against two integrated protein sequence databases: SwissProt (http://web.expasy.org/docs/swiss-prot_guideline.html) and NR. Protein domains were annotated by searching against the InterPro (http://www.ebi.ac.uk/interpro/, V32.0) and Pfam databases (http://pfam.xfam.org/, V27.0), using InterProScan (V4.8) and HMMER (http://www.hmmer.org/, V3.1), respectively. The Gene Ontology (GO, http://www.geneontology.org/page/go-database) terms for each gene were obtained from the corresponding InterPro or Pfam entry. The pathways in which the genes might be involved were assigned by BLAST against the KEGG database (http://www.kegg.jp/kegg/kegg1.html, release 53), with an E-value cut-off of 1 × 10⁻⁵.

RNA sequencing and analysis

To aid with gene annotation and to address many biological questions using gene expression level information, we produced a total of 53.21 Gb of RNA-seq data from eight different organs, including pistil, root, young seed, young spikes, young stamen, stem, young leaf and sheath. For this RNA-seq analysis, detailed information about treatment of plant material and experimental process has been described in the literature previously¹⁶. RNA-seq data were mapped to the genome using Tophat (version 2.0.8). Only the aligned reads located within 600 bp of each other were defined as concordantly mapped pairs; these were used in the downstream quantification analysis. The minimum and maximum intron length was set to 5 bp and 50,000 bp respectively. All other parameters were set to the default values. cufflinks30 (version 2.1.1) (http://cufflinks.cbcb.umd.edu/) was then used to estimate the expression level for each gene based on reads that have been uniquely mapped to the genome. An ‘expressed gene’ was defined as a gene with RPKM > 1 in leaf or root organs of 3-week-old seedlings. The remaining genes were defined as ‘non-expressed genes’.

TE analysis

We studied the evolutionary relationships of two class I and five class II TEs by genome-wide identification of RT/DDE domains; we then constructed phylogenetic trees. To identify RT and DDE domains, we first searched known representative RT/DDE sequences against the assembly using TBLASTN⁴⁴ and extracted all regions of E-value less than 1 × 10⁻¹⁰, and then excluded overlapping hits and retained regions for which the size ≥ 50% of representative sequences. Known representative sequences for DDE domains were from Yuan and Wessler⁴⁷, and representative sequences for RT domains were from an in-house plant TE database. A total of 100,879 domain regions were identified.

Giemsa-C banding and fluorescence in situ hybridization (FISH)

Chromosome preparation from Ae. tauschii accession AL8/78 root tips was performed as described by Han et al.⁴⁸. The chromosome Giemsa-C banding procedure followed Gill et al.⁴⁹. Sequential fluorescence in situ hybridization (FISH) with synthesized labelled oligonucleotide probes Oligo-pSc119.2 and Oligo-pTa535 was performed as described by Tang et al.⁵⁰. Images were captured with an Olympus BX-51 microscope equipped with a DP-70 CCD camera. The karyotype of C-banding and FISH patterns of Ae. tauschii chromosomes were identified according to homoeology with the D genome of wheat⁴⁹.

Whole genome bisulfite sequencing

Bisulfite data from Illumina sequencing was aligned to the Ae. tauschii genome using Bismark⁵¹ (version 0.12.5) with bowtie 2, requiring perfect matches. The mapping-quality was set to 10 to filter the mapping results to obtain unique mapping reads. Bismark extractors were used to identify three types of methylation (CHG, CHH, and CG). The visualization of the methylation results was conducted using deeptools⁵².

Genome evolutionary analysis

Genome structural syntenic analyses were performed with the MCScanx toolkit²³. Owing to the large number of recent duplicates found in the wheat D genome, top 20 BLASTP gene pairs were used as inputs for MCScanx to see if their E-value was less than 1 × 10⁻⁵. Paralogous pairs of sequences were identified from the best reciprocal matches in all-by-all BLASTP searches. For each pair of homologous genes, protein sequences were aligned using CLUSTALW2⁵³, and nucleotide sequences were then forced to fit the amino acid alignments using PAL2NAL⁵⁴. K _s values were calculated using the Nei–Gojobori algorithm⁵⁵ implemented in the codeml package of PAML²⁹. Orthogroups, or putative gene families, were constructed using the OrthoMCL method⁵⁶.

Integrated genetic map and key agronomic genes/QTLs map

We collected all of the known sequences for the molecular markers of the D genome that have been generated in the past three decades, among which there were 735 RFLP markers, 3,536 SSR markers and nearly one million SNP markers. We next anchored these markers to the genome by matching marker sequences with the D genome sequences. QTLs located on the D subgenome of common wheat were also mapped to the integrated map based on their flanking marker information. We also anchored agronomically important genes on the D genome by using a similar approach.

Life Sciences Reporting Summary

Further information on experimental design and reagents is available in the Life Sciences Reporting Summary.

Data availability

The genome sequence and the annotation are available from the National Centre for Biotechnology Information (NCBI) as BioProject ID PRJNA182898. This Whole Genome Shotgun project is deposited at DDBJ/EMBL/GenBank under accession number AOCO00000000. The version described in this paper is the second version, AOCO02000000. The data that support the findings of this study are available from the corresponding author upon request.

References

Choulet, F. et al. Structural and functional partitioning of bread wheat chromosome 3B. Science 345, 1249721 (2014).
Article PubMed Google Scholar
McClintock, B. Chromosome organization and genic expression. Cold Spring Harb. Symp. Quant. Biol. 16, 13–47 (1951).
Article CAS PubMed Google Scholar
Orgel, L. E. & Crick, F. H. C. Selfish DNA: the ultimate parasite. Nature 284, 604–607 (1980).
Article CAS PubMed Google Scholar
Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009).
Article CAS PubMed Google Scholar
Feschotte, C., Jiang, N. & Wessler, S. R. Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 3, 329–341 (2002).
Article CAS PubMed Google Scholar
Slotkin, R. K. & Martienssen, R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8, 272–285 (2007).
Article CAS PubMed Google Scholar
Feschotte, C. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet. 9, 397–405 (2008).
Article CAS PubMed PubMed Central Google Scholar
Lisch, D. Epigenetic regulation of transposable elements in plants. Annu. Rev. Plant Biol. 60, 43–66 (2009).
Article CAS PubMed Google Scholar
Bucher, E., Reinders, J. & Mirouze, M. Epigenetic control of transposon transcription and mobility in Arabidopsis. Curr. Opin. Plant Biol. 15, 503–510 (2012).
Article CAS PubMed Google Scholar
Lisch, D. Regulation of the mutator system of transposons in maize. Methods Mol. Biol. 1057, 123–142 (2013).
Article CAS PubMed Google Scholar
Lu, F. et al. High-resolution genetic mapping of maize pan-genome sequence anchors. Nat. Commun. 6, 6914 (2015).
Article CAS PubMed PubMed Central Google Scholar
Avni, R. et al. Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science 357, 93–97 (2017).
Article CAS PubMed Google Scholar
Brenchley, R. et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491, 705–710 (2012).
Article CAS PubMed PubMed Central Google Scholar
The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).
Article Google Scholar
Ling, H. Q. et al. Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 496, 87–90 (2013).
Article CAS PubMed Google Scholar
Jia, J. et al. Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91–95 (2013).
Article CAS PubMed Google Scholar
Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 27, 787–792 (2017).
Article CAS PubMed PubMed Central Google Scholar
Zou, C. et al. Evolutionary and expression signatures of pseudogenes in Arabidopsis and rice. Plant Physiol. 151, 3–15 (2009).
Article CAS PubMed PubMed Central Google Scholar
Thibaud-Nissen, F., Ouyang, S. & Buell, C. R. Identification and characterization of pseudogenes in the rice gene complement. BMC Genomics 10, 317 (2009).
Article PubMed PubMed Central Google Scholar
Xiao, J. et al. Pseudogenes and their genome-wide prediction in plants. Int. J. Mol. Sci. 17, 1991 (2016).
Law, M. et al. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol. 167, 25–39 (2015).
Article CAS PubMed Google Scholar
Jiao, Y., Li, J., Tang, H. & Paterson, A. H. Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. The Plant Cell 26, 2792–2802 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucl. Acids Res. 40, e49 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tang, H., Bowers, J. E., Wang, X. & Paterson, A. H. Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc. Natl Acad. Sci. USA 107, 472–477 (2010).
Article CAS PubMed Google Scholar
International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005).
Article Google Scholar
Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Article CAS PubMed Google Scholar
The International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010).
Article Google Scholar
Devos, K. M. Grass genome organization and evolution. Curr. Opin. Plant Biol. 13, 139–145 (2010).
Article CAS PubMed Google Scholar
Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556 (1997).
CAS PubMed Google Scholar
Pink, R. C. & Carter, D. R. Pseudogenes as regulators of biological function. Essays in Biochemistry 54, 103 (2013).
Article CAS PubMed Google Scholar
Poliseno, L. et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 (2010).
Article CAS PubMed PubMed Central Google Scholar
Law, J. A. & Jacobsen, S. E. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet. 11, 204–220 (2010).
Article CAS PubMed PubMed Central Google Scholar
Li, Q. & Eichten, S. R. Genetic perturbation of the maize methylome. The Plant Cell 26, 4602–4616 (2014).
Article PubMed PubMed Central Google Scholar
Eichten, S. R. & Stuart, T. DNA methylation profiles of diverse Brachypodium distachyon align with underlying genetic diversity. Genome Res. 26, 1520–1531 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression. BMC Genomics 13, 300 (2012).
Article CAS PubMed PubMed Central Google Scholar
Alonso, C., Perez, R., Bazaga, P. & Herrera, C. M. Global DNA cytosine methylation as an evolving trait: phylogenetic signal and correlated evolution with genome size in angiosperms. Front. Genet. 6, 4 (2015).
Article PubMed PubMed Central Google Scholar
Li, Q. et al. RNA-directed DNA methylation enforces boundaries between heterochromatin and euchromatin in the maize genome. Proc. Natl Acad. Sci. USA 112, 14728–14733 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gent, J. I. et al. CHH islands: de novo DNA methylation in near-gene chromatin regulation in maize. Genome Res. 23, 628–637 (2013).
Article CAS PubMed PubMed Central Google Scholar
Regulski, M. et al. The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA. Genome Res. 23, 1651–1662 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sharp, P. J., Chao, S., Desai, S. & Gale, M. D. The isolation, characterization and application in the Triticeae of a set of wheat RFLP probes identifying each homoeologous chromosome arm. Theor. Appl. Genet. 78, 342–348 (1989).
Article CAS PubMed Google Scholar
Somers, D. J., Isaac, P. & Edwards, K. A high-density microsatellite consensus map for bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 109, 1105–1114 (2004).
Article CAS PubMed Google Scholar
Wang, S. et al. Characterization of polyploid wheat genomic diversity using a high-density 90,000 single nucleotide polymorphism array. Plant Biotechnol. J. 12, 787–796 (2014).
Article CAS PubMed PubMed Central Google Scholar
Winfield, M. O. et al. High-density SNP genotyping array for hexaploid wheat and its secondary and tertiary gene pool. Plant Biotechnol. J. 14, 1195–1206 (2016).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Yu, X. J., Zheng, H. K., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics 88, 745–751 (2006).
Article CAS PubMed Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Yuan, Y. W. & Wessler, S. R. The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies. Proc. Natl Acad. Sci. USA 108, 7884–7889 (2011).
Article CAS PubMed PubMed Central Google Scholar
Han, F., Lamb, J. C. & Birchler, J. A. High frequency of centromere inactivation resulting in stable dicentric chromosomes of maize. Proc. Natl Acad. Sci. USA 103, 3238–3243 (2006).
Article CAS PubMed PubMed Central Google Scholar
Gill, B. S., Friebe, B. & Endo, T. R. Standard karyotype and nomenclature system for description of chromosome bands and structural aberrations in wheat (Triticum aestivum). Genome 34, 830–839 (1991).
Article Google Scholar
Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55, 313–318 (2014).
Article CAS PubMed Google Scholar
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ramírez, F., Dündar, F., Diehl, S., Grüning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucl. Acids Res. 42, 187–191 (2014).
Article Google Scholar
Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
Article CAS PubMed Google Scholar
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucl. Acids Res. 34, W609–W612 (2006).
Article CAS PubMed PubMed Central Google Scholar
Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986).
CAS PubMed Google Scholar
Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank J. Dvorak and M.C. Luo for the AL8/78 line; L. Pan, L.L. Zheng, Y.H. Liu, X.Y. Zhang and D.P. Li for material preparation. This research was supported by the National Key R&D Program for Crop Breeding (2016YFD0101004), the National Natural Science Foundation of China (31261140368), and the CAAS-Innovation Team Project. Y.N. Jiao acknowledges start-up funding from the Youth Thousand Talents Program.

Author information

Guangyao Zhao, Cheng Zou and Kui Li contributed equally to this work.

Authors and Affiliations

Key Laboratory of Crop Gene Resources and Germplasm Enhancement, Ministry of Agriculture, The National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
Guangyao Zhao, Cheng Zou, Lifeng Gao, Xu Liu, Long Mao, Xiuying Kong & Jizeng Jia
Novogene Bioinformatics Institute, 100083, Beijing, China
Kui Li, Kai Wang & Wenkai Jiang
Agronomy College, Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, 450002, Zhengzhou, China
Tianbao Li
State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, 100093, Beijing, China
Xiaoxia Zhang & Yuannian Jiao
Center for Information in Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, 610054, Chengdu, China
Hongjin Wang & Zujun Yang

Authors

Guangyao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Zou
View author publications
You can also search for this author in PubMed Google Scholar
Kui Li
View author publications
You can also search for this author in PubMed Google Scholar
Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianbao Li
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxia Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongjin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zujun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenkai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Long Mao
View author publications
You can also search for this author in PubMed Google Scholar
Xiuying Kong
View author publications
You can also search for this author in PubMed Google Scholar
Yuannian Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Jizeng Jia
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.J., Y.J., X.K., L.M. and W.J. initiated the project and designed the study. G.Z., C.Z., K.L., K.W., T.L., L.G., X.Z., H.W. and Z.Y. performed the research. G.Z., C.Z., K.L., K.W., T.L., L.G., X.Z., H.W. and Z.Y. generated and analysed the data. J.J., Y.J., X.K., W.J., G.Z., C.Z., K.L. and X.L. wrote the paper.

Corresponding authors

Correspondence to Wenkai Jiang, Long Mao, Xiuying Kong, Yuannian Jiao or Jizeng Jia.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary Figures 1–20, Supplementary Tables 1–24, Supplementary References

Life Sciences Reporting Summary

Supplementary Table 25

Important agronomic genes isolated by map-based cloning approach in major crops

Supplementary Table 26

Mapping result of QTLs and important agronomic genes in major crops

Supplementary Table 27

Mapping results of SSR markers on the Ae. tauschii genome

Supplementary Table 28

Mapping results of RFLP markers on the Ae. tauschii genome

Supplementary Table 29

Mapping results of near one million SNP probes on the Ae. tauschii genome

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhao, G., Zou, C., Li, K. et al. The Aegilops tauschii genome reveals multiple impacts of transposons. Nature Plants 3, 946–955 (2017). https://doi.org/10.1038/s41477-017-0067-8

Download citation

Received: 21 September 2017
Accepted: 30 October 2017
Published: 20 November 2017
Issue Date: December 2017
DOI: https://doi.org/10.1038/s41477-017-0067-8

This article is cited by

Long-read sequencing reveals the structural complexity of genomic integration of HPV DNA in cervical cancer cell lines
- Zhijie Wang
- Chen Liu
- Xiaoyuan Huang
BMC Genomics (2024)
Characterization of sucrose nonfermenting-1-related protein kinase 2 (SnRK2) gene family in Haynaldia villosa demonstrated SnRK2.9-V enhances drought and salt stress tolerance of common wheat
- Jia Liu
- Luyang Wei
- Li Sun
BMC Genomics (2024)
A platform for whole-genome speed introgression from Aegilops tauschii to wheat for breeding future crops
- Hao Li
- Lele Zhu
- Chun-Peng Song
Nature Protocols (2024)
Transcriptome analysis in Aegilops tauschii unravels further insights into genetic control of stripe rust resistance
- Behnam Davoudnia
- Ali Dadkhodaie
- Mohsen Yassaie
Planta (2024)
A multi-omic resource of wheat seed tissues for nutrient deposition and improvement for human health
- Jingjing Zhi
- Jian Zeng
- Yin Li
Scientific Data (2023)

Subjects

Abstract

Similar content being viewed by others

Results and discussion

Genome assembly and feature annotation

Transposable element analysis

Impact of TEs on gene evolution and regulation

Integrated genetic map and key agronomic genes/QTLs map

Conclusions

Methods

Genome sequencing

Genome assembly and evaluation

Genome annotation

RNA sequencing and analysis

TE analysis

Giemsa-C banding and fluorescence in situ hybridization (FISH)

Whole genome bisulfite sequencing

Genome evolutionary analysis

Integrated genetic map and key agronomic genes/QTLs map

Life Sciences Reporting Summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links