The genome of the extremophile crucifer Thellungiella parvula

Dassanayake, Maheshi; Oh, Dong-Ha; Haas, Jeffrey S; Hernandez, Alvaro; Hong, Hyewon; Ali, Shahjahan; Yun, Dae-Jin; Bressan, Ray A; Zhu, Jian-Kang; Bohnert, Hans J; Cheeseman, John M

doi:10.1038/ng.889

Letter
Published: 07 August 2011

The genome of the extremophile crucifer Thellungiella parvula

Maheshi Dassanayake¹^na1,
Dong-Ha Oh¹^na1,
Jeffrey S Haas^1,2,
Alvaro Hernandez³,
Hyewon Hong^1,4,
Shahjahan Ali⁵,
Dae-Jin Yun⁴,
Ray A Bressan^4,6,7,
Jian-Kang Zhu^6,7,
Hans J Bohnert^1,4,7,8 &
…
John M Cheeseman¹

Nature Genetics volume 43, pages 913–918 (2011)Cite this article

12k Accesses
249 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Thellungiella parvula¹ is related to Arabidopsis thaliana and is endemic to saline, resource-poor habitats², making it a model for the evolution of plant adaptation to extreme environments. Here we present the draft genome for this extremophile species. Exclusively by next generation sequencing, we obtained the de novo assembled genome in 1,496 gap-free contigs, closely approximating the estimated genome size of 140 Mb. We anchored these contigs to seven pseudo chromosomes without the use of maps. We show that short reads can be assembled to a near-complete chromosome level for a eukaryotic species lacking prior genetic information. The sequence identifies a number of tandem duplications that, by the nature of the duplicated genes, suggest a possible basis for T. parvula's extremophile lifestyle. Our results provide essential background for developing genomically influenced testable hypotheses for the evolution of environmental stress tolerance.

You have full access to this article via your institution.

Download PDF

A chromosome-scale reference genome of Lobularia maritima, an ornamental plant with high stress tolerance

Article Open access 01 December 2020

Li Huang, Yazhen Ma, … Quanjun Hu

The genome of Magnolia biondii Pamp. provides insights into the evolution of Magnoliales and biosynthesis of terpenoids

Article Open access 01 March 2021

Shanshan Dong, Min Liu, … Shouzhou Zhang

Chromosome-level genome assembly, annotation and evolutionary analysis of the ornamental plant Asparagus setaceus

Article Open access 01 April 2020

Shu-Fen Li, Jin Wang, … Wu-Jun Gao

Main

According to phylogenetic studies based on fossil evidence³, the spilt between A. thaliana and the main Brassica group encompassing T. parvula in the subclade Eutremeae is thought to have occurred about 43 million years ago. Both T. parvula and A. thaliana have similar genome sizes, and their close taxonomic relationship provides unique opportunities for tracing evolutionary rearrangements between the two species.

The main goal of this project was to produce a de novo, scaffold-level, gap-free assembly of the T. parvula genome. To achieve this, we used second generation sequencing exclusively, including ROCHE-454 GS FLX Titanium sequencing for its read length advantage and Illumina GA2 sequencing for its higher quality reads. We included varying insert sizes of paired-end libraries in addition to single-end reads (Online Methods). In total, we obtained 7.8 × 10⁹ high quality basepairs, equivalent to ∼50-fold genome coverage. Of these, 85% came from the 454 sequencing (Supplementary Fig. 1 and Supplementary Table 1).

In the absence of genetic maps, with only limited physiological studies⁴, without prior genome information and with only very limited transcriptome sequences, we used an iterative hybrid approach to construct a draft genome (Online Methods). The result was a total of 1,496 meta-contigs (scaffolds) of merged primary contigs, ranging in size from 1 kb to 13.08 Mb (Table 1). However, unlike typical scaffold sequences, these meta-contigs were free of gaps. Overall, 73% of the length of the T. parvula draft genome was represented in 20 contigs longer than 1.5 Mb, and 85% of the sequenced genome was represented by the largest 60 contigs, each 100 kb or greater in length. Based on flow cytometry of propidium-iodide–stained nuclei⁵, the T. parvula genome had previously been estimated to be about 160 Mb, or 15% larger than that of A. thaliana. The total size of the curated and assembled T. parvula genome sequence space, however, was 137.09 Mb. This discrepancy is similar to those for A. thaliana (estimated as ∼150 Mb, or 25% longer than the sequenced genome⁶) and Cucumis sativus (estimated at 30% greater than the draft genome⁷).

Table 1 Overview of the T. parvula draft genome sequence

Full size table

Syntenic regions between T. parvula contigs and other Brassicaceae chromosomes were apparent after aligning T. parvula contigs with the A. thaliana genome (Fig. 1) and chromosome A3 of Brassica rapa⁸ (Supplementary Fig. 2). The 20 longest contigs covered all five A. thaliana chromosomes, with the exception of positions that approached and included centromeric regions. The largest T. parvula contig, c1 (13.08 Mb), aligned with the entire length of one arm of A. thaliana chromosome 1 (Fig. 1a).

**Figure 1: Macro synteny between *T. parvula* contigs and *A. thaliana* chromosomes.**

For T. parvula contigs and A. thaliana chromosomes, we annotated repetitive elements (Online Methods). Overall, repetitive sequences amounted to 7.5% of the T. parvula genome based on similarity searches against genomic repeat databases and de novo clustering of repetitive sequences (Supplementary Tables 2,3). Figure 1a and b show repeat distributions in combination with overall sequence alignment comparisons using Circos plots.

Repetitive sequences were distributed unevenly in both species. Repeat-rich sequences were concentrated near the centromeric regions in A. thaliana chromosomes⁶, as reported for other plant genomes^9,10; these sequences were, however, enriched toward the ends of T. parvula contigs (Fig. 1a). As a result of established difficulties in assembling repetitive sequences¹¹, we found repeat-rich sequences more frequently among the smaller T. parvula contigs (Fig. 1b). Thus, the average repeat content in the largest 20 contigs was 5.5%, whereas the next 40 contigs, c21–c60, contained 17.5% repeat content.

We predicted gene models using FGENESH++, GENSCAN and BLAST (see URLs) searches to minimize false positive predictions. We based annotations on sequence similarity identified using independent BLAST searches and the Blast2GO pipeline (Online Methods). We manually inspected predicted open reading frames (ORFs) whose length deviated more than 20% from the putative A. thaliana homologs for exon merging or splitting. T. parvula contained a total of 28,901 predicted protein-coding ORFs. This is about 7% more than A. thaliana, which contains 27,059 protein-coding complementary DNAs (cDNAs) (excluding chloroplast and mitochondrial genes and based on the TAIR9 release). We mapped Illumina short read sequences from the transcriptome of young T. parvula plant tissues to 19,176 of these predicted ORFs (Online Methods and Supplementary Table 4).

The mean size of the predicted ORFs was 1,252 bp, with 71% of the ORFs between 201 bp and 1,500 bp in length (Fig. 2a). This distribution is similar to that of A. thaliana protein-coding cDNAs (Supplementary Fig. 3). The GC contents were substantially higher in exons than in introns and intergenic regions (Table 1). Based on sequence similarity searches to the NCBI nucleotide database, the primary matches for the T. parvula predicted ORFs were most frequently coding regions from Arabidopsis lyrata (53%), A. thaliana (29%) and B. rapa (5%) (Supplementary Table 5). BLASTn searches of T. parvula ORFs against A. thaliana cDNAs identified 25,783 (89%) hits (e value < 0.00001). Among these, 21,523 ORFs were of very similar lengths (80–120%) to their putative A. thaliana homologs (Fig. 2b). The arrangement of predicted ORFs in the T. parvula genome showed extensive macro-synteny with A. thaliana with infrequent rearrangements (Supplementary Table 4), mirroring the genome-wide alignments observed between A. thaliana chromosomes and T. parvula contigs. Each of the 20 largest T. parvula contigs consisted mostly of ORFs that shared sequence similarity with genes from a single A. thaliana chromosome, the exception being contig c3, which shared similarity with genes from three chromosomes (Supplementary Table 6).

**Figure 2: Prediction and annotation of ORFs in the *T. parvula* draft genome.**

A total of 3,118 predicted ORFs had no BLASTn hits to A. thaliana cDNAs even at lowered stringency levels (e value > 0.001). We have listed these as unidentified ORFs in Supplementary Table 4. Notably, these putative ORFs were enriched in regions containing larger numbers of repetitive sequences, possibly indicating T. parvula–specific transposable elements (for example, contigs c17 and c18 in Supplementary Table 6 and the histograms in the outer circle of Fig. 1a). The draft genome also includes 86.6 kb of noncoding RNAs based on sequence searches against microRNA (miRNA) and other noncoding RNA databases (Supplementary Tables 2,7).

We assigned Gene Ontology (GO) terms for the T. parvula predicted ORFs using the Blast2GO pipeline¹² and compared them with the A. thaliana transcriptome (Fig. 2 and Supplementary Table 8). In the GO class 'biological processes', subcategories of 'response to abiotic or biotic stimulus' and 'developmental processes' were enriched in T. parvula, whereas genes in the subcategory 'signal transduction' were underrepresented (Fig. 2c). In the GO class 'molecular function', we found the subcategories of 'transporter activity' and 'receptor binding or activity' to be significantly different between the species (Fig. 2d). Among genes annotated as performing transporter activities, the numbers of ATPase and nucleotide, cation and sugar transporters were significantly higher in T. parvula than in A. thaliana (Table 2). These differences may reflect the different habitats and environmental pressures to which the species adapted. ATPase and nucleotide transporters with functions in pH homeostasis and cellular energy generation have, for example, been related to protection under salinity stress^13,14, whereas transport and accumulation of soluble sugars or polyols are considered key mechanisms that provide osmotic stress tolerance¹⁵. We found the most significant difference in gene copy numbers for transporters of cations other than Na⁺ and K⁺, perhaps reflecting the adaptation of T. parvula to soil not only containing saline but also imbalanced in other ions^2,16.

Table 2 Detailed comparison of GO categories with transporter activity or receptor binding or activity between T. parvula and A. thaliana

Full size table

Gene copy number variation has also been proposed as a major mechanism of phenotypic differentiation and as reflecting evolutionary adaptation to the environment^17,18,19. The T. parvula genome included 1,842 more predicted ORFs than protein coding cDNAs in A. thaliana (Supplementary Fig. 3). Mirroring the observed differences in the GO subcategories, the T. parvula genome contained higher copy numbers of orthologous genes related to stress adaptation, for example, AVP1, HKT1, NHX8 (ref. 20), CBL10 (ref. 21) and MYB47 (Supplementary Table 4).

Gene duplication as a vehicle for evolution has long been hypothesized²², and experimental evidence of this has recently been accumulating^7,23,24,25. In both T. parvula and A. thaliana, the major role in generating copy number variation has been played by tandem gene duplication (Fig. 3a) rather than by large gains or losses in segment composition following the A. thaliana and T. parvula divergence after the most recent whole-genome duplication³. We found a total of 1,278 and 1,113 tandem duplication events in the T. parvula and A. thaliana genomes, respectively. Only half of these were shared between the two species (Fig. 3b and Supplementary Table 9). Inspection of the GO class representation of tandem duplications revealed significantly different GO 'biological process' (Fig. 3c) and GO 'molecular function' (Fig. 3d) subcategories. Differences in gene numbers in the subcategories 'response to abiotic or biotic stimulus' and 'developmental processes' (Fig. 2c) were most prominent among genes multiplied by tandem duplication (Fig. 3c), as supported by substantially lower P values (Supplementary Table 8).

**Figure 3: Comparison of local tandem duplication (T.D.) events in the *A. thaliana* genome and the *T. parvula* draft genome.**

Finally, Figure 4 and Supplementary Table 10 show the assembly of the T. parvula contigs into the seven chromosomes that characterize this species. The evolution of chromosome structures in Brassicaceae has previously been traced through comparative 'chromosome painting' techniques using BAC-size sequence probes from the A. thaliana genome²⁶. With these techniques, Lysak and colleagues²⁶ identified large genome segments, termed A to X, derived from an ancestral karyotype (n = 8). These ancestral karyotypes can be found in different assemblages in chromosome structures of different Brassicaceae clades²⁶, including A. thaliana²⁷ (Fig. 4a) and Eutremeae²⁸ (n = 7). Using these as guides, the 40 largest T. parvula contigs could be unambiguously assembled into seven chromosomes²⁸ covering 114.39 Mb (83% of the draft genome) (Fig. 4b and Supplementary Table 10). Each of these has a distinct, repeat-rich region, signifying the centromere (Fig. 4c, outer histogram). That the five largest contigs (c1–c5) covered the entire lengths of single chromosome arms attests to the quality of the de novo assembly. It is further noteworthy that the genomic regions in T. parvula contigs c2 and c3, although showing extensive rearrangements compared to the A. thaliana genome sequence, matched distinct ancestral karyotype blocks (Supplementary Fig. 4a, ancestral karyotype blocks R and W for c2, and Supplementary Fig. 4b, ancestral karyotype blocks V, K, L, Q, V′ and X for c3). Thus, our model for the T. parvula chromosomes provides sequence-based evidence for the Lysak model for crucifer species with n = 7, including the clade Eutremeae²⁸. It also defines the boundaries of ancestral karyotype blocks more clearly and suggests more detailed structure than can be captured by chromosome painting experiments alone. This is particularly clear with respect to ancestral karyotype block V, which, based on sequence information, was divided into the blocks V and V′ in the T. parvula genome (Fig. 4b and Supplementary Fig. 4b). Also, ancestral karyotype block I extended to the pericentromeric region of T. parvula chromosome 4 (Fig. 4b,c) rather than falling entirely to one side of the centromere, as previously indicated by the chromosome painting experiments in various crucifer species with n = 7 (ref. 28).

**Figure 4: Assembly of the seven chromosomes of *T. parvula*.**

A number of angiosperm families include extremophile species, although fewer than 10% of all plant species may be classified this way. Extremophiles' presence in evolutionarily distinct lineages reveals genetic complexities that appear to have evolved from the common genetic makeup of all plants. In adaptation to various combinations of environmental stresses, these extremophiles show tolerance of stresses against which crop plants in particular have no defenses. Knowing how extremophiles operate can, however, instruct us about the underlying genetic requisites and mechanisms for successful stress defenses. In this report, we have now shown that it is possible to determine the genome sequence of extremophiles, as well as model glycophytes, exclusively relying on next-generation DNA sequencing tools and de novo assembly.

The availability of the T. parvula genome provides a unique view of chromosome structure, organization and gene complement. Of particular importance is the comparison of this genome with that of the related A. thaliana, which is unquestionably a stress-sensitive species. In our initial analysis, this halophyte, with a genome only ∼15% larger than that of A. thaliana, shows striking differences in gene complement. The differences are partly because of tandem duplications in T. parvula of single copy genes in A. thaliana and preferential amplification of genes with known or assumed functions in stress defense responses. Within these differences, we expect, lie the unique solutions to understanding T. parvula's particular lifestyle and adaptation to its demanding ecological niche. More detailed examination of genome structure, coding complexity, and gene structure and expression in stress response pathways in comparative studies will point the way toward correlating the T. parvula phenotype with its genetic makeup.

URLs.

SeqAnswers online forum, http://seqanswers.com/; GENSCAN, http://genes.mit.edu/GENSCAN.html; BLAST, http://blast.ncbi.nlm.nih.gov/Blast.cgi; FGENESH++, http://linux1.softberry.com/berry.phtml?topic=fgeneshplus2; TAIR GOslim, ftp://ftp.arabidopsis.org/home/tair/Ontologies/; sff_extract program, http://bioinf.comav.upv.es/sff_extract/; Vmatch suite, http://www.vmatch.de/; AMOS pipeline, http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS; Repbase, http://www.girinst.org/repbase/; Plant Repeat database, ftp://ftp.plantbiology.msu.edu/pub/data/TIGR_Plant_Repeats/; TAIR9 cDNA database, ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR9_blastsets/TAIR9_cdna_20090619; miRBase, http://www.mirbase.org/; Rfam database, http://rfam.sanger.ac.uk/; GraphPad QuickCalcs, http://www.graphpad.com/quickcalcs/contingency1.cfm.

Methods

Plant material and DNA extraction.

Total DNA was isolated from 10-day-old seedlings of T. parvula. The seeds were derived from a single plant propagated from single seeds over eight successive generations. The original accession was collected from a salt lake in Tuz Golu, central Turkey at an elevation of 905 m above sea level. At the collection site, the soil bulk density was 1.225 g/cm³ with 32.4% salts by weight. Genomic DNA was prepared using the Nucleon Phytopure Genomic DNA Extraction kit (GE Healthcare).

Strategy for a highly contiguous draft genome.

Compared to Sanger sequencing, the shorter reads associated with either 454 or Illumina sequencing manifest decreased connectivity. As a result, considerably deeper coverage is required to generate contiguous assemblies. Deeper coverage alone, however, does not in itself solve the problem of fragmented assemblies; if reads are shorter than a repeat, gaps are unavoidable, and with deeper coverage, accumulated sequencing errors make assembly more computationally challenging. In assembling the T. parvula genome, the problem was mitigated by (i) using reads from different technologies, (ii) using paired reads with different insert lengths to span different repeat lengths and (iii) computationally selecting high quality reads.

Overview of sequencing, assembly and annotation.

Library construction and sequencing were performed in the W.M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign. Random shotgun genomic libraries were constructed according to the manufacturer's recommendations for each of the two pyrosequencing platforms, GS FLX Titanium (454 Life Sciences) and Illumina GA2 (Illumina). Newbler (454-Roche), ABySS²⁹ and minimus2 (ref. 30) were used as the main assembly programs to generate the draft genome, and FGENESH++ (SoftBerry), GENSCAN, BLAST (see URLs) and Blast2GO¹² were used to predict and annotate gene models.

DNA library preparation and sequencing.

For 454 pyrosequencing, both shotgun and paired-end libraries were constructed. Genomic DNA was randomly sheared by nebulization to fragments of 500–800 bp in length to construct two shotgun libraries. Additional DNA was processed to construct paired-end libraries with size spans of 3 kb (three libraries), 8 kb (two libraries) and 20 kb (two libraries). All libraries were constructed, clonally amplified and sequenced on the 454 Genome Sequencer FLX-Titanium according to the manufacturer's kits and protocols (454 Life Sciences). Signal processing and base calling were performed using the bundled GS FLX software version 2.0.01.

For Illumina sequencing, genomic DNA was nebulized, and fragments 200–500 bp in length were size selected to construct a shotgun library using the Illumina Genomic DNA Sample Prep kit (Illumina). The library was sequenced on three lanes of a flowcell from one end (single read) for 81 cycles on a Genome Analyzer IIx. The Illumina Pipeline 1.5 was used to generate fastq sequence files from the raw data.

Hybrid genome assembly.

A combined total of 7.8 × 10⁹ bases resulted from sequencing using both platforms. Average read sizes were 355 bp and 80 bp for the 454 and Illumina sequences, respectively. Approximately 85% of all sequences were derived from the 454 sequencing. We followed an iterative approach for assembly starting from raw sequence reads assembled into primary contigs. We used two assembly programs and combined the primary contigs and paired-end data to build scaffolds in successive assemblies. Single and paired-end 454 sequences were assembled using the Roche GS assembler, Newbler (version 2.0.01.14), with a 40 bp minimum overlap and 90% identity. In both instances, reads were first assembled as single-end reads, after which the paired-end information was used to construct scaffolds.

To assemble Illumina reads, we tested both Velvet³¹ (v1.3) and ABySS²⁹ (v1.2) short-read assemblers using only reads that passed the Illumina chastity filter (base call values for chastity greater than 0.6 in the first 25 cycles); the k-mer size was set to 31 bp, and the coverage cutoff was set to 4. Both assemblers produced comparable results, but ABySS was much faster and was, therefore, chosen for further optimized short read assemblies. We also used Newbler contigs as single reads with the Illumina reads as the input in ABySS. We tested every odd-numbered length from 29 bp to 61 bp as the k-mer size to find the optimal size, meaning that which yielded the longest N50 and the fewest total contigs while maintaining total contig length near the flow-cytrometry–estimated genome size of 160–180 Mb.

Because ABySS can be very sensitive to sequencing errors such as short indels, when using raw 454 reads in ABySS, custom Perl scripts were used to remove any raw 454 read that had homopolymers exceeding 10 bases (all 454 homopolymer error reads cannot be removed but can be minimized). To enable the scaffold generating step in ABySS to proceed when 454 raw paired-ends reads were used, the program sff_extract (see URLs) was used to process the standard flowgram (sff) files generated by the GS FLX sequencer. Different k-mer sizes were selected based on the different paired-end libraries used for scaffolding, but in most instances, the optimum k-mer size found with ABySS was 41.

The collection of contigs and scaffolds created in the primary assemblies was an overlapping set with a high level of redundancy. To select a non-redundant set, we used mkvtree in the Vmatch suite (see URLs) to index the sequences by length; Vmatch was used to cluster sequences, including clusters with size of 1 (singlets). We matched for 100% identity and full coverage of the smaller sequences in pairs. Contigs longer than 1,000 bp were used for further processing. This set was further inspected with all-against-all BLAST searches and the aligner NUCmer in MUMmer 3.22 (ref. 32) to remove duplicate contigs that may have been assembled for the same region of the genome.

The meta-assembly of selected contigs from primary assemblies was carried out with the overlap-layout-consensus assembler, Minimus2 in the AMOS pipeline (see URLs), using a minimum 40-bp overlap with 95% identity. The resulting contigs and singlets were combined and purged of further redundancy, contaminating DNA and mitochondrial and chloroplast DNA using BLAST searches. This resulted in 1,496 contigs with a total length of ∼137 Mb.

Genome annotation.

The T. parvula draft genome was masked for repetitive sequences by RepeatMasker³³ searching Repbase 14.01 and with BLASTn using the Plant Repeat database (see URLs). The masked contigs for known repetitive elements were further analyzed with NUCmer and custom scripts to search for long tandem repeats and for T. parvula–specific unclassified, non-exact, long repeats. Any sequences that were found more than five times were considered as repeats in this search.

FGENESH++ (SoftBerry) was used to predict protein coding ORFs in the T. parvula draft genome masked for repetitive sequences, with parameters optimized for dicot plants and protein sequences from the NCBI non-redundant (NR) database as reference. A total of 29,338 ORFs were predicted, of which 437 were further annotated as transposable elements based on BLASTn searches. Genomic regions that contained FGENESH++-predicted ORFs with lengths similar to their Arabidopsis homologs (± 20%) were tested with another gene prediction program, GENESCAN. When the predictions from the two programs deviated for the same genomic region, the ORF closest in length to another known homologous cDNA was taken as the more likely prediction. All genomic contigs and predicted ORFs were searched against NCBI nucleotide and protein databases and TAIR9 cDNA database (see URLs) using BLASTn and BLASTx searches. The predicted proteins were further annotated with the Blast2GO pipeline¹² to assign GO and GOslim-plant terms based on NCBI plant databases and InterProScan³⁴. To obtain experimental evidence for our ab initio predictions, we mapped the ORFs to high quality Illumina reads trimmed to 80 bp from a transcriptome sequence library generated from young seedlings. Using the program Bowtie³⁵ with 100% identity to a minimum length of 50 bp and with '-m' set to 1 to ensure unique mapping, we found that 73% of the high quality reads mapped uniquely to the predicted ORFs (Supplementary Table 4). The remaining reads are too repetitive in nature, map to multiple ORFs or contain low complexity regions and are therefore unusable in mapping.

BLAST searches were performed to identify miRNA genes and other RNA genes by searching against the miRBase database of plant miRNA collections (release 16) and the Rfam database (see URLs) (release 10) for other non-coding RNA families including rRNA and tRNA genes.

Statistical analyses.

When comparing the distributions of GO subcategories between A. thaliana and T. parvula (Figs. 2 and 3), two-tailed χ² tests were used (see URLs). For each GO subcategory, a 2 × 2 contingency table was constructed by recording the numbers of genes included or not included in a subcategory for each species and ranking the statistical significance of the differences.

Accession codes.

The raw reads for this project are deposited in the NCBI SRA project under the accession number SRA026763. The Illumina reads can be accessed under SRX047632 and the 454 reads under SRX032604. The genome assembly is deposited with the NCBI Genome Project ID 63843, and the sequences are deposited with the GenBank ID AFAN00000000.1.

Accession codes

Accessions

GenBank/EMBL/DDBJ

63843

NCBI Reference Sequence

AFAN00000000.1

Sequence Read Archive

References

Al-Shehbaz, I.A. & O'Kane, S.L. Placement of Arabidopsis parvula in Thellungiella (Brassicaceae). Novon. 5, 309–310 (1995).
Article Google Scholar
Amtmann, A. Learning from evolution: Thellungiella generates new knowledge on essential and critical components of abiotic stress tolerance in plants. Mol. Plant 2, 3–12 (2009).
Article CAS Google Scholar
Beilstein, M.A. et al. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107, 18724–18728 (2010).
Article CAS Google Scholar
Orsini, F. et al. A comparative study of salt tolerance parameters in 11 wild relatives of Arabidopsis thaliana. J. Exp. Bot. 61, 3787–3798 (2010).
Article CAS Google Scholar
Oh, D.-H. et al. Genome structures and halophyte-specific gene expression of the extremophile Thellungiella parvula in comparison with Thellungiella salsuginea (Thellungiella halophila) and Arabidopsis. Plant Physiol. 154, 1040–1052 (2010).
Article CAS Google Scholar
Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 1275–1281 (2009).
Article CAS Google Scholar
Mun, J.-H. et al. Sequence and structure of Brassica rapa chromosome A3. Genome Biol. 11, R94 (2010).
Article Google Scholar
Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).
Article CAS Google Scholar
International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010).
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
Article CAS Google Scholar
Conesa, A. & Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Intl. J. Plant Genomics 2008, 619832 (2008).
Article Google Scholar
Oh, D.-H. et al. Intracellular consequences of SOS1 deficiency during salt stress. J. Exp. Bot. 61, 1205–1213 (2010).
Article CAS Google Scholar
Gao, F. et al. Cloning of an H⁺-PPase gene from Thellungiella halophila and its heterologous expression to improve tobacco salt tolerance. J. Exp. Bot. 57, 3259–3270 (2006).
Article CAS Google Scholar
Lugan, R. et al. Metabolome and water homeostasis analysis of Thellungiella salsuginea suggests that dehydration tolerance is a key response to osmotic stress in this halophyte. Plant J. 64, 215–229 (2010).
Article CAS Google Scholar
Inan, G. et al. Salt cress. A halophyte and cryophyte Arabidopsis relative model system and its applicability to molecular genetic analyses of growth and development of extremophiles. Plant Physiol. 135, 1718–1737 (2004).
Article CAS Google Scholar
Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Article CAS Google Scholar
Dassanayake, M. et al. Transcription strength and halophytic lifestyle. Trends Plant Sci. 16, 1–3 (2011).
Article CAS Google Scholar
Hastings, P.J. et al. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).
Article CAS Google Scholar
Craig Plett, D. & Møller, I.S. Na⁺ transport in glycophytic plants: what we know and would like to know. Plant Cell Environ. 33, 612–626 (2010).
Article Google Scholar
Quan, R. et al. SCABP8/CBL10, a putative calcium sensor, interacts with the protein kinase SOS2 to protect Arabidopsis shoots from salt stress. Plant Cell 19, 1415–1431 (2007).
Article CAS Google Scholar
Ohno, S. Evolution by Gene Duplication 160 (Springer, New York, New York, USA, 1970).
Hanada, K. et al. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 148, 993–1003 (2008).
Article CAS Google Scholar
Cannon, S.B. et al. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 4, 10 (2004).
Article Google Scholar
DeBolt, S. Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biol. Evol. 2, 441–453 (2010).
Article Google Scholar
Lysak, M.A. & Koch, M.A. Phylogeny, genome, and karyotype evolution of crucifers (Brassicaceae). in Genetics and Genomics of the Brassicaceae (eds. Schmidt, R. & Bancroft, I.). 1–31 (Springer, New York, New York, USA, 2011).
Lysak, M.A. et al. Mechanisms of chromosome number reduction in Arabidopsis thaliana and related Brassicaceae species. Proc. Natl. Acad. Sci. USA 103, 5224–5229 (2006).
Article CAS Google Scholar
Mandáková, T. & Lysak, M.A. Chromosomal phylogeny and karyotype evolution in x = 7 crucifer species (Brassicaceae). Plant Cell 20, 2559–2570 (2008).
Article Google Scholar
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Article CAS Google Scholar
Sommer, D.D. et al. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8, 64 (2007).
Article Google Scholar
Zerbino, D.R. & Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Article CAS Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Article Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS Google Scholar
Zdobnov, E.M. & Apweiler, R. InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).
Article CAS Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article Google Scholar

Download references

Acknowledgements

We thank M.P. D'Urzo (Purdue University, West Lafayette, Indiana, USA) for providing plant materials and J.-H. Mun (National Academy of Agricultural Science, Suwon, Korea) for providing the B. rapa chromosome sequence. We also gratefully acknowledge M. Vaughn (University of Texas, Austin, Texas, USA), S. Jackman, M. Krzywinski (Michael Smith Genome Sciences Center, Vancouver, British Columbia, Canada) and SeqAnswers online forum (see URLs) for advice on genome assembly and visualization. Funding has been provided by King Abdullah University of Science and Technology, Thuwal, Saudi Arabia, by the World Class University Program (R32–10148) at Gyeongsang National University, Republic of Korea and the Next-generation BioGreen21 Program (SSAC, PJ008025), Rural Development Administration, Republic of Korea.

Author information

Maheshi Dassanayake and Dong-Ha Oh: These authors contributed equally to this work.

Authors and Affiliations

Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Maheshi Dassanayake, Dong-Ha Oh, Jeffrey S Haas, Hyewon Hong, Hans J Bohnert & John M Cheeseman
Office of Networked Information Technology, School of Integrative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Jeffrey S Haas
Center for Comparative & Functional Genomics, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Alvaro Hernandez
Division of Applied Life Science (BK21 program), Gyeongsang National University, Jinju, Korea
Hyewon Hong, Dae-Jin Yun, Ray A Bressan & Hans J Bohnert
Bioscience Core Laboratory-Genomics, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Shahjahan Ali
Department of Horticulture & Landscape Architecture, Purdue University, West Lafayette, Indiana, USA
Ray A Bressan & Jian-Kang Zhu
Center for Plant Stress Genomics and Biotechnology, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Ray A Bressan, Jian-Kang Zhu & Hans J Bohnert
Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
Hans J Bohnert

Authors

Maheshi Dassanayake
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Ha Oh
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey S Haas
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
Hyewon Hong
View author publications
You can also search for this author in PubMed Google Scholar
Shahjahan Ali
View author publications
You can also search for this author in PubMed Google Scholar
Dae-Jin Yun
View author publications
You can also search for this author in PubMed Google Scholar
Ray A Bressan
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Kang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hans J Bohnert
View author publications
You can also search for this author in PubMed Google Scholar
John M Cheeseman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.D., D.-H.O., H.J.B. and J.M.C. designed, performed, analyzed experiments and wrote the paper; J.S.H. compiled programs and wrote custom scripts; A.H. and S.A. performed sequencing; H.H. prepared materials; D.-J.Y., R.A.B. and J.-K.Z. provided materials and intellectual feedback.

Corresponding authors

Correspondence to Maheshi Dassanayake, Dong-Ha Oh or Dae-Jin Yun.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dassanayake, M., Oh, DH., Haas, J. et al. The genome of the extremophile crucifer Thellungiella parvula. Nat Genet 43, 913–918 (2011). https://doi.org/10.1038/ng.889

Download citation

Received: 21 March 2011
Accepted: 24 June 2011
Published: 07 August 2011
Issue Date: September 2011
DOI: https://doi.org/10.1038/ng.889

This article is cited by

A high-quality chromosome-level Eutrema salsugineum genome, an extremophile plant model
- Meng Xiao
- Guoqian Hao
- Quanjun Hu
BMC Genomics (2023)
Evolutionary diversification of cytokinin-specific glucosyltransferases in angiosperms and enigma of missing cis-zeatin O-glucosyltransferase gene in Brassicaceae
- Lenka Záveská Drábková
- David Honys
- Václav Motyka
Scientific Reports (2021)
Phylogeny and biogeography of the genus Hesperis (Brassicaceae, tribe Hesperideae) inferred from nuclear ribosomal DNA sequence data
- Atena Eslami-Farouji
- Hamed Khodayari
- Bariş Özüdoğru
Plant Systematics and Evolution (2021)
A chromosome-scale reference genome of Lobularia maritima, an ornamental plant with high stress tolerance
- Li Huang
- Yazhen Ma
- Quanjun Hu
Horticulture Research (2020)
Arabidopsis HOS15 is a multifunctional protein that negatively regulate ABA-signaling and drought stress
- Akhtar Ali
- Dae-Jin Yun
Plant Biotechnology Reports (2020)