The genome of the extremophile crucifer Thellungiella parvula

Journal name:
Nature Genetics
Year published:
Published online

Thellungiella parvula1 is related to Arabidopsis thaliana and is endemic to saline, resource-poor habitats2, making it a model for the evolution of plant adaptation to extreme environments. Here we present the draft genome for this extremophile species. Exclusively by next generation sequencing, we obtained the de novo assembled genome in 1,496 gap-free contigs, closely approximating the estimated genome size of 140 Mb. We anchored these contigs to seven pseudo chromosomes without the use of maps. We show that short reads can be assembled to a near-complete chromosome level for a eukaryotic species lacking prior genetic information. The sequence identifies a number of tandem duplications that, by the nature of the duplicated genes, suggest a possible basis for T. parvula's extremophile lifestyle. Our results provide essential background for developing genomically influenced testable hypotheses for the evolution of environmental stress tolerance.

At a glance


  1. Macro synteny between T. parvula contigs and A. thaliana chromosomes.
    Figure 1: Macro synteny between T. parvula contigs and A. thaliana chromosomes.

    Comparison of the 20 largest T. parvula contigs, c1–c20 (a) and the 40 next largest contigs, c21–c60 (b) with A. thaliana chromosomes. A. thaliana chromosomes 1–5 are depicted as red, green, yellow, purple and blue, respectively, with the centromeric regions indicated by black bands. T. parvula contigs are represented by gray blocks. Regions containing more than 75% similarity over a minimum of 2,000 bp with maximum gap allowance of 1,000 bp are connected with lines of colors matching those used for coloring the A. thaliana chromosomes. Ticks in each chromosome or contig block indicate lengths in 1 Mb. The distributions of protein coding regions and repetitive sequences are shown in the outer circles, with the percentage of protein coding genes, DNA transposons and retrotransposons shown in blue, yellow and orange, respectively, with a window size of 0.1 Mb. In the T. parvula contigs, predicted protein coding genes without BLASTn hits (e value < 0.0001) against the A. thaliana cDNA database are shown in green.

  2. Prediction and annotation of ORFs in the T. parvula draft genome.
    Figure 2: Prediction and annotation of ORFs in the T. parvula draft genome.

    (a) Length distribution of predicted T. parvula ORFs. (b) Comparison of T. parvula predicted ORFs with A. thaliana cDNAs showing the highest BLASTn hit score. The ratio of T. parvula ORF length to A. thaliana cDNA length is given as a percentage. In both a and b, the vertical axes and numbers above the bars are counts. Comparison of GO 'biological processes' (c) and GO 'molecular function' categories (d) between A. thaliana cDNAs (At) and T. parvula predicted ORFs (Tp). The GO categories are as defined in TAIR GOslim (see URLs). Categories with significant differences calculated using a χ2 test, as described in the Online Methods, are indicated as *P < 0.05 or **P < 0.01. In c, the GOslim categories 'other metabolic processes' (GO:0008152), 'other physiological processes' (GO:0007582) and 'other biological processes' (GO:0008150) are not shown. The complete list of cDNA and ORF numbers in each of the GO categories and their associated P values are listed in Supplementary Table 8.

  3. Comparison of local tandem duplication (T.D.) events in the A. thaliana genome and the T. parvula draft genome.
    Figure 3: Comparison of local tandem duplication (T.D.) events in the A. thaliana genome and the T. parvula draft genome.

    (a) Examples of tandem duplications. Examples shown are for the chromosome and contig regions containing HKT1, CBL10 and MYB47. (b) A Venn diagram showing shared and specific tandem duplication events in T. parvula and A. thaliana. We defined a tandem duplication event as the presence of more than one gene with the same annotation in one location or more than one gene in one location separated by not more than one other gene with a different annotation. The numbers of genes involved in the duplication events are given in parentheses. Tandem duplications of genes with the same annotations in both species are counted as shared events. Comparison of the GO 'biological processes' (c) and 'molecular function' categories (d) between T. parvula ORFs and A. thaliana cDNAs for genes showing tandem duplications. The radial axes are the percentages of cDNA or ORFs in each GO category compared to the number of total tandem duplicated cDNA or ORFs. Categories showing significant differences are marked as *P < 0.05 or **P < 0.01. The number of tandem duplicated cDNAs or ORFs in each GO category and P values are listed in Supplementary Table 8. The complete list of tandem duplicated cDNAs and ORFs is presented as Supplementary Table 9.

  4. Assembly of the seven chromosomes of T. parvula.
    Figure 4: Assembly of the seven chromosomes of T. parvula.

    (a) Outline of the ancestral karyotype segments determined by comparative chromosome painting techniques26, 27 in A. thaliana chromosomes. The ancestral karyotype segments, denoted A to X, are drawn to scale based on the A. thaliana genome sequence. (b) T. parvula contigs aligned to the Eutremeae (n = 7) karyotype schema26, 28 and the ORFs defining the borders of the ancestral karyotype segments. A. thaliana locus IDs showing the highest homology with each ORF are given in parentheses. Shown are T. parvula contigs covering the ancestral karyotype segments. Complete chromosome assignment of the 40 largest contigs, including the contigs covering the centromeric regions, are presented in Supplementary Table 10. (c) Circos plot presenting the assembly of seven chromosomes. The 40 largest T. parvula contigs are shown. The links and histograms in the outer circles showing the distribution of protein coding genes and repetitive sequences were generated as in Figure 1. The ancestral karyotype segments in the A. thaliana chromosomes and T. parvula contigs and the links connecting them are depicted with colors as in a and b.

Accession codes

Referenced accessions

NCBI Reference Sequence


Sequence Read Archive


  1. Al-Shehbaz, I.A. & O'Kane, S.L. Placement of Arabidopsis parvula in Thellungiella (Brassicaceae). Novon. 5, 309310 (1995).
  2. Amtmann, A. Learning from evolution: Thellungiella generates new knowledge on essential and critical components of abiotic stress tolerance in plants. Mol. Plant 2, 312 (2009).
  3. Beilstein, M.A. et al. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107, 1872418728 (2010).
  4. Orsini, F. et al. A comparative study of salt tolerance parameters in 11 wild relatives of Arabidopsis thaliana. J. Exp. Bot. 61, 37873798 (2010).
  5. Oh, D.-H. et al. Genome structures and halophyte-specific gene expression of the extremophile Thellungiella parvula in comparison with Thellungiella salsuginea (Thellungiella halophila) and Arabidopsis. Plant Physiol. 154, 10401052 (2010).
  6. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815 (2000).
  7. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 41, 12751281 (2009).
  8. Mun, J.-H. et al. Sequence and structure of Brassica rapa chromosome A3. Genome Biol. 11, R94 (2010).
  9. Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178183 (2010).
  10. International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763768 (2010).
  11. Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 6165 (2011).
  12. Conesa, A. & Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Intl. J. Plant Genomics 2008, 619832 (2008).
  13. Oh, D.-H. et al. Intracellular consequences of SOS1 deficiency during salt stress. J. Exp. Bot. 61, 12051213 (2010).
  14. Gao, F. et al. Cloning of an H+-PPase gene from Thellungiella halophila and its heterologous expression to improve tobacco salt tolerance. J. Exp. Bot. 57, 32593270 (2006).
  15. Lugan, R. et al. Metabolome and water homeostasis analysis of Thellungiella salsuginea suggests that dehydration tolerance is a key response to osmotic stress in this halophyte. Plant J. 64, 215229 (2010).
  16. Inan, G. et al. Salt cress. A halophyte and cryophyte Arabidopsis relative model system and its applicability to molecular genetic analyses of growth and development of extremophiles. Plant Physiol. 135, 17181737 (2004).
  17. Sudmant, P.H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641646 (2010).
  18. Dassanayake, M. et al. Transcription strength and halophytic lifestyle. Trends Plant Sci. 16, 13 (2011).
  19. Hastings, P.J. et al. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551564 (2009).
  20. Craig Plett, D. & Møller, I.S. Na+ transport in glycophytic plants: what we know and would like to know. Plant Cell Environ. 33, 612626 (2010).
  21. Quan, R. et al. SCABP8/CBL10, a putative calcium sensor, interacts with the protein kinase SOS2 to protect Arabidopsis shoots from salt stress. Plant Cell 19, 14151431 (2007).
  22. Ohno, S. Evolution by Gene Duplication 160 (Springer, New York, New York, USA, 1970).
  23. Hanada, K. et al. Importance of lineage-specific expansion of plant tandem duplicates in the adaptive response to environmental stimuli. Plant Physiol. 148, 9931003 (2008).
  24. Cannon, S.B. et al. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 4, 10 (2004).
  25. DeBolt, S. Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biol. Evol. 2, 441453 (2010).
  26. Lysak, M.A. & Koch, M.A. Phylogeny, genome, and karyotype evolution of crucifers (Brassicaceae). in Genetics and Genomics of the Brassicaceae (eds. Schmidt, R. & Bancroft, I.). 131 (Springer, New York, New York, USA, 2011).
  27. Lysak, M.A. et al. Mechanisms of chromosome number reduction in Arabidopsis thaliana and related Brassicaceae species. Proc. Natl. Acad. Sci. USA 103, 52245229 (2006).
  28. Mandáková, T. & Lysak, M.A. Chromosomal phylogeny and karyotype evolution in x = 7 crucifer species (Brassicaceae). Plant Cell 20, 25592570 (2008).
  29. Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 11171123 (2009).
  30. Sommer, D.D. et al. Minimus: a fast, lightweight genome assembler. BMC Bioinformatics 8, 64 (2007).
  31. Zerbino, D.R. & Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821829 (2008).
  32. Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
  33. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462467 (2005).
  34. Zdobnov, E.M. & Apweiler, R. InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847848 (2001).
  35. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

Download references

Author information

  1. These authors contributed equally to this work.

    • Maheshi Dassanayake &
    • Dong-Ha Oh


  1. Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Maheshi Dassanayake,
    • Dong-Ha Oh,
    • Jeffrey S Haas,
    • Hyewon Hong,
    • Hans J Bohnert &
    • John M Cheeseman
  2. Office of Networked Information Technology, School of Integrative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Jeffrey S Haas
  3. Center for Comparative & Functional Genomics, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Alvaro Hernandez
  4. Division of Applied Life Science (BK21 program), Gyeongsang National University, Jinju, Korea.

    • Hyewon Hong,
    • Dae-Jin Yun,
    • Ray A Bressan &
    • Hans J Bohnert
  5. Bioscience Core Laboratory-Genomics, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

    • Shahjahan Ali
  6. Department of Horticulture & Landscape Architecture, Purdue University, West Lafayette, Indiana, USA.

    • Ray A Bressan &
    • Jian-Kang Zhu
  7. Center for Plant Stress Genomics and Biotechnology, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.

    • Ray A Bressan,
    • Jian-Kang Zhu &
    • Hans J Bohnert
  8. Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.

    • Hans J Bohnert


M.D., D.-H.O., H.J.B. and J.M.C. designed, performed, analyzed experiments and wrote the paper; J.S.H. compiled programs and wrote custom scripts; A.H. and S.A. performed sequencing; H.H. prepared materials; D.-J.Y., R.A.B. and J.-K.Z. provided materials and intellectual feedback.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (952K)

    Supplementary Tables 1, 2, 5 and 6 and Supplementary Figures 1–4.

Excel files

  1. Supplementary Table 3 (4M)

    Repetitive sequences in T. parvula draft genome

  2. Supplementary Table 4 (11M)

    List of T. parvula predicted ORFs and their annotations

  3. Supplementary Table 7 (244K)

    List of non-coding RNAs in T. parvula draft genome

  4. Supplementary Table 8 (6M)

    List and comparison of GO annotations of the T. parvula predicted ORFs and A. thaliana cDNAs

  5. Supplementary Table 9 (1004K)

    Tandem local duplications in the T. parvula draft genome and the A. thaliana genome

  6. Supplementary Table 10 (56K)

    Assignments of the largest 40 T. parvula contigs in seven chromosomes

Additional data