Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

Abstract

We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Comparison of A. lyrata and A. thaliana genomes.
Figure 2: Apparent deletions by size and annotation.
Figure 3: Changes in genomic intervals along the A. thaliana genome.
Figure 4: Change in size of collinear and rearranged regions, intergenic regions and gene families.
Figure 5: Comparison of transposable elements.
Figure 6: Sizes and allele frequency distribution of insertions and deletions that were either fixed or still segregating in 95 A. thaliana individuals43 and that are presumed to be derived based on comparison with the A. lyrata allele.

References

  1. Greilhuber, J. et al. Smallest angiosperm genomes found in Lentibulariaceae, with chromosomes of bacterial size. Plant Biol. 8, 770–777 (2006).

    Article  CAS  Google Scholar 

  2. Gregory, T.R. et al. Eukaryotic genome size databases. Nucleic Acids Res. 35, D332–D338 (2007).

    Article  CAS  Google Scholar 

  3. Gaut, B.S. & Ross-Ibarra, J. Selection on major components of angiosperm genomes. Science 320, 484–486 (2008).

    Article  CAS  Google Scholar 

  4. Pellicer, J., Fay, M.F. & Leitch, I.J. The largest eukaryotic genome of them all? Bot. J. Linn. Soc. 164, 10–15 (2010).

    Article  Google Scholar 

  5. Bennetzen, J.L., Ma, J. & Devos, K.M. Mechanisms of recent genome size variation in flowering plants. Ann. Bot. 95, 127–132 (2005).

    Article  CAS  Google Scholar 

  6. Hawkins, J.S., Proulx, S.R., Rapp, R.A. & Wendel, J.F. Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants. Proc. Natl. Acad. Sci. USA 106, 17811–17816 (2009).

    Article  CAS  Google Scholar 

  7. Piegu, B. et al. Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 16, 1262–1269 (2006).

    Article  CAS  Google Scholar 

  8. Vitte, C., Panaud, O. & Quesneville, H. LTR retrotransposons in rice (Oryza sativa, L.): recent burst amplifications followed by rapid DNA loss. BMC Genomics 8, 218 (2007).

    Article  Google Scholar 

  9. Woodhouse, M.R. et al. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol. 8, e1000409 (2010).

    Article  Google Scholar 

  10. Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).

    Article  CAS  Google Scholar 

  11. Johnston, J.S. et al. Evolution of genome size in Brassicaceae. Ann. Bot. 95, 229–235 (2005).

    Article  CAS  Google Scholar 

  12. Oyama, R.K. et al. The shrunken genome of Arabidopsis thaliana. Plant Syst. Evol. 273, 257–271 (2008).

    Article  CAS  Google Scholar 

  13. Wright, S.I., Lauga, B. & Charlesworth, D. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol. Biol. Evol. 19, 1407–1420 (2002).

    Article  CAS  Google Scholar 

  14. Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).

    Article  CAS  Google Scholar 

  15. Beilstein, M.A., Nagalingum, N.S., Clements, M.D., Manchester, S.R. & Mathews, S. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107, 18724–18728 (2010).

    Article  CAS  Google Scholar 

  16. Kuittinen, H. et al. Comparing the linkage maps of the close relatives Arabidopsis lyrata and A. thaliana. Genetics 168, 1575–1584 (2004).

    Article  CAS  Google Scholar 

  17. Koch, M.A. & Kiefer, M. Genome evolution among cruciferous plants: a lecture from the comparison of the genetic maps of three diplod species–—Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana. Am. J. Bot. 92, 761–767 (2005).

    Article  Google Scholar 

  18. Yogeeswaran, K. et al. Comparative genome analyses of Arabidopsis spp.: inferring chromosomal rearrangement events in the evolutionary history of A. thaliana. Genome Res. 15, 505–515 (2005).

    Article  CAS  Google Scholar 

  19. Lysak, M.A. et al. Mechanisms of chromosome number reduction in Arabidopsis thaliana and related Brassicaceae species. Proc. Natl. Acad. Sci. USA 103, 5224–5229 (2006).

    Article  CAS  Google Scholar 

  20. Berr, A. et al. Chromosome arrangement and nuclear architecture but not centromeric sequences are conserved between Arabidopsis thaliana and Arabidopsis lyrata. Plant J. 48, 771–783 (2006).

    Article  CAS  Google Scholar 

  21. Swarbreck, D. et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 36, D1009–10014 (2007).

    Article  Google Scholar 

  22. Lim, J.K. & Simmons, M.J. Gross chromosome rearrangements mediated by transposable elements in Drosophila melanogaster. Bioessays 16, 269–275 (1994).

    Article  CAS  Google Scholar 

  23. Stankiewicz, P. et al. Genome architecture catalyzes nonrecurrent chromosomal rearrangements. Am. J. Hum. Genet. 72, 1101–1116 (2003).

    Article  CAS  Google Scholar 

  24. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  Google Scholar 

  25. Lee, J., Han, K., Meyer, T.J., Kim, H.S. & Batzer, M.A. Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS ONE 3, e4047 (2008).

    Article  Google Scholar 

  26. Braumann, I., van den Berg, M.A. & Kempken, F. Strain-specific retrotransposon-mediated recombination in commercially used Aspergillus niger strain. Mol. Genet. Genomics 280, 319–325 (2008).

    Article  CAS  Google Scholar 

  27. Woodhouse, M.R., Pedersen, B. & Freeling, M. Transposed genes in Arabidopsis are often associated with flanking repeats. PLoS Genet. 6, e1000949 (2010).

    Article  Google Scholar 

  28. Ranz, J.M. et al. Principles of genome evolution in the Drosophila melanogaster species group. PLoS Biol. 5, e152 (2007).

    Article  Google Scholar 

  29. The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).

  30. Clark, R.M. et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 317, 338–342 (2007).

    Article  CAS  Google Scholar 

  31. Borevitz, J.O. et al. Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 104, 12057–12062 (2007).

    Article  CAS  Google Scholar 

  32. Enright, A.J., Van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).

    Article  CAS  Google Scholar 

  33. Michelmore, R.W. & Meyers, B.C. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8, 1113–1130 (1998).

    Article  CAS  Google Scholar 

  34. Thomas, J.H. Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants. Genome Res. 16, 1017–1030 (2006).

    Article  CAS  Google Scholar 

  35. Yang, X. et al. The F-box gene family is expanded in herbaceous annual plants relative to woody perennial plants. Plant Physiol. 148, 1189–1200 (2008).

    Article  CAS  Google Scholar 

  36. Tuskan, G.A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).

    Article  CAS  Google Scholar 

  37. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007).

    Article  CAS  Google Scholar 

  38. Velasco, R. et al. A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS ONE 2, e1326 (2007).

    Article  Google Scholar 

  39. Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).

    Article  CAS  Google Scholar 

  40. SanMiguel, P., Gaut, B.S., Tikhonov, A., Nakajima, Y. & Bennetzen, J.L. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998).

    Article  CAS  Google Scholar 

  41. Devos, K.M., Brown, J.K. & Bennetzen, J.L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12, 1075–1079 (2002).

    Article  CAS  Google Scholar 

  42. Hollister, J.D. & Gaut, B.S. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res. 19, 1419–1428 (2009).

    Article  CAS  Google Scholar 

  43. Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, e196 (2005).

    Article  Google Scholar 

  44. Petrov, D.A., Sangster, T.A., Johnston, J.S., Hartl, D.L. & Shaw, K.L. Evidence for DNA loss as a determinant of genome size. Science 287, 1060–1062 (2000).

    Article  CAS  Google Scholar 

  45. Petrov, D.A., Lozovskaya, E.R. & Hartl, D.L. High intrinsic rate of DNA loss in Drosophila. Nature 384, 346–349 (1996).

    Article  CAS  Google Scholar 

  46. Charlesworth, B. Evolutionary rates in partially self-fertilizing species. Am. Nat. 140, 126–148 (1992).

    Article  CAS  Google Scholar 

  47. Knight, C.A., Molinari, N.A. & Petrov, D.A. The large genome constraint hypothesis: evolution, ecology and phenotype. Ann. Bot. 95, 177–190 (2005).

    Article  CAS  Google Scholar 

  48. Jaffe, D.B. et al. Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13, 91–96 (2003).

    Article  CAS  Google Scholar 

  49. Demuth, J.P., De Bie, T., Stajich, J.E., Cristianini, N. & Hahn, M.W. The evolution of mammalian gene families. PLoS ONE 1, e85 (2006).

    Article  Google Scholar 

  50. Prachumwat, A. & Li, W.H. Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes. Genome Res. 18, 221–232 (2008).

    Article  CAS  Google Scholar 

  51. Drosophila 12 Genomes Consortium. et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

  52. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. & Higgins, D.G. The CLUSTAL-X Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).

    Article  CAS  Google Scholar 

  53. Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).

    Article  CAS  Google Scholar 

  54. McCarthy, E.M. & McDonald, J.F. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–367 (2003).

    Article  CAS  Google Scholar 

  55. Edgar, R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).

    Article  Google Scholar 

  56. Xiong, Y. & Eickbush, T.H. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9, 3353–3362 (1990).

    Article  CAS  Google Scholar 

  57. Zhang, X. & Wessler, S.R. Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. Proc. Natl. Acad. Sci. USA 101, 5589–5594 (2004).

    Article  CAS  Google Scholar 

  58. Swofford, D.L. PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods): Version 4. (Sinauer Associates, Sunderland, Massachusetts, USA, 2003).

  59. Simillion, C., Vandepoele, K., Saeys, Y. & Van de Peer, Y. Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 14, 1095–1106 (2004).

    Article  CAS  Google Scholar 

  60. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  61. Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).

    Article  CAS  Google Scholar 

  62. Pearson, W.R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650 (1991).

    Article  CAS  Google Scholar 

  63. Kent, W.J. BLAT–—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  Google Scholar 

  64. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  Google Scholar 

  65. Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The US Department of Energy Joint Genome Institute (JGI) provided sequencing and analyses under the Community Sequencing Program supported by the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231. We are particularly grateful to D. Rokhsar and K. Barry for providing leadership for the project at JGI. We thank J. Borevitz, A. Hall, C. Langley, J. Nasrallah, B. Neuffer, O. Savolainen and S. Wright for contributing to the initial sequencing proposal submitted to the Community Sequencing Program at JGI, C. Lanz and K. Lett for technical assistance, and P. Andolfatto and R. Wing for comments on the manuscript. This work was supported by National Science Foundation (NSF) DEB-0723860 (B.S.G.), NSF DEB-0723935 (M.N.), NSF MCB-0618433 (J.C.C.), NSF IOS-0744579 (M.E.N.), NIH GM057994 (J.B.), grant GABI-DUPLO 0315055 of the German Federal Ministry of Education and Research (K.F.X.M.), ERA-NET on Plant Genomics (ERA-PG) grant ARelatives from the Deutsche Forschungsgemeinschaft (D.W.) and Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT) and the Inter-University Network for Fundamental Research (P6/25, BioMaGNet) (Y.V.d.P.), a Gottfried Wilhelm Leibniz Award of Deutsche Forschungsgemeinschaft (DFG) (D.W.), the Austria Academy of Sciences (M.N.) and the Max Planck Society (D.W. and Y.-L.G.).

Author information

Authors and Affiliations

Authors

Contributions

J.B., J.C.C., B.S.G., I.V.G., Y.-L.G., K.F.X.M., M.N., Y.V.d.P. and D.W. conceived the study; M.E.N. provided the biological material; J.C., J.-F.C., R.M.C., N.F., J.G. and Y.-L.G. performed the experiments; E.G.B., J.A.F., N.F., H.G., Y.-L.G., G.H., J.D.H., T.T.H., R.P.O., S.O., P.P., A.A.S., J.S., K.S., M.S., X.W. and L.Y. analyzed the data; and Y.-L.G., T.T.H., M.N. and D.W. wrote the paper with contributions from all authors.

Corresponding authors

Correspondence to Detlef Weigel or Ya-Long Guo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Tables 1–5 and Supplementary Figures 1–4 (PDF 1278 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, T., Pattyn, P., Bakker, E. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43, 476–481 (2011). https://doi.org/10.1038/ng.807

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.807

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing