An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions

Journal name:
Nature Genetics
Year published:
Published online


Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.

At a glance


  1. Genome alignments and whole-genome triplications.
    Figure 1: Genome alignments and whole-genome triplications.

    (a) Typical example of a 160-kb A. lyrata region aligning against the other eight Brassicaceae genomes. Alignment blocks (solid rectangles) are linked to form collinear chains. In alignments with most species, most positions in A. lyrata are contained within a single alignment chain, except in alignments with B. rapa and L. alabamica, where three alignment chains each were seen, suggesting whole-genome triplications in these two species. (b) Genome-wide level of chain coverage by each of the eight species of the A. lyrata genome. (cf) Comparative chromosome painting (CCP) analysis in L. alabamica (2n = 22 chromosomes). (c) DAPI-stained mitotic chromosomes. Chromosomes of different size and heterochromatin content suggest an allopolyploid origin of the species. Arrowheads mark the four largest chromosomes. (df) CCP analysis of Cardamineae-specific ancestral chromosomes on pachytene (meiotic) chromosome spreads. (d) Three genomic copies of the genomic block A (the A1 copy is split between two different chromosomes). (e) Three genomic copies of chromosome AK6/8. (f) Two genomic copies of chromosome AK8/6; the third copy is absent. In e,f, copy 1 being substantially longer than the other homeolog(s) suggests an allohexaploid origin or differential fractionation of the three subgenomes. All scale bars, 10 μm.

  2. A phylogenetic tree obtained using a set of 1,048,889 fourfold-degenerate sites in PhyML (general time-reversible (GTR) substitution model).
    Figure 2: A phylogenetic tree obtained using a set of 1,048,889 fourfold-degenerate sites in PhyML (general time-reversible (GTR) substitution model)73.

    The positions (triangles) of the At-α duplication event and of the two whole-genome triplication events (Br-α and La-α) are approximate.

  3. Estimation of the fraction of sites under selection in the A. thaliana genome.
    Figure 3: Estimation of the fraction of sites under selection in the A. thaliana genome.

    (a) Breakdown of sites under selection between coding (CDS) and noncoding regions (left) and among different types of noncoding regions (right). (b) Percentage of sites under selection for different sets of regions of the A. thaliana genome. Noncoding region categories include UTRs: 5′ (blue) and 3′ (light blue); intronic regions: 500 bp of intron 1 (tan), 30-bp intronic regions flanking exons (orange) and middle of introns other than intron 1 (brown); and intergenic regions: bases are assigned to the closest annotated TSS or TES.

  4. Evidence of selection on CNSs in populations.
    Figure 4: Evidence of selection on CNSs in populations.

    MAF distributions are shown at polymorphic sites within a population of 80 A. thaliana individuals (left) and a population of 13 C. grandiflora individuals with 26 haplotypes (right). All types of CNSs show lower population diversity and higher MAF compared to fourfold-degenerate (4D) and intronic sites, confirming that a substantial fraction of CNSs are under selective pressure. Whereas most types of CNSs have similar diversity levels, intronic CNSs seem to be under the weakest selective pressure, and smRNA CNSs seem to be under the strongest selective pressure. 0D, zero-fold degenerate.

  5. The majority of A. lyrata CNSs are shared with most other Brassicaceae, but few are conserved outside that clade, with the exception of those corresponding to smRNAs.
    Figure 5: The majority of A. lyrata CNSs are shared with most other Brassicaceae, but few are conserved outside that clade, with the exception of those corresponding to smRNAs.

    A. lyrata CNSs were first symmetrically extended to at least 120 bp, except when this reached into coding exons. These sequences were then aligned against each genome using BLAST. The number of CNSs with at least one hit with an E value below 0.0001 is shown. Whereas smRNA CNSs constitute only 1.1% of eligible CNSs, they account for more than 18% of CNSs mapped to C. papaya, and this fraction increases to 66% for the most distant species considered, O. sativa.

  6. CNSs are enriched for sequence motifs with evidence of constraint.
    Figure 6: CNSs are enriched for sequence motifs with evidence of constraint.

    (a) Motifs found to be enriched (z score > 4) in at least one type of CNS. Numbers and coloring correspond to the fold enrichment compared to the permuted version of the motif (red, enrichment; blue, depletion). Motif annotations come from the PLACE database74 (W, A or T; K, A or C; M, G or T; S, C or G; Y, C or T; N, any base). (bd) Characteristics of bases in and around proximal upstream instances of the CACGTG motifs located in CNSs. (b) SNP density in the A. thaliana population (blue) and MAF distribution in the C. grandiflora population (red). (c) Position-specific average PhyloP scores over the set of motif instances. (d) Position-specific PhyloP scores for every instance of the set of motifs, ranging from −1 (non-conserved, blue) to +1 (highly conserved, red).


  1. Duret, L. & Bucher, P. Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7, 399406 (1997).
  2. Boffelli, D. et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 13911394 (2003).
  3. Hong, R.L., Hamaguchi, L., Busch, M.A. & Weigel, D. Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing. Plant Cell 15, 12961309 (2003).
  4. Bejerano, G. et al. Ultraconserved elements in the human genome. Science 304, 13211325 (2004).
  5. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 10341050 (2005).
  6. Margulies, E.H. et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17, 760774 (2007).
  7. Stark, A. et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450, 219232 (2007).
  8. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. & Lander, E.S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241254 (2003).
  9. Adrian, J. et al. cis-Regulatory elements and chromatin state coordinately control temporal and spatial expression of FLOWERING LOCUS T in Arabidopsis. Plant Cell 22, 14251440 (2010).
  10. Lyons, E. & Freeling, M. How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J. 53, 661673 (2008).
  11. Freeling, M. & Subramaniam, S. Conserved noncoding sequences (CNSs) in higher plants. Curr. Opin. Plant Biol. 12, 126132 (2009).
  12. Zou, C. et al. Cis-regulatory code of stress-responsive transcription in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 108, 1499214997 (2011).
  13. Hsieh, T.F. et al. Regulation of imprinted gene expression in Arabidopsis endosperm. Proc. Natl. Acad. Sci. USA 108, 17551762 (2011).
  14. Inada, D.C. et al. Conserved noncoding sequences in the grasses. Genome Res. 13, 20302041 (2003).
  15. Guo, H. & Moose, S.P. Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell 15, 11431158 (2003).
  16. Kaplinsky, N.J., Braun, D.M., Penterman, J., Goff, S.A. & Freeling, M. Utility and distribution of conserved noncoding sequences in the grasses. Proc. Natl. Acad. Sci. USA 99, 61476151 (2002).
  17. Bossolini, E., Wicker, T., Knobel, P.A. & Keller, B. Comparison of orthologous loci from small grass genomes Brachypodium and rice: implications for wheat genomics and grass genome annotation. Plant J. 49, 704717 (2007).
  18. Colinas, J., Birnbaum, K. & Benfey, P.N. Using cauliflower to find conserved non-coding regions in Arabidopsis. Plant Physiol. 129, 451454 (2002).
  19. Haberer, G. et al. Large-scale cis-element detection by analysis of correlated expression and sequence conservation between Arabidopsis and Brassica oleracea. Plant Physiol. 142, 15891602 (2006).
  20. Hupalo, D. & Kern, A.D. Conservation and functional element discovery in 20 angiosperm plant genomes. Mol. Biol. Evol. published online; doi:10.1093/molbev/mst082 (27 May 2013).
  21. Reineke, A.R., Bornberg-Bauer, E. & Gu, J. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes. Nucleic Acids Res. 39, 60296043 (2011).
  22. Thomas, B.C., Rapaka, L., Lyons, E., Pedersen, B. & Freeling, M. Arabidopsis intragenomic conserved noncoding sequence. Proc. Natl. Acad. Sci. USA 104, 33483353 (2007).
  23. Schranz, M.E., Lysak, M.A. & Mitchell-Olds, T. The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 11, 535542 (2006).
  24. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815 (2000).
  25. Hu, T.T. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat. Genet. 43, 476481 (2011).
  26. Slotte, T. et al. The Capsella rubella genome and the genomic consequences of rapid mating system evolution. Nat. Genet. 45, 831835 (2013).
  27. Wang, X. et al. The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 10351039 (2011).
  28. Cheng, F. et al. Deciphering the diploid ancestral genome of the mesohexaploid Brassica rapa. Plant Cell published online; doi:10.1105/tpc.113.110486 (7 May 2013).
  29. Yang, R. et al. The reference genome of the halophytic plant Eutrema salsugineum. Front. Plant Sci. 4, 46 (2013).
  30. Dassanayake, M. et al. The genome of the extremophile crucifer Thellungiella parvula. Nat. Genet. 43, 913918 (2011).
  31. Schranz, M.E., Song, B.H., Windsor, A.J. & Mitchell-Olds, T. Comparative genomics in the Brassicaceae: a family-wide perspective. Curr. Opin. Plant Biol. 10, 168175 (2007).
  32. Couvreur, T.L. et al. Molecular phylogenetics, temporal diversification, and principles of evolution in the mustard family (Brassicaceae). Mol. Biol. Evol. 27, 5571 (2010).
  33. Bowers, J.E., Chapman, B.A., Rong, J. & Paterson, A.H. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433438 (2003).
  34. Parra, G., Bradnam, K., Ning, Z., Keane, T. & Korf, I. Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289297 (2009).
  35. Lysak, M.A., Koch, M.A., Pecinka, A. & Schubert, I. Chromosome triplication found across the tribe Brassiceae. Genome Res. 15, 516525 (2005).
  36. Schnable, J.C., Wang, X., Pires, J.C. & Freeling, M. Escape from preferential retention following repeated whole genome duplications in plants. Front. Plant Sci. 3, 94 (2012).
  37. Edger, P.P. & Pires, J.C. Gene and genome duplications: the impact of dosage-sensitivity on the fate of nuclear genes. Chromosome Res. 17, 699717 (2009).
  38. Ming, R. et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452, 991996 (2008).
  39. Bailey, C.D. et al. Toward a global phylogeny of the Brassicaceae. Mol. Biol. Evol. 23, 21422160 (2006).
  40. Thomas, J.W. et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788793 (2003).
  41. Yang, Y.W., Lai, K.N., Tai, P.Y. & Li, W.H. Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J. Mol. Evol. 48, 597604 (1999).
  42. Town, C.D. et al. Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell 18, 13481359 (2006).
  43. Yang, L. & Gaut, B.S. Factors that contribute to variation in evolutionary rate among Arabidopsis genes. Mol. Biol. Evol. 28, 23592369 (2011).
  44. Ponting, C.P. & Hardison, R.C. What fraction of the human genome is functional? Genome Res. 21, 17691776 (2011).
  45. Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110121 (2010).
  46. Margulies, E.H. & Birney, E. Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat. Rev. Genet. 9, 303313 (2008).
  47. Sorek, R. & Ast, G. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res. 13, 16311637 (2003).
  48. Halligan, D.L., Eyre-Walker, A., Andolfatto, P. & Keightley, P.D. Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila. Genome Res. 14, 273279 (2004).
  49. Halligan, D.L. & Keightley, P.D. Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. Genome Res. 16, 875884 (2006).
  50. Blanchette, M. et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16, 656668 (2006).
  51. Cao, J. et al. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 43, 956963 (2011).
  52. Nei, M. & Li, W.H. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76, 52695273 (1979).
  53. Watterson, G.A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256276 (1975).
  54. Katzman, S. et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).
  55. Casillas, S., Barbadilla, A. & Bergman, C.M. Purifying selection maintains highly conserved noncoding sequences in Drosophila. Mol. Biol. Evol. 24, 22222234 (2007).
  56. Feldman, M. & Levy, A.A. Allopolyploidy—a shaping force in the evolution of wheat genomes. Cytogenet. Genome Res. 109, 250258 (2005).
  57. Thomas, B.C., Pedersen, B. & Freeling, M. Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 16, 934946 (2006).
  58. Luo, F., Liu, J. & Li, J. Discovering conditional co-regulated protein complexes by integrating diverse data sources. BMC Syst. Biol. 4 (suppl. 2), S4 (2010).
  59. Muiño, J.M., Hoogstraat, M., van Ham, R.C. & van Dijk, A.D. PRI-CAT: a web-tool for the analysis, storage and visualization of plant ChIP-seq experiments. Nucleic Acids Res. 39, W524W527 (2011).
  60. Zhang, W., Zhang, T., Wu, Y. & Jiang, J. Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell 24, 27192731 (2012).
  61. Kim, J. et al. microRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. Plant J. 42, 8494 (2005).
  62. Nogueira, F.T. et al. Regulation of small RNA accumulation in the maize shoot apex. PLoS Genet. 5, e1000320 (2009).
  63. Nogueira, F.T., Madi, S., Chitwood, D.H., Juarez, M.T. & Timmermans, M.C. Two small regulatory RNAs establish opposing fates of a developmental axis. Genes Dev. 21, 750755 (2007).
  64. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 2529 (2000).
  65. Lyons, E. et al. Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol. 148, 17721781 (2008).
  66. Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 9294 (2010).
  67. Schultz, S.T., Lynch, M. & Willis, J.H. Spontaneous deleterious mutation in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 96, 1139311398 (1999).
  68. Ahituv, N., Prabhakar, S., Poulin, F., Rubin, E.M. & Couronne, O. Mapping cis-regulatory domains in the human genome using multi-species conservation of synteny. Hum. Mol. Genet. 14, 30573063 (2005).
  69. Lockton, S. & Gaut, B.S. Plant conserved non-coding sequences and paralogue evolution. Trends Genet. 21, 6065 (2005).
  70. Margulies, E.H., Blanchette, M., Haussler, D. & Green, E.D. Identification and characterization of multi-species conserved sequences. Genome Res. 13, 25072518 (2003).
  71. Hong, X., Scofield, D.G. & Lynch, M. Intron size, abundance, and distribution within untranslated regions of genes. Mol. Biol. Evol. 23, 23922404 (2006).
  72. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 5774 (2012).
  73. Guindon, S., Lethiec, F., Duroux, P. & Gascuel, O. PHYML Online—a web server for fast maximum likelihood–based phylogenetic inference. Nucleic Acids Res. 33, W557W559 (2005).
  74. Higo, K., Ugawa, Y., Iwamoto, M. & Korenaga, T. Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 27, 297300 (1999).
  75. Lamesch, P. et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 40, D1202D1210 (2012).
  76. Johnston, J.S. et al. Evolution of genome size in Brassicaceae. Ann. Bot. (Lond.) 95, 229235 (2005).
  77. Lysak, M.A., Koch, M.A., Beaulieu, J.M., Meister, A. & Leitch, I.J. The dynamic ups and downs of genome size evolution in Brassicaceae. Mol. Biol. Evol. 26, 8598 (2009).
  78. Boisvert, S., Laviolette, F. & Corbeil, J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J. Comput. Biol. 17, 15191533 (2010).
  79. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  80. Chantha, S.C., Herman, A.C., Platts, A.E., Vekemans, X. & Schoen, D.J. Secondary evolution of a self-incompatibility locus in the brassicaceae genus leavenworthia. PLoS Biol. 11, e1001560 (2013).
  81. Lysak, M.A. & Mandáková, T. Analysis of plant meiotic chromosomes by chromosome painting. Methods Mol. Biol. 990, 1324 (2013).
  82. Cantarel, B.L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 18, 188196 (2008).
  83. Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome Res. 10, 516522 (2000).
  84. Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24, 637644 (2008).
  85. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
  86. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  87. Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419423 (2011).
  88. Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 38, 45704578 (2010).
  89. Harris, R.S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Penn. State Univ. (2007).
  90. Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W. & Haussler, D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc. Natl. Acad. Sci. USA 100, 1148411489 (2003).
  91. Blanchette, M. et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14, 708715 (2004).
  92. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520562 (2002).
  93. Washietl, S. et al. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. RNA 17, 578594 (2011).
  94. Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936939 (2011).
  95. Dutheil, J. et al. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinformatics 7, 188 (2006).

Download references

Author information

  1. These authors contributed equally to this work.

    • Annabelle Haudry &
    • Adrian E Platts


  1. Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.

    • Annabelle Haudry,
    • Robert J Williamson,
    • Khaled M Hazzouri,
    • John R Stinchcombe,
    • Alan M Moses &
    • Stephen I Wright
  2. Université Lyon 1, Centre National de la Recherche Scientifique (CNRS), Unité Mixte de Recherche (UMR) 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France.

    • Annabelle Haudry
  3. School of Computer Science, McGill University, Montreal, Quebec, Canada.

    • Adrian E Platts,
    • Emilio Vello,
    • Mickael Leclercq &
    • Mathieu Blanchette
  4. McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada.

    • Adrian E Platts,
    • Emilio Vello,
    • Mickael Leclercq &
    • Mathieu Blanchette
  5. Department of Biology, McGill University, Montreal, Quebec, Canada.

    • Douglas R Hoen,
    • Ewa Forczek,
    • Zoé Joly-Lopez,
    • Daniel J Schoen,
    • Paul M Harrison &
    • Thomas E Bureau
  6. Nature Sciences Department, Colby-Sawyer College, New London, New Hampshire, USA.

    • Joshua G Steffen
  7. Department of Human Genetics, McGill University, Montreal, Quebec, Canada.

    • Ken Dewar
  8. Institute of Vegetables and Flowers (IVF), Chinese Academy of Agricultural Sciences (CAAS), Beijing, China.

    • Xiaowu Wang
  9. US Department of Energy Joint Genome Institute, Walnut Creek, California, USA.

    • Jeremy Schmutz
  10. HudsonAlpha Institute of Biotechnology, Huntsville, Alabama, USA.

    • Jeremy Schmutz
  11. J. Craig Venter Institute, Rockville, Maryland, USA.

    • Christopher D Town
  12. Division of Biological Sciences, University of Missouri, Columbia, Missouri, USA.

    • Patrick P Edger &
    • J Chris Pires
  13. The School of Plant Sciences, University of Arizona, Tucson, Arizona, USA.

    • Karen S Schumaker &
    • David E Jarvis
  14. Plant Cytogenomics, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czech Republic.

    • Terezie Mandáková &
    • Martin A Lysak
  15. Biosystematics Group, Plant Sciences, Wageningen University, Wageningen, The Netherlands.

    • Erik van den Bergh &
    • M Eric Schranz
  16. Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada.

    • Stephen I Wright


The study was conceived by M.B., S.I.W., A.M.M., T.E.B., D.J.S., P.M.H. and J.R.S. Computational experiments were designed by A.H., A.E.P., A.M.M., S.I.W., T.E.B. and M.B. E.F., Z.J.-L., J.C.P., M.E.S., D.J.S. and T.E.B. obtained material for genome sequencing for L. alabamica, S. irio and A. arabicum. A.E.P., K.D. and T.E.B. sequenced the DNA, and A.E.P. assembled the genomes, using additional data provided by C.D.T., P.P.E., M.E.S., E.v.d.B. and J.C.P. Additional RNA sequencing data were obtained from J.G.S., B. rapa genome sequence data were provided by X.W., and E. salsugineum genome data were provided by J.S., D.E.J. and K.S.S. T.M. and M.A.L. performed the multicolor FISH study on L. alabamica. P.M.H. and A.E.P. performed the gene annotation, D.R.H. and T.E.B. annotated TEs, and M.L. identified structural RNAs. Multiple-genome alignments and identification and analysis of CNSs were performed by A.E.P., A.H., E.V. and M.B. Population genetics analyses were performed by A.H., A.E.P., R.J.W., K.M.H., A.M.M., A.E.P. and S.I.W. The manuscript was written primarily by A.H., A.E.P., S.I.W. and M.B., with input from all coauthors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (1.2 MB)

    Supplementary Figures 1–11, Supplementary Tables 1–9 and Supplementary Note

Additional data