We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.
At a glance
- Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18, 1944–1954 (2008). et al.
- Evolution of genome size in Brassicaceae. Ann. Bot. 95, 229–235 (2005). et al.
- Genome evolution among cruciferous plants: a lecture from the comparison of the genetic maps of three diploid species—Capsella rubella, Arabidopsis lyrata subsp Petraea, and A. thaliana. Am. J. Bot. 92, 761–767 (2005). &
- Comparative genome analyses of Arabidopsis spp.: inferring chromosomal rearrangement events in the evolutionary history of A. thaliana. Genome Res. 15, 505–515 (2005). et al.
- Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 (2003). , , &
- Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J. Mol. Evol. 48, 597–604 (1999). , , &
- Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell 18, 1348–1359 (2006). et al.
- Chromosome triplication found across the tribe Brassiceae. Genome Res. 15, 516–525 (2005). , , &
- Importance and origin. in Breeding Oilseed Brassicas (eds. Labana, K.S., Banga, S.S. & Banga, S.K.) 1–20 (Springer-Verlag, Berlin, Germany, 1993). &
- Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Jap. J. Bot. 7, 389–452 (1935).
- Comparative physical mapping of segments of the genome of Brassica oleracea var. alboglabra that are homoeologous to sequenced regions of chromosomes 4 and 5 of Arabidopsis thaliana. Plant J. 23, 233–243 (2000). &
- Physical mapping and microsynteny of Brassica rapa ssp. pekinensis genome corresponding to a 222 kbp gene-rich region of Arabidopsis chromosome 4 and partially duplicated on chromosome 5. Mol. Genet. Genomics 274, 579–588 (2005). et al.
- Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107, 18724–18728 (2010). , , , &
- Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication. Genome Biol. 10, R111 (2009). et al.
- Sequence and structure of Brassica rapa chromosome A3. Genome Biol. 11, R94 (2010). et al.
- Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
- The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452, 991–996 (2008). et al.
- The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007). et al.
- The collapse of gene complement following whole genome duplication. BMC Genomics 11, 313 (2010). , &
- Sequence composition and genome organization of maize. Proc. Natl. Acad. Sci. USA 101, 14349–14354 (2004). et al.
- Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108, 4069–4074 (2011). , &
- Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 16, 934–946 (2006). , &
- Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol. 8, e1000409 (2010). et al.
- Comparative inference of illegitimate recombination between rice and sorghum duplicated genes produced by polyploidization. Genome Res. 19, 1026–1032 (2009). , , &
- Seventy million years of concerted evolution of a homoeologous chromosome pair, in parallel, in major poaceae lineages. Plant Cell 23, 27–37 (2011). , &
- The gene balance hypothesis: from classical genetics to modern genomics. Plant Cell 19, 395–402 (2007). &
- Duplicate genes increase expression diversity in closely related species and allopolyploids. Proc. Natl. Acad. Sci. USA 106, 2295–2300 (2009). , &
- Brassica genomics: a complement to, and early beneficiary of, the Arabidopsis sequence. Genome Biol. 2, R1011 (2001). , , , &
- Auxin in action: signalling, transport and the control of plant growth and development. Nat. Rev. Mol. Cell Biol. 7, 847–859 (2006). , &
- Sequence and analysis of chromosome 1 of the plant Arabidopsis thaliana. Nature 408, 816–820 (2000). et al.
- Auxin: a trigger for change in plant development. Cell 136, 1005–1016 (2009). &
- Evolution of the TCP gene family in Asteridae: cladistic and network approaches to understanding regulatory gene family diversification and its impact on morphological evolution. Mol. Biol. Evol. 20, 1997–2009 (2003). &
- FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell 11, 949–956 (1999). &
- Multiple roles of Arabidopsis VRN1 in vernalization and flowering time control. Science 297, 243–246 (2002). , , , &
- Analysis of a post-translational steroid induction system for GIGANTEA in Arabidopsis. BMC Plant Biol. 9, 141 (2009). , , &
- A repressor complex governs the integration of flowering signals in Arabidopsis. Dev. Cell 15, 110–120 (2008). et al.
- Analysis of the function of two circadian-regulated CONSTANS-LIKE genes. Plant J. 26, 15–22 (2001). , , , &
- The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009). et al.
- De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010). et al.
- The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010). et al.
- Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002). , , &
- Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics 171, 765–781 (2005). et al.
- GeneWise and Genomewise. Genome Res. 14, 988–995 (2004). , &
- Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003). et al.
- Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007). et al.
- OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). , &
- MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599 (2007). , , &
- Supplementary Text and Figures (3M)
Supplementary Note, Supplementary Tables 1–21 and Supplementary Figures 1–25.