The first completely sequenced plant chromosomes, from the mustard Arabidopsis thaliana, reveal a dynamic genome that is constantly being rearranged.
Although we share the planet with plants and depend on them for food, raw materials and energy, we know very little about them. But this is beginning to change. On pages 761 and 769 of this issue, Lin et al.1 and Wambutt et al.2 report the DNA sequences of chromosomes 2 and 4, respectively, from Arabidopsis thaliana. The sequences include about 30% of the plant's genes, and show that although many plant genes are familiar homologues of known animal and fungal genes, many others are plant specific. The sequences also show an evolving genome, with evidence for ancient and recent duplications of both single genes and large chromosomal regions.
Arabidopsis (Fig. 1) was chosen almost a decade ago as the subject of the first plant genome project because it has a very small genome, yet performs the same functions and contains essentially the same genes as other flowering plants. In other words, it has the same list of parts as more familiar crop and ornamental plants, yet lacks the large arrays of repetitive DNA sequences found between genes that characterize larger genomes. Arabidopsis is the main model system for laboratory studies in basic plant biology, so we know much about its genetics, physiology, development and structure3. Its nuclear genome contains five chromosome pairs, and Lin et al. and Wambutt et al. have now sequenced the two smallest, chromosomes 2 and 4. Their sequences are complete, with the exception of the region around the centromeres (a specialized structure important in cell division), which does not contain many genes. The total of 37 megabases (Mb) is more than a quarter of the Arabidopsis genome, and the rest is expected to be done within a year.
What is found in this list of parts? The authors have identified 7,781 protein-coding genes. Because the gene-rich part of the whole Arabidopsis nuclear genome is around 120 Mb, 31% of genes ought to be in chromosomes 2 and 4, meaning that the entire genome should code for around 25,000 proteins. This is a surprisingly large number compared with the predicted totals of fewer than 20,000 genes in the nematode worm Caenorhabditis elegans4 and fruit fly Drosophila melanogaster5 genomes. With their nervous systems, behavioural responses and ability to move (plus, in one case, to see and fly too), they seem much more complex than Arabidopsis, which has only around 50 identified cell types. But this apparent greater complexity may be an illusion, as plants do many things that animals cannot. As well as producing oxygen by photosynthesis and synthesizing all their own amino acids and vitamins, plants can assess (and respond to) environmental features such as gravity, day length and the quality of light.
The evolutionary lineages leading to plants and animals diverged around 1.5 billion years ago, when the last common ancestor was some unknown sort of unicellular organism6. All the functions of a multicellular, land-dwelling organism — such as cell–cell communication and water uptake — then evolved independently in the two lineages. This is why plants and animals are so different anatomically. But how different are they at the level of their gene lists?
Gene function can be predicted, to a certain degree, by comparison with known gene sequences. For almost 60% of the Arabidopsis proteins encoded on chromosomes 2 and 4, the functions (including plant- and bacteria-specific processes such as photosynthesis, as well as more general processes shared with animals) are familiar. For example, 5% (218) of the 4,037 Arabidopsis genes on chromosome 2 code for DNA-binding proteins, which are implicated in gene expression in both plants and animals. Some of the families of such proteins found in Arabidopsis, such as myb relatives and homeobox proteins, are also present in animals. Other families, though, are so far found only in plants7.
Similarly, animal and plant genomes code for thousands of receptor proteins. These proteins, which can be at the cell surface or internal to the cell, allow cells to sense signalling molecules such as hormones. In plants, like animals, such cell-surface receptors are often protein kinases. But those found so far in plants have all been serine/threonine or histidine kinases, whereas tyrosine kinases are usually found in animals. So although the general function of many genes seems to be similar in plants and animals, in many instances the related function is carried out by a protein with a different evolutionary history.
Why is it important to know this? First, because comparisons between plants and animals can show which general functions are essential for complex organisms, and which different proteins can carry out these functions. And second, because the availability of plant proteins and plant pathways that can be transferred to transgenic animals (where they could act without interacting with existing animal pathways), and vice versa, gives a large palette of genes for experimental use. If almost 60% of the genes can be assigned a general function by comparison with known proteins, this means that more than 40% cannot. Many of these are specific to plants, so there are also hitherto unknown protein functions — and, perhaps, unknown functions of organisms — waiting to be discovered.
As well as the list of parts, the chromosomal sequences reported by Lin et al.1 and Wambutt et al.2 show some of the processes that lead to large-scale genome structure. Both chromosomes show many tandem gene duplications (where two or more close copies of a gene are found next to each other). For example, on chromosome 2 there are 239 tandem duplications, involving 593 genes. There are larger and more distant duplications too — almost 2.5 Mb of sequence (with considerable divergence) is duplicated on both chromosomes in four large blocks. Moreover, a portion of chromosome 4, containing 37 genes, is duplicated on chromosome 5.
Chromosome 2 also contains a region equivalent to 75% of the mitochondrial genome. (Mitochondria, where respiration takes place, have their own genome, which is thought to have arisen through engulfment of a bacterium.) This region of chromosome 2 is nearly identical to the mitochondrial genome, indicating the recent transfer of a large piece of DNA from a membrane-bound organelle to the nucleus. Overall, more than 60% of the predicted proteins have a relative with considerable sequence conservation elsewhere in the Arabidopsis genome. The degree of sequence conservation between duplicated genes and regions indicates that large and small regions have duplicated, both locally and distantly, over the course of chromosome evolution in this tiny mustard.
This first glimpse of plant chromosomes — and the complete lists of the genes on them — shows a remarkably dynamic genome, one in constant motion and undergoing constant rearrangement. Nonetheless, there is a large enough core set of genes similar to those of distant relatives (such as animals) that functional conservation seems likely. Moreover, there are enough related functions to see that the independent solutions to the problems of multicellular existence found in plant and animal lineages can involve similar biochemistry. Such reactions are carried out sometimes by related, and other times by unrelated, proteins. Finally, there is a large collection of genes that do not correspond to anything yet seen — enough to guarantee many years of discovery in plant science.
Lin, X. et al. Nature 402, 761–768 (1999).
Wambutt, R. et al. Nature 402, 769–777 (1999).
Meyerowitz, E. M. & Somerville, C. R. (eds) Arabidopsis (Cold Spring Harbor Laboratory Press, New York, 1994).
C. elegans Sequencing Consortium Science 282, 2012–2018 (1998).
Miklos, G. L. G. & Rubin, G. M. Cell 86, 521–529 (1996).
Wang, D. Y. C., Kumar, S. & Hedges, S. B. Proc. R. Soc. Lond. B 266, 63–171 (1999).
Riechmann, J.-L. & Meyerowitz, E. M. Biol. Chem. 379, 633–646 (1998).
About this article
Molecular Biology and Evolution (2008)
A genomic duplication in Arabidopsis thaliana contains a sequence similar to the human gene coding for SAP130
Plant Physiology and Biochemistry (2001)
Trends in Plant Science (2000)
Current Opinion in Microbiology (2000)