Genome sequencing

Genes blossom from a weed

The tiny weed Arabidopsis thaliana (Fig. 1) has received much attention lately, not only from plant scientists (who are already familiar with its attributes), but from genome sequencers, federal granting agencies and even the US Congress1. Why is this so? With its small genome2 of around 120 million base pairs (Mb), its compact growth and the ease with which it can be genetically manipulated, Arabidopsis serves as a model for physiological, biochemical, cell biological and developmental studies of over 250,000 species of plant.

Figure 1: Arabidopsis thaliana — a little weed, but a powerful model.
figure1

TONI HAYDEN/JOHN INNES CENTRE

A tiny-seeded member of the mustard family, Arabidopsis is the model for over 250,000 species of plant.

On page 485 of this issue, Bevan et al.3 describe their analysis of just under 1.9 Mb of contiguous DNA sequence produced by the European Union Arabidopsis Genome Project. Unlike the larger genomes of its distant cousins — soybean, corn, wheat and most other agriculturally important crop plants — the Arabidopsis genome is chock-full of genes. On average, the authors found one gene every 4,800 bases and, by extrapolation from this gene density, they predict that the maximum number of genes needed to ‘grow’ a plant is about 21,000. This number is in line with estimates based on complementary DNA sequencing programmes4,5.

Although this sequence represents only about 1.5% of the Arabidopsis genome, it provides a ‘nuts and bolts’ view of the largest contiguous segment of plant DNA sequenced to date. To put things in perspective, it is longer than most of the completely sequenced prokaryote genomes. Like previous reports of functional cataloguing6 and whole-genome annotation, Bevan and colleagues' snapshot of predicted Arabidopsis genes (and the stuff in between) will probably be out of date soon after publication. However, their careful analysis of this 1.87-Mb sequence provides a tantalizing preview of the bricks and mortar needed to build a plant.

Searches of sequence databases with the 389 predicted Arabidopsis genes revealed that bits and pieces of just over half (some 209 genes) can be recognized in the genomes of organisms ranging from the bacterium Escherichia coli to humans. Because of the unique nature of plants, the authors had to add several additional categories/subcategories to the functional catalogue6 of genes. These included genes involved in the production of secondary metabolites, the source material for numerous pharmaceutical products. Moreover, because plants can't up and run from their predators, a category was established for genes involved in disease resistance, defence and responses to a variety of stresses.

Plant species diverged relatively recently, so, when complete, this catalogue of gene sequences will allow Arabidopsis to serve as a reference genome or ‘gene bank’ for all flowering plants. But the flip side of the coin reveals that it may not be easy to assign a function to 46% of the genes. Extrapolating from the data, nearly 10,000 genes in the Arabidopsis genome will fall into the category of ‘function unknown’, providing plant biologists with fertile ground for new investigation, and with a big challenge — to assign functions to these genes.

These estimates are supported by the analysis of a more broadly based collection of largely non-contiguous sequences produced by the Arabidopsis Genome Initiative7,8 (Fig. 2). This international consortium of genome sequencers was established in 1996 as part of the Multinational Coordinated Arabidopsis Genome Research Project — a model for international cooperation in science. Bevan et al.3 are involved in this project, as well as genome researchers in France, Japan and the United States. By mid-1998, the US, Japanese and EU groups are expected to deliver around 30 Mb of sequence (about one-quarter of the genome), distributed across the five Arabidopsis chromosomes (Fig. 2). These groups will soon be joined by researchers at a genome centre in France, which is just coming on-line.

Figure 2: Progress in the Arabidopsis Genome Initiative.
figure2

As of December 1997, 18.90 million base pairs (Mb) of the Arabidopsis genome had been sequenced by the Arabidopsis Genome Initiative, 1.9 Mb of which are described by Bevan et al.3. Another 10.69 Mb of sequence is in production and should be completed by the middle of this year.

The current rate of sequencing by the Arabidopsis Genome Initiative is on target with the agreed date of 2004 for completion of the genome sequence. However, this year Congress has provided additional funding to the National Science Foundation (NSF), for the establishment of a Plant Genome Research Program and a scaling up of the US effort. These extra funds will allow the target date to be brought forward. In a recent announcement by the NSF, the planned scale-up of the NSF/Department of Energy/US Department of Agriculture Interagency Arabidopsis Genome Sequencing Program calls for the genome sequence to be finished by the end of the year 2000. Given this boost in funding — and pending continued support for the other members of the Arabidopsis Genome Initiative — the international community of plant scientists should be prepared for a bountiful harvest of genes at the dawn of the new millennium.

References

  1. 1

    Briggs, S. P. & Helentjaris, T. Genome Res. 7, 856–857 (1997).

  2. 2

    Goodman, H., Ecker, J. R. & Dean, C. Proc. Natl Acad. Sci. USA 92, 10831–10835 (1995).

  3. 3

    Bevan, M.et al. Nature 391, 485–488 (1998).

  4. 4

    Newman, T., Bruijn, F. J. & Green, P. Plant Physiol. 106, 1241–1255 (1994).

  5. 5

    Cooke, R.et al. Plant J. 9, 101–124 (1996).

  6. 6

    Riley, M. Microbiol. Rev. 57, 862–952 (1993).

  7. 7

    Sato, S.et al. DNA Res. 4, 215–230 (1997).

  8. 8

    Kotani, H.et al. DNA Res. 4, 291–300 (1997).

Download references

Author information

Rights and permissions

Reprints and Permissions

About this article

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.