Letters to Nature

Nature 402, 404-407 (25 November 1999) | doi:10.1038/46536

The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes

Yin-Long Qiu1, Jungho Lee1, Fabiana Bernasconi-Quadroni1, Douglas E. Soltis2, Pamela S. Soltis2, Michael Zanis2, Elizabeth A. Zimmer3, Zhiduan Chen1,5, Vincent Savolainen4 & Mark W. Chase4

  1. Institute of Systematic Botany, University of Zurich, Zollikerstrasse 107, 8008 Zurich, Switzerland
  2. School of Biological Sciences, Washington State University, Pullman, Washington 99164-4236, USA
  3. Laboratory of Molecular Systematics, Smithsonian Institution, Washington DC 20560, USA
  4. Jodrell Laboratory Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK
  5. Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China.

Correspondence to: Yin-Long Qiu1 Correspondence and requests for materials should be addressed to Y.-L.Q. (e-mail: Email: yqiu@systbot.unizh.ch; or after February 2000, Email: yqiu@bio.umass.edu).


Angiosperms have dominated the Earth's vegetation since the mid-Cretaceous (90 million years ago)1, providing much of our food, fibre, medicine and timber, yet their origin and early evolution have remained enigmatic for over a century2, 3, 4, 5, 6, 7, 8. One part of the enigma lies in the difficulty of identifying the earliest angiosperms; the other involves the uncertainty regarding the sister group of angiosperms among extant and fossil gymnosperms. Here we report a phylogenetic analysis of DNA sequences of five mitochondrial, plastid and nuclear genes (total aligned length 8,733 base pairs), from all basal angiosperm and gymnosperm lineages (105 species, 103 genera and 63 families). Our study demonstrates that Amborella, Nymphaeales and Illiciales-Trimeniaceae-Austrobaileya represent the first stage of angiosperm evolution, with Amborella being sister to all other angiosperms. We also show that Gnetales are related to the conifers and are not sister to the angiosperms, thus refuting the Anthophyte Hypothesis1. These results have far-reaching implications for our understanding of diversification, adaptation, genome evolution and development of the angiosperms.

Difficulty in identifying the earliest angiosperms is the result of three problems that characterize diversification of most major clades. First, the great divergence between gymnosperms and angiosperms makes assessment of character homology difficult and thus renders the otherwise powerful outgroup-approach problematic in morphological cladistic analyses1, 9. Second, extinction, which is partly responsible for this divergence, has almost certainly occurred in both groups1, 10, 11, 12, 13, 14, 15, 16, and highlights the need of extensive taxon sampling when relying on the living diversity. Last, the fossil evidence indicates that the early angiosperms went through an explosive radiation1, 10, 11, 12, 13, 14, 15, 16, which to resolve requires the sampling of a large number of characters. Previous molecular analyses have had some success in resolving relationships among basal angiosperms17, 18, 19, 20, 21; however, their results are only weakly supported, and worse, are often contradictory because of evolutionary rate heterogeneity among lineages of the particular gene used, weak phylogenetic signal in single genes, and insufficient taxon sampling. From both theoretical and empirical studies, it is becoming increasingly clear that to address such a difficult issue as basal angiosperm phylogeny, extensive sampling in both dimensions of taxa and characters (genes) is necessary22, 23, 24.

We obtained sequences of five genes from all three plant genomes: mitochondrial atp1 and matR, plastid atpB and rbcL and nuclear 18S rDNA. They encode products involved in energy metabolism, carbohydrate synthesis and information processing. Thus, our character sampling strategy of taking multiple genes of different functions from all three genomes is designed to reduce homoplasy generated by gene-, function- and genome-specific molecular evolutionary phenomena such as rate heterogeneity, GC content bias, RNA editing and protein structural constraints25, 26. To optimize the performance of phylogenetic methods in analysing complex diversification patterns in early angiosperms22, 23, we included 97 species, 95 genera and 55 families of basal angiosperms, essentially sampling all living families5, 8, 11, 20, 21. Eight gymnosperms from eight families were used as outgroups. The DNA sequences were analysed with parsimony methods; bootstrap (BS) and jackknife (JK) analyses were conducted to measure stability of phylogenetic patterns.

The same single most parsimonious tree was found in each of 1,000 random taxon-addition replicates in the analysis (Fig. 1). Amborella, a shrub of the monotypic New Caledonian family Amborellaceae, is sister to all other angiosperms, which are strongly (90% BS and 92% JK) supported as a monophyletic group. The next diverging lineage corresponds to Nymphaeales, the water lilies; its sister clade of the remaining angiosperms receives 98% BS and 99% JK support. The third clade consists of two small Australasian families, Austrobaileyaceae and Trimeniaceae, and two small eastern Asia-eastern North America disjunct families, Illiciaceae and Schisandraceae (Illiciales). All remaining angiosperms (euangiosperms) make up a strongly supported large clade (97% BS and 99% JK). The relationships among lineages within euangiosperms are resolved in the shortest tree but generally receive less than 50% BS support. All major lineages, however, are strongly supported; these agree with previous classifications5, 8, 11 and results of cladistic analyses of morphological and molecular data9, 20, 21, 27. Among gymnosperms, two gnetalean genera, Gnetum and Welwitschia, are not sister to angiosperms as suggested by the Anthophyte Hypothesis1, but fall close to the conifers.

Figure 1: The single most parsimonious tree found in the five-gene DNA sequence analysis (tree length, 13,240 steps; consistency index, 0.413; retention index, 0.604).
Figure 1 : The single most parsimonious tree found in the five-gene DNA sequence analysis (tree length, 13,240 steps; consistency index, 0.413; retention index, 0.604). Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

Numbers above branches are branch lengths (ACCTRAN optimization); those below in italics are bootstrap values (only those above 50% are shown; for branches related to ANITA (bold type), numbers below branches before the slash are bootstrap values and those after are jackknife values). GYM, gymnosperms; AMB, Amborella; NYM, Nymphaeales; ITA, Illiciales, Trimeniaceae and Austrobaileya; CER, Ceratophyllum; MON, monocots; CHL, Chloranthaceae; WIN, Winterales; PIP, Piperales; MAG, Magnoliales; LAU, Laurales; EUD, eudicots; Acorus_c, A. calamus; Acorus_g, A. gramineus; Ceratophyllum_d, C. demersum; Ceratophyllum_s, C. submersum.

High resolution image and legend (0K)

We observed one INDEL (insertion/deletion) in matR that supports the basal position of Amborella, Nymphaeales and Illiciales-Trimeniaceae-Austrobaileya (ANITA) in angiosperms: an 18-base-pair (bp) deletion in all euangiosperms but not in ANITA or gymnosperms, some of which have 6–15-bp deletions (Fig. 2). Although we cannot rule out the possibility that the sequence in the INDEL region of ANITA and gymnosperms results from independent insertions, two lines of evidence suggest that this scenario is unlikely. The sequences are found in all three ANITA lineages and all four gymnosperm lineages. Furthermore, there are identical or similar codons shared by ANITA and gymnosperms in the INDEL.

Figure 2: The portion of the aligned matrix from mitochondrial matR showing the INDEL that distinguishes euangiosperms (top block) from ANITA (middle block) and gymnosperms (bottom block).
Figure 2 : The portion of the aligned matrix from mitochondrial matR showing the INDEL that distinguishes euangiosperms (top block) from ANITA (middle block) and gymnosperms (bottom block). Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, or to obtain a text description, please contact npg@nature.com

Dots indicate nucleotides identical to the top sequence, and dashes indicate gaps. Sequences of Arabidopsis, Oenothera, Vicia, Solanum and Triticum (from GenBank; not used in the phylogenetic analysis) are included here to show the INDEL status in derived eudicots and monocots. The codon grouping shown is the correct reading frame. Abbreviations as in Fig. 1.

High resolution image and legend (0K)

Reconstruction of deep phylogenies using DNA sequences has been plagued by problems caused by rate heterogeneity, weak phylogenetic signal in single genes, insufficient taxon sampling, explosive radiation, extinction and protein structural constraints25, 26. In retrospect, our earlier studies using single genes suffered from some of these problems when Ceratophyllum was found to be sister to all other angiosperms20, 21. The same concern could still be raised about our results presented here; however, the use of five genes with different functions from all three genomes and the sampling of almost all basal angiosperm families is likely to have reduced considerably the effect of these problems. Five lines of independent evidence support our identification of ANITA as the earliest angiosperms from the DNA sequence analysis. First, we carried out five single-gene analyses. Three of them, atp1, matR and atpB, placed ANITA at the base of angiosperms, with the exception that in the atp1 analysis, Amborella fell into eudicots because of its divergent sequence. This example, together with that of Ceratophyllum in rbcL analyses20, 21, exposes the weakness of single-gene analysis, the very reason we conducted this multigene analysis. The 18S rDNA analysis yielded a polytomy among basal angiosperms (but an earlier study of 18S rDNA sampling 223 seed plants identified ANITA as the earliest angiosperms19), whereas analysis of rbcL still rooted the angiosperm tree at Ceratophyllum, consistent with earlier results20, 21 (removal of Ceratophyllum, however, changed the root to ANITA). We also analysed amino-acid sequences of the four protein-coding genes and still found ANITA at the base of angiosperms.

Second, the INDEL in matR, being a non-point-mutation character and thus less prone to convergence than nucleotide substitutions, clearly separates ANITA from euangiosperms and places them closer to the gymnosperms. Third, a study using duplicated phytochrome genes to root the angiosperm phylogeny and two analyses of multiple genes with different taxon sampling have independently corroborated that ANITA are the earliest angiosperms27, 28, 31. Fourth, all ANITA members (except Illiciaceae) share one morphological feature: carpel closure at anthesis through occlusion of the inner space by secretion. This feature is rare in euangiosperms and is probably a primitive condition in the earliest angiosperms29. Last, fossil evidence, although awaiting further discovery, is beginning to show a pattern consistent with our topology. Many floral structures from the Early Cretaceous show similarities to those of ANITA1, 12, 13. Taken together, these lines of evidence provide strong and clear support for this deep split of angiosperms.

Identifying the earliest angiosperms solves one of the two critical pieces of the angiosperm origin mystery2, and will contribute to solving the other piece: determining the sister group of angiosperms among extant and fossil gymnosperms. Our analyses indicate that the prevalent view of the past two decades that Gnetales are sister to angiosperms is incorrect. These results will also help elucidate the evolution of several features that contributed to the rise and ultimate dominance of angiosperms in modern terrestrial ecosystems, for example, floral development, carpel closure, insect pollination and double fertilization. Finally, these findings will allow selection of appropriate model organisms to investigate critical changes in genomic evolution and developmental programmes, which have led to the overwhelming diversity we see today.

This study also has important implications for future work on reconstructing ancient phylogenies. DNA sequence analyses, although proven to be powerful in reconstructing recent evolutionary histories, have often failed to resolve difficult, long-standing controversies in organismal diversification patterns25. Although many problems may be responsible for this failure, one obvious aspect of experimental design has been paradoxically overlooked: collection of enough data to reduce sampling error. Several studies experimenting with extensive sampling of either taxa or characters have been published17, 19, 20, 26, but few with both (see ref. 27). Our analysis demonstrates that extensive sampling of both taxa and characters is important for resolving difficult phylogenetic issues.



Gene sequencing

Total cellular DNA was extracted from fresh or silica-gel-dried leaves using the standard CTAB method (see ref. 21). The genes were amplified by conventional PCR and sequenced using an ABI-PRISM 377 DNA sequencer (PE Applied Biosystems) according to manufacturer's protocols with modifications. The five genes were aligned individually, using XPILEUP from the GCG package (Genetics Computer Group, Inc.) with different settings for gap-creation penalty and gap-extension penalty. Minor manual adjustments were made after computer alignment.

Phylogenetic analyses

For the taxa analysed, all 105 species had rbcL sequences, and 93, 88, 98 and 101 species had atpB, 18S rDNA, matR and atp1 sequences, respectively (missing data for critical taxa: Kadsura: 18S rDNA, Trimenia: atp1, Cycas and Zamia: atpB, and Metasequoia and Podocarpus: matR). Each taxon had data for at least three out of the five genes. Parsimony (equal weighting) analyses were carried out using PAUP*4.0b2 (ref. 30). To search for islands of shortest trees, a heuristic search was conducted using 1,000 random taxon-addition replicates, one tree held at each step during stepwise addition, TBR branch swapping, steepest descent option in effect. MulTrees option in effect and no upper limit of MaxTrees. Both bootstrap and jackknife (50% character deletion) analyses were conducted using 1,000 resampling replicates and the same tree search procedure as described above except with simple taxon addition. The data matrix is available as Supplementary Information at http://www.nature.com.

All atp1 and matR, and some atpB, rbcL and 18S rDNA sequences were generated in this study, deposited in GenBank under accession numbers AF197576-AF197815; remaining sequences were from GenBank and ref. 27.



  1. Crane,P. R., Friis,E. M. & Pedersen,K. R. The origin and early diversification of angiosperms. Nature 374, 27–33 (1995). | Article | ISI | ChemPort |
  2. Darwin,C. in More Letters of Charles Darwin: A Record of His Work in a Series of Hitherto Unpublished Letters Vol. 2 (eds Darwin, F. & Seward, A. C.) 20–22, 26–27 (John Murray, London, 1903).
  3. Arber,E. A. N. & Parkin,J. On the origin of angiosperms. Bot. J. Linnean Soc. 38, 29–80 (1907).
  4. von Wettstein,R. R. Handbuck der Systematischen Botanik. II. Band (Franz Deuticke, Wien, 1907).
  5. Takhtajan,A. Flowering Plants: Origin and Dispersal (Oliver and Boyd, Edinburgh, 1969).
  6. Doyle,J. A. Origin of angiosperms. Annu. Rev. Ecol. Syst. 9, 365–392 (1978). | Article | ISI |
  7. Endress,P. K. Reproductive structures and phylogenetic significance of extant primitive angiosperms. Pl. Syst. Evol. 152, 1–28 (1986). | ISI |
  8. Cronquist,A. The Evolution and Classification of Flowering Plants 2nd edn (The New York Botanical Garden, New York, 1988).
  9. Donoghue,M. J. & Doyle,J. A. in Evolution, Systematics, and Fossil History of the Hamamelidae Vol. 1 (eds Crane, P. R. & Blackmore, S.) 17–45 (Clarendon, Oxford, 1989).
  10. Doyle,J. A. Cretaceous angiosperm pollen of the Atlantic Coastal Plain and its evolutionary significance. J. Arnold Arbor. 50, 1–35 (1969). | ISI |
  11. Walker,J. W. & Walker,A. G. Ultrastructure of lower Cretaceous angiosperm pollen and the origin and early evolution of flowering plants. Ann. Missouri Bot. Gard. 71, 464–521 (1984). | ISI |
  12. Friis,E. M., Pedersen,K. R. & Crane,P. R. Angiosperm floral structures from the Early Cretaceous of Portugal. Pl. Syst. Evol. (Suppl.) 8, 31–49 (1994).
  13. Friis,E. M., Pedersen,K. R. & Crane,P. R. Early angiosperm diversification: the diversity of pollen associated with angiosperm reproductive structures in Early Cretaceous floras from Portugal. Ann. Missouri Bot. Gard. 86, 259–296 (1999). | ISI |
  14. Walker,J. W., Brenner,G. J. & Walker,A. G. Winteraceous pollen in the lower Cretaceous of Israel: early evidence of a magnolialean angiosperm family. Science 220, 1273–1275 (1983). | ISI |
  15. Taylor,D. W. & Hickey,L. J. An Aptian plant with attached leaves and flowers: implications for angiosperm origin. Science 247, 702–704 (1990). | ISI |
  16. Sun,G., Dilcher,D. L., Zheng,S. & Zhou,Z. In search of the first flower: a Jurassic angiosperm, Archaefructus, from Northeast China. Science 282, 1692–1695 (1998). | Article | PubMed | ISI | ChemPort |
  17. Martin,P. G. & Dowd,J. M. Studies of angiosperm phylogeny using protein sequences. Ann. Missouri Bot. Gard. 78, 296–337 (1991). | ISI |
  18. Hamby,R. K. & Zimmer,E. A. in Molecular Systematics of Plants (eds Soltis, P. S., Soltis, D. E. & Doyle, J. J.) 50–91 (Chapman and Hall, New York, 1992).
  19. Soltis,D. E. et al. Angiosperm phylogeny inferred from 18S ribosomal DNA sequences. Ann. Missouri Bot. Gard. 84, 1–49 (1997). | ISI |
  20. Chase,M. W. et al. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Ann. Missouri Bot. Gard. 80, 528–580 (1993). | ISI |
  21. Qiu,Y.-L., Chase,M. W., Les,D. H. & Parks,C. R. Molecular phylogenetics of the Magnoliidae: cladistic analyses of nucleotide sequences of the plastid gene rbcL. Ann. Missouri Bot. Gard. 80, 587–606 (1993). | ISI |
  22. Hillis,D. M. Inferring complex phylogenies. Nature 383, 130–131 (1996).  | Article | PubMed | ISI | ChemPort |
  23. Graybeal,A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47, 9–17 (1998). | Article | PubMed | ISI | ChemPort |
  24. Soltis,D. E. et al. Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. Syst. Biol. 47, 32–42 (1998). | Article | PubMed | ISI | ChemPort |
  25. Qiu,Y.-L. & Palmer,J. D. Phylogeny of early land plants: insights from genes and genomes. Trends Plant Sci. 4, 26–30 (1999). | Article | PubMed | ISI |
  26. Naylor,G. J. P. & Brown,W. M. Structural biology and phylogenetic estimation. Nature 388, 527–528 (1997). | Article | PubMed | ISI | ChemPort |
  27. Soltis,P. S., Soltis,D. E. & Chase,M. W. Angiosperm phylogeny inferred from multiple genes as a research tool for comparative biology. Nature 402 402–404 (1999).  | Article | PubMed | ISI | ChemPort |
  28. Mathews,S. & Donoghue,M. J. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286, 947–950 (1999).  | Article | PubMed | ISI | ChemPort |
  29. Endress,P. K. & Igersheim,A. Gynoecium diversity and systematics of the Laurales. Bot. J. Linnean Soc. 125, 93–168 (1997). | Article | ISI |
  30. Swofford,D. L. PAUP*4.0b2: Phylogenetic Analysis Using Parsimony. (Sinauer, Sunderland, Massachusetts, 1998).
  31. Parkinson,C. L., Adams,K. L. & Palmer,J. D. Multigene analyses identify the three earliest lineages of extant flowering plants. Curr. Biol. (in the press).

Supplementary Information

Supplementary information accompanies this paper.



We thank C. D. K. Cook, M. E. Endress, P. K. Endress, E. M. Friis, O. Nandi and R. Rutishauser for critical reading of the manuscript, R. Collett, A. Floyd, B. Hall and S. S. Renner for plant material, and the Swiss NF and US NSF for financial support.