Abstract
Comparative biology requires a firm phylogenetic foundation to uncover and understand patterns of diversification and evaluate hypotheses of the processes responsible for these patterns. In the angiosperms, studies of diversification in floral form1, 2, stamen organization3, reproductive biology4, photosynthetic pathway5, nitrogen-fixing symbioses6 and life histories7 have relied on either explicit or implied phylogenetic trees. Furthermore, to understand the evolution of specific genes and gene families, evaluate the extent of conservation of plant genomes and make proper sense of the huge volume of molecular genetic data available for model organisms8 such as Arabidopsis, Antirrhinum, maize, rice and wheat, a phylogenetic perspective is necessary. Here we report the results of parsimony analyses of DNA sequences of the plastid genes rbcL and atpB and the nuclear 18S rDNA for 560 species of angiosperms and seven non-flowering seed plants and show a well-resolved and well-supported phylogenetic tree for the angiosperms for use in comparative biology.
Efforts to infer angiosperm phylogeny have greatly improved our understanding of the major lineages of flowering plants9, 10, 11, 12, 13, 14. However, despite these advances, the phylogenetic trees inferred from these studies are not completely congruent in the inter-relationships portrayed among the major lineages (although the alternatives are not strongly supported), and nearly all trees suffer from areas of poor resolution and/or weak support, usually due to low levels of divergence. In addition, molecular studies using hundreds of taxa present difficult analytical problems; the large number of taxa results in a huge number of possible trees15, increasing the amount of time needed to conduct a thorough analysis and decreasing the chances of finding the optimum tree(s). Phylogenetic analyses of 500 rbcL sequences did not find the most parsimonious trees9; further analyses of these data found slightly shorter trees16, 17. An analysis based on 18S rDNA11 also probably failed to find the most parsimonious trees, in spite of extensive analyses. Thus, despite a decade of major improvements in our understanding of angiosperm phylogeny, the picture is not yet complete.
The analytical issues involved in large-scale phylogenetic analyses have been discussed18, 19. Both simulations20, 21 and empirical studies18 indicate that additional data can improve phylogenetic inferences from large data sets. Analyses of angiosperm relationships on the basis of gene sequences for rbcL, atpB and 18S rDNA for 190 angiosperm species and 3 outgroups showed increased resolution and internal support (as measured by bootstrap values), and faster run times when the data sets for these genes were combined rather than analysed separately. These studies indicated that additional data (that is, gene sequences) and taxa could improve inferences of angiosperm phylogeny.
Our analysis, which is based on parsimony analyses of 4,733 aligned nucleotides, provides a well-resolved and well-supported estimate of angiosperm phylogeny. Most of the major clades and some of the smaller ones recovered in this analysis were also found in previous studies9, 11, 12, 13. However, contrary to previous analyses based on data for one or two genes, all major clades and most of the spine of the tree receive jackknife (JK) support equal to or greater than 50%. Amborellaceae are the sister to all remaining angiosperms, consistent with results inferred from 18S rDNA11 and atpB13 alone; the JK value for the clade of all angiosperms except Amborella is 65%. Nymphaeaceae (with a JK value of 100%) are then sister to all remaining angiosperms (JK 72%), followed by a clade of Austrobaileya, Illicium and Schisandra (JK for this clade of three genera 100%). This same branching order was found with stronger support in analyses based on five genes and nearly 9,000 base pairs (bp) of sequence data14 and is further corroborated by analyses of duplicate phytochrome genes22. The remaining angiosperms (JK 71%) form two major clades. One of these (JK 56%) consists of Chloranthaceae, Magnoliales, Laurales, Winterales and Piperales, all classified in subclass Magnoliidae23, and the monocots. Each of these branches is strongly supported (JK
95%), but the relationships among these clades have JK values of less than 50%.
Within the monocots, Acorus is sister to a clade containing all other monocots (JK 99%). Alismatales are the next branching monocots and sister to a large clade (JK 99%) that comprises six main lineages: Petrosaviaceae, Dioscoreales, Pandanales, Liliales, Asparagales and commelinoids. Although all but Asparagales (JK 56%) and commelinoids (JK 68%) have JK percentages greater than 80%, relationships among clades of monocots are poorly resolved and/or weakly supported. The commelinoids, in turn, comprise Arecales (palms), Poales (including grasses), Commelinales and Zingiberales (gingers) as successive sister groups, although relationships among these four clades are not strongly supported. Only 102 out of the roughly 65,000 species of monocots24 were included in this analysis; more extensive sampling of monocots and discussion of relationships are given in ref. 25.
The second major clade above the first three basal branches comprises Ceratophyllum as the weakly supported sister (JK 53%) to the eudicots, that is, all remaining angiosperms. The eudicots (JK 99%) comprise a series of successively branching orders: Ranunculales, Proteales, Trochodendrales, Gunneraceae/Myrothamnaceae, and a large clade of 'core eudicots' (JK 100%). This last clade contains the majority of all angiosperm species. Major clades within the core eudicots are Saxifragales as sister to the remaining rosids; Caryophyllales; and asterids, which comprise Cornales, Ericales, euasterids I and euasterids II. Model organisms in the rosids include Arabidopsis, Brassica, Gossypium and legumes. Nitrogen-fixing symbioses with nodulating bacteria arose within this clade and are confined to a portion of the eurosid I subclade. Although this 'nitrogen-fixing clade' has been found in previous analyses6, 11, 13, this is the first study to our knowledge to provide strong support for it. Like the rosids, the asterid clade was recognized in previous studies9, 11, 12, 13, but without strong support. The model organisms Antirrhinum, Nicotiana, Petunia, Solanum and Helianthus are found in this large asterid clade. Familiar families found within each clade are given in Table 1. All three major clades of core eudicots identified here cut across subclass boundaries established in previous classifications23, 24. For the complete tree see Supplementary Information.Fig. 1
Figure 1: Summary of phylogenetic relationships for angiosperms inferred from analysis of rbcL, atpB and 18S rDNA sequences; the jackknife consensus tree (for groups receiving >50% support) is shown.

The shortest trees found were 45,100 steps. The number of species in each clade is given in parentheses; not all 560 species occurred in a clade portrayed in this summary tree. Jackknife support is given below branches.
High resolution image and legend (0K)Table 1: Familiar taxa and the clades in which they are placed in the phylogenetic tree shown in Fig. 1
Although some minor portions of the tree are weakly supported, the first branches of angiosperm phylogeny, Amborella, Nymphaeales, Austrobaileya-Illicium-Schisandra (plus Trimenia14) now seem clear14, 22. The relationships among orders of magnoliids are only weakly supported, as is the position of the monocots. Within the eudicots, the branching order of the major clades is fairly clear, but levels of support could be increased by additional data. Finally, although the core eudicots form a clade (JK 100%), the relationships among rosids, Caryophyllales, asterids and a number of smaller clades lack substantial JK support. Additional data, both molecular and morphological, could increase support for these relationships.
Despite some areas of poor resolution and/or weak support, the general structure of angiosperm phylogeny is now clear and well supported. Using this framework, long-standing questions of plant diversification can be addressed, and recent discoveries can be placed in the proper historical context. For example, petals have apparently arisen multiple times within the angiosperms, from sepals in some cases to stamens in others1, 26, 27, but the phylogenetic pattern of petal derivation is not clear. Using a phylogenetic tree such as ours, it would be possible to assess the homology of 'petals' throughout the angiosperms and then to search for genetic and developmental mechanisms of 'petal' formation28. Other aspects of floral evolution, such as the origin, maintenance and diversification of synorganized flowers1 in the core eudicots and patterns of stamen organization3, can also be addressed. Inferences of diversification of other angiosperm characteristics, such as morphology, physiology, ecology and genome structure, are now possible on a much more robust basis than previously possible. Although large analyses of the angiosperms have been reported9, 11, 12, 13, this study provides both greater resolution and stronger support for the relationships inferred. The use of broad taxon sampling and many characters limits spurious results due to homoplasy and unequal rates of evolution among taxa and provides stronger support for relationships because historical information from all three genes is combined. Our tree, which employs three times the quantity of data used to construct most other trees, provides a substantially more secure foundation for inferring the evolution of key features in the angiosperms.
Methods
Selection of species was designed to sample all major groups of angiosperms, using recent classifications23, 24 and phylogenetic studies11, 13, 16 as guides. Sequences of rbcL, atpB, and 18S rDNA were amplified by polymerase chain reaction (PCR) and sequenced using an ABI 377 automated DNA sequencer. Details of sampling, DNA extraction, PCR amplification, sequencing and alignment will be published elsewhere (D.E.S. et al., manuscript in preparation). Voucher information, GenBank numbers and the aligned data matrix are available at http://www.wsu.edu:8080/~soltilab/, http://www.kewgardens.org and http://www.ucjeps.berkeley.edu/bryolab/greenplantpage.html. Heuristic parsimony analyses were conducted using PAUP* 4.0 (ref. 29) and the ratchet (K. Nixon) with NONA (P. Goloboff). Parsimony JK analyses30 (1,200 replicates, each with ten random-entry order replicates and branch swapping) were used to measure support for the topology. The gymnosperms Ephedra, Gnetum, Welwitschia, Ginkgo, Pinus, Podocarpus and Taxus were specified as the outgroup.


