Plants form belowground associations with mycorrhizal fungi in one of the most common symbioses on Earth. However, few large-scale generalizations exist for the structure and function of mycorrhizal symbioses, as the nature of this relationship varies from mutualistic to parasitic and is largely context-dependent. We announce the public release of MycoDB, a database of 4,010 studies (from 438 unique publications) to aid in multi-factor meta-analyses elucidating the ecological and evolutionary context in which mycorrhizal fungi alter plant productivity. Over 10 years with nearly 80 collaborators, we compiled data on the response of plant biomass to mycorrhizal fungal inoculation, including meta-analysis metrics and 24 additional explanatory variables that describe the biotic and abiotic context of each study. We also include phylogenetic trees for all plants and fungi in the database. To our knowledge, MycoDB is the largest ecological meta-analysis database. We aim to share these data to highlight significant gaps in mycorrhizal research and encourage synthesis to explore the ecological and evolutionary generalities that govern mycorrhizal functioning in ecosystems.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
Plant performance is largely a function of the plant-symbiotic microbiome1. As a result, ecosystem functions and the vital services humans derive from them (e.g., food and fiber production, carbon sequestration) are fundamentally dependent on the interactions plants have with symbionts. Although symbioses are common, our knowledge of their impact on ecosystem functions and services is relatively incomplete. Broad generalizations about the relationships between plants and their symbionts are limited due to the context-dependent nature of such symbioses, which exist along a continuum of possible outcomes, from mutualistic to parasitic. Examining the results of many experiments through meta-analysis allows for larger-scale generalizations than individual experiments can provide independently and can lead to synthesis-generated evidence2. Furthermore, including phylogenetic information in meta-analyses can account for correlated evolutionary relationships among taxa used in multiple studies. Understanding ecological outcomes of symbioses, and the environmental and evolutionary context contributing to such outcomes, is crucial to maintaining and restoring the ecosystem functions and services upon which humans depend.
Mycorrhizal fungi form an ancient symbiosis with most plants on Earth3,4. The host plant and associated fungi form a trading partnership where the fungi increase the effective absorptive capabilities of the plant, delivering nutrients and water in exchange for plant-derived photosynthates5. Since most plants associate with mycorrhizal fungi, the outcome of this symbiosis can influence ecosystem structure, function, and services mediated by plant productivity. Several empirical studies have individually demonstrated how host plant traits and identities, fungal partners, soil biotic and abiotic conditions, and experimental conditions can alter the structure and function of mycorrhizal symbioses6,
Statistical methods to simultaneously examine multiple ecological and evolutionary factors in a meta-analysis framework have been recently developed16,
Here we present MycoDB, a large database of mycorrhizal inoculation experiments, linked with plant and fungal phylogenies (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1), to facilitate tests on the ecological and evolutionary contexts in which the addition of mycorrhizal fungi is beneficial or parasitic to plant hosts. MycoDB focuses on studies of two dominant types of mycorrhizal fungi, ectomycorrhizal (EM) fungi and arbuscular mycorrhizal (AM) fungi, because they predominate among published studies on mycorrhizal symbioses. MycoDB contains data on plant productivity response to mycorrhizal fungi from 4,010 studies (from 438 unique publications) and is organized in a hierarchical fashion such that a single publication can contain multiple discrete experiments and a single experiment can contain multiple studies. The ecological and evolutionary context of studies can be explored with 24 additional explanatory variables (e.g., plant functional group, inoculum complexity, plant or fungal origin; Table 1 (available online only) and Table 2) and mycorrhizal fungal and plant host phylogenetic trees (Figs 1 and 2). MycoDB can be used to model phylogenetic heritability of plant response to mycorrhizal fungi in host plant lineages, fungal lineages, and their interaction, as well as explore the relationship among explanatory variables and plant response to mycorrhizal fungi, while controlling for the influence of plant and fungal phylogenies.
Overview and literature searches
MycoDB (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1) contains data from three main phases of data collection and validation. Phase I occurred in 2005, when we identified 1852 publications by conducting an initial literature search of the ISI Web of Science database using the key words mycorrhiz* and inocul* (on January 22, 2005). From this initial list, 134 publications were selected, in random order, as having met our inclusion criteria for meta-analysis such as reporting plant biomass response, use of a mycorrhizal addition treatment, and inclusion of a non-inoculated control (see ‘Criteria for inclusion’ below). More publications from the initial list of 1852 publications likely met our criteria, but were excluded from Phase I of database construction because of time constraints. Data from 49 additional publications on EM fungi were added from a previous meta-analysis22 to reduce dominance of the data by studies on AM symbioses. This process resulted in a total of 183 publications summarized in MycoDB during Phase I. In 2010, as part of an NCEAS (National Center for Ecological Analysis and Synthesis) Distributed Graduate Seminar conducted across nine institutions, we began Phase II of the data collection to dramatically increase the size of MycoDB with several targeted literature searches. On September 21, 2010, we conducted searches of the ISI Web of Science database using the following search terms: (1) (mycorrhiz* or ectomyc* or endomyc* or arbuscul* or vesicular*) and inocul* resulting in 4,013 papers; (2) search terms from (1) AND restoration or rehabilitation or reclamation or revegetation or reforestation resulting in 305 papers; (3) search terms from (1) AND local adaptation or strain or isolate or genotype or ecotype or geograph* resulting in 627 papers; (4) search terms from (1) AND tissue P or tissue N or shoot P or shoot N or leaf P or leaf N resulting in 793 papers; (5) search terms from (1) AND Gigaspor* or Acaulospor* or Scutellospora or Archaeospora resulting in 387 papers. Searches 2–5 were designed to enrich the database in studies relevant to restoration ecology, local adaptation, influences of nutrients, and AM fungi besides Glomus, because these topics were identified either as being of interest for planned focused meta-analyses (restoration, local adaptation) or as under-represented in the Phase I search results (influences of nutrients, AM fungi besides Glomus). The results from all five searches were collated, duplicates (as well as publications already included in MycoDB) were removed, and papers were selected that appeared in at least one of the four focused searches (2–5), resulting in a list of 1,768 publications. Again, from this larger list, 255 publications were selected, in random order, as having met our inclusion criteria for meta-analysis such as reporting plant biomass response, use of a mycorrhizal addition treatment, and inclusion of a non-inoculated control (see below ‘Criteria for inclusion’), and their data were added to MycoDB. After Phase II, MycoDB contained data from a total of 4,010 studies from 438 publications. Phase III of the creation of MycoDB consisted of extensive data validation and the creation of phylogenetic trees for all plant species and fungal genera in the database.
A subset of MycoDB was used to study how edaphic properties, plant functional groups, and microbial community complexity determine the outcome of mycorrhizal symbioses21. Different subsets of these data have been used to explore local adaptation among plants, mycorrhizal fungi, and soils (Rúa et al. in review) as well as how partner identity, colonization levels, and P fertilization impact plant host response to EM associations22.
Criteria for inclusion
Prior to inclusion in MycoDB, publications were screened for meta-analysis appropriateness and for an experimental design that was amenable to our research questions. It was required that all studies compare results of a mycorrhizal inoculation treatment (or several treatments) to a non-inoculated control. In other words, studies must compare plant response for some addition of mycorrhizal fungi to no addition. The method of inoculation varies among studies in MycoDB and can include the addition of spores, roots, mycelia, pot culture, field soil, or any combination thereof. Studies that apply mycorrhizal fungi to all treatments, eliminate fungal presence (e.g., using fungicide application), or otherwise manipulate ecological factors to promote or suppress fungi were not included. We included studies with unsterilized background soil, many of which likely contained propagules of mycorrhizal fungi in all treatments, though this could not be confirmed because fungal colonization data was not consistently reported. If an experiment contained the manipulation of a factor in addition to mycorrhizal fungi (e.g., fertilizer treatment, soil amendment), the results are included in MycoDB as separate studies within the same experiment. All studies report mean plant biomass data as the response variable. Studies could report shoot biomass, total biomass, or root biomass and shoot biomass. Studies must report means, but were still included if measures of dispersion (e.g., standard error, standard deviation, or error bars on figures) were not given as was the case in 91% of studies. If sample size was not given, the associated parameter ‘n’ was coded as 1 for both inoculated treatments and non-inoculated controls, reducing the weight of the study relative to what it would be if the sample size was known. Data presented in tables were extracted directly, but data presented in figures were extracted using Engauge Digitizer software version 4.123. Data on additional explanatory variables (e.g., plant family, plant functional group) were also extracted from the publication text when available or looked up using supplementary peer-reviewed resources. Both lab studies and field studies are included in MycoDB and coded separately using the variable LOCATION (Table 1 (available online only)). No limitation was placed on the duration of study for inclusion in MycoDB; in the case of studies that examined plant biomass over a time series, only data from the last sampling event was included. Data were then entered into MycoDB using a custom web-based data entry interface and database that matched inoculated treatments with non-inoculated controls24.
Plant and fungal phylogenetic tree construction
We constructed plant and fungal phylogenetic trees for all the species in MycoDB using a composite phylogeny approach, which combines taxonomic and phylogenetic information into a single tree16,25,26. For plant phylogeny, we derived phylogenetic topology from existing ‘supertrees’ and assigned well-supported divergence times to all possible internal bifurcations (evolutionary divergence event) using TimeTree27 as a source of published divergence times. The remaining unknown branch lengths were rooted with known divergence dates28 and arbitrarily set and scaled to yield an ultrametric tree wherein all extant species were lined up at the present date. In cases where taxonomic nomenclature has changed since the creation of the original ‘supertrees’ or publication of papers used in the database, names were changed manually to reflect current consensus taxonomy. Pairwise shared branch-lengths were then used to calculate a variance-covariance matrix, which can be used in mixed multifactor meta-analyses.
For the fungal composite phylogeny, we manually reconstructed the evolutionary relationships among different genera based on known or commonly accepted taxonomy using information from previously published reports. Fungal taxonomy, particularly of AM fungi, has undergone major revisions during the duration of the compilation of MycoDB. We traced the evolution of fungal taxa into current consensus systematics29; however, because of ambiguity in species identification of AM fungi, we only included fungal taxonomic identification to the genus level. Even so, some taxa formerly named Glomus could not be placed definitively within current genera and were therefore left as Glomus. In the case of the AM fungal phylogenetic branch, the composite tree topologies between different genera of this clade were informed by taxonomic position of the type species of each genus (when possible) in relationship with the taxonomic position of the other type species of another genus29,
Construction and manipulation of composite phylogenies was conducted using R Statistical Software35 (version 3.0.2), the ape package in R36, Phylocom37, and Phylomatic37. The files contain the fungal and plant composite phylogenies in the Newick tree format. The FungalTree_version1.txt file (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1) represents the evolutionary relationships among different fungal genera that exist in the MycoDB database. Similarly, the PlantTree_version1.txt file (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1) represents the evolutionary relationships between different plant species that exist in this database, with each node of the plant’s composite tree labeled with corresponding higher taxonomic classification.
Data in MycoDB are organized in a hierarchical manner as a single publication often contained data from multiple discrete experiments and studies (i.e., trials) on multiple plant hosts. As such, the 438 publications in MycoDB contain data for 4,010 studies (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1). A study is defined as a comparison of average plant performance between plants that were inoculated with mycorrhizal fungi (AM or EM, never both) and plants that were not inoculated. Table 1 (available online only) contains detailed meta-data for all variables in MycoDB including descriptions of variables and levels. Figure 1 demonstrates the frequency and distribution of unique plant species and fungal genus combinations (669 total) contained in MycoDB. For example, the most frequently reported mycorrhizal combination was Pinus pinaster (maritime pine) inoculated with species from the EM fungal genus Pisolithus (106 studies). The two inlay graphs represent the most common plant species in MycoDB, separated by mycorrhizae type (AM vs EM). Lines to plant species indicate the number of fungal genera in association with each plant species. For example, the most common plant species in MycoDB are Zea mays (corn, 217 studies) and Eucalyptus globulus (blue gum, 148 studies), in association with AM fungi and EM fungi, respectively. Figure 2 is a heat map representing the frequency of studies according to unique plant host and fungal genus combinations and their phylogenetic relationships. For EM fungi, the most commonly represented plant-fungal combinations in MycoDB occur between plants within the Pinaceae growing in association with Pisolithus fungi. For AM fungi, hotspots occur within the Poaceae, Solanaceae, and Fabaceae grown in association with Rhizoglomus and Funneliformis fungi. As these plant families are important to forestry and agriculture, their prevalence in the literature makes sense, but the tropics and thus a large portion of plant and fungal biodiversity are underrepresented. Figure 2 suggests that empirical work thus far regarding the mycorrhizal symbiosis is not only limited with respect to the plant and fungal species examined, but also relatively poorly represented among phylogenetically diverse clades of plants and fungi.
Although this database represents the efforts of over 80 people distributed over 10 years, these data still only represent a fraction of the total available data on plant response to mycorrhizal fungi. The number of papers published each year that fit our search criteria has grown exponentially in the time since our initial search. The 351 plant species in our database represent a small proportion of the 450,000 total plant species on Earth38, the majority of which likely associate with mycorrhizal fungi. Moreover, as might be expected, the plant taxa represented are heavily biased toward species important for agriculture and forestry (e.g., corn, tobacco, pine, eucalyptus). Similarly, the fungal taxa that are best represented in the database are taxa that have been commercially marketed, such as the ectomycorrhizal fungus Pisolithus tinctorius39. Given this uneven representation of plant and fungal species and potential correlations among closely related plant and fungal species, it is important to analyze these data using phylogenetic mixed models even when testing environmental moderators of plant responsiveness to mycorrhizal fungal inoculation.
MycoDB was prepared in anticipation of common technical problems (and statistical solutions thereof) encountered in meta-analyses. In particular, a difficulty in many ecological meta-analyses is the lack of independence of the estimates. Multiple estimates (‘studies’) extracted from the same publication may be more similar to each other than those arising from different publications due to similarities in experimental methods or context within the same publication. Multilevel meta-analytic models (with estimates nested within publications) can be used to account for such correlated data structures18. Similarly, multiple estimates may represent contrasts of different treatments that are compared against a common control condition, leading to statistical dependencies in the estimates due to reuse of information from the control condition40. Hence, identification of estimates that share a common control condition is of crucial importance (i.e., variable CTLTRTSETID in Table 1 (available online only)). Moreover, multiple studies may use the same species or different species that are phylogenetically related, and such taxonomic overrepresentation may limit the scale of inference of the meta-analysis. Inclusion of information on phylogenetic relations within a mixed-effects model can account for these correlations and allow tests of generality of results. This can be done through inclusion of categorical taxonomic (e.g., family, genus, species) variables as random effects21 or through an analysis that includes the full phylogeny41,42. Including phylogenetic information in meta-analyses can account for correlated evolutionary relationships as well as allow inference on the rates and constraints of evolution on the phenotypic character being considered18. Finally, the availability of phylogenies for both plant species and fungi offers the possibility of modeling potential coevolution and ecological interactions using appropriate mixed-effects models43.
As the largest database concerning this important symbiosis, MycoDB may prove particularly useful for the development of meta-analysis educational curriculum or statistics tutorials using ecological data. In a classroom setting, for example, the database could be used in demonstrations of single predictor meta-analyses, which could then be followed up with comparison of multiple moderator meta-analyses to demonstrate the consequences of correlated predictors. Subsets of the data could support investigation of the advantages of larger datasets in overcoming problems of correlated predictors, thereby promoting exploration of context-dependency in future meta-analyses.
Data record 1
The database file in csv format, titled ‘MycoDB_version1.csv’ (February 5, 2016 version), was uploaded to the Dryad Digital Repository (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1). Detailed meta-data for each column is located in Table 1 (available online only) of this Data Descriptor article. Taxonomic information on plant species and fungal genera contained in the database are included in Supplementary File 1 of this article.
Data record 2
The phylogenetic tree of mycorrhizal fungal genera present in Data record 1, in txt format and titled ‘FungalTree_version1.txt’ (February 5, 2016 version), was uploaded to the Dryad Digital Repository (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1). The file contains the fungal genera composite phylogeny in the Newick tree format.
Data record 3
The phylogenetic tree of plant hosts present in Data record 1, in txt format and titled ‘PlantTree_version1.txt’ (February 5, 2016 version), was uploaded to the Dryad Digital Repository (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1). The file contains the plant composite phylogeny in the Newick tree format.
We devised several layers of methods to ensure the quality of the data in MycoDB. First, random sampling of publications that resulted from our initial searches was conducted to reduce bias in which data were included in the database. Second, on the front end, data collection was conducted using a web-based custom data entry system with organized fields and drop down menus to reduce data entry error24. This approach also allowed data collection to be conducted simultaneously and remotely by multiple users. After front-end data entry by users, database administrators conducted back-end database content management to validate data integrity by examining distributions and outliers as well as iteratively hand-checking random subsets of papers for accuracy. We used the United States Department of Agriculture PLANTS Database44 and The Plant List version 1.145 to verify and update scientific names and life histories for each species included in MycoDB. Database administrators hand checked outliers and returned to original papers when data were missing, as well as added moderator variables and edited moderator levels to facilitate specific meta-analyses. Finally, for the local adaptation study subset of MycoDB, all entries were compared with the original paper and corrected when necessary. The data validation methods used to create MycoDB satisfy all data-related compliance criteria designed to promote methodological quality in ecological meta-analyses46.
MycoDB is deposited in the Dryad Digital Repository at http://dx.doi.org/10.5061/dryad.723m1 and publicly available under the CC0 public domain dedication, given proper scholarly citation of the version used and this data descriptor. We recommend that, prior to publication, users validate data subsets against original publications.
How to cite this article: Chaudhary, V. B. et al. MycoDB, a global database of plant response to mycorrhizal fungi. Sci. Data 3:160028 doi: 10.1038/sdata.2016.28 (2016).
Chaudhary, V. B. Dryad Digital Repository http://dx.doi.org/10.5061/dryad.723m1 (2016)
We are thankful to all participants who contributed to the creation of MycoDB: Lynette Abbott, Chris Allen, Brady Allred, Markandu Anputhas, Kristine Averill, Allison Baker, Wes Beaulieu, Reed Couch, Ahn-Heum Eom, Hui Fang, Christopher Fernandez, Aaron Godin, Mitchell Greer, Jill Grenon, Kristin Haider, Brynn Heckel, Kris Hennig, Nicole Hergott, Taylor Holland, David Johnson, John Klironomos, Alexander Koch, Liz Koziol, Andy Krohn, Anna Larimer, Becky Mau, Ann McCauley, Luke McCormack, Elizabeth Middleton, Meghan Midgley, R. Michael Miller, Peter Moutoglis, Bailey Nicholson, Brian Ohsowski, Gregory Penn, Michael Perkins, Krittika Petprakob, Anne Pringle, Stephen Russell, Mark Schwartz, Ben Sullivan, Bill Swenson, Miranda Welsh, Raymond West, Steven Wilkening, and Catherine Zabinski. This project was supported financially by a working group and Distributed Graduate Seminar through the National Center for Ecological Analysis and Synthesis, which in turn is supported by the National Science Foundation (NSF; DEB-0072909), the University of California at Santa Barbara and the state of California; The Radcliffe Institute for Advanced Study at Harvard University; and the National Evolutionary Synthesis Center, which receives support from an NSF grant (EF-0423641), Duke University, the University of North Carolina, Chapel Hill and North Carolina State University. V.B.C. was supported by the NSF (grants DGE-0549505, DGE-0742483) and Loyola University Chicago. MAR was supported by an NSF Postdoctoral Research Fellowship in Biology under Grant No. DBI-12-02676 and a postdoctoral fellowship at NIMBioS, sponsored by NSF grant no. DBI-1300426 and the Univ. of Tennessee, Knoxville. J.D.H. was supported by NSF DEB Award #1119865.