Introduction

Interspecific interactions are ubiquitous across the Tree of Life. With the realization that eukaryotic organisms can harbor rich microbial communities, came also the view that these smaller partners may in fact play important roles in mediating host-symbiont associations, thus adding a further layer to this complex set of nested interactions, i.e. host-symbiont-microbe [1,2,3,4,5,6,7,8]. As the number of studies exploring the microorganisms associated with symbiotic organisms increases, likewise does the need for compliant standardized metadata that provides contextual information associated with each study and sample. Standardized metadata allows for the integration of data across organisms, resources, and within data repositories. Here, we present the symbiont-associated (SA) environmental package as a new extension to the minimum information about any (x) sequence (MIxS) standards [9], which will be included in MIxS version 6. Whilst the MIxS-SA expands upon the MIxS host-associated environmental package [9], it reflects the need for a new standard that takes into account the distinct life history traits of symbionts, their association with one or multiple host organisms, the complex nature of host-symbiont interactions along the mutualism-parasitism continuum, and the nestedness of symbiont-associated microbiota. We also propose adding the term ‘relationship to other packages’ to all environmental packages across the domains of life, to allow for integrated analysis of symbiont and host microbiota by linking metadata elements across environmental packages. This will allow users to nest a package within another package, and to identify replicates. This added feature is pivotal for the study of the microbiome of symbionts that are themselves nested within a host, reflects the inherent nestedness of all ecosystems and will facilitate reports of complex sampling design from a myriad of environments.

Collecting relevant metadata (data describing data) is now widely recognized as critical to contextualize samples and increase their reusability and reproducibility [9,10,11]. The Genomic Standards Consortium (GSC, https://gensc.org) has developed and maintains a suite of minimal information metadata standards for describing sequence metadata (checklists) for genome (MIGS), metagenome (MIMS), marker gene sequences (MIMARKS), simple amplified genome (MISAG), metagenome-assembled genome (MIMAG), virus genomes (MIUViG) and environmental packages for describing habitat-specific contextual data of the sampling environment [9, 10, 12, 13], collectively referred to as the Minimum Information about any (x) Sequence (MIxS) standard (ref. [9], https://gensc.org/mixs/).

The MIxS standards are used broadly across the microbiome research communities. These standards have been integrated into large scale microbiome projects (e.g. Human Microbiome Project, https://www.hmpdacc.org/), Earth Microbiome Project (https://earthmicrobiome.org/), Microbiology of the Built Environment (MoBE, https://www.microbe.net), microbiome bioinformatics platforms (e.g., QIIME, Qiita, mothur, JGI GOLD, MG-RAST, EBI, NCBI) and are now required upon manuscript submission. A primary advantage of the MIxS standards is the collation of large aggregates of associated metadata that can be harnessed to uncover, and eventually comprehend, patterns of microbial diversity and ecology.

The MIxS-SA package was initially drafted during the 1st Parasite Microbiome project workshop that involved the contribution of members of the GSC in addition to microbial ecologists, parasitologists, pathologists and marine biologists [14]. Participants rapidly identified the need to incorporate information on the nestedness of symbiont-associated systems, and the absence within the MIxS host-associated package of descriptors of complex life histories of mutualistic and parasitic symbionts. Until now, researchers have either omitted this information or added research-specific symbiont-associated annotations, limiting significantly the potential to compare, combine and/or reuse data from different systems and studies. Whereas the MIxS-SA package was initially designed for the study of parasite-microbes interaction, the scope of the package was expanded to include non-parasitic symbionts. This addition is a necessary expansion due to the context-dependent nature of symbiotic interactions and the ability of a given symbiont to interact differently with different organisms. Notably, the resulting MIxS-SA package reduces the need to develop additional highly similar packages for different types of symbionts.

Symbiotic associations are generally classified as mutualistic (mutually beneficial association), commensal (beneficial association to one of the partners, but not harmful to the other), or parasitic (detrimental association to one of the partners) [15]. In the context of the symbiont-associated package, the term symbiont applies to macro and microorganisms that can establish a physical interaction with at least one other organism at some stage of their life cycle regardless of the nature and dependence of the interaction. As such, this definition also covers symbiotic organisms that establish facultative and accidental associations (e.g., dead-end hosts), not requiring evolutionary processes to explain their association, but excludes free-living organisms that establish a symbiotic relationship with another free-living organism (e.g., flowers and bees). The MIxS-SA package presented herein has gone through an open and iterative review process engaging the GSC community and experts studying symbiotic organisms across various symbiont and host taxa.

Here, we present the selected list of metadata descriptors for symbiont-associated microbiota studies, including a subset of mandatory (M) terms that underpin metadata compliance (Table 1; Supplementary Information SI-1 contains all MIxS-SA items). In order to allow comparative studies of the microbiota of, sometimes closely related, free-living and symbiotic organisms, the MIxS-SA includes terms already found in the MIxS host-associated package. Thus, in MixS-SA, the term “host” (when used alone) refers to the host of the biological sample which is the symbiotic organism. New terms were created to characterise the “host of the symbiotic host”. We provide symbiont-associated package specific “Expected values” and “Examples”. Changes to the package (addition of terms, modification etc.) can be proposed by the community by creating a ticket on the MIxS GitHub page (https://github.com/GenomicsStandardsConsortium/mixs).

Table 1 MIxS symbiont-associated environmental package representative terms, along with requirement status, description and MIXS IDs.

Given the diversity of symbiotic interactions and that the nature and dependence of such interactions can be context-dependent rather than a fixed trait, it was crucial to define terms and provide value syntax that were inclusive for diverse types of symbioses and also across the symbiont life histories and transmission processes. For example, the term “host dependence” (a mandatory item) and “type of symbiosis” (a conditional item) are discrete but complementary items. While “host dependence” aims to provide a general characterization of the known type of host dependence for the symbiotic organism (e.g., facultative), “type of symbiosis” was specifically designed to further characterize the type of biological interaction established between the symbiotic organism and its respective host at the moment the biological sample was taken (e.g., mutualistic). As a result, the MIxS-SA package features mandatory and conditionally mandatory, and optional features that enable flexibility according to the knowledge of the study system at the time of sampling. Two examples of MIxS-SA-compliant metadata are provided in Supplementary information (SI-2), and the respective study designs are presented in Fig. 1. The examples refer to 16 S rRNA gene studies of (a) the bacterial communities of the parasite Coitocaecum parvum, a trematode, across four of its life stages: the sporocyst, the metacercaria and the adult, as well as the free-living cercaria [16], and (b) of the leaves and roots of the parasitic plant Orobanche hederae and its ivy host, Hedera spp. [17].

Fig. 1: Examples of study design for the sampling of microbes of symbiotic organisms, their hosts and environment.
figure 1

a Trematode Coitocaecum parvum different life stages (S1, S2, S3, S4) are reported with the MIxS-SA package. The microbiome of infected snail (H1), amphipod (H2) and fish (H3) hosts are reported with the MIxS host-associated package. In addition, the microbiome of environmental sediment (E1) and water (E2) from which these organisms were collected can be reported with MIxS-sediment and MIxS-water, respectively. The following relationships are reported: S1 “within” H1, S2 “within” E2, S3 “within” H2, S4 “within” H3, H1 “next to” E1, H1 “within” E2, H2 “within” E2, H3 “within” E2, E1 “next to” E2. b Angiosperm Orobanche hederae (S5, S6) parasitizing a host plant (P1, P2, P3) is reported using the included MIxS-SA and MIxS-PA (plant-associated) packages respectively. In addition, the MIxS-soil package is used to report corresponding soil samples. The following relationships are reported: S6 ‘within’ P1, S5 ‘next to’ S6, P1 ‘next to’ P2, P1 ‘next to’ P3, P2 ‘next to’ P3, P1 ‘within’ E3, P3 ‘within’ E4, E3 ‘next to’ E4.

While identical terms are often used in several of the 17 environmental packages currently available (https://gensc.org/mixs/), here we introduce three additional new terms: one is shared by several relevant MIxS environmental packages, and the two others will feature within the core MIxS package. The new term “observed host symbionts” provides a more comprehensive descriptor for the subject organism associations with smaller symbionts and it has been added to the host-associated, human-associated, plant-associated, human-vaginal, human-skin, human-oral and human-gut packages. The term “biotic relationship” has been added to the core package as a conditional descriptor of the relationship between the subject organism and other larger host organism(s). Finally, it appears necessary to include in the MIxS core a new term that takes into account the nested feature of most associations found in nature, such as host-symbiont-microorganism, in which multiple packages are necessary to describe the samples of the study (e.g., water, sediment, host-associated, and symbiont-associated). The proposed term “relationship to other samples” indicates the direct relationship between two samples from the same Bioproject, that are described in different environmental package(s). This proposed feature, still under development, will allow for integrated analyses of the microbiota of symbiotic organisms and their direct environment, even in the context of co-infections (e.g., symbiont-associated SA1234 is “within” host-associated sample HA8974, “next to” symbiont-associated sample SA7890). This feature will also benefit other studies by providing ecologically-relevant contextual information (e.g., host-associated HA2567 is “within” environmental water sample W1234, “next to” host-associated sample HA5679, ‘next’ to environmental soil sample S5897). In conclusion, it is our hope that the MIxS-SA, together with the new terms, will enable researchers to better conduct integrated analyses of multi-level biological systems with the ultimate goal of better understanding the role of microbes associated with symbionts.