MIxS-SA: a MIxS extension defining the minimum information standard for sequence data from symbiont-associated micro-organisms

The symbiont-associated (SA) environmental package is a new extension to the minimum information about any (x) sequence (MIxS) standards, established by the Parasite Microbiome Project (PMP) consortium, in collaboration with the Genomics Standard Consortium. The SA was built upon the host-associated MIxS standard, but reflects the nestedness of symbiont-associated microbiota within and across host-symbiont-microbe interactions. This package is designed to facilitate the collection and reporting of a broad range of metadata information that apply to symbionts such as life history traits, association with one or multiple host organisms, or the nature of host-symbiont interactions along the mutualism-parasitism continuum. To better reflect the inherent nestedness of all biological systems, we present a novel feature that allows users to co-localize samples, to nest a package within another package, and to identify replicates. Adoption of the MIxS-SA and of the new terms will facilitate reports of complex sampling design from a myriad of environments.


INTRODUCTION
Interspecific interactions are ubiquitous across the Tree of Life. With the realization that eukaryotic organisms can harbor rich microbial communities, came also the view that these smaller partners may in fact play important roles in mediating hostsymbiont associations, thus adding a further layer to this complex set of nested interactions, i.e. host-symbiont-microbe [1][2][3][4][5][6][7][8]. As the number of studies exploring the microorganisms associated with symbiotic organisms increases, likewise does the need for compliant standardized metadata that provides contextual information associated with each study and sample. Standardized metadata allows for the integration of data across organisms, resources, and within data repositories. Here, we present the symbiont-associated (SA) environmental package as a new extension to the minimum information about any (x) sequence (MIxS) standards [9], which will be included in MIxS version 6. Whilst the MIxS-SA expands upon the MIxS host-associated environmental package [9], it reflects the need for a new standard that takes into account the distinct life history traits of symbionts, their association with one or multiple host organisms, the complex nature of host-symbiont interactions along the mutualismparasitism continuum, and the nestedness of symbiont-associated microbiota. We also propose adding the term 'relationship to other packages' to all environmental packages across the domains of life, to allow for integrated analysis of symbiont and host microbiota by linking metadata elements across environmental packages. This will allow users to nest a package within another package, and to identify replicates. This added feature is pivotal for the study of the microbiome of symbionts that are themselves nested within a host, reflects the inherent nestedness of all ecosystems and will facilitate reports of complex sampling design from a myriad of environments.
The MIxS standards are used broadly across the microbiome research communities. These standards have been integrated into large scale microbiome projects (e.g. Human Microbiome Project, https://www.hmpdacc.org/), Earth Microbiome Project (https:// earthmicrobiome.org/), Microbiology of the Built Environment (MoBE, https://www.microbe.net), microbiome bioinformatics platforms (e.g., QIIME, Qiita, mothur, JGI GOLD, MG-RAST, EBI, NCBI) and are now required upon manuscript submission. A primary advantage of the MIxS standards is the collation of large aggregates of associated metadata that can be harnessed to uncover, and eventually comprehend, patterns of microbial diversity and ecology.
The MIxS-SA package was initially drafted during the 1st Parasite Microbiome project workshop that involved the contribution of members of the GSC in addition to microbial ecologists, parasitologists, pathologists and marine biologists [14]. Participants rapidly identified the need to incorporate information on the nestedness of symbiont-associated systems, and the absence within the MIxS host-associated package of descriptors of complex life histories of mutualistic and parasitic symbionts. Until now, researchers have either omitted this information or added research-specific symbiont-associated annotations, limiting significantly the potential to compare, combine and/or reuse data from different systems and studies. Whereas the MIxS-SA package was initially designed for the study of parasitemicrobes interaction, the scope of the package was expanded to include non-parasitic symbionts. This addition is a necessary expansion due to the context-dependent nature of symbiotic interactions and the ability of a given symbiont to interact differently with different organisms. Notably, the resulting MIxS-SA package reduces the need to develop additional highly similar packages for different types of symbionts.
Symbiotic associations are generally classified as mutualistic (mutually beneficial association), commensal (beneficial association to one of the partners, but not harmful to the other), or parasitic (detrimental association to one of the partners) [15]. In the context of the symbiont-associated package, the term symbiont applies to macro and microorganisms that can establish a physical interaction with at least one other organism at some stage of their life cycle regardless of the nature and dependence of the interaction. As such, this definition also covers symbiotic organisms that establish facultative and accidental associations (e.g., dead-end hosts), not requiring evolutionary processes to explain their association, but excludes free-living organisms that establish a symbiotic relationship with another free-living organism (e.g., flowers and bees). The MIxS-SA package presented herein has gone through an open and iterative review process engaging the GSC community and experts studying symbiotic organisms across various symbiont and host taxa.
Here, we present the selected list of metadata descriptors for symbiont-associated microbiota studies, including a subset of mandatory (M) terms that underpin metadata compliance (Table 1; Supplementary Information SI-1 contains all MIxS-SA items). In order to allow comparative studies of the microbiota of, sometimes closely related, free-living and symbiotic organisms, the MIxS-SA includes terms already found in the MIxS host-associated package. Thus, in MixS-SA, the term "host" (when used alone) refers to the host of the biological sample which is the symbiotic organism. New terms were created to characterise the "host of the symbiotic host". We provide symbiont-associated package specific "Expected values" and "Examples". Changes to the package (addition of terms, modification etc.) can be proposed by the community by creating a ticket on the MIxS GitHub page (https:// github.com/GenomicsStandardsConsortium/mixs).
Given the diversity of symbiotic interactions and that the nature and dependence of such interactions can be context-dependent rather than a fixed trait, it was crucial to define terms and provide value syntax that were inclusive for diverse types of symbioses and also across the symbiont life histories and transmission processes. For example, the term "host dependence" (a mandatory item) and "type of symbiosis" (a conditional item) are discrete but complementary items. While "host dependence" aims to provide a general characterization of the known type of host dependence for the symbiotic organism (e.g., facultative), "type of symbiosis" was specifically designed to further characterize the type of biological interaction established between the symbiotic organism and its respective host at the moment the biological sample was taken (e.g., mutualistic). As a result, the MIxS-SA package features mandatory and conditionally mandatory, and optional features that enable flexibility according to the knowledge of the study system at the time of sampling. Two examples of MIxS-SAcompliant metadata are provided in Supplementary information (SI-2), and the respective study designs are presented in Fig. 1. The examples refer to 16 S rRNA gene studies of (a) the bacterial communities of the parasite Coitocaecum parvum, a trematode, across four of its life stages: the sporocyst, the metacercaria and the adult, as well as the free-living cercaria [16], and (b) of the leaves and roots of the parasitic plant Orobanche hederae and its ivy host, Hedera spp. [17].
While identical terms are often used in several of the 17 environmental packages currently available (https://gensc.org/ mixs/), here we introduce three additional new terms: one is shared by several relevant MIxS environmental packages, and the two others will feature within the core MIxS package. The new term "observed host symbionts" provides a more comprehensive descriptor for the subject organism associations with smaller symbionts and it has been added to the host-associated, humanassociated, plant-associated, human-vaginal, human-skin, humanoral and human-gut packages. The term "biotic relationship" has been added to the core package as a conditional descriptor of the relationship between the subject organism and other larger host organism(s). Finally, it appears necessary to include in the MIxS core a new term that takes into account the nested feature of most associations found in nature, such as host-symbiontmicroorganism, in which multiple packages are necessary to describe the samples of the study (e.g., water, sediment, hostassociated, and symbiont-associated). The proposed term "relationship to other samples" indicates the direct relationship between two samples from the same Bioproject, that are described in different environmental package(s). This proposed feature, still under development, will allow for integrated analyses of the microbiota of symbiotic organisms and their direct environment, even in the context of co-infections (e.g.,  symbiont-associated SA1234 is "within" host-associated sample HA8974, "next to" symbiont-associated sample SA7890). This feature will also benefit other studies by providing ecologicallyrelevant contextual information (e.g., host-associated HA2567 is "within" environmental water sample W1234, "next to" hostassociated sample HA5679, 'next' to environmental soil sample S5897). In conclusion, it is our hope that the MIxS-SA, together with the new terms, will enable researchers to better conduct integrated analyses of multi-level biological systems with the ultimate goal of better understanding the role of microbes associated with symbionts. Term added to host-associated, human-associated, plant-associated, human-vaginal, human-skin, human-oral, and human-gut packages. Fig. 1 Examples of study design for the sampling of microbes of symbiotic organisms, their hosts and environment. a Trematode Coitocaecum parvum different life stages (S1, S2, S3, S4) are reported with the MIxS-SA package. The microbiome of infected snail (H1), amphipod (H2) and fish (H3) hosts are reported with the MIxS hostassociated package. In addition, the microbiome of environmental sediment (E1) and water (E2) from which these organisms were collected can be reported with MIxS-sediment and MIxS-water, respectively. The following relationships are reported: S1 "within" H1, S2 "within" E2, S3 "within" H2, S4 "within" H3, H1 "next to" E1, H1 "within" E2, H2 "within" E2, H3 "within" E2, E1 "next to" E2. b Angiosperm Orobanche hederae (S5, S6) parasitizing a host plant (P1, P2, P3) is reported using the included MIxS-SA and MIxS-PA (plant-associated) packages respectively. In addition, the MIxS-soil package is used to report corresponding soil samples. The following relationships are reported: S6 'within' P1, S5 'next to' S6, P1 'next to' P2, P1 'next to' P3, P2 'next to' P3, P1 'within' E3, P3 'within' E4, E3 'next to' E4.