Open Science principles for accelerating trait-based science across the Tree of Life

Synthesizing trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Species traits are widely used in ecological and evolutionary science, and new data and methods have proliferated rapidly. Yet accessing and integrating disparate data sources remains a considerable challenge, slowing progress toward a global synthesis to integrate trait data across organisms. Trait science needs a vision for achieving global integration across all organisms. Here, we outline how the adoption of key Open Science principles—open data, open source and open methods—is transforming trait science, increasing transparency, democratizing access and accelerating global synthesis. To enhance widespread adoption of these principles, we introduce the Open Traits Network (OTN), a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across organisms. We demonstrate how adherence to Open Science principles is key to the OTN community and outline five activities that can accelerate the synthesis of trait data across the Tree of Life, thereby facilitating rapid advances to address scientific inquiries and environmental issues. Lessons learned along the path to a global synthesis of trait data will provide a framework for addressing similarly complex data science and informatics challenges. A decentralized community is introduced that aims to standardize and integrate species trait data across organismal groups, based on principles of Open Science.


Current barriers to global trait-based science
Despite the recognized importance of traits, several common research practices limit our capacity for meaningful synthesis across the Tree of Life. These practices include failure to publish usable datasets alongside new findings 40 , missing or inadequate metadata 41 , minimal descriptions of methods used to collate, clean and analyse trait datasets in published works 42 , and inadequate coordination between researchers and institutions with common goals, such as filling strategic spatial or taxonomic gaps in trait knowledge 43,44 . Our limited ability to access and redistribute trait data contributes to the widespread reproducibility crisis within science 45 . Any study relying on data that cannot easily be re-used introduces barriers to verifying the claims made by those studies and thereby questions the reproducibility of the science 46 , which is becoming of prime importance to many scientific journals. Such limitations have been common within trait-based science.
Access to data is not the only impediment to a global synthesis of trait knowledge. Barriers to synthesis exist because researchers and institutions are apprehensive that the time and resources they spend to create new observations or share legacy data (for example, observations from field guides, specimens, or publications without data supplements) will not be recognized. Identifying who should receive credit for contributing trait observations (whether via co-authorship or other formal recognition) is a complex issue, particularly where data involve a chain of expertise (for example, when trait data are extracted from taxonomic treatments involving specimen collectors, digitizers, taxonomists and curators). Funding bodies are often reluctant to support data management, limiting recognition of the sizeable effort expended on creating bespoke solutions to curating and harmonizing trait data from different sources 46 .
Opportunities exist for expanding the spatial and taxonomic coverage of trait observations, particularly by strengthening interdisciplinary connections across single organismic groups. Despite certain plant traits (for example, growth form, height and leaf size) being carefully catalogued in taxonomic species descriptions 47 , these data have only recently been exchanged with large-scale databases such as TRY 21 or BIEN (http://bien.nceas.ucsb.edu/bien/). Although several informatics challenges in biodiversity science have now been overcome (for example, synthesizing global species occurrence information (https://www.gbif.org/) and sharing genetic data on individuals (https://www.ncbi.nlm.nih.gov/genbank/)), trait science lacks a vision for achieving global integration across all organisms. We argue that this is not simply a failure of the traits community to learn from existing successful networks. Instead, cataloguing traits is a more complex task that is highly contextdependent and therefore needs a more refined network model than that offered by a centralized repository.

Box 1 | Research programs dependent on comprehensive trait data across the Tree of Life
Access to open data on the traits of all organisms will allow the pursuit of long-standing questions in ecological science, including: Defining major axes of strategy variation across the Tree of Life. Measurements of organismal traits can be used to identify the position of species along trade-off spectrums that shape fitness. For instance, traits (for example, leaf nitrogen content, leaf mass per area, seed mass and maximum height) have been used to capture trade-offs between plant species at global scales 91 . With access to open data on the traits of a wide breadth of organisms, we will be able to identify major axes of functional specialization across the Tree of Life. Traits such as adult body mass, offspring mass at independence, mass-specific metabolic rate, and body temperature capture core differences in the ecological strategies within and across groups [92][93][94] . Though there is tremendous potential in the comparison of traits across the Tree of Life via the OTN, consideration of scale and biology are needed to identify where it is appropriate to make such comparisons 95 .
Conservation of functional diversity in protected areas. Reserve-selection procedures 96 seek to identify representative networks of potential protected areas at the lowest financial cost. Species are often the biological unit being targeted regardless of the functional diversity they collectively represent. However, there is an increased focus on ecosystem services and protecting ecosystem function by better representing phylogenetic or functional diversity 97 . With trait data from the Tree of Life, systematic conservation planning 98 could: (1) optimize reserve designs to maximize the conservation of ecosystem function predicted from species traits; or (2) assess the adequacy of current protected areas in representing ecosystem function.
Reserve design typically targets species or populations, not ecosystem functions, and is based on three fundamental principles: comprehensiveness, adequacy and representativeness 98 . Using theoretical predictions about the relationship between traits and ecosystem functions we could predict which traits, or range of trait values 99 , should be preserved when designing protected area networks. In this context, comprehensiveness would allow for the inclusion of the full range of ecosystem functions (as captured by traits of species present) recognized at an appropriate scale within and across relevant ecosystem units (for example, bioregions and biomes); adequacy would seek to maintain the viability and integrity of ecosystem function and model how functional redundancy (that is, different species performing the same functional role 100 ) may scale across landscapes to identify the minimum viable reserve size for maintaining ecosystem functions; and representativeness would seek to capture the diversity of functions and the gradients across which they occur, including the level of intraspecific variation and plasticity inherent to the trait being examined.
Strengthening predictions of the effects of global change on biodiversity. Traits already provide valuable information for models predicting global change impacts on the biosphere. Access to large-scale data on plant traits allows the distribution of traits to be captured in Earth system models rather than only modelling competition among broad plant functional types [101][102][103] . Species-based predictions from correlative niche models may also benefit from the integration of data on species biology to capture mechanistic links between function and the environment [104][105][106][107] .
The next generation of Earth system models may also benefit from integration of open trait data for (at least) two types of organisms: (1) terrestrial vertebrates; and (2) microbes. Terrestrial vertebrates provide key ecological functions (for example, dispersal and disturbance 108 ) and their loss may result in significant changes to ecosystem function 109 . However, capturing the influence of terrestrial megafauna on forest structure, function and biogeochemical cycles will be improved by access to data on traits (for example, size and diet) to parameterize processbased models. Access to knowledge about the traits of soil-borne microbes, whose activities increase atmospheric warming via the decomposition of soil carbon, may help predict planetary responses to climate change [110][111][112] . Molecular-level traits in microbes (for example, enzyme activity) allow the estimation of decomposition rates but are difficult to measure, amplifying the need to share these data openly.
We propose that widespread adoption of key Open Science principles (Box 2) could be transformative for trait science in achieving a global synthesis. These principles would lay a strong foundation for transparency, reproducibility and recognition, and encourage a culture of data sharing and collaboration beyond established networks. Openness reinforces the scientific process by allowing increased scrutiny of methods and results, resulting in the deeper exploration of findings and their significance 42,[48][49][50][51] . The scope of trait science would increase if researchers and institutions: (1) made datasets available in machine-accessible formats under clear licensing arrangements; (2) created and adopted standardized protocols, handbooks or metadata formats for data collection, documentation and management (see refs. 48,49 ); and (3) created human-centred networks to reduce the complexity of integrating existing data from disparate sources (for example, specimens, published literature, citizen-science initiatives 50,51 and large-scale digitization efforts). These different sources exhibit systematic differences in error rates, validation, context, reproducibility and objectivity relative to field-collected trait observations. Without a model of recognition that embraces transparency and fairness, much trait data will remain hidden from science.

Introducing the Open Traits Network
The Open Traits Network (OTN) is a collaborative initiative for accelerating trait data synthesis. Specifically, it is a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across all organisms. We promote five main objectives built upon Open Science ideals that could transform trait science: (1) Openly sharing data, methods, protocols, codes and workflows. (2) Citing original data collectors and providing scholarly credit. (3) Providing appropriate metadata together with trait observations. (4) Collecting trait data following reproducible, standardized methods and protocols (when available), or committing to their development. (5) Providing training resources in trait collection and database construction using Open Science principles.
We envision a future for trait research where protocols for data exchange and re-use are transparent, research findings are reproducible, and all trait data (either newly collected or from legacy sources) are openly available to the research community and broader public. While several network models exist in trait research (Fig. 2), the OTN adopts a decentralized but connected structure with an emphasis on bringing people together through data and expertise.
Often, groups building smaller-scale databases do so in isolation, using their own tools and workflows tailored to their research question; they are decentralized and disconnected (Fig. 2a). Decentralization has certain advantages, including retaining the power to determine which traits are most useful in a study system and how they should be compiled. There is little formal support or interaction across this style of network, so researchers often collect redundant data and develop similar tools for data collection, cleaning and integration, which can lead to duplication of effort. There are many small, isolated and heterogeneous data sources of this sort, increasing the disconnect between pools of trait data 52 .
For some organisms, centralized hubs exist to aggregate and standardize trait data across disparate sources (see refs. 21,32,53-57 ) (Fig. 2b). These trait repositories have become the main access point for trait data on well-studied taxa such as plants and corals, but they remain mostly isolated, limiting the sharing of expertise and information across taxa. As these repositories continue to grow, difficulties with data integration and synthesis will also increase due to the momentum of entrenched workflows and exchange protocols that may not be interoperable.
Some successful large-scale initiatives have followed the centralized and connected network model (for example, the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/) and GenBank (https://www.ncbi.nlm.nih.gov/genbank/)). These platforms mandate strict data exchange protocols to facilitate synthesis using standardized vocabularies (for example, the Darwin Core 58 and Humboldt Core 59 ). These protocols have been central to the explosive growth of biodiversity data as they facilitate the exchange of information using common data formats [58][59][60] . Ontologies that provide unified terms and concepts necessary to represent traits have been developed (for example, Uberon, the multispecies anatomy ontology for animals 61 , and TOP, the Thesaurus of Plant characteristics 62 ).

Birds
Plants Mammals The plant phylogeny is sparsely populated for traits but contains more taxa (n = 10,596) than the mammal and bird phylogenies (n = 5,747 and 9,993, respectively). Trait data were downloaded from refs. 25,34,87 . We counted the number of traits present across these datasets for each species and mapped those onto phylogenies using posteriors 37,88 and a random subset of plant species within a single phylogeny 89 . Terminal branches (representing species) and ancestral lineages (using ancestral state reconstruction 90 ) were coloured according to the number of reconstructed traits. Note that this is an exploratory analysis conducted purely to show variation in the availability of trait data across taxonomic groups.
These provide integration with other data types (for example, genetic and environmental) and their corresponding ontologies (for example, Gene Ontology 63 and Environmental Ontology 64 ). Despite these successes, we argue that a centralized and connected network structure will not facilitate trait data synthesis. Trait observations are highly nuanced and hierarchical. Describing multiple aspects of a phenotype for any organism with traits is not amenable to a simplified set of exchange fields that apply across the Tree of Life. While the centralized and connected model (Fig. 2b) does have benefits, it lacks the necessary flexibility to connect trait data where ontologies and exchange formats do not exist. The likely result is that established trait networks will remain isolated and disconnected.
The decentralized but connected model (orange connections in Fig. 2c) adopted by the OTN maintains the key advantages of a decentralized network (for example, taxon-or discipline-specific decision making) while enhancing the level of connectivity among groups, allowing for easier sharing of expertise, tools and data. These network characteristics also buffer against node loss (for example, due to lack of funding). Decentralized and connected networks are characterized by socially mediated improvements in learning 65 as they capitalize on the aggregated judgement of many experts rather than singular opinions 66 . The OTN model capitalizes on existing connections within disciplines and links domains across the Tree of Life to disseminate knowledge about traits. By recognizing the importance of specialist taxon groups (light-green nodes in Fig. 2c) and accommodating their needs into the development of cross-domain tools for synthesis (dark-green nodes in Fig. 2c), the OTN model will be particularly beneficial for low-profile taxa that may not be accommodated by a centralized effort to synthesize data. The OTN's open, decentralized network structure will allow researchers to retain agency and independence while also creating a collaborative effort to minimize the duplication of effort.

How (and why) to participate in the OTN
The OTN seeks to broaden its membership by lowering barriers to inclusion and advocating for approaches to trait science that benefit data custodians. New members can join the OTN via our website (www.opentraits.org) through two mechanisms: (1) adding a member profile (for example, name, location, expertise and collaboration statement); and/or (2) registering their open-source (or embargoed) trait datasets in the OTN Trait Dataset Registry (see Activity 1). The registry contains metadata for trait datasets and links users to the open dataset. New entries to the registry will be reviewed by OTN members before being added. This step will facilitate interaction between new and established OTN members and encourage deeper collaboration. Once registered, members will receive regular updates about the OTN, including newly registered trait datasets, notifications about upcoming chances for face-to-face meetings, and funding opportunities. Members will also benefit from the OTN through the sharing of resources, funding calls and workshops where appropriate.
OTN membership spans scientists (and institutions) with highlevel expertise in trait data science and synthesis activities through to those with strong motivations to work with traits but little expertise. The OTN has already conducted an international workshop facilitated by an open call for participants, with more workshops planned. Following this initial communication process, we are

Box 2 | using Open Science principles in trait research
Open Science principles outline a movement towards making all aspects of the scientific process transparent and accessible to a wide audience 51 . Open Science principles are rapidly being adopted across the sciences.
The Box 2 figure shows the six core principles of Open Science and their potential benefits to trait science. Three Open Science principles are particularly relevant to the Open Traits Network and trait-based research more broadly: Open Data, Open Source and Open Methods.
In this context, knowledge is considered open if anyone can freely access, use, modify and share it, subject at most to measures that preserve provenance and openness (http:// opendefinition.org/). Several pronouncements about Open Science principles have already been made, including the Berlin Declaration (https://openaccess.mpg.de/Berlin-Declaration), the Bouchout Declaration (http://www.bouchoutdeclaration.org/ declaration/), and the Denton Declaration (https://openaccess. unt.edu/denton-declaration) on open access to science data. Other initiatives champion open practices such as the Bari Manifesto on interoperability 113 and the FORCE 11 network, which developed the 'Joint Declaration of Data Citation Principles' (https://www. force11.org/datacitationprinciples) and the 'FAIR' principles (https://www.force11.org/group/fairgroup/fairprinciples).
The FAIR principles address several of the challenges facing trait-based research, namely making data findable, accessible, interoperable and reusable.

Science principles
Open data Transparency through reproducible analysis, better outreach through exchange of data with partners like Encyclopedia of Life, and accelerated discovery through data reuse

Open source
Reproducible analyses, accelerated synthesis through data and tool sharing, and improvement via shared data cleaning and checking

Open access
Faster knowledge transfer as published works become more easily shareable Open peer review Greater scientific rigour through increased scrutiny of data and methods

Open methods
Standards development for collection protocols and metadata, and easier interpretation and decision-making scrutiny

Open resources
Better training in Open Science methods and increasing access to resources for data collection and database construction currently sharing ideas and act upon them within subgroups. Being a decentralized network, the OTN does not need to rely on funding and dedicated personnel to complete tasks, though larger goals will benefit from financial support. Instead, we will communicate the joint aims and gaps between network nodes (Fig. 2) and arrange workshops and activities where necessary.
We recognize that altruism is unlikely to offer enough motivation to ensure widespread participation in the OTN. The sharing of trait datasets is not merely a technical problem to be solved; it relies on custodians having the skills, incentives and motivation to contribute. The key incentives for individuals to join the OTN include increasing the findability of their data and expertise and having access to a ready-made network of trait scientists and institutions engaging in relevant initiatives. Data are a powerful asset for researchers, and release under open-license schemes accompanied by well-defined metadata offers great potential for new collaborations and increased visibility. A persistent concern is that scientists will lose control of their hard-earned data under open licensing, though this underestimates the potential for new collaborations and may unnecessarily increase distrust within the scientific community 67 . Access to scientific networks can provide valuable exposure and connection 49 , particularly for early-career researchers and those in developing nations, although it is important to understand the risks involved. By emphasizing the importance of community engagement and support, the OTN seeks to make trait-data sharing and synthesis an opportunity for all involved rather than simply a technical challenge to be solved.

Activity 1: Maintaining a global registry of trait-based initiatives.
Several data gaps impede synthetic analyses across taxa, geographical locations and ontogeny. The heterogeneous ways in which trait data have been collected to date have resulted in a patchy and unrepresentative data landscape across trait types, taxa, regions and times of the year 68,69 . The OTN bridges these gaps by maintaining a Trait Dataset Registry that can be accessed at http://opentraits.org/ datasets.html.
The OTN Registry contains information on existing open (or embargoed) datasets so that gaps can be identified and ultimately filled through collective effort. Core information for the registry includes Digital Object Identifier (DOI), taxonomic coverage, curator and format. The OTN Registry also provides the opportunity for contributors to identify if and where code to process and manipulate raw data is located (see Activity 2). As it develops, the OTN Registry will relate trait concepts to ontologies provided through the Open Biomedical Ontologies Foundry (http://www.obofoundry.org). The OTN Registry maps to several Open Science principles (for example, Open Source, Open Data and Open Access; Box 2) and is designed to support data retrieval and integration.
The OTN does not place restrictions on what members may consider traits of importance to a taxonomic group. Most traits can be measured from individuals and fit into existing definitions, though this may not be appropriate for organisms where individual or taxonomic boundaries are unclear (for example, microbes 70 and fungi 71 ). It can be argued that traits encompass emergent properties of populations (for example, abundance and geographic range size) or represent interactions among species (for example, diet type). Within the OTN, we believe that more important than imposing strict definitions around traits is engaging the community in discussion about the utility of available data for answering novel ecological and evolutionary questions.  data with seamless piping of data from one software tool to the next. OTN contributors have already developed several open-source tools such as the traitdataform package, which assists R users to format their data and harmonize units (http://ecologicaltraitdata.github. io/traitdataform). The code for the Coral Traits database 32 (https:// github.com/jmadin/traits) could be modified to guide the creation of databases on other organisms. The FENNEC project provides a tool for accessing and viewing community trait data as a self-hosted website service 72 (https://github.com/molbiodiv/fennec). The OTN can act as a connector between developers and the broader community seeking to synthesize trait data, facilitating the training of scientists in all aspects of reproducible data management.
Activity 3: Advocating for a free flow of data and appropriate credit. One goal of the OTN is to increase the use of open datasets and to ensure due credit is given to researchers who collect or synthesize primary data. Without effective reward or motivation for collecting new trait observations or sharing legacy data, a trait synthesis across the Tree of Life will remain unattainable. Currently, motivation for collecting and sharing new primary data is not strong and direct funding for trait data management is scarce. The OTN can strengthen the attribution of credit to data providers and promote new data collection via two paths. Firstly, the OTN will encourage citation back to primary source via a permissive license model that secures authorship attribution (for example, Creative Commons Attribution 4.0 Int; CC BY 4.0) and the use of DOIs and Open Researcher and Contributor ID (ORCID) identifiers. Open-access datasets with a DOI can be tracked to understand patterns of re-use and to assess the impact of the author's decision to share.
There is an important distinction between sharing data within a network and making data publicly available under an open license. Clear license arrangements increase visibility and promote fair attribution and citation (for example, using Creative Commons licenses such as CC-BY or CC0). CC-BY requires attribution (that is, citation) to the original creator whereas CC0 does not legally require users of the data to cite the source, though this does not affect ethical norms for attribution in research communities (https://creativecommons.org/share-your-work/public-domain/cc0/). Identifying who should be credited for prior work on legacy data is complicated by the involvement of many individuals. This issue could be solved, in part, by inviting organizations to be named as contributors or co-authors on outputs using their data or (looking forward) implementing new ways of documenting who should be credited for making specimens or datasets usable in trait science.
Incentives to collect new trait data can be linked to the Open Science practice of pre-registration. In pre-registration, authors archive a public proposal for research activities (for example, via the Centre for Open Science; https://cos.io/prereg/) which, if approved, may receive in-principle acceptance from participating journals. As of March 2019, 168 journals are willing to give in-principle acceptance following pre-review of the study design prior to conducting field or experimental work. Ten of these participating journals regularly feature papers on trait-based science (for example, BMC Ecology and Ecology and Evolution). We envision a situation where the OTN Trait Registry (Activity 1) could be used to identify spatial or taxonomic gaps in trait data that could be coupled to pre-registered hypotheses. Together, pre-registration and in-principle acceptance of findings could incentivize the collection of new data, circumventing the growing reliance on available data with known gaps. Activity 4: Creating a trait core to facilitate synthesis and standardization. Trait science requires its own 'core' terminology or data standard that is flexible enough to capture the complexity of trait data. Building on efforts to standardize occurrence data (that is, Darwin Core 58 ) and biological inventories (that is, Humboldt Core 43,59 ), the OTN envisions a trait core offering a set of crossdomain metadata standards and controlled vocabularies that are (ideally) connected to trait ontologies via unambiguous identifiers. This standard terminology would be implemented across trait-data publications, unifying data in decentralized repositories as well as centralized data portals.
A trait core would allow trait data to be: (1) interpreted accurately within the context of their collection (that is, including information on associated data on factors such as environmental conditions at collection sites, taxa covered, data custodians or collection methods); and (2) known by compatible terms so that observations of similar phenomena can be grouped and compared (that is, what is meant by 'generation time' or 'establishment' across taxonomic groups 73,74 ). Existing initiatives may provide logical cornerstones for referencing terms and concepts, including Ecological Metadata Language 41 . Several initiatives implement the Ecological Metadata Language (for example, The Knowledge Network for Biocomplexity 75 , Darwin Core 58 and Humboldt Core 59 ) and the use of referencing terms from anatomy or phenotype ontologies (for example, the Plant Ontology 66 and the Vertebrate Trait Ontology 67 ) to relate traits to publicly defined terms, allowing annotated data to be processed computationally (http://www.obofoundry.org).
Progress towards a trait core is already being made through the development of a prototypal Ecological Trait Standard 76 (Box 3). However, the development and adoption of a trait core requires consultation and coordination within the broader scientific community, a goal which the OTN is ideally placed to advance. The OTN can mobilize expertise for cross-domain workshops and advocate for funding, which allows not only meetings of experts but also the creation of cyber-infrastructure for synthesis nodes (dark-green nodes in Fig. 2c). Links to emerging initiatives for biodiversity data standardization (for example, Species Index of Knowledge 57 ) will also be vital for success, as will ratification of the core through the Biodiversity Information Standards (TDWG, www.tdwg.org).
Activity 5: Facilitating consistent approaches to measuring traits within major groups. The OTN will share new developments towards protocols and handbooks for major clades that standardize approaches to capture trait observations. Protocols are necessary because downstream activities such as developing metadata standards (Activity 4) will be impossible if trait measurement protocols do not exist. Some research communities have adopted standardized terms 56,62 and data collection protocols (for example, plants 20,[77][78][79][80] , invertebrates 29,81-83 , mammals 36 and aquatic life 30,32,84 ), though these may not always fit the requirements of some studies (for example, where trait variability rather than the average trait of species is targeted 85 ). Protocols and handbooks may not emerge rapidly and should have the flexibility to be open to innovation through a commitment to version control and updates as techniques evolve. Two versions of the plant trait measurement handbook have been published 77,86 and several online resources exist that can be updated regularly (see http://prometheuswiki.org/tiki-custom_home.php).
Standardizing approaches to trait measurement across research communities will reduce ambiguity when aggregating data and improve the quality of resulting datasets. Integrating trait standardization and databasing into taxonomic workflows constitutes a challenge and an opportunity 7 that holds the promise of bridging the long disconnect between structural and functional traits. The presence of a range of biodiversity collections personnel in the OTN and an open invitation for more to join is expected to catalyse the adoption of trait-based thinking into taxonomic practices.

Concluding remarks
This is the opportune time to push towards a new approach to sharing and synthesizing trait data across all organisms. Trait science has great potential to increase its taxonomic, phylogenetic and spatial scopes by leveraging data-science tools, embracing Open Science principles, and creating stronger connections between researchers, institutions, publishers and funding bodies. We hope that trait enthusiasts, regardless of field and research stage, will engage with the OTN via our website (www.opentraits.org) and help build new connections between disciplines, institutions and taxonomic domains. By adding metadata profiles for datasets to the OTN Trait Dataset Registry, trait collection efforts become more findable, as do the researchers who have compiled them. We envision that by connecting people with common goals, we can work collectively towards a synthesis of global trait data to preserve the nuances of taxon-specific expertise while also facilitating collaboration across domains. We urge scientists and institutions keen to commit to Open Science principles to make use of existing resources, including those offered by the Centre for Open Science (https://cos. io/), the Open Science Training Handbook (https://open-sciencetraining-handbook.gitbook.io/book/), the Open Science Training Initiative (http://www.opensciencetraining.com/index.php) and FOSTER (https://www.fosteropenscience.eu/toolkit).
To support and expand the activities of the OTN, we will grow membership and develop communities around synthesis nodes to undertake key activities and secure funding support, in particular for the development of a trait core. Funding for international workshops, technical support and implementation meetings could drive a new era of trait-based synthesis that mirrors the achievement of similar initiatives such as GBIF, which now houses >1 billion occurrence records.
By supporting a reciprocal exchange of expertise and outputs using Open Science principles between researchers and institutions, we can mobilize data for a cross-taxa, worldwide, trait-based data resource to examine, understand and predict nature's responses to global change. As a better-connected OTN emerges, data streams and coordination will improve, allowing us to deliver information to support globally important research agendas (Box 1) as well as specific data and knowledge to the public through integration with third-party portals. Lessons learned along the path to a global synthesis of trait data across all organisms will provide a framework for addressing similarly complex, context-dependent challenges in biodiversity informatics and beyond.

Box 3 | A pipeline for harmonizing trait data from disparate sources
Preparation and re-use of open trait data requires the following action steps 76 : 1. Data collection and data handling within the project context. Applying project-specific methodology and storing data in tables suitable for data analysis, applying project-specific terms for taxa, traits and column labels. Adopting standard methodologies and terminologies from the start will simplify steps 2 and 3 and facilitate data publication. OTN Activity 5 aims to build consensus for common trait definitions and measurement methodologies for major organismal groups.
2. Harmonization of taxa and traits into standard terms using ontologies. Prior to publication, all taxa should be harmonized into accepted names linked to ontologies using uniform resource identifiers (URIs). Those can be provided in metadata or secondary data tables. Ontologies for traits are scarce, but if available should also be referred to via URIs to deliver unambiguous trait definitions. OTN Activity 2 will foster the development of ontologies linking trait data to publicly available resources.
3. Standardization of table descriptors and metadata using a standard vocabulary. Data should be published in tables using standard terms for column names, such as those provided by the Ecological Trait-data Standard vocabulary (ETS; DOI: 10.5281/ zenodo.2605377). This ETS implements a minimal terminology that can be adapted to include traits from a variety of organisms and uses of uniform resource identifiers for taxa, traits, methods and units, thereby following the standards for a semantic web of scientific data. Metadata should point to the applied standard to facilitate interpretation by humans and machines. OTN Activity 4 will engage in the cross-domain, community-based development for standard vocabularies for trait data.
4. Publication of data and upload to a public repository. Open Access file hosting services offer permanent hosting and findability of data by assigning a DOI to data publications and stating authorship and conditions for re-use under Creative Commons licenses. OTN Activity 3 will support recognition for data publications and thereby mitigate the investment into data standardization for smaller research projects.
5. Synthesis of trait data and re-use in downstream products. Open-access data publications with high-quality metadata are a valuable complement for meta-analysis or the functional analysis of abundance data. By keeping original author terms and values, the quality of the derived datasets can be assured and controlled for a better integration of multiple datasets. Availability of such highquality data will facilitate reproducibility and enable computeraided analysis of large databases. OTN Activity 1 supports the findability of trait data by creating a public registry, and OTN Activity 2 will develop tools to aid the compilation of databases.