Introduction

The rapid progress in data science methods for machine-based analysis of big data provides enormous potential for new data driven sciences and the development and optimization of innovative technologies. In the wide field of plasma science, the application of machine learning methods, e.g. for investigation and control of fusion plasmas, the particle and event identification in high energy physics, and the discovery of space phenomena in astrophysics has been common practice for several years, see1,2,3 and references therein. Recently, first approaches have been published that use machine learning methods for simulation, diagnostics and control of technological plasmas4,5. This is of particular interest because technological plasmas are used in many applications and industrial processes. Examples are the deposition of thin films, plasma etching, and plasma decontamination6,7,8. During the last ten years, plasma medicine has been established as an additional important research topic in the field of cold plasmas and first certified medical devices are already in practical use9,10. Applications of cold plasmas in medicine include the plasma-based synthesis of biomedical surfaces, wound healing, and cancer treatment11.

The potential of data driven science in plasma physics—like in all other fields—can only be fully explored if research data is findable, accessible, interoperable and reusable (FAIR) for both humans and computers. This requirement has recently been pinpointed by the FAIR data principles12 (cf. Table 1). The minimum requirements for “fair” research data are that the data is made public and that it is well documented by additional metadata. The quality of metadata plays a key role for the degree of “fairness”. Once the (meta)data is registered or indexed in a searchable resource with a unique and persistent identifier, the machine-readable metadata should contain information on how the data can be accessed, how it can interoperate with applications or workflows for analysis, storage and processing, and in which context it can be reused, i.e., detailed information on the scope of the data, lab conditions, process parameters, etc.13.

Table 1 The FAIR Guiding Principles according to Wilkinson et al.12.

In many scientific disciplines established scope-specific metadata standards exist that are recognised and broadly used by the community. To give an example, the data tag suite DATS has recently been introduced to enable discoverability of datasets in the field of biomedical research14. Dictionaries of disciplinary metadata standards are provided by the Digital Curation Centre (http://www.dcc.ac.uk/resources/metadata-standards/, accessed: 2020-06-14) and the Research Data Alliance (http://rd-alliance.github.io/metadata-directory/, accessed: 2020-06-14), for example. In some cases, elaborated data portals are already available that provide public access to data for reuse in data driven research. Examples are the proteomics identifications (PRIDE) database15, the novel materials discovery (NOMAD) repository16, and the repository for high energy physics data (HEPData)17. Although there are a number of databases which are relevant for research in applied plasma science and technology, e.g. the NIST Atomic Spectra Database (https://www.nist.gov/pml/atomic-spectra-database, accessed: 2020-06-14), the LXcat database18, and Quantemol-DB19, there is no repository that is specifically designed for the curation of the heterogeneous data from research in the field of applied plasma physics. This hinders the reuse of and access to data in this specific domain. And even more important, there are no metadata standards for a unified categorisation and detailed description of research data in plasma physics. It is important to note that certain data models exist, which mainly aim at a uniform storage of rather homogeneous data and with this strongly support the interoperability of data in specific areas. Examples of such physical data models in plasma related sciences include the data model of the HEPData platform17, the ITER Physics Data Model20, and the data acquisition and analysis system, MDSplus21. However those data models are more or less specfic to the resources deposited in according databases and do not allow for a more general categorisation and findability of research data as it is intended with the metadata schema proposed here.

The present manuscript suggests a metadata schema for research data in plasma science and technology, which is named Plasma-MDS. Plasma-MDS is supposed to be complementary to established data models and aims to be a starting point for the development of a standard for the categorisation and documentation of digital data obtained from research in plasma physics. With this, it supports recent attempts to enable “a new era of plasma science and technology research and development” (https://www.york.ac.uk/physics/ypi/conferencesevents/icddps/, accessed: 2020-06-14) by data-driven discovery in plasma science and provides a basis for participation of the community in comprehensive developments with respect to research data management, like for example the European Open Science Cloud (EOSC) (https://www.eosc-portal.eu, accessed: 2020-06-14) and the National Research Data Infrastructure (NFDI) in Germany22. Examples of using Plasma-MDS for the categorisation and description of datasets from research in plasma technology are given in the present manuscript.

Results

Metadata represent extra information attached to data that allows people and automated processes to find, access and ultimately reuse data. Among others, Dublin Core (http://dublincore.org/schemas/, accessed: 2020-06-14), the DataCite Metadata Schema23, and DCAT (https://www.w3.org/TR/vocab-dcat-2/, accessed: 2020-06-14) represent fundamental metadata schemata that are widely used for the collection and indexing of general metadata of digital objects, such as title, publication year, and permanent identifier. However, the “degree of fairness” of public research data (cf. Table 1) can dramatically be enhanced by adding additional domain-specific metadata.

The new plasma metadata schema Plasma-MDS can be used as an extension to general metadata schemata in plasma science and technology. It follows the nomenclature “schema.element.qualifier” and comprises various metadata fields related to the plasma source, the plasma medium, and the plasma target possibly involved in the study. Furthermore, metadata fields related to the applied diagnostics and the published resource are included. Here, the schema element diagnostics also aims to cover applied modelling and simulation methods. The motivation for the main schema elements “source”, “medium”, “target”, and “diagnostics” lies in the fact that scientific results in plasma physics frequently refer to a plasma source (e.g. atmospheric pressure plasma jet) which is operated in a medium (e.g. argon) and acting on a target (e.g. biological tissue). Furthermore, plasma physics utilizes a variety of diagnostic methods (e.g. laser absorption spectroscopy) and there are numerous scientific papers which concentrate on specific aspects of the plasma diagnostics rather than on a certain plasma (source, medium, and target). Therefore, plasma diagnostics is also considered as a separate schema element. In addition, the schema element “plasma.resource” specifies details of the digital data object, which is obtained by the applied diagnostic procedures.

Only the diagnostics and resource metadata fields are designed to be mandatory. This is because in one study the focus might be on the diagnostics applied to a certain target, while in another the simulation of a plasma source without inclusion of any target might be of interest. However, it is strongly suggested to complete as many metadata fields as possible in order to ensure a high level of “fairness” of the data. Furthermore, there is no controlled vocabulary so far to provide maximum flexibility in the definition of metadata. It is intended to review the plasma metadata schema after an initial phase of growing usage and to evaluate the establishment of a community standard including controlled vocabularies.

An overview of the schema elements with the respective qualifiers is presented in Fig. 1. Here, it is also shown how Plasma-MDS is used as a domain-specific extension to general metadata schemata such as Dublin Core, DataCite, or DCAT. This means that Plasma-MDS does not aim to replace existing standards, but it adds the option to describe studies conducted in the field of plasma physics in more detail. Note that the specific sub-domain and/or topic (e.g. inertial confinement fusion, low-temperature plasmas, plasma medicine) to which the record refers should be named by the general schema used in the respective case, e.g. in “dcterms.subject”. Moreover, data models, which may be used to store the digital resources should also be referenced by the general schemata, e.g. “dcterms.conformsto”.

Fig. 1
figure 1

Overview of the schema elements and qualifiers of Plasma-MDS (blue). The sketch illustrates how the domain-specific schema extends general metadata of datasets according to basic metadata schemata like Dublin Core (DC), DataCite, or DCAT.

Plasma-MDS distinguishes the three different field types “controlled list”, “term list”, and “free text”. Controlled list means that pre-defined categories are available for selection. Term lists are defined as compilations of keywords generated on-the-fly from terms already describing the respective element in a specific data repository using Plasma-MDS. They are used whenever the establishment of controlled lists maintained by the community is an option in the long-term perspective. On the other hand, free text fields aim to give more detailed information on the respective element that cannot be represented by well-defined terms.

Schema element plasma.source

The schema element “plasma.source” has five qualifiers. First, “plasma.source.name” designates the plasma source. Several plasma source names can be entered here, if the data are related to several plasma sources. Furthermore it might be helpful to give not only the trademark name (e.g. “kINPen MED”, which is a certain plasma source developed at the INP) but to name also the type of the plasma source (e.g. “HF plasma jet”, which indicates that the “kINPen MED” is a high frequency plasma jet). This should increase findability of datasets including sought-for plasma sources. Before adding a new plasma source name to a database using Plasma-MDS as metadata schema, it should be verified that this value is not already given taking care for differing notations.

Next, the qualifier “plasma.source.application” informs about the application area the plasma source and the dataset are related to. Several terms can be given here describing different aspects of the application, for example plasma medicine or surface treatment. The first term identifies the dataset to be related to the topic plasma medicine, whereas the second term describes the more technical aspect of surface treatment in contrast, e.g. to plasma (volume) chemistry. Terms like “antimicrobial reduction” indicate the purpose of the application, if it should be distinguished from others, e.g. “modification of wettability”.

The qualifier “plasma.source.specification” allows to define basic specifications of the plasma source, which are i) current/voltage waveform, ii) frequency range, iii) pressure range and iv) temperature range. These four specifications describe basic properties which can be applied to almost every plasma source and ensure a rough categorisation of the plasma:

  1. 1.

    “waveform” specifies the power delivery waveform and can take the values “pulsed”, “DC” (direct current), and “AC” (alternating current);

  2. 2.

    “frequency” specifies the pulse repetition frequency or the frequency of the waveform, and can take the values “low frequency” (<300 kHz), “high frequency” (300 kHz to 300 MHz), and “microwave” (>300MHz). No value is to be added if “waveform” is set to “DC”;

  3. 3.

    “pressure” specifies the gas pressure and can take the values “low pressure” (\(\lesssim 1{0}^{3}\) Pa), “medium pressure” (103 to 105 Pa), “atmospheric pressure” (≈105 Pa), and “high pressure” (\(\gtrsim 1{0}^{5}\) Pa);

  4. 4.

    “temperature” specifies the state of thermodynamic equilibrium and can take the values “thermal” and “non-thermal”, which are fundamental categories to describe if a plasma is in local thermal equilibrium or not.

With the qualifier “plasma.source.properties” it is possible to add further description of plasma properties as free text. Finally, “plasma.source.procedure” is a free text container to describe procedures to set the plasma source into operation. But it can also be used to give details on the whole (experimental) setup needed to produce the data resource. Table 2 gives an overview over all qualifiers of the schema element “plasma.source”.

Table 2 Plasma-MDS fields related to the plasma source.

The metadata collected by the schema element “plasma.source” are not mandatory and can be omitted if the dataset includes data which are not specific to a plasma source, e.g. data from target analysis. However, metadata for the plasma source should be included whenever applicable. For instance, the metadata of a plasma source used for pre-treatment of a specific target might also be included if the dataset contains only research data from target analysis.

Schema element plasma.medium

The schema element “plasma.medium” has three qualifiers and describes the medium the plasma is operated in or consisting of. First, “plasma.medium.name” names the medium. Examples are noble gases (e.g. Ar), molecular gases (e.g. CO2), or complex mixtures, e.g. plasma compositions consisting of sulfur hexfluoride (SF6) and polytetrafluoroethylene (PTFE). Arc plasmas operated in vacuum usually consist of evaporated electrode material like copper and chromium (Cu-Cr). For gas mixtures it is favourable to fill this field with a list of different species rather than to name each single mixture of species. Furthermore, chemical element symbols and common abbreviations are preferred, e.g. “Ar” instead of “argon” and “HMDSO” instead of “hexamethyldisiloxane”. Next, “plasma.medium.properties” is a free text qualifier that can take unstructured information to describe details of the plasma medium, e.g. gas flow rates, the carrier gas in a mixture, or the gas purity. Finally, “plasma.medium.procedure” is a free text container to describe procedures to prepare the medium before plasma operation and the treatment during plasma operation. Table 3 gives an overview over all qualifiers of the schema element “plasma.medium”.

Table 3 Plasma-MDS metadata fields related to the plasma medium.

The schema element “plasma.medium” is not mandatory and can be omitted, e.g. if the description of the plasma source already provides sufficient information on the plasma medium. This might be the case, e.g. if the plasma source is a low-pressure sodium lamp, where the lamp fill is part of the plasma source specification. However, for reasons of findability redundant information on the plasma medium in the corresponding schema element is suggested.

Schema element plasma.target

As for “plasma.medium”, there are three qualifiers for the schema element “plasma.target” which allow to specify the name, properties, and procedure of the target. The qualifier “plasma.target.name” should designate the target the plasma source is acting on—either directly or mediated by a substance. Examples for possible target names are “Si wafer”, “distilled water” and “E. coli”. It is suggested to use chemical element symbols or common abbreviations where applicable. Multiple targets can be named here. This is of particular importance if the action of the plasma is mediated by a substance. For instance, it may be of interest to treat water or pharmaceuticals in a plasma reactor and afterwards use those treated substances to let them interact with a cell line. Such cases are considered by specifying multiple plasma targets.

The qualifier “plasma.target.properties” is designed to collect details of the plasma target, e.g. geometric dimensions, grade, and orientation of a silicon wafer. Consequently, the qualifier “plasma.target.procedure” is eligible to describe any processing steps to prepare targets before plasma treatment (e.g. growth of cell lines) as well as handling throughout the plasma treatment. Table 4 gives an overview over all qualifiers of the schema element “plasma.target”.

Table 4 Plasma-MDS metadata fields related to the plasma target.

The schema element “plasma.target” is not mandatory and can be omitted, e.g. if only the characterization of a plasma source is intended.

Schema element plasma.diagnostics

The schema element “plasma.diagnostics” serves the purpose to give details on the respective plasma diagnostics and modelling/simulation procedures. That means, it is about the methods used in the study to produce the data resource, either experimentally or theoretically. This is of particular importance as in plasma physics numerous specialized diagnostic methods are relevant and filtering datasets according to the applied diagnostics can be helpful. Another advantage of this schema element is that datasets can be considered which are related to plasma physics but do not deal with a specific plasma source or plasma application. For instance, this is the case if a diagnostic or modelling/simulation method is reported which is not only applicable to plasmas but also to non-ionized gases, i.e. vapours or cold gas. Examples of plasma diagnostics names include “OES” (optical emission spectroscopy), “LSD” (laser schlieren deflectometry), “PIC-MCC” (particle-in-cell/Monte Carlo collision simulations). It is suggested to use common abbreviations where available. However, synonyms can be entered here as well together with different diagnostics applied within one dataset.

The second and third qualifiers “plasma.diagnostics.properties” and “plasma.diagnostics.procedure” contain further details on the applied diagnostics and modelling/simulation methods, respectively. References to journal publications with more details on the applied methods may be provided here. Table 5 gives an overview of all qualifiers of the schema element “plasma.diagnostics”.

Table 5 Plasma-MDS metadata fields related to diagnostics.

The schema element “plasma.diagnostics” is mandatory because knowledge of the applied experimental/modelling/simulation method is assumed to be crucial for the reusability of the data.

Schema element plasma.resource

Plasma-MDS is designed to describe datasets which can contain several resources. Resources are digital representations of research data. Hence, the above defined metadata do not serve for the only purpose to describe a single resource but possibly a set of resources. To give details on the specifics of each resource, the schema element “plasma.resource” is introduced. The qualifiers might be in parts redundant to metadata of different metadata schemata like, e.g. Dublin Core and DataCite Metadata Schema. However, they provide key information on each single resource which should be compiled into the schema element “plasma.resource” for the sake of clarity. The qualifiers are defined as follows:

  1. 1.

    “filetype” obviously contains the file extension, e.g. pdf, csv or jpg. It corresponds to “dcterms.format” and should be appropriate for long-term preservation;

  2. 2.

    “datatype” describes the type of data, which can take values like “data table”, “SEM image” (from scanning electron microscope), “cfu-plot” (colony forming units, e.g. of bacteria), to give some examples. It corresponds to “dcterms.type”;

  3. 3.

    “range” is intended to detail a parameter range the resource is valid for. Examples might be a wavelength range (e.g. 400 to 800 nm in case of emission spectra), or a magnification and an accelerating voltage in case of scanning electron microscope images;

  4. 4.

    “quality” is considered to rank the level of scientific quality control. Allowed values are given by a controlled vocabulary consisting of “verified”, “published”, and “reviewed”. Here, “verified” is the lowest quality level and means that the resource is checked for plausibility by the data creators and the data curators. “published” means that the data of the resource have already been published in a peer-reviewed paper. Finally, “reviewed” implies that the data resource has been peer-reviewed by an independent expert.

Table 6 provides an overview of all qualifiers of the schema element “plasma.resource”. This schema element is mandatory because information on the available resources is crucial for finding and selecting relevant datasets. Note that this element must be provided for each resource if several digital objects are attached to the dataset.

Table 6 Plasma-MDS metadata fields related to the resource.

Examples

To demonstrate how Plasma-MDS can be used for the annotation of research data, two examples from research in plasma technology are provided in the following. The first example comes from basic research studies of order phenomena in atmospheric pressure plasma jets and does not involve a plasma target. The second example examines the origin of species in a liquid upon plasma interaction. Free access to the Plasma-MDS metadata of both examples is provided by INPTDAT—a new interdisciplinary data platform for plasma technology (https://www.inptdat.de, accessed: 2020-06-14). INPTDAT was build at INP with the aim to provide free and easy access to research data and information from all fields of plasma technology and plasma medicine. It aims to support the findability, accessibility, interoperability and reuse of data for the low-temperature plasma community. Note that in the same way and in accordance with the approach shown in Fig. 1, Plasma-MDS may be used for a unified extension of data catalogues in other plasma related domains.

The first example demonstrates how Plasma-MDS was used for the annotation of digital data that has been used for analysis of the correlation of helicality and rotation frequency of filaments in a non-thermal atmospheric pressure plasma jet (ntAPPJ). Parts of the dataset have been pictured in Figure 4 in Schäfer et al.24 Tables 7 and 8 provide a preview of the Plasma-MDS metadata. Public access to the digital data and all metadata is provided by the dataset published with INPTDAT25.

Table 7 Preview of Plasma-MDS metadata for the dataset “Correlation of helicality and rotation frequency of filaments in the ntAPPJ”25.
Table 8 Preview of Plasma-MDS resource metadata for the dataset “Correlation of helicality and rotation frequency of filaments in the ntAPPJ”25.

In the second example, Plasma-MDS was used for the annotation of a dataset published in the York Research Database (https://pure.york.ac.uk/portal/, accessed: 2020-06-14). This shows how Plasma-MDS and the research data platform INPTDAT can be used to improve the findability and reusability of digital research data published elsewhere. The dataset published with INPTDAT at https://www.inptdat.de/node/98 includes all Plasma-MDS metadata and refers to the original dataset published in the York Research Database26. In this case, detailed information on plasma source, plasma medium, plasma target, applied diagnostics, and published resources were extracted from the journal article and its supporting information27 to which the digital data belong. Tables 9 and 10 provide a preview of the Plasma-MDS metadata.

Table 9 Preview of Plasma-MDS metadata for the dataset “Non-thermal plasma in contact with water: the origin of species”26.
Table 10 Preview of Plasma-MDS resource metadata for the dataset “Non-thermal plasma in contact with water: the origin of species”26.

Here, INPTDAT does not provide direct access to the digital object, i.e. the research data but strongly enhances the findability and reusability of the original data by means of Plasma-MDS.

Discussion

The plasma metadata schema Plasma-MDS was developed to complement basic metadata schemata with metadata fields for the collection of domain-specific information. This was demonstrated by means of two examples (Tables 710). This section discusses how Plasma-MDS supports the transfer of FAIR data principles (Table 1) into practice to enable data driven plasma science.

Findability

To be findable, machine readable metadata should allow the discovery of relevant datasets by humans and computer systems28. According to the FAIR data principles (Table 1), a globally unique and persistent identifier should be assigned to each dataset. This identifier allows to find, track and cite data and their metadata. Plasma-MDS makes use of metadata fields for the unique identifier as part of basic metadata schemata (e.g. Dublin Core metadata field “dcterms.identifier”). Furthermore, Plasma-MDS strongly supports findability by providing specific fields to collect rich domain-specific metadata. The metadata collected by Plasma-MDS aims to allow researchers to properly understand the nature of the dataset by including descriptive information about the context, conditions, and quality of the data as demonstrated by the two examples in Tables 710. Particularly, the metadata fields using controlled lists (plasma.source.specification and plasma.resource.quality) and term lists (e.g. qualifier “name” for all schema elements) also support the automated processing of metadata by computer systems. The collection of general information (e.g. creator, date, and license) is already supported by basic metadata schemata (Dublin Core, DataCite Metadata Schema, and others). To take full advantage of the benefits of Plasma-MDS, the implementation of Plasma-MDS in a (meta)data repository is needed. However, generic or institutional data repositories usually do not provide the possibility to collect domain-specific information on deposited datasets. INPTDAT is the first data platform that implements Plasma-MDS and indexes the domain-specific metadata to provide elaborate search features for interdisciplinary datasets in the field of plasma technology. The Schema.org representation of the DataCite Metadata Schema29 is used by INPTDAT for registration of digital object identifiers (DOI) and as an important means for increasing the findability of datasets by major search engines like Google, Microsoft, Yahoo and Yandex14.

Accessibility

To be accessible, data and metadata should be stored for the long term such that they can be easily accessed by humans and computer systems using standard communication protocols28. This requirement cannot be met by the metadata schema itself, but by the repository in which the (meta)data is stored. Any (meta)data repository providing Plasma-MDS as metadata schema should meet the requirements for accessibility according to the FAIR data principles (Table 1). Therefore, the data platform INPTDAT uses public APIs (application programming interfaces) to provide open access to general as well as domain-specific metadata in different formats. No authentication and authorization of users is required to access metadata. Furthermore, INPTDAT maintains all metadata physically separated from data files and provides the possibility to easily extract and move metadata to other repositories by public APIs.

Interoperability

To be interoperable, data should be ready to be exchanged, interpreted, and combined in a (semi)automated way with other datasets by humans and computer systems28. Therefore, community standards for data management and, in particular, established vocabularies/ontologies/thesauri are required. In this respect, Plasma-MDS can be seen as a first step towards increasing awareness and the development of a common standard. Appropriate collaborative structures are to be set up that allow to develop, maintain, and document controlled vocabularies that themselves again fulfil the FAIR data principles (Table 1). Where appropriate, this should build on established ontological resources in related areas. Finally, the possibility to include qualified references to other (meta)data is provided by basic metadata schemata (e.g. Dublin Core metadata field “dcterms.relation”).

Reusability

To be reusable, the provided metadata must ensure that the dataset can be used in future research and that it can be integrated with other compatible data sources. The conditions under which the data can be reused should be clear to humans as well as computer systems28. Therefore, the FAIR data principles (Table 1) demand a detailed description of the dataset including information on what the dataset contains, how it was generated and processed, and the conditions under which the data can be reused. Future use of Plasma-MDS will turn out whether the implemented qualifiers for the schema elements “plasma.source”, “plasma.medium”, “plasma.target”, “plasma.diagnostics”, and “plasma.resource” suffice to achieve this requirement and which adjustments might be necessary in revised versions of Plasma-MDS. It is worth mentioning that each metadata schema is not fixed but subject to regular updates. Furthermore, it is important to note that metadata schemata are only as good as they are being used. It is the responsibility of users to provide the required information with their data. Basic metadata schemata already provide fields to include information on the data usage license (e.g. Dublin Core metadata field “dcterms.rights”). A discussion of appropriate licenses for data publications is beyond the scope of this paper. In general, the Creative Commons (CC) license CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/, accessed: 2020-06-14) is recommendable for research data publications. More information and recommendations on how to license research data are provided, e.g. by the Digital Curation Centre30. The association of data with detailed provenance, appears to be the most challenging FAIR data principle (Table 1). At the same time, it is the most important precondition for reliable reuse of research data. Detailed information about the provenance of data allows researchers to understand how the data were generated, in which context it can be reused, and how reliable it is. With the qualifiers “properties” and “procedure” for the different schema elements, Plasma-MDS is prepared to collect the relevant metadata. However, it is difficult to ensure that third parties will be able to fully understand and reproduce the workflow of data creation, especially for the large number of experimental methods in the field of low-temperature plasmas for which no standard operation procedures (SOPs) are available yet. In this respect, Plasma-MDS may give the impetus to agree on certain SOPs and data annotation standards for widely used experimental methods in the low-temperature plasma community. Finally, data and metadata should meet domain-relevant community standards. Obviously, this requirement is only applicable if community standards or best practices for data archiving and sharing exist. Plasma-MDS can possibly contribute to the establishment of such standards.

Outlook

In conclusion, there is still a need for action, particularly in the introduction of community standards with respect to controlled terminologies. Few sub-areas of plasma science and technology could benefit from already established vocabularies and ontologies, e.g. in biomedical science (e.g. OBI, see http://obi-ontology.org, accessed: 2020-06-14) (relevant for plasma medicine) and material science (e.g. EMMO, see https://emmc.info/emmo-info/, accessed: 2020-06-14) (relevant for plasma surface technology). In this regard there is the desire that Plasma-MDS will be integrated into other data repositories and further developed by the plasma community, particularly with respect to controlled terminologies. The establishment of research data management standards in widespread plasma research areas is seen as a basic prerequisite for extensive data-driven research and will be followed up, e.g. within the framework of the initiative on data driven plasma science. The long-term goal is to establish Plasma-MDS as a widespread community standard, which supports the reuse of research data and promotes data driven plasma science up to the point where research data management becomes everyday practice in the plasma community.

Methods

Plasma-MDS has been developed given practical requirements of research in plasma technology. In accordance with typical plasma processes and applications, the schema contains metadata fields to collect annotations about the plasma source, plasma medium, and plasma target involved in the study from which research data were obtained. Furthermore, metadata for the respectively applied diagnostics and modelling/simulation methods are collected. Finally, resource metadata fields are included to describe the individual digital objects belonging to the dataset. The following approaches have been used to compile the five schema elements of Plasma-MDS:

  1. 1.

    Scientist interviews and use cases. A number of interviews has been performed with scientists, which are active in different fields of applied plasma physics and plasma medicine. This was important to broaden the expert knowledge of the authors and to identify use cases which gave impact on required information that is needed to ensure optimal findability and reuse of heterogeneous data in the field of research.

  2. 2.

    Exploration of existing metadata schemata. Different metadata schemata have been investigated to include existing standards like Dublin Core and DataCite, and to match to metadata fields of existing software systems like DSpace, CKAN and DKAN. Domain specific metadata schemata, like HEPdata17 and the ETHZ metadata schema (https://documentation.library.ethz.ch/display/RC/Metadata + Schema, accessed: 2020-06-14) gave rise to structure Plasma-MDS according to the requirements of the addressed scientific community.

  3. 3.

    Expert workshops. Several workshops have been performed to present and discuss the state of development of Plasma-MDS. Workshops acted as expert audits in the domain of applied plasma physics and plasma medicine as well as in the field of research data management. The workshops identified missing “metadata” and refined controlled lists.

  4. 4.

    Keywording of publications. Several publications in the field of applied plasma physics and plasma medicine have been enriched with metadata by test to ensure the proper choice of metadata schema elements and qualifiers. By this means Plasma-MDS has been checked for consistency and completeness.

Having the basic schema elements and their attributes fixed, JSON Schema (https://json-schema.org, accessed: 2020-06-14) was used for formal representation of Plasma-MDS. The JSON representation of Plasma-MDS is retrievable at https://purl.org/plasma-mds. Note that the JSON schema file may also serve for implementation of Plasma-MDS in third-party systems and validation of instance files against the schema using existing JSON validation tools, e.g. Ajv: Another JSON Schema Validator (https://ajv.js.org, accessed: 2020-06-14) or JSON Schema Validator (https://www.jsonschemavalidator.net, accessed: 2020-06-14), respectively.