A framework for FAIR robotic datasets

Motta, Corrado; Aracri, Simona; Ferretti, Roberta; Bibuli, Marco; Bruzzone, Gabriele; Caccia, Massimo; Odetti, Angelo; Ferreira, Fausto; de Pascalis, Francesca

doi:10.1038/s41597-023-02495-3

Download PDF

Article
Open access
Published: 13 September 2023

A framework for FAIR robotic datasets

Scientific Data volume 10, Article number: 620 (2023) Cite this article

1384 Accesses
1 Citations
16 Altmetric
Metrics details

Subjects

Abstract

It is essential to publish and make available environmental data gathered by emerging robotic platforms to contribute to the Global Ocean Observing System (GOOS), supported by the United Nations - Decade of Ocean Science for Sustainable Development (2021–2030). The transparency of these unique observational datasets needs to be supported by the corresponding robotic records. The data describing the observational platform behaviour and its performance are necessary to validate the environmental data and repeat consistently the in-situ robotic deployment. The Free and Open Source Software (FOSS), proposed in this manuscript, describes how, using the established approach in Earth Sciences, the data characterising marine robotic missions can be formatted and shared following the FAIR (Findable, Accessible, Interoperable, Reusable) principles. The manuscript is a step-by-step guide to render marine robotic telemetry FAIR and publishable. State-of-the-art protocols for metadata and data formatting are proposed, applied and integrated automatically using Jupyter Notebooks to maximise visibility and ease of use. The method outlined here aims to be a first fundamental step towards FAIR interdisciplinary observational science.

Making marine image data FAIR

Article Open access 15 July 2022

Digital twins: a stepping stone to achieve ocean sustainability?

Article Open access 09 October 2023

A biological ocean data reformatting effort

Article Open access 16 February 2024

Introduction

The rise of cutting-edge robotic platforms^1,2,3 in the context of the Global Ocean Observing System (GOOS)⁴ is rapidly feeding a new generation of data. Both the environmental and the robotic data need a rigorous treatment⁵ capable to align marine robotics data with the long tradition of observational oceanography. FAIR - Findability, Accessibility, Interoperability, and Reusability - principles have to be the steering factors when handling these coupled datasets⁶.

They are, in fact, fundamental to cement data conduit to render scientific studies scrutable and scientific data repeatable and declinable. By following FAIR principles, the gathered data can be used in multiple fields of science not necessarily only within the collecting community, which is more likely to share the same vocabulary and background knowledge. Rendering a dataset FAIR encompasses, among other things, establishing a set of discovery metadata, i.e., descriptive information. Particular data centers/observing systems/research institutes provide tailored guidelines for data publication in their portals. For example, Earth Sciences data follow the NASA Global Change Master Directory - Directory Interchange Format (GCMD DIF)⁷; Arctic data can follow Svalbard Integrated Arctic Earth Observing System (SIOS)^8,9,10 guidelines. GCMD DIF and SIOS encompass the more general standards, ISO 19115¹¹, but they also integrate requirements that shape data treatment in Earth Sciences. However, in many other contexts, such guidelines are not provided and the datasets are published without sufficient descriptive metadata. In marine robotics, often data are collected during field expeditions and published as raw telemetry. While there are some efforts in place to improve data standardisation¹², mostly these are either for industrial applications, such as Remotely Operated Vehicles (ROVs)^13,14,15,16, or military-originated¹⁷, in other cases these attempts only address the construction of marine robots, but not their collected data¹⁸. The result is a lack of scrutiny when it comes to metadata and metadata standards as FAIRness enablers.

On the semantic level, the nomenclature used for metadata and for the description of variables and their attributes has to comply with a controlled vocabulary. For Earth Sciences, this is described in the Climate and Forecast (CF) Metadata Convention¹⁹, designed to promote the processing and sharing of files created using the NetCDF (Network Common Data Form) software libraries and machine-independent data formats^20,21. A shared/controlled vocabulary, for a given discipline, ideally contains the standard names of all the variables that can be stored in a FAIR dataset. There are a few efforts such as Marine Regions²² for what concerns georeferencing of marine areas (e.g. for trials) or the vocabularies defined in the NERC Vocabulary Server²³, for instance for ARGO floats²⁴. Environmental and robotic variables are constantly evolving, hence a shared vocabulary is a living entity, continuously developing^25,26. To the best of our knowledge, such vocabulary does not exist in the field of marine robotics, hence FAIRness cannot be fulfilled in the current state. We are working on a controlled vocabulary dedicated to the robotic variables, which at the moment is not present in literature, capable of maintaining the robotic and environmental dichotomy. For the data collected by marine vehicles, the vocabulary is based upon the Fossen nomenclature²⁷, which is commonly consulted by robotic scientists, for instance, when developing the control algorithm of a marine robot. As far as the environmental data are concerned, we base our controlled vocabulary on the Climate and Forecast Convention, which encompasses both the specific data unit of thought and the metadata singular concepts. In fact, fundamental aspects such as interoperability and reusability demand domain-specific standards, as for example the Climate and Forecast Convention. This paper wishes to lay the foundations of a data framework capable of including dynamically new concepts and their corollary entities. Ultimately, FAIR datasets need to be identified by a unique identifier, i.e. a DOI (Digital Object Identifier). An identifier associated uniquely to an object (data, article, abstract) allows the community to keep track of a specific product²⁸. Moreover, given the importance of instruments and associated metadata for the assessment of data quality and data reuse, a globally unique, persistent and resolvable identification of instruments is crucial. To this end, the Research Data Alliance (RDA) Working Group (WG) Persistent Identification of Instruments (PIDINST) explored a community-driven solution for globally unambiguous and persistent identification of operational scientific instruments²⁹.

In this paper, we present a Free and Open Source Software (FOSS) to render marine robotics datasets FAIR-compliant. The method can be automated to generate a FAIR datasets right after field missions. Achieving complete FAIR compliance using our framework is an iterative and incremental process that we have begun to design starting from the management of metadata, a fundamental aspect in the context of the FAIR principles. In particular, we suggest a minimum set of descriptive metadata for the coupled datasets of robotic and environmental data, in order to guarantee the principle of findability and accessibility. Furthermore, we define use metadata for the variables (attributes) to ensure the reusability of the datasets. Finally, for the sake of interoperability, we propose standard names for the robotic variables that do not belong to any controlled vocabulary, by following the most used terminology in the robotic domain. The method comes with its software implementation, which is also described in the paper and available on GitHub³⁰. A practical example of the applicability of the method on real data acquired during field tests is also available on Zenodo repository³¹.

Results

The proposed FOSS consists of a set of python scripts, Jupyter³² notebooks and modules to provide a metadata infrastructure, described in details in the method section. Applying the FOSS pipeline, hence the metadata infrastructure, during a field mission, results in the generation of a FAIR dataset, stemming from the telemetry of a robotic platform, namely SWAMP (Shallow Water Autonomous Multipurpose Platform)³³. SWAMP is a catamaran-shaped vehicle with double-ended hulls that can host a number of different sensors on its deck. During the considered data campaign, SWAMP automatically performed a number of standardised maneuvers³⁴ by following the International Towing Tank Conference (ITTC) criteria³⁵. Fieldwork and expeditions in marine robotics³⁶ are typically characterised by both a large number of daily planned activities and unexpected events. Therefore, it is unrealistic to rely on the operators to manually create a FAIR dataset during such missions. The discussed FOSS supports the automation of the process by attaching a minimum set of global and variable metadata after the data acquisition with minimal intervention on the operator’s behalf. Figure 1 schematises the proposed approach. The two files on the left represent the inputs to the FOSS, they contain respectively the telemetry of the marine platform and a configuration file. In this phase, a dedicate python module is used to connect to the database which contains global and variable metadata, to retrieve all the metadata information. We provide our data in the form of NetCDFs, which is a widely used format that enables the creation, access, and sharing of labeled and array-oriented data^20,21. In addition to the NetCDF, the software pipeline also outputs an eXtensible Markup Language (XML) ISO 19115-compliant file, containing the descriptive metadata. The two input files are:

1.
Log&Trace.csv: the vehicle’s telemetry data, in the form of a raw log table, provided by SWAMP
2.
Conf.ini: the descriptive metadata, in the form of a configuration file, generated by the Human-Computer Interface (HCI) used to remotely control the vehicle

The telemetry file is a text file storing the values of each variable in a dedicated column. For each column, we set a 2-level name, followed by all the values. The first level name is the custom name given to a specific variable (what we used to do already), whilst the second level name is the long_name, which indicates the standard variable name it refers to. Since the long_name is also the unique ID of the variable database, such information can be used by the scripts to retrieve from the database all the attributes of the pointed variable. It is fundamental to use a 2-level name for the columns, as it is very common to have multiple measurements for the same type of data. For example, SWAMP contains more than one Global Navigation Satellite System (GNSS) on board. One is part of the Navigation, Guidance and Control (NGC) unit, whilst the other is contained in the independent propulsion modules called minions and located in the hulls. During operations, the minions are identified with their positions: Front-Left (FL), Front-Right (FR), Rear-Left (RL), and Rear-Right (RR). Therefore, multiple latitude and longitude measurements will be collected during any SWAMP field deployment. Furthermore, to guarantee the replicability of the datasets, it is important to know the specific instruments used to obtain each individual measurement, or the particular algorithms used to process the data. By knowing them, it is also possible to find out the accuracy of each collected dataset. For these reasons, we include such information, when available, to the first level name, within square brackets, used as delimiters. Table 1 shows the 2-level names in the first row and the different instruments used, as should be reported in the log file. The script uses the first name to set the variable name on the NetCDF file and the second name as the ID of the database to retrieve all the attributes (e.g., unit, coverage_content_type, description, etc.) in order to append them to the NetCDF variable just created. Also the instrument name, if present, is extracted and removed from the variable name and saved as a variable metadata, under the CF attribute named source. When the NetCDF file is opened in Python, using the module xarray, the corresponding variable is shown as reported in Fig. 2. For the descriptive metadata, the vehicle’s interface provides a specific panel view to generate the input configuration file, as shown in Fig. 3. Such a view is directly connected to the database and gets automatically populated with the latest version of the global metadata and their attributes. It is therefore sufficient to add or edit an entry to the database and commit it to remote, to see it appearing in the operator’s HCI view as well. Each operator can specify new default values locally, in addition to the general ones provided by the database. In this fashion, most of the fields in the view can be filled with a single click. Once the mandatory fields are filled, the configuration file can be generated. Such a file contains a set of key-value entries, where the key is the name found in the Attribute Convention for Data Discovery (ACDD)³⁷, the unique ID of the database. Ultimately, the scripts use the file to append the metadata to the NetCDF. Furthermore, it connects to the database to retrieve all metadata that should be automatically calculated (i.e., when the auto attribute is set to “True”) and generate them as well. For example, the time_coverage_duration can be calculated from the date and time standard variables of the vehicle’s log file, by following ISO 8601-1:2019³⁸ proposed format³⁹. Figure 4 shows how some of the generated global metadata appears on xarray.

Table 1 Example of 2-level names in SWAMP telemetry, followed by values.

Full size table

Discussion

Coupled environmental and robotics datasets, collected through innovative marine platforms, are fundamental to enable cross-analysis between the measurements, e.g., to decipher the environmental data based on the platform behaviour, to seal the best interpretation of marine telemetry and to generally increase the transparency of the performed field and model experiments. As full reproducibility of the results is often not possible due to the nature of the domain and the type of field missions, the proposed framework focuses on allowing the replication or the re-use of the data and on optimising the sharing of the obtained research results, in line with the EU research and innovation strategies⁴⁰. In fact, rendering these unique datasets FAIR, enables interoperability and enhances their visibility. The resulting datasets are within reach of the wider scientific community. The framework presented here can be applied in simulations, where reproducibility is more likely. This could be part of a future work, for instance employing the MARUS simulator⁴¹ (https://marusimulator.github.io/).

The Free and Open Source Software described here, implements the best practices already in use in Earth Sciences and it is the ideal first step to create a “FAIR by default” dataset during the conclusion of a marine expedition, which involves emerging technology⁴². The automation provided by the scripts, facilitates and increases the creation of datasets equipped with the appropriate metadata, which in turn, eases the correct storage and publication of data and then eventually enables the creation of consistent time series, which are fundamental to study the evolution of natural phenomena taking place, in this instance, in the ocean. Furthermore, the work presented in this manuscript is a first fundamental step towards a deeper interdisciplinary discussion around a continuously evolving technological and observational system.

In this manuscript, we use SWAMP as an exemplary rising robotic platform, not commercialised. SWAMP is, in fact, an autonomous platform, highly modular and reconfigurable, hence capable of different sensing modalities. Its unconventional configuration renders the standardisation of the logged data more complex when compared to commercialised platforms. Other autonomous vehicles, such as gliders, are part of dedicated programs, are commercially available and their data logging is tailored for ocean observations. The framework described in this manuscript offers a coupled dataset, reporting the data describing the performance of the observing platform and the environmental data gathered during the same field campaign. A shared dedicated vocabulary in support of a FAIR coupled datasets is novel and fundamental to accelerate raising robotic platforms into the ocean observing system. For instance, it is useful to log variables related to the functioning of the control algorithms applied to sustain marine robotic research. Gliders are among established autonomous vehicles which benefit from bespoke data and metadata documentation by the OceanGliders group⁴³ - formerly Everyone’s Gliding Observatories (EGO). The Data Management Team of EGO provides guidelines to format a NetCDF file for distributing glider data⁴⁴ and metadata⁴³ in a EGO-compliant manner⁴⁵. They also provide a NERC-compliant vocabulary containing standard names for glider measurements. Therefore, the EGO community promotes glider technology and focuses on homogenising⁴⁴ and standardising data collected by ocean gliders, hence greatly improving glider data sharing and scientific and international collaboration. The objective of our study is to provide a general-purpose framework, and its related code, to generate FAIR datasets from the collected data, in an automated way. Such a framework, is dedicated to emerging robotic platforms, not yet commercialised, but apt to perform unconventional measurements, at times in remote areas of the ocean^36,46, thus operating in the observational gap left by established platforms. The software, proposed in this manuscript, is simple and open to guarantee prompt and effortless adaptation in the different operational and sensing modes that characterise emerging platforms.

Methods

The proposed method can be split into two parts.

The first one concerns a review of the existing conventions that provide standardised metadata and their possible adaptation to marine robotics. Our objective is not to create yet another standard or data format, but to offer a set of simple tools that draw from what has already been developed for other scientific contexts. As already mentioned, the output file format is NetCDF. It allows the attachment of metadata information on both a global and variable level, which renders the dataset self-describing.

The second part is to build an infrastructure to access such metadata automatically, in order to create a “FAIR by default” dataset. In fact, one of the goals of this work is to generate FAIR robotic and scientific datasets at the end of every field mission. Such infrastructure is set to be as comprehensive as possible and can be adopted in other scientific contexts. In order to build our infrastructure, the fundamental components that we need to include to fully describe a dataset are the following:

Descriptive (or global) Metadata
Variables Standard Names
Use (or variable) Metadata

A FAIR dataset shall be findable and accessible. To enable that, the data must be enriched with a set of “global” metadata typically referred to as descriptive metadata. Usually, such metadata are domain agnostic, many standards exist and are used and shared between different fields. For example, ISO 19115¹¹ provides a schema to describe any kind of geographic information⁴⁷. Descriptive Metadata answer questions such as the spatial and temporal dimension of the collected data as well as its identification, scientific summary, and its license.

Therefore, we started to define a minimum set of descriptive metadata for our datasets. An extract of such selection is reported in Table 2. Each entry comes with a number of attributes, e.g., the name, the description, whether the metadata can be automatically generated from the dataset, the default value, and whether is mandatory or optional to include. However, the most important field is the standard name (here reported as ACDD), because it has to come from a shared and controlled vocabulary, where the standard names are recognised worldwide and allow us to find automatically the datasets of interest.

Table 2 Defining a minimum set of global metadata to be associated with a given dataset.

Full size table

To define such a minimum set of global metadata, we reviewed the descriptive information that is typically asked in the marine robotic field and, in general, in the Earth Sciences domain. The SIOS (Svalbard Integrated Arctic Earth Observing System)^9,10 and the Ocean Geospatial Consortium Catalog Service for the Web (OGC CSW)⁴⁷ were the starting point. Once that we identified the metadata of interests, we aligned to the standard schema proposed by ISO 19115¹¹. Finally, since our goal is to create NetCDF FAIR-compliant datasets, we converted the ISO 19115¹¹ standard names, which are provided in the form of XML elements, to human-readable global metadata. To do that we used the ACDD. This convention contains the list of recommended global metadata for describing a NetCDF dataset and provides the mapping with the ISO 19115¹¹ standard schema, which was used in this context. The final minimum set of global metadata is stored in a database-like format and is available in the dedicated GitHub space³⁰. The unique field for each entry is named ACDD, it contains the standard name and is used as the ID of the database.

Once the dataset has all the ingredients to be discoverable, the second major issue to consider is interoperability. As a matter of fact, it is not enough to guarantee an automated discovery, access, and download of a dataset. It is equally, if not more important to enable the automated processing of the data contained in the dataset as well. A typical use case is the development of a script that downloads different datasets and analyses the data that are contained in them, for example by merging or comparing their measurements. In the marine robotic field, if such datasets are produced by different researchers, or even different research groups or institutions, there is a high probability that the variable names used to describe the same measurements are chosen independently, hence they do not match. This makes the automated analysis cumbersome, if not impossible, and increases exponentially the time needed to align the input data.

As a consequence, it is necessary to establish standard names also for the variables contained in the dataset. Currently, many EOV (Essential Ocean Variables)⁴⁸ do have a correspondent standard name in a shared and controlled vocabulary, which enables interoperability. One of the most common conventions is the Climate and Forecast (CF) Metadata Convention¹⁹, which gathers the standard names that can be attached as a variable attribute in a NetCDF file (the attribute field is named standard_name by the CF convention itself). In this way, the operator can still set a custom name to a variable and then include the correspondent standard name as an attribute. However, for the robotic variables, i.e., variables describing the performance of the robotic platform, there is no such agreement and it is still a challenge to find standard names shared by the community.

Similarly to what was done with the global metadata, we started to define standard names for the robotic variables found in the telemetry of our robotic vehicles and to store them in a light database to quickly access them when needed. When possible, we used existing standard names for the variables, e.g., for common measurements such as yaw, pitch, and roll we used the CF names platform_yaw, platform_pitch and platform_roll. For all the other cases, where we could not find a name from a controlled vocabulary, we proposed a new standard name. We tried to cohere as much as possible with the terminology commonly used in the robotic field, by following Fossen’s guidelines. As shown in Table 3, for each log name, we filled both the field standard_name and the field long_name when we found a standard name from a controlled vocabulary such as CF, e.g. for platform_yaw. On the other hand, we filled only the attribute long_name when we proposed a new name, e.g., in the case of platform_heave_acceleration_down. In this way, the attribute long_name always contains a unique value and it can be used as the ID for the database. The complete, but preliminary list can be found in the dedicated GitHub space³⁰.

Table 3 Mapping Standard Names with Log Names, exemplary table.

Full size table

Finally, for each log variable, besides the standard_name and long_name attributes, it is important to set other attributes that help the user or the machine to understand the characteristics of each measurement. These additional attributes include, for example, the unit of measurement, the fill value in case of missing data, or whether the log variable refers to a direct (physical) measurement or not. Also, such attributes shall have a standardised name to be processed automatically. We referred to them as variable metadata. Table 4 summarises the most important attributes, which can be attached to each variable in a NetCDF file. Right now, we include as mandatory only the ones indicated with “M” in the last column. The other fields are optional, but highly recommended. The source attribute is the only one that is not persistently assigned to a standard variable, but depends on the instruments used on a specific field mission or on the algorithms employed to pre-process the output data. Therefore, such information is attached to the custom variable name. The two databases generated, one containing the global metadata and one containing the variable standard names with their attributes (variable metadata), are living entities and are constantly updated. They are provided in the form of Lightweight JSON-based databases. They can be accessed or modified by adding, removing, or updating one or more entries with a simple Python module named “metadataDB” which is provided as part of the FOSS. The module is a customised wrapper of the public module named PysonDB-V2 and connects directly with the databases. An additional notebook named “database.ipynb” explains the module and how to access the database. The notebook, the module, and the JSON database files are available on GitHub³⁰. These databases and the module to access them represent the simplest form of the proposed infrastructure and enable us to produce datasets composed by a single NetCDF file containing all global metadata, all variable metadata (attributes), and all the actual measurements, in an automated fashion.

Table 4 Attributes table, example.

Full size table

Data availability

An example of practical application of the proposed method on real data acquired during field tests to obtain FAIR robotic dataset is available on Zenodo repository³¹ at https://doi.org/10.5281/zenodo.7825000. The data record is composed of two files referring to the same dataset: the .csv file is the raw format that was acquired by the ASV robotic platform SWAMP during field test. The .nc file contains the same data, but in a standard format and with global and variable metadata generated using the standardization workflow, based on FAIR Principles, described in this paper, which uses controlled and standard vocabularies (ACDD and standard CF). The data refer to the execution of zig-zag manoeuvres of the ASV following the ITTC standards for ship manoeuvrability, adapted to the specific case of innovative surface robotic platform³⁴. The shared dataset demonstrates the practical applicability of the proposed framework, hence adding value to our study. This is an example that can be extended to other datasets acquired with emerging surface robotic platforms in different contexts.

Code availability

Scripts, notebooks and modules to generate metadata in several formats following FAIR principles for marine robotic data is available on GitHub (https://github.com/CorradoMotta/FAIR-Data-in-Marine-Robotics), under the GNU General Public License v3.0. The dedicated GitHub page³⁰ of the project supports the understanding and usage of the codes.

References

Aracri, S. et al. Soft robots for ocean exploration and offshore operations: a perspective. Soft Robotics 8, 625–639 (2021).
Article PubMed PubMed Central Google Scholar
Piermattei, V. et al. Cost-effective technologies to study the Arctic Ocean environment. Sensors 18, 2257 (2018).
Article ADS PubMed PubMed Central Google Scholar
Bernardi, M. et al. AURORA, a multi-sensor dataset for robotic ocean exploration. International Journal of Robotics Research 41, 461–469 (2022).
Article Google Scholar
GOOS. Global Ocean Observing System. https://www.goosocean.org/ (2021).
Aracri, S. et al. Open science in marine robotics. In International Conference on Open Data (ICOD 2022): Book of abstracts, 96–100, https://doi.org/10.5281/zenodo.8071065 (2023).
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3, 160018 (2016).
Article PubMed PubMed Central Google Scholar
NASA. Global Change Master Directory - Directory Interchange Format (GCMD DIF). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices/directory-interchange-format-dif-standard (2023).
SIOS. Svalbard Integrated Arctic Earth Observing System. https://sios-svalbard.org/ (2021).
SIOS. Technical documentation guidance for data centres contributing to SDMS. https://www.sios-svalbard.org/sites/sios-svalbard.org/files/common/SDMS_Interoperability_Guidelines.pdf (2020).
Ignatiuk, D. et al. SIOS data management system: distributed data system for Earth system science. In EGU General Assembly, vol. 19–30 April, EGU21–15205, https://doi.org/10.5194/egusphere-egu21-15205 (2021).
International Organization for Standardization. ISO 19115-1:2014 geographic information – metadata – part 1: fundamentals. https://www.iso.org/standard/53798.html (2014).
Waldmann, C. et al. About the value of standards for ocean technology. In OCEANS 2021: San Diego – Porto, 1–5, https://doi.org/10.23919/OCEANS44145.2021.9705984 (2021).
NORSOK Standard. U-102 Remotely operated vehicle (ROV) services. https://online.standard.no/norsok-u-102-2020 (2020).
DNV-GL. Rules for classification - underwater technology part 5 types of UWT systems, chapter 7 remotely operated vehicles. https://www.dnv.com/ (2015).
Gabl, R. et al. Hydrodynamic loads on a restrained ROV under waves and current. Ocean Engineering 234, 109279 (2021).
Article Google Scholar
Walker, K. L. et al. Experimental validation of wave induced disturbances for predictive station keeping of a remotely operated vehicle. IEEE Robotics and Automation Letters 6, 5421–5428 (2021).
Article Google Scholar
NATO, STANDARD ANEP-87. Digital underwater signalling standard for network node discovery & interoperability. Edition A Version 1. https://nso.nato.int/nso/nsdd/main/standards?search=ANEP-87 (2017).
DNV-GL. Rules for classification - Underwater Technology Part 5 Types of UWT systems, Chapter 8 Autonomous underwater vehicles. https://www.dnv.com/ (2015).
Eaton, B. et al. NetCDF Climate and Forecast (CF) metadata conventions. http://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.pdf (2022).
Brown, S. A., Folk, M., Goucher, G. & Rew, R. Software for portable scientific data management. Computers in Physics 7, 304–308 (1993).
Article ADS Google Scholar
Rew, R. & Davis, G. NetCDF: an interface for scientific data access. IEEE Computer Graphics and Applications 10, 76–82 (1990).
Article Google Scholar
Flanders Marine Institute. Marine regions. Managed by Flanders Marine Institute. https://www.marineregions.org/ (2018).
National Oceanographic Center. The NERC Vocabulary Server (NVS). https://vocab.nerc.ac.uk/ (2023).
Freeland, H. et al. ARGO - a decade of progress. In Proceedings of OceanObs'09: Sustained Ocean Observations and Information for Society, Vol. 2, 357–370 (European Space Agency, 2010).
Schoening, T. et al. Making marine image data FAIR. Scientific Data 9, 414 (2022).
Article PubMed PubMed Central Google Scholar
Schoening, T. et al. Publisher correction: making marine image data FAIR. Scientific Data 9, 445 (2022).
Article PubMed PubMed Central Google Scholar
Fossen, T. I. Guidance and control of ocean vehicles (Wiley, 1994).
DOI Foundation. The DOI® Handbook. https://www.doi.org/the-identifier/resources/handbook (2019).
Stocker, M. et al. Persistent identification of instrument. Data Science Journal 19 (2020).
Motta, C., Ferretti, R. & Aracri, S. FAIR data in marine robotics. Zenodo. v0.1.0-alpha. https://doi.org/10.5281/zenodo.8256384, https://corradomotta.github.io/FAIR-Data-in-Marine-Robotics/html/index.html (2023).
Ferretti, R., Motta, C. & Bibuli, M. Maneuverability characterization of autonomous surface vehicle (ASV): ITTC zig-zag test dataset., Zenodo, https://doi.org/10.5281/zenodo.7825000 (2023).
Team, J. Project Jupyter. https://jupyter.org/ (2023).
Odetti, A., Bruzzone, G., Altosole, M., Viviani, M. & Caccia, M. SWAMP, an autonomous surface vehicle expressly designed for extremely shallow waters. Ocean Engineering 216, 108205 (2020).
Article Google Scholar
Ferretti, R. et al. Procedures for maneuverability characterization: from ships to marine robots. In Computer Applications and Information Technology in the Maritime Industries, COMPIT'23, http://data.hiper-conf.info/compit2023_drubeck.pdf (2023).
Quality Systems Group of the 29th International Towing Tank Conference. ITTC recommended procedures and guidelines. https://ittc.info/media/9876/0_0.pdf (2021).
Bruzzone, G. et al. Multi-sensor 3D mapping of Tethys Bay (Ross Sea – Antarctica) with PROTEUS, an innovative, highly reconfigurable and versatile unmanned marine vehicle. In EGU General Assembly 2023, EGU23–12041, https://doi.org/10.5194/egusphere-egu23-12041 (2023).
(ESIP), E. S. I. P. Attribute Convention for Data Discovery - ACDD. https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3 (2022).
International Organization for Standardization. ISO 8601-1:2019 date and time – representations for information interchange – part 1: basic rules. https://www.iso.org/standard/70907.html (2019).
Intergovernmental Oceanographic Commission of UNESCO. Ocean Data Standards, Vol.2: Recommendation to adopt ISO 8601:2004 as the standard for the representation of dates and times in oceanographic data exchange. https://repository.oceanbestpractices.org/bitstream/handle/11329/217/54_2.pdf?sequence=1 isAllowed = y (2011).
Lee, B. et al. Reproducibility of scientific results in the EU: scoping report. https://www.ouvrirlascience.fr/wp-content/uploads/2020/12/Reproducibility-of-scientific-results-in-the-EU.pdf (2020).
Lončar, I. et al. MARUS - a marine robotics simulator. In OCEANS 2022, Hampton Roads, 1–7 (2022).
EuroGOOS Office, EuroGOOS Technology and Planning Working Group Chairs & EOOS Technology Forum Foresight Workshop Organising Committee. EOOS technology forum report 2022. Thinking ahead: the technology of the science we need for the ocean we want. https://www.eoos-ocean.eu/wp-content/uploads/2022/08/EOOS-Tech-Forum-report-2022.pdf (2022).
OceanGliders. Ocean gliders: data and metadata from Global Data Assembly Centre. https://doi.org/10.17882/56509 (2023).
EGO gliders data management team. EGO gliders data processing chain. https://doi.org/10.17882/45402 (2023).
Thierry, C., Claire, G., Jean-Philippe, R., Justin J. H. B. & Bartolome, G. EGO gliders NetCDF format reference manual. https://doi.org/10.13155/34980 (2023).
Bruzzone, G., Odetti, A., Caccia, M. & Ferretti, R. Monitoring of sea-ice-atmosphere interface in the proximity of Arctic tidewater glaciers: the contribution of marine robotics. Remote Sensing 12, https://doi.org/10.3390/rs12111707 (2020).
Ocean Geospatial Consortium Catalog Service for the Web - OGC CSW. https://www.ogc.org/ (2023).
Lindstrom, E., Gunn, J., Fischer, A., McCurdy, A. & Glover, L. A framework for ocean observing. https://unesdoc.unesco.org/ark:/48223/pf0000211260 (2012).

Download references

Acknowledgements

The fundamental contribution of Giorgio Bruzzone, Mauro Giacopelli and Edoardo Spirandelli (CNR-INM/Genoa) to SWAMP design and development is greatly appreciated and acknowledged by the authors. The crew of the research boats Litus and Aretusa (CNR-ISMAR): Mauro Penzo, Gianni Zennaro, Roberto Vianello, Loris Dametto and Gianfranco Magris provided invaluable assistance in the field, and the authors gratefully acknowledge their contribution. This work was co-financed by the European Union from the European Regional Development Funds as part of the Interreg Italy - Croatia “InnovaMare - Developing innovative technologies for sustainability of Adriatic Sea” (ID: 10248782) and by HORIZON-WIDERA-2021-ACCESS-02 – Twinning Project MONUSEN - Montenegrin Centre for Underwater Sensor Networks (grant agreement number 101060395). We wish to thank the reviewers for significantly improving the quality of the paper with their recommendations.

Author information

These authors contributed equally: Corrado Motta, Simona Aracri, Roberta Ferretti.

Authors and Affiliations

Institute of Marine Engineering (INM), National Research Council of Italy (CNR), Department of Engineering, ICT and Technology for Energy and Transport (DIITET), Genoa, 16149, Italy
Corrado Motta, Simona Aracri, Roberta Ferretti, Marco Bibuli, Gabriele Bruzzone, Massimo Caccia & Angelo Odetti
Arcadia SIT, Vigevano, 27029, Italy
Corrado Motta
University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, 10000, Croatia
Fausto Ferreira
Institute of Marine Sciences (ISMAR), National Research Council of Italy (CNR), Department of Earth System Sciences and Environmental Technologies (DSSTTA), Venice, 30122, Italy
Francesca de Pascalis

Authors

Corrado Motta
View author publications
You can also search for this author in PubMed Google Scholar
Simona Aracri
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Ferretti
View author publications
You can also search for this author in PubMed Google Scholar
Marco Bibuli
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Bruzzone
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Caccia
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Odetti
View author publications
You can also search for this author in PubMed Google Scholar
Fausto Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Francesca de Pascalis
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.M. wrote the codes and developed the SWAMP interface, he also contributed to the writing of the manuscript; A.S. wrote the article, assisted with the experiments design and sensor integration and co-leads the data management system of the research group together with R.F., who also edited and contributed to the writing of the article; M.B. and M.C. are in charge of the robotic platform control; G.B. head of the Robotics Lab in the INM division of Genoa, is the designer and the developer of the robotic platform electronics and software architectures; A.O. conceived, engineered and created the robotic platform; F.D.P. is the principal investigator of Innovamare, the project that funded the experiments. F.F. edited the manuscript and contributed to the integration of standards.

Corresponding author

Correspondence to Simona Aracri.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Motta, C., Aracri, S., Ferretti, R. et al. A framework for FAIR robotic datasets. Sci Data 10, 620 (2023). https://doi.org/10.1038/s41597-023-02495-3

Download citation

Received: 09 June 2023
Accepted: 18 August 2023
Published: 13 September 2023
DOI: https://doi.org/10.1038/s41597-023-02495-3