The BlueBio project’s database: web-mapping cooperation to create value for the Blue Bioeconomy

Funding innovation requires knowledge on previous/on-going research and identification of gaps and synergies among actors, networks and projects, but targeted databases remain scattered, incomplete and scarcely searchable. Here we present the BlueBio database: a first comprehensive and robust compilation of internationally and nationally funded research projects active in the years 2003–2019 in Fisheries, Aquaculture, Seafood Processing and Marine Biotechnology. Based on the previous research projects’ database realized in the framework of the COFASP ERA-NET, it was implemented within the ERA-NET Cofund BlueBio project through a 4-years data collection including 4 surveys and a wide data retrieval. After being integrated, data were harmonised, shared as open and disseminated through a WebGIS that was key for data entry, update and validation. The database consists of 3,254 “georeferenced” projects, described by 22 parameters that are clustered into textual and spatial, some directly collected while others deduced. The database is a living archive to inform actors of the Blue Bioeconomy sector in a period of rapid transformations and research needs and is freely available at: 10.6084/m9.figshare.21507837.v3.

The ERA-NET scheme was launched by the EU in 2002 to support the coordination and collaboration between national and regional research programmes and to increase the share of funding that Member States jointly dedicate to challenge-driven research and innovation agendas. The focus and role of ERA-NETs have varied across the Frameworks Programmes: • ERA-NET actions in FP6 provided support for actors implementing public research programmes to coordinate their activities, e.g. by developing joint activities such as joint calls for transnational proposals; • ERA-NET Plus actions in FP7 provided -in a limited number of cases with high European added valueadditional EU financial support to top-up research funding of a single joint call for proposals between national and/or regional programmes; • The ERA- NET Cofund under Horizon 2020 merged the former ERA-NET and ERA-NET Plus into a single instrument with the central and compulsory element of implementing one substantial call with top-up funding from the Commission.
The main objective of the BlueBio project is to establish a coordinated R&D funding scheme that will strengthen Europe's position in the blue bioeconomy, identifying new and improving existing ways of bringing bio-based products and services to the market and finding new ways of creating value from the blue bioeconomy. In this process a key issue is the development of a future research agenda in the field of marine biotechnologies associated to fisheries, aquaculture and seafood processing, based on the past, current and future challenges posed to research. To achieve this objective, a focal activity was the analysis of the past and on-going research projects on aquaculture, fisheries and seafood processing funded at European/national/regional level, including an identification of possible duplications and gaps.
The PostgreSQL BlueBio database was built on the basis of the previous COFASP database 10 The analysis of the collected information allowed us to cluster projects into research topics and to list those that would need to be further investigated in the short-medium period and in the different EU areas 12 . It could be pivotal to good policy making and development 10 .
Information is disseminated through a web mapping application available on the ERA-NET Cofund BlueBio project website (https://bluebioeconomy.eu/the-bluebio-projects-online-database/; Fig. 1). It allows for: (i) easy and visual search of all projects carried out on a specific issue and/or geographical area and (ii) collaborative work, as users were/are invited and enabled to contribute to the collection of the data and the improvement of its reliability and availability.
From a technical point of view, the web application "spatially enables" the PostgreSQL object-relational database by relying on OpenLayers 3 (User Interface map component), Apache Web Server, MapQUest map server and GeoJSON features of countries and marine divisions (Web Map/Feature Access). The baseline PostegreSQL database as well as the web mapping application are hosted at CNR-IRBIM of Ancona, Italy.
The data set reported here is derived from the baseline PostegreSQL database and covers projects active in the period 2003-2019. The baseline database is continuously updated and widened, and it is foreseen to store projects active in the period 2003-2022 by the end of the BlueBio project.
The data collection and publication represent an unprecedented, consistent and robust recognition of the research carried out in Fisheries, Aquaculture, Seafood Processing and Marine Biotechnology at EU and country level. Although it does not have the ambition to include the entire universe of funded projects, the BlueBio database surely represents a unique collection gathering information from international and national repositories, archives of research institutes as well as from individual researchers and research projects' websites.
It gives a picture and a map of the main research topics targeted by research in the EU and of the funding resources devoted to them. This information can be used by a range of stakeholders, from policy makers to researchers and producers as it allows to: • identify relevant gaps and overlaps in the research on Fisheries, Aquaculture, Seafood Processing and Marine Biotechnologies at national/international level; • take the stock of available knowledge to support the development of future research programmes both at national, regional and EU level; • provide suitable material to identify potential synergies among actors and networks for future research projects.

Methods
Data collection. The data collection made within the COFASP project was extended in ERA-NET Cofund BlueBio through 4 surveys (once a year from 2019 to 2022) and an in-depth web and database search and review. The latter was carried out by database administrators (CNR-IRBIM) on the EU projects' websites, the websites of research institutes/universities as well as on those of national and international funding agencies. Surveys consisted of circulating a questionnaire -in .xls file with predefined fields to fill -amongst information producers (i.e., project coordinators, national research funding agencies involved in BlueBio projects); www.nature.com/scientificdata www.nature.com/scientificdata/ several reminds were sent to increase the response rate. Each Bluebio partner was clearly in charge of collecting its own national projects, asking for its own networks of research institutions and making great effort to directly contact relevant and priority projects. The questionnaire (BlueBio_database_data collection questionnaire.xlsx) is made available through the unrestricted repository at figshare 11 .
Note that the collection process ultimately depended on the identified key national contacts/information providers and their level of engagement with COFASP (before) and BlueBio (after) network and partners.
Data collection made also use of anonymous users who were able to submit independent records by using the "New project" module of the developed webGIS (Fig. 1). For this purpose, the webGIS and its web-based module for data entry was promoted during several BlueBio project meetings. Data harmonisation. The harmonisation process involved refining and cross-validating the collected information to allow comparison and analysis. It was long and time consuming.
First, a content cleaning process took place whereby the grammar, spelling and format were checked (e.g., institutions were standardised and traced back to predefined institutional signatures). Then, each entry (both by anonymous www.nature.com/scientificdata www.nature.com/scientificdata/ web-users and interviewed partners) was cross-validated against all the available data sources (e.g., questionnaires, institutional projects' database and project's specific websites) and, if necessary, integrated and edited by administrators before it was stored in the database. This process is hereinafter referred to as data retrieval (Table 1 and Table 2).
For a better characterization of the projects, based on the action fields of the BlueBio project, new fields of information were added such as identification by research category and source of funding.
Four main research categories were considered: Fisheries, Aquaculture, Seafood Processing and Marine Biotechnology. The combination of 2 or more categories was also considered to characterise cross-cutting research projects.
According to the related supporting programmes and instruments of funding, each project was also assigned to one of the following funding sources: National, European, European/National and Other. The former includes those projects that were exclusively funded within national programmes or instruments of funding (e.g. An additional effort was made to harmonize the programme field. For example, the overarching funding programme was reported for each EU funded projects (e.g., FP4, FP5, Horizon 2020), the national projects were  www.nature.com/scientificdata www.nature.com/scientificdata/ generalized as "National Programme", while the projects cofounded trough the ERA-NET scheme were included in "International Cooperation".
Regarding the projects' budget, when necessary, it was translated into Euro using the exchange rate of the starting year of the project. Similarly, foreign projects' abstracts were translated into English before they were stored in the database.
Last, to allow a better characterization of the projects and their easier search in the database, each project was associated with keywords taken from a list previously identified by database administrators (Online-only Table 1).

Geographical extension.
Projects were geographically allocated based on the marine area(s) where the research was carried out and the countries of the institutions involved. It allowed to highlight eventual differences between the European seas and/or countries.
Countries were directly linked to each institution, while projects were allocated into marine areas following these criteria: • if the study area and/or case studies were clearly recognizable the project was associated with specific marine area(s); • if the study area was not indicated but the project dealt with field experiments, the marine area of the coordinator country was used; • if the study area was not indicated and the project did not deal with field experiments (e.g., laboratory genetic projects), the project was labelled as Not associated with marine areas. It was the case of those projects dealing on Aquaculture, Seafood Processing and Marine Biotechnology that were not specifically carried out at sea.
The marine areas were identified following a hierarchical structure composed by 3 different levels of detail: Area, Subarea and Division. The identification of the Areas and Subareas was based on the Food and Agriculture Organization (FAO) Fishing Areas 12 : Atlantic Northeast (FAO Area 27), Atlantic Eastern Central (FAO Area 34) and Mediterranean and Black Sea (FAO Area 37). The FAO Fishing Divisions were also considered for the Atlantic Northeast and Atlantic Eastern Central, whereas the FAO-GFCM Geographical subareas (GSAs) were used for the Mediterranean and Black Sea. Overall, the 3 major Marine areas, Atlantic Northeast, Atlantic Eastern Central, Mediterranean and Black Sea were divided into 18 subareas and 75 divisions (Fig. 2, Online-only Table 2).
Construction of the database. Tables 1, 2 summarise the information fields gathered through the collection and harmonisation process. Some were directly collected through questionnaires or by searching and comparing different sources of information such as institutional projects' databases and project's specific websites, while others were assigned by database administrators. Overall, 22 fields were associated with each record (project). Coordinator names and emails are stored in the database but not shared for privacy.
Program details (information fields: Programme1 and Programme2) are currently being harmonised by database administrators, and will be soon released in the next version of the data repository.
The relational database was built in PostgreSQL and consists of a collection of tables that store interrelated data (Fig. 3). Each record is associated with a unique ID (e.g., pkid for each project) which allows creating relationships between tables. It was managed and maintained using different database management tools (e.g., pgAdmin).

Data Records.
Once subjected to the quality control procedures, the dataset presented in figshare 11 Table 2, allowing more than one option) Questionnaire/DB administrator Table 2. Standard spatial information fields used by the BlueBio database. *Data retrieval: search made by DB administrators comparing and integrating from different sources of information (e.g., institutional projects' database and project's specific websites).
The repository follows the FAIR principle of Findability, Accessibility, Interoperability and Reusability of data.
Some examples of information that can be drawn by the analysis of the data stored in the released BlueBio database are shown hereafter.
Most of the projects started in the period 2004-2017, and mainly focused on Fishery, Aquaculture and Seafood Processing (Fig. 4). Among the cross-cutting categories instead, Aquaculture & Marine Biotechnology   www.nature.com/scientificdata www.nature.com/scientificdata/ Overall, 26 out of the 96 countries involved were EU MS (including the United Kingdom as Brexit entered into force in 2020) and 58 non-EU countries. Norway dealt with the highest number of projects (1,649) followed by Italy, Spain, and the United Kingdom, which however participated in a far lower number of projects ranging from 427 to 467 (Fig. 5). Again, Norway coordinated the highest number of projects (more than 45% of the total universe of the database), followed by Italy (8%) and Germany (6%). On the other hand, Spain, the United Kingdom and France were involved in the highest number of projects as partners (8-9%; Fig. 5). A few countries (e.g., Germany, Poland and Finland) maintained a similar importance in the categorization both by coordinator and involved country, while others were only involved and never coordinated (e.g., Hungary and Lithuania).
The majority of the projects were funded at national level, while 18% by the European Commission (Fig. 6). The projects co-financed by European and National funds and those supported by Other funding sources accounted for 11% and 0.09%, respectively. Excluding projects for which the budget information was not available (948), 34% of the collected projects have a budget greater than 500 k€, 13% lower than 100 k€ and 24% between 100 k€ and 500 k€.
Projects with budgets >500 k€ represented 50% on average of the projects in each research category and around 90% of the total funding, while low budget projects (<100 k€), in general, did not exceed the 2% of the total funds of a research category.
Project's budgets ranging between 100 and 500 k€ were quite important in almost all research categories exceeding 30% of the total projects, with the exception of Marine Biotechnology, Aquaculture & Fisheries, Aquaculture & Fisheries & Marine Biotechnology & Seafood Processing. However, they never exceeded 10% of the total funding of a research category.
Bringing together information on funding sources and budget categories, data highlights that most of the projects coordinated by Norway had a budget >500 k€ and were funded by national programmes (Fig. 7). The same was in Italy, the United Kingdom, Spain and France but, in this case, most of the projects were funded by European funding programmes. On the contrary, in all countries the majority of projects with a budget less than 500 k€ were national, while the National-European funding programme financed mainly few projects with a budget >500 k€.
Deeper analysis of topics within each research category highlighted research priorities and needs as well as eventual differences among European marine areas and countries. Obviously, outcomes strongly depend on the identification of topics by users. With regards to Aquaculture, for example, Table 3 lists the 16 main topics identified, among which excel Aquaculture development and/or management, Animal welfare and/or health, Animal feed, and Engineering.
However, Animal welfare and/or health -mainly consisting in the development of farming systems to improve productivity and product quality by increasing welfare and lower the risks of diseases in the farmed species -seemed to be the priority almost everywhere in the Atlantic Northeast (FAO Area 27) and in the coastal waters of Morocco (FAO Area 34) (Fig. 8). Other relevant issues in these areas were related to open-sea aquaculture and the evaluation of impacts induced on farmed species by other human activities or environmental stressors (e.g., climate change, ocean acidification, algal toxins). In the remaining areas of the Atlantic Northeast (Iceland Grounds, central and southern Baltic Sea, Bay of Biscay, Portuguese waters) and in the Mediterranean and Black Sea (FAO Area 37), instead, the most addressed topic was Aquaculture development and/or management, which comprises either projects aimed to push the sector's production by implementing larval rearing for already farmed and new species and projects dealing with the management of aquaculture and its sustainability, including marine spatial planning. Additional relevant topics are Engineering (i.e., technological development of aquaculture systems both at open sea and land), and Animal welfare and/or health, the latter limited to the Black Sea. In contrast, very few projects dealt on the implementation of integrated multitrophic aquaculture and offshore integrated platforms. Also, the assessment of impacts induced by aquaculture on the marine ecosystem www.nature.com/scientificdata www.nature.com/scientificdata/ did not appear a priority. Fish appeared the most investigated taxonomic group followed by shellfish (molluscs and crustaceans) and algae. A very low number of projects targeted other low trophic organisms, e.g., ascidians, sea cucumbers, jellyfish, krill.

Technical Validation
The data collected through the questionnaires were checked against additional sources such as, for example, CORDIS website for EU-funded projects and single project's websites. This process, in parallel with the data harmonisation, was manual and very time consuming.
The webGIS itself was developed and used to validate the dataset, as users could submit their own editing through the dedicated update module (Fig. 9) while querying the database. Each proposed online update, as well as each new entry, needed to be validated (through a cross-check among different databases and project's websites) and approved by database administrators prior to becoming permanent and available on-line. When updating a pre-existing project, online users can update the information already reported in the database and/or add new ones in the empty fields.
Less useful was the dedicated email address, as only a few users reported to the DB administrators with incorrect or missing information.
In the overall, all the process (data collection, verification, harmonization) required more than 8000 hours.   www.nature.com/scientificdata www.nature.com/scientificdata/ Innovation policy aimed to strengthen the scientific and technological development of the Union and foster its competitiveness, including in its industry, in a context of sustainable growth 12 .
For instance, scientists who are going to draft research proposals could examine the data to be informed on the state-of-art in the field of interest in terms of topics addressed and geographical areas covered in Europe. It would allow them to contextualise and make their research innovative, e.g., they could verify which are the most targeted species or the most updated technological developments in the aquaculture sector.
Searching on the dataset also allows scientists to create networks with other research institutes and/or universities and private companies working in the same field, thus encouraging sharing of knowledge and known-how, benchmarking and cooperation among different actors on strategic research issues, and improving an efficient use of resources among projects. On the other hand, private companies could query the dataset to search for research institutes able to support their R&D team in the development of new technologies and products.
The dataset could be also used by policy makers to identify potential experts to be involved in scientific advisory bodies called to provide advice and support decisional processes, e.g. Scientific, Technical and Economic Committee for Fisheries (STECF), International Council for the Exploration of the Sea (ICES) and General Fisheries Commission for the Mediterranean (GFCM). Moreover, it could be key to verify available information on short-term needs and to identify gaps to be addressed in the short-time through scientific advice studies (e.g., call for tenders and call for proposals) or in the long-term through research projects supported under national/ international research framework programmes.
In spite of its large coverage in terms of projects funded under different funding programmes, geographical areas and countries, the dataset presents however some limitations to be taken into consideration by users. First, it mainly covers projects involving the countries participating in the COFASP ERA-NET and the ERA-NET COFUND BlueBio (i.e., Belgium, Croatia, Denmark, Estonia, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Latvija, Malta, Netherland, Norway, Portugal, Romania, Spain, Sweden, Turkey and the United Kingdom). Second, it was not possible to update the database in respect to the national projects funded by France, Netherland, United Kingdom and Turkey because these countries joined the COFASP ERA-NET but not the ERANET COFUND BlueBio.
To geographically reallocate projects, shapefiles of marine areas (FAO Fishing Divisions and FAO-GFCM GSAs) can be downloaded at: https://www.fao.org/fishery/area/search/en 13 and https://www.fao.org/gfcm/data/ maps/gsas/en/; while vector data of world countries are available in several public domain map datasets such as Natural Earth (https://www.naturalearthdata.com/).

Code availability
From the BlueBio PostgreSQL database, the .CSV dataset was extracted via queries through the free software environment R 14 , using the dbConnect function from the RPostgreSQL package 15 , a database interface and 'PostgreSQL' driver for R. The related R code (dbconnect_csv.R) is freely available through figshare 11 .
The maps in Figs. 2, 5, 7, 8 were created using several R packages like tidyverse 16 for data handling, rgdal 17 package providing bindings to the "Geospatial Data Abstraction Library", sf package, a standardised way of encoding spatial vector data in R and the package ggplot for graphical visualisation. The R code (scidata_workflow.R) and the additional data layer for the creation of all the reported figures and summary statistics are deposited in the figshare 11 public repository. This could facilitate data re-use and analysis.