Intruduction

The ban of one of industry’s main solutions against corrosion, i.e. compounds based on hexavalent chromium1, has recently started in Europe due to health and environmental issues, resulting in a need to find effective replacements2. The large and growing amounts of reported corrosion inhibition efficiencies existing in literature and obtained over the years as a result of this event is expected to keep increasing in volume. Moreover, the development of high-throughput testing methodologies3,4,5,6,7,8, has allowed to obtain significant databases in shorter timeframes for different substrates, application conditions and molecular structures3,9,10. This has called for the development of a data driven application, such as the one developed in this work, which will allow academic and industry researchers to swiftly select the most adequate condition specific corrosion inhibitor to be embedded directly into protective coating systems or through smart nanocontainers. We envision the CORDATA app to be the first step in the corrosion inhibitor selection process before going to the laboratory to perform further research and development activities. Although there are many accounts in literature focusing on corrosion inhibition efficiencies, to the best of our knowledge, this is the first web application dealing with data management for this particular issue. It allows to more efficiently compare many different data sources at the same time, thus making it easier to find appropriate solutions that were already tested experimentally, but that might be lost in the middle of a large volume of experimental data obtained in the past. Moreover, the dynamic nature of a data management web application, will allow it to grow in size and evolve in functionality throughout the years, adapting to the needs of the corrosion science community to better solve societal challenges through open data.

Results and discussion

Data driven technologies and machine leaning are among the latest developments and most promising approaches in corrosion science to guide the discovery and design of more effective and environmentally benign corrosion inhibitors and protective coating systems3,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25. However, one of the main challenges dealing with the application of machine learning to understand and design protective systems is building the datasets required for training the predictive models26,27. The collection of experimental data, as well as data management and curation, are among the most time-consuming tasks in the machine learning workflow. Therefore, a web application like the one presented herein, will fulfill two main purposes: (1) it can be used by scientists and engineers working in academia and industry to quickly compare the performance of different corrosion inhibitors and select the most appropriate condition specific corrosion inhibitor for each application; and (2) it will provide a framework to organize cured datasets for different substrates which will trigger further machine learning and data driven developments to design corrosion inhibitors.

A general view of the CORDATA application can be seen in Fig. 1 and accessed free of charge through the following url: https://datacor.shinyapps.io/cordata/. The web application was designed to work on personal computers, tablets and mobile phones, and includes several different functionalities (Fig. 2), such as: (1) search for the appropriate application conditions, such as the type of metal and alloy, the possible synergistic combination of inhibitors, the minimum efficiency, select a range of temperature and pH, and a minimum aggressive salt concentration; (2) quickly check the inhibitor structure and the reference used to obtain its corrosion inhibition efficiency; (3) search specific corrosion inhibitors through an internal search engine; (4) select and compare other properties and aspects of the data, such as the molecular weight, SMILES notation, measurement time, corrosion inhibitor concentration, synergistic inhibitor concentration, experimental methodology, literature reference, and name and institution of the contributor that added each specific data entry; and (5) a user interface with detailed instructions is available for users to submit additional data, request the whole dataset or provide their feedback. A spreadsheet template file can be downloaded for users to include their own data, while the whole updated dataset will be available to contributors, to be used in their own machine learning and data driven research.

Fig. 1: Graphical user interface (GUI) of the CORDATA web application.
figure 1

The CORDATA GUI (https://datacor.shinyapps.io/cordata/) includes digital features to search and select the data, according to intended application conditions of the corrosion inhibitors, as well as the main table with corrosion inhibition efficiencies and corresponding measurement properties.

Fig. 2: Main features of the CORDATA web application.
figure 2

The CORDATA application includes (1) digital features to search the appropriate application conditions of the corrosion inhibitors, (2) the possibility to check the inhibitor structure and the respective literature reference, (3) an embedded search engine to search for specific corrosion inhibitors, (4) an option to visualize additional properties, and (5) instructions for users to contribute with data and feedback.

At the time of this publication nearly five thousand corrosion inhibition efficiencies and almost four hundred compounds have already been added to the database. The data originates from more than one hundred and twenty publications, mainly for aluminum, copper, magnesium, iron and their main alloys. More specific information about the data included in the database can be found in Table 1.

Table 1 Overview of the data included in the CORDATA database for different metals and respective alloys.

The total number of efficiency values and compounds are already in a sufficient amount to find efficient corrosion inhibitor solutions for a broad number of application cases and conditions, thus it is expected to be immediately helpful for corrosion scientists and engineers working on the design of more efficient corrosion protective systems. Nevertheless, the data currently included in the application is still only a small part of all the information existing in literature. This number will increase over the years, as more data will be added by the authors and by other research groups that see value contributing to the database, while the web application gains traction among the corrosion science community.

Methods

The open data management web application developed in this work was built using the R programming language28, which is a free coding framework for statistical computing and graphical representation. In particular, it employed mainly the Shiny package29, which makes it easy to build interactive web applications from R.