Long developing period and cumbersome evaluation for the lubricating materials performance seriously jeopardize the successful development and application of any database system in tribological field. Such major setback can be solved effectively by implementing approaches with high throughput calculation. However, it often involves with vast number of output files, which are computed on the basis of first principle computation, having different data format from that of their experimental counterparts. Commonly, the input, storage and management of first principle calculation files and their individually test counterparts, implementing fast query and display in the database, adding to the use of physical parameters, as predicted with the performance estimated by first principle approach, may solve such setbacks. Investigation is thus performed for establishing database website specifically for lubricating materials, which satisfies both data: (i) as calculated on the basis of first principles and (ii) as obtained by practical experiment. It further explores preliminarily the likely relationship between calculated physical parameters of lubricating oil and its respectively tribological and anti-oxidative performance as predicted by lubricant machine learning model. Success of the method facilitates in instructing the obtainment of optimal design, preparation and application for any new lubricating material so that accomplishment of high performance is possible.
Lubrication materials are the essentially key materials to improve the functionality of new energy vehicles, aerospace, marine ships and intelligent machineries1,2,3, and their further development. Much of research work on problems associated with lubricating materials has been carried out, which results in abundant simulation or test data4,5,6,7. However, these individual data are currently still not being effectively utilized, and difficulty in evaluating performance of lubricating materials in various industrial applications is still widely remaining as a major problem. A way to relieve such problem may be combining technique of first-principles high-throughput calculation with the use of database embedding massive research information and analytical tools. This, integrating with the possible component of screening or searching, according to specific needs, gears the trend in developing and/or exploring new materials8. Commonly, the particularity of analog computation often results in problematic files of data structure, and data storage, and its processing of data calculated by first-principles, which are barriers of database platform and such problem requires to break through effectively.
Establishment of an effective and efficient data structure generally requires the premise for building up an appropriate database platform. This, in turn, inevitably affects the structure and techniques on data storage, data retrieval, and machine learning9. In practically operational process, it often needs to adjust/compromise the logical and physical relationship between the real time data. Typically, most simulated data are just stored in local hard disks and likely to be erased carelessly. Hence, choosing some storage mode to give a long-term and secure storing is thus necessary and recommended. Establishment of a database platform not only can solve the problem of massive simulated data generated by calculation, it also helps to standardize and/or integrate the simulated and tested data so as to improve the effective integration and utilization of the data.
Input and output to these files of first principles usually contain large amount of vital information and rather complex contents. Major information like energy values, fermi energy values and eigenvalues must be suitably and efficiently extracted and re-stored for further viewing, Furthermore, feasibility in displaying information such as band structure and spectrum often varies with the capacity of the processing graphics software. Therefore, output visualization is also one of the problems to be solved in the procedure of establishing database architecture.
Aiming at solving the problems as stipulated above, some researchers have accomplished partly the work in constructing of First Principle database: typically, AFOW10, Materials Project11, MedeA12, MatCloud13, NOMAD14 and the software associated with Materials Informatics Platform (MIP)15. Among these, the first four platforms are known as high-throughput platforms built with the first-principle VASP software packages, which are capable providing data to be shared between computing platforms and calculation results. The software is mainly embedded with inorganic crystal structure data, and it has some constraint on the selection of first-principles calculation software, moreover, lacks the support of experimental data. Although NOMAD is equipped with the supports to the users in uploading and downloading input and output files for various computing software processes, its data analysis is not powerful enough. However, Materials Informatics Platform is generally considered as a computing platform for thermoelectric materials, a series of consultations suggests that its database platforms are lacking of lubrication data of material and its simulated data are unable to combine with their experimental counterparts. That is, there does not solve the problems of diverse/sparse data storage and information extraction. It also needs conducting associated theoretical study to correlate the calculated physical parameters to the properties of materials. Hence, predictive research on material performance becomes critical in accelerating its development. Such realization resulted in the launch of Material Genome Project in the United States in 2011. As the research methods involve with high throughput experiments, high performance computing and data depth analysis, it has drawn many scientific researchers and substantial resources into the project, specifically working in exploring the method of data mining and performance prediction16,17,18,19. Its nature of massive data accumulation, combining with material performance prediction, has led to many new materials to be discovered and developed.
In view of the above mentioned, this paper specifically utilizes Window system to construct an integrated development environment of Hypertext Preprocessor (abbreviated as: PHP)/MySQL and PHP/MongoDB. To accomplish such purpose, a Structure-Performance database of lubricating materials has been built to store data and analyze the information achievable from high throughput calculation and experiment. The functionalities of the preprocessor and the database fundamentally consist of: uploading and inputting data (both tested and simulated ones) in different formats; searching information by simply keying in keywords or parameters; predicting performance via first principle calculations. By adequately studying the relationship of the calculated parameters and their corresponding performance of lubricating materials, it allows the establishment a machine learning prediction model preliminarily for tribological and anti-oxidative properties of the lubricating oils so that relatively more accurate and meaningful prediction can be achievable.
Architecture of Lubricating Materials Database
Construction of integrated development environment for high-throughput computing
For the data processing requirements of high throughput computing and research and development of lubricants, a stable integrated development environment for PHP based on Apache service protocol under Windows environment (Fig. 1) has thus been built. MySQL (relational database) and MongoDB (non-relational database) are generally considered as the integrated environmental platform. In addition, the communication between (i) PHP and MySQL database, and between (ii) PHP and MongoDB database can be realized by suitably configuring. The dynamic link of such two databases often results in a variety of storage forms on specifically importing the test data and the high-throughput calculation results, which commonly has inclination to retain their individually original data20. Furthermore, the conversion formats between different data often facilitate the retrieval of these data.
Construction of front-end and back-end system framework for database website
Backstage management system
The background management system is a prerequisite module of database and has great versatility for its internal users. In this design, the background management system remains the same interface as the front-end system interface. Owing to the needs in development stage, background management system is effectively categorized into three types, which are typically as data management, user management and system management. Such data management system is mainly used in managing data and users, such as: data deletion, user addition and deletion, role management, setting user permissions, and so on.
Front-end application system
The pages of front-end management are in fact interactive interface mainly gearing the communication between users and personalized according to their individually analytical levels. Their typical welcome interface of a database website is shown in Fig. 2, which gives a brief illustration of a website, and its login page and registration page for its users. Protection for user login and registration can be realized by the security functional block in the website data. Moreover, users are restrained to log in repeatedly so as to enhance resource saving and to improve the accessing speed.
Construction of functional module system
Logical relationship of functional module system
The structural layer and functional modules of the system are shown in Fig. 3. Basically, one-to-many tree structure is adopted as the logical structure of data, and in its physical structure like order, connection and index. Structures with distinct layers are also made conducive so that realization of design schemes is possible. As seen in Fig. 3, its functional modules are fundamentally divided into three parts, which typically consist of: (i) Data input module, (ii) Search module, and (iii) Prediction module.
Relationship schema/ entity-relationship for the MySQL database
The MySQL database is established with (i) table names, (ii) table Structure, (iii) fields, (iv) field types, and (v) so on. Furthermore, the optimization of database is also very important for its acceptability to users. When the structure of MySQL is not well designed, its efficiency in developing an encoding process is significantly jeopardized. Figure 4 illustrates the relationship schema/entity-relationship diagram and the interconnection of individual components within this proposed MySQL database.
Implementation scheme of functional module system
Figure 5 shows the interfacial connection of a function module in the lubrication material database. The realization of its function of inputting data for both test and analog calculation, together with querying components for (i) keyword and parameter, and (ii) prediction of material properties can be simply achievable by clicking different module boxes in the system. The remaining sections in this paper briefly introduce the specific functions of individual modules.
Data input module: Standardization of data formats is aimed at facilitating data retrieval and improving utilization efficiency. Diversification of operating software is often derived from series of testers and high throughput computing, which recursively leads to type diversification of output files. Data input modules in this paper are specifically designed to include inputting data from both experiments and/or those files calculated by first-principles, which commonly fuse together the values of physical parameter and service performance of the associated materials. As inputting test data is only supported by Excle tables in standard format (Fig. 6(a)), while inputting analog data is supported by TXT documents (Fig. 6(b)), the process of importing data can be routed orderly as follows: (i) entering name of the compound; (ii) selecting upload file from the personal computer, and (iii) clicking “Submit” to complete upload. Thereafter, the system will automatically connect MySQL database and MongoDB database in PC, and it stores test tables in MySQL database, and results of high throughput calculation in MongoDB database, filtering and retrieving key data in the background of the system, and finally by inserting the associated data into the MySQL database (Fig. 7). Its structural transformation mechanism for filtering, extracting and inserting operations basically involves with: (i) converting the obtained string into an array, (ii) selecting the variable and value to be extracted from the array, (iii) re-creating a new array with the variable consistently with the inserted table header, and (iv) updating the variable and value on MySQL database. Effectively sorting and integrating relational and non-relational databases not only meet the requirements for processing special data and for high-throughput computing, it also provides capability for data query and management.
Search module: Query system allows realization of keyword query and physical parameters query, through transmitting the database to SQL statements.The statement of keyword query is: select * from search_keyword where full_formula = ‘$_POST[keyword]; Queries for performance and physical parameters must be implemented in the form of fuzzy queries, such as queries for lubricating materials that conform to a particular range of dipole moments: select * from data entry where $_POST [keyword] between $_POST [a] and $_POST [b]. Example of the results of test data of PAO and first principles display of graphene query by keywords is as illustrated in Fig. 8, whilst Fig. 9 is demonstrating the example in the query results of inputting performance (e.g. wear) or physical parameters (e.g. density), and also their corresponding searching range, which allows the finding of corresponding compounds from its possibly variable.
Prediction module: Series of first principle theoretical methods in predicting material properties are mainly based on Schrodinger equation. Hence, the target of its database construction needs to have capability in screening material and predicting material performance. It is thus crucial to establish a relationship model between parameters and properties using Python language on the basis of data mining technology. Thereafter, an interface connecting Python to PHP has built, and the prediction of performance of lubricating material has been realized by suitably transferring the relational model to the prediction group data. Such prediction system should thus be equipped with capability in predicting the physical and chemical performance, anti-oxidative performance and tribological performance of the associated lubricating materials.
In the operation of this developed database system, the user is expected importing the results of lubricating material, initially obtained by first-principles calculation, which are followed by clicking “prediction” button to terminate the inputting action. Figure 10 tabulates the predictions of the friction coefficient obtained after introducing the simulation data of the lubricating oil.
Performance Prediction Method of Lubricating Oil Based on Machine Learning
As material structure always affects its properties and performance, a large number of experiments illustrates that the effects of existence of unsaturated bonds in lubricating oil molecules to its low temperature performance21,22,23,24. Studies also indicated that length and breakage of carbon chain often change its viscosity and viscosity index, and the branching of ester carbon chain improves its hydrolytic stability. Fracture failure of O-H bond and N-H bond in additives also likely leads to rapid deterioration of lubricating oil performance. In 2016, an article in Nature reported a prediction method specifically focusing on simulated physical characteristics (typically like: charge mobility, photovoltaic characteristics, etc.), which provides guide to synthesis of target materials having specific functions of machine learning25. This serves to demonstrate the important role that machine learning plays in discovering the relationship between microstructure and material properties which are normally not revealed in a test process.
Implementation of first principles calculations to software like VASP26, Materials Studio27 and CASTEP28 has been carried out. It has also used Materials Studio simulation software to calculate physical parameters of the lubricating additives in this article, and its outputs are imported into relevant modules in the database, machine learning models within Tensorflow for predicting the associated performance.
Calculation of quantum mechanics parameters of lubricating oil
Molecular model of additives can be established in Materials Studio. However, the establishment process requires the performance of structure optimization of additives by the firstly use of Forcite package29 and then DeMol 3 package30. After the completion of geometry optimization, it uses Demol 3 package to calculate the anticipated parameters. Figure 11 shows six of these typical molecular models.
Quantum parameters of molecules (such as: molecule surface area, molecule energy, molecule volume, dipole moment, energy orbital, etc.) are subsequently calculateing to the Formula 1, in which X represents the calculated values, Xmin and Xmax represents the minimum and the maximum value in calculations, and Xnorm represents the normalized values. Normally, the normalized parameter values are basically used to build the necessarily relational model.
Construction and importation of machine learning model between feature parameters and wear
Wear data of the 36 groups of lubricants used for constructing the model in Sec. 3.1 are all from the doctorate thesis of Junyan Zhang31. Base oil used in the test is liquid paraffin. Friction and wear tests were carried out on a standard four-ball tester running at a rotational speed of 1450 rpm with test time of 30 min, and its wear scar diameter has been obtained for wear volume estimation32. As mentioned previously, a linear regression machine learning model for characteristic parameters (low orbital energy and dipole moment) of lubricating oil and wear has been built by using Tensorflow (Fig. 12). Among the 36 groups, 29 groups are classified as training group and 7 other groups are taken as prediction group. Both training group and other data set are randomly selected from the 36 groups of data, which have been properly numbered. The data imported to training group is by Excle form and with data in the 29 groups, while the other 7 prediction groups are purposefully used for verification there validity.
The subsequently obtained training model with data from training groups enables defining weights, building cost functions, and adopting gradient descent methods. The model is invoked through the database platform, whilst the prediction counterpart is used to substitute the data from model for predicting wear volume. For facilitating the analysis, comparison of the predicted wear with its actually practical wear, has also been conducted, and their corresponding predicted accuracy is tabulated in Tables 1–3. The loss plot in Fig. 13 compares the discrepancy between training and verification.
Construction and importation of machine learning model between feature parameters and oxidation onset temperature
The oxidation onset temperature (OOT) data of the 17 groups of lubricants used for constructing the model are fundamentally taken from Shengpeng Zhan’s article33. Base oil used in the test has been ester oil TMPTO. Thermal oxidation test is carried out on a differential scanning calorimeter (DSC). The use of Tensorflow (Fig. 10) allows a linear regression machine learning model simulating characteristic parameters of lubricating oil and wear to be built. The characteristic parameters are basically including molecular energy, low orbital energy, LUMO-HOMO energy, dipole moment and fat-water partition coefficient. 13 groups of the parameters are nominated as training group and 4 other groups as prediction group. Data in individual sets are imported in Excle form, and subsequently their training relationship model is acquired post of many training iterations. The model is generally invoked via its database platform, and thereafter the prediction group is substituted to the model for predicting initial oxidation temperature. Analysis is therefore initiated by the predicted initial oxidation temperature which is then compared with its actually experimental values, and their predicted accuracy can be seen from Table 4.
Lubrication is a core technology to support the advanced manufacturing, to ensure smooth operation of machinery, and to achieve energy saving. However, its process in developing the technology for accomplishment of high-performance lubricating material still relatively slow. It thus needs urgent innovation and concept in improving material design revolutionarily so that its practical significance can be achievable in developing efficient lubricating materials.
This paper is initiated on the basis of research idea on material genetic engineering for carrying out performance prediction of materials so as to meet extreme service performance. To accomplishing such purpose, technique in producing software of database for integrating the simulation calculation and experimental data, composition of the lubrication material-structure and physical parameter-lubrication performance is thus proposed. The technique combines database, data mining and other machine learning methods together. The anticipated database platform for lubricating materials has initially established by considering the following factors and components.
A database platform with high throughput computing results and test counterparts, and high capability to store and analyze the relevant data has thus been constructed based on Web, which can effectively realize data entry, data query and performance prediction of lubricating materials.
Software combining database with machine learning, the relationship model between calculated physical parameters and properties of lubricating materials has been established to facilitate the prediction of material properties.
The models of calculated physical parameters and wear rate, calculated physical parameters and oxidation onset temperature of lubricating oils were also constructed for predicting tribological properties and anti-oxidative properties of lubricating oils. Comparison of experimental results with the data predicted from model has shown its high level of conformability between prediction and test data.
Wang, R., Gao, D. Q., He, N. R. & Wang, Z. Research Progress of Oxide Lubricating Materials. Surface Technology 46, 127–133 (2017).
Moon, S. M., Cho, Y. J. & Kim, T. W. Evaluation of lubrication performance of crank pin bearing in a marine diesel engine. Friction. 4, 464–471 (2018).
Cai, M. R., Guo, R. S., Zhou, F. & Liu, W. M. Lubricating a bright future: Lubrication contribution to energy saving and low carbon emission. Science China Technological Sciences 56, 2888–2913 (2013).
Fouts, J. A., Shiller, P. J., Mistry, K. K., Evans, R. D. & Dolla, G. L. Additive effects on the tribological performance of WC/a-C:H and TiC/a-C:H coatings in boundary lubrication. Wear 372-373, 104–115 (2017).
Shoaib, T. et al. Stick-slip friction reveals hydrogel lubrication mechanisms. Langmuir the Acs Journal of Surfaces & Colloids 34, 756–765 (2018).
Waara, P., Hannu, J., Norrby, T. & Byheden, A. Additive influence on wear and friction performance of environmentally adapted lubricants. Tribology International 34, 547–556 (2001).
Njiwa, P. et al. Zinc Dialkyl Phosphate (ZP) as an Anti-Wear Additive: Comparison with ZDDP. Tribology Letters 44, 19–30 (2011).
Fan, X. L. Materials Genome Initiative and First-Principles High-Throughput Computation. Materials China 6, 689–695 (2015).
Denney, M. J., Long, D. M., Armistead, M. G., Anderson, J. L. & Conway, B. N. Validating the extract, transform, load process used to populate a large clinical research database. International Journal of Medical Informatics 94, 271–274 (2016).
Curtarolo, S. et al. AFLOW: An automatic framework for high-throughput materials discovery. Computational Materials Science 58, 218–226 (2012).
Jain, A., Ong, S. P., Hautier, G. & Chen, W. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials 1, 011002 (2013).
Rozanska, X. et al. High-Throughput Calculations of Molecular Properties in the MedeA Environment: Accuracy of PM7 in Predicting Vibrational Frequencies, Ideal Gas Entropies, Heat Capacities, and Gibbs Free Energies of Organic Molecules. Journal of Chemical & Engineering Data 59, 3136–3143 (2014).
Yang, X. Y. et al. MatCloud, a high-throughput computational materials infrastructure: Present, future visions, and challenges. Chinese Physics B. 27, 108–115 (2018).
Draxl, C. & Scheffler, M. NOMAD: The FAIR concept for big data-driven materials science. MRS Bulletin 43, 676–682 (2018).
Lin, X., Xi, L. L. & Yang, J. First Principles High-throughput Research on Thermoelectric Materials: a Review. Journal of Inorganic Materials. 34, 6–16 (2019).
Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R. Accelerating materials property predictions using machine learning. Scientific Reports 3, 2810 (2013).
Lakshmi, A. A. et al. Prediction of mechanical properties of ASS 304 in superplastic region using artificial neural networks. Materials Today: Proceedings 5, 3704–3712 (2018).
Attarian Shandiz, M. & Gauvin, R. Application of machine learning methods for the prediction of crystal system of cathode materials in lithium-ion batteries. Computational Materials Science 117, 270–278 (2016).
Inokuchi, T., Li, N., Morohoshi, K., & Arai, N. Multiscale prediction of functional self-assembled materials using machine learning: High-performance surfactant molecules. Nanoscale, 10, https://doi.org/10.1039/C8NR03332C (2018).
Shi, Y., Wang, F., Li, P. P. & Liu, Y. B. The Study of the Data Storage and Retrieval for the Massive Data of MUSER Based on Cassandra. Astronomical Research & Technology 15, 361–368 (2018).
Erhan, S. Z., Sharma, B. K. & Perez, J. M. Oxidation and low temperature stability of vegetable oil-based lubricants. Industrial Crops & Products 24, 292–299 (2006).
Cheng, B. X. et al. Raman Spectroscopic Analysis of Ester Base Oil During the Thermal Oxidation Process. Journal of Instrumental Analysis 36, 507–512 (2017).
Zheng, Z. et al. Synthesis, hydrolytic stability and tribological properties of novel borate esters containing nitrogen as lubricant additives. Wear 222, 135–144 (1998).
Cheng, B. X. et al. Effect of Antioxidants on the Oxidation Resistance of TMPTO under High-temperature Friction. Lubrication Engineering 42, 17–22 (2017).
Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).
Ma, S. G. et al. Exploring the catalytic activity of MXenes Mn +1C nO2 for hydrogen evolution. Journal of Materials Science 54, 11378–11389 (2019).
Jin, Y. L. et al. Synthesis of a Multi-phenol Antioxidant and Its Compatibility with Alkyl Diphenylamine and ZDDP in Ester Oil. Tribology Letters 67, 58 (2019).
Zhang, R. H., Leng, S. L., Yang, Y. C., Shi, W. & Lu, Z. B. Atomistic simulation of the mechanical properties of β-SiC based on the first-principles. Physica B: Condensed Matte 512, 1–5 (2017).
Sim, J. et al. Gas adsorption properties of highly porous metal-organic frameworks containing functionalized naphthalene dicarboxylate linkers. Dalton Transaction 43, 18017 (2014).
Dance, I. Evaluations of the accuracies of DMol3 density functionals for calculations of experimental binding enthalpies of N2, CO, H2, C2H2 at catalytic metal sites. Molecular Simulation 44, 1–14 (2017).
Zhang, J. Y. Structure-activity relationship of additives and mechanism of boundary lubrication. Lanzhou Institute of Chemical Physics. (1999).
Wang, T. T., Dai, K., Whang, Z., Peng, H. & Gao, X. L. A Quantitative Structure Tribo-ability Relationship Model for the Antiwear Properties of N/S-containing Heterocyclic Lubricant Additives using Back Propagation Neural Network. Tribology 37, 495–500 (2017).
Zhan, S. P. et al. Studies of antioxidant performance of amine additives in lubricating oil using 3D-QSAR. Science China (Technological Sciences) 60, 299–305 (2017).
This research is financially Supported by the National Key Research and Development Project under the Grant no. 2018YFB0703801 and the National Natural Science Foundation of China under the Grant no. 51905385.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Jia, D., Duan, H., Zhan, S. et al. Design and Development of Lubricating Material Database and Research on Performance Prediction Method of Machine Learning. Sci Rep 9, 20277 (2019). https://doi.org/10.1038/s41598-019-56776-2