Introduction

The interior of a deep-sea manned submersible is confined and narrow. The special operating environment requires the submariner to judge and control the mission with experience and feedback from the equipment in a harsh and complex space. The HMI in the submersible mainly consists of the interface display and the console operation, which is the main medium of interaction between the submariner and the machine. Studies have shown that the main cause of HMI accidents is human factors1. The confined space, complex structure and high operational precision required inside the submersible make it extremely prone to human factors errors. Scholars have investigated the human reliability of HMI in many ways, such as Evica's analysis of the HMI in the cockpit of an aircraft using a systematic human factor error prediction method, which showed that low human reliability of the interface was the main cause of human errors in pilots2. Zhang et al. provided quantitative assessment data for the interface human reliability analysis of ship navigator by building an improved CREAM (Cognitive Reliability and Error Aanalysis Method, CREAM) model3. Zhang et al. optimises the design of the navigation interface of a deep-sea manned submersible to improve its interface usability from a human factor perspective4. Zhang et al. assessed the human reliability of submariners by establishing the probability of cognitive errors as a function of efficiency in a thermal environment5. Miao et al. decomposes the interactive interface of the manned submersible and optimally designs a new interface system from the perspective of optimal operational process6.

Methods of studies on human reliability are equally numerous, for example, Yuan et al. proposed a new controller interface-oriented human reliability analysis method based on the Delfino method and Bayesian networks, and the reliability of the method was verified by combining with actual cases7. Zhang et al. identified human reliability assessment models for the HMI by improving the CREAM method between controllers and pilots and between dispatchers and pilots, respectively8, 9. Hao et al. established a pilot human error analysis model from qualitative and quantitative perspectives based on the human reliability design requirements of the HMI in the aircraft cockpit, combined with the basic theory of cognitive behavior10. Wang et al. enhanced aircraft cockpit HMI human reliability by constructing a CPC (Common Performance Condition) effect-based fuzzy set and extending CREAM to calculate pilot cognitive failure probability11. Zhu et al. used fuzzy deduction and BP neural network and forward–backward algorithms to implement the reliability calculation of the HMI in the cockpit landing phase of a civilian aircraft12. The essential behavioral formation factors (PSF, Performance Shaping Factors) that affect human reliability have also been investigated. Kim et al. pointed out that the study of PSF in different contexts could significantly improve the human reliability of HMI in nuclear power plants13. Liu et al. redefined four types of PSF for nuclear power plant control rooms based on the expert correction method and successfully reduced the probability of human errors14. Yeong et al. used the CREAM analysis method to analyse HMI PSF in nuclear power plants, and the results showed that optimal HMI design and adequate training helped to improve operator performance15. Liu et al. established a basis for quantitatively studying the causal relationships between PSF by improving the Standardised nuclear Power plant Risk Analysis-Human reliability analysis (SPRA-H) method16. Yang et al. constructed a Bayesian network to predict controllers' probability of human error in multiple tasks using air control behavior formation factors as root nodes, and the results showed that Bayesian networks are more advantageous in studying this problem17. From the above, most scholars have analysed the essential PSF affecting human reliability, but few have studied the interactions between PSF. Bandeira pointed out that correlations between PSF are prevalent in complex civil air transport systems and that they have a significant impact on pilot performance and the success or failure of tasks related to flight procedures18. Obviously, the exploration of correlations between PSF is also one of the keys to improving the human reliability of HMI, but few studies have been carried out on the reliability of HMI for deep-sea submersible. Previous studies have only compared the sensitivities of different types of PSF, which not only lack comparability between the data, but also the conclusions obtained were not convincing. In the traditional study of PSF, only a single dimension is considered to affect human factor reliability. Most studies on human factor reliability were conducted in the dimension of "human" or "machine"16. Compared to analysing PSF from the perspective of individual factors, the use of different dimensional analyses allows for a better identification of the influential interactions between the various factors.

This study investigates the interactions between the factors that affect the HMI PSF of a deep-sea manned submersible. A more comprehensive and systematic evaluation method is built to improve the HMI human reliability of manned submersibles. It provides a more scientific and effective guidance for the design of the HMI while improving the operational efficiency of deep-sea manned submersible.

Method

HMI PSF for submersibles

The cockpit of an aircraft and a deep-sea submersible are both confined and complex human–machine environments. In 1972, Edward first proposed the principle of a specific system interface for "human" in safety work, which consists of the following elements: Software, Hardware, Environment and Liveware19. The initials of these four elements are used to represent the SHEL model. Errors tend to occur at the central point of contact between human and hardware, software, environment and liveware. The model depicts the vulnerability of modern production and is a direct guide to safety work. The interfaces described are not only found on the front line, but at all levels of the production organization, so the model is universally relevant. Based on the definition of the SHEL model, this study divided the elements covered by the submersible HMI into four aspects: system staff (L), system software (S), system hardware (H) and system environment (E), and the assessment was determined as a study of the interaction between L–L, L–S, L–H and L–E. A summary of the navigation-related literature and an interview survey with experts in the field of navigation yielded a total of 28 PSF, as shown in Table 1.

  1. (1)

    L–L: Study of the interactions between submariner and team members in terms of information exchange and operational collaboration capabilities.

  2. (2)

    L–H: Study of the interaction between submariner and hardware operational equipment.

  3. (3)

    L–S: Study of the interactions between submariner and software interfaces.

  4. (4)

    L–E: Study of the interactions between submariner and the operating environment of the submersible's working chamber.

Table 1 Deep-sea manned submersible HMI PSF summary.

A system of PSF for submersible HMI

A questionnaire was used to investigate and analyse the 28 PSF obtained to build a HMI human factor reliability PSF system for deep-sea manned submersible.

Questionnaire study

The questionnaire was administered to those who had experience in operating deep-sea submersibles (i.e. submariners, submarine trainees in training, etc.), were all male and had an average age of around 37 years old. The main information in the questionnaire consisted of basic information and PSF on human reliability, using a 5-point Likert scale, with 1 being "minimal impact" and 5 being "great impact". A small pre-sample survey was conducted to ensure the validity of the questionnaire before distribution. Before completing the questionnaire, we informed all participants of the purpose of the study and had them sign the questionnaire informed consent form. We prepared a small gift for each participant who completed the questionnaire. A total of 260 questionnaires were returned, of which 243 were valid, and a reliability check was conducted on the returned questionnaires to ensure the validity of the data. The demographic information from the questionnaire was shown in Table 2.

Table 2 Demographics of the questionnaire.

Usability testing of questionnaire

The reliability coefficient of the questionnaire as a whole was calculated by SPSS software to be 0.88, which indicates good consistency of the questionnaire. The same reliability test was conducted for the four pre-defined assessment dimensions in this study, and the results are shown in Table 3.

Table 3 Questionnaire reliability testing.

The results in the table show that the alpha coefficients of the four assessment dimensions are L–L (0.94), L–H (0.86), L–S (0.82) and L–E (0.95), which were all greater than 0.6. According to the reliability test conditions of the questionnaire, the alpha coefficient is greater than 0.6, indicating that the factors present good consistency in all interactive categories and reach the requirements of the reliability test.

The 28 PSF were analysed for association validity with the four dimensions (L–L, L–H, L–S, L–E). The PSF for the four dimensions were S1–S9 (L–L), S10–S17 (L–H), S18S25 (L–S) and S26–S28 (L–E), and the results of the analysis were shown in Table 4.

Table 4 Correlation analysis of PSF with dimensions.

KMO (Kasier-Meyer-Olkin measure of Sample Adequacy) is the value of sampling appropriateness, which can determine the correlation and bias between sample data. The higher the KMO value, the stronger the correlation between the sample data. Bartlett's sphericity test can detect the independence relationship between variables. In this study, the questionnaire data obtained were analysed using SPSS software. The KMO test value for the questionnaire was 0.856 and the Bartlett's spherical test approximate chi-square was 1868.7. The data results obtained reached the requirements of the factor analysis. The initial component matrix was rotated using the maximum variance method to obtain the rotated component matrix. After removing the factors with factor loadings less than 0.6 (PSF number: S1,7,8,12) and multiple loadings greater than 0.2 from the rotation matrix, the data were retested for KMO values and Bartlett's spherical test.

PSF system

The four PSF that did not match the data test results were Physical performance, Reasonable staff selection and deployment, Clear division of labour and responsibility, Display and control device layout. After removing the unqualified data (Sig. P > 0.05), all PSF were renumbered. A final system of PSF containing 4 dimensions was established. This system of indicators reflects the influence of the HMI of deep-sea manned submersibles on the behavioral operations of submariners, as shown in Fig. 1.

Figure 1
figure 1

System of PSF for manned submersibles.

Ethical approvals

The study received ethical approval from the Human Research Ethics Committee of Northwestern Polytechnical University (Ref No: 245/2023). In addition, the Key Laboratory of Ergonomics of the Ministry of Industry and Information Technology of China and the Institute of Industrial Design of Northwestern Polytechnical University approved the use of the research site (Ref No: 24/2023). All relevant guidelines, procedures and regulations were followed. The experts involved in the study provided written informed consent. All participants were informed that they were free to withdraw from the study at any time without consequences.

Model for human reliability evaluation

To identify the effects between the factors, this study combines an interpretative structural model with a Bayesian network to model the interactions of PSF for manned submersibles. Firstly, the interpreted structural model is used to obtain the hierarchical structure and map the model into a Bayesian network to complete the topology. Secondly, the Bayesian network data was populated by obtaining the prior probabilities of the root nodes and the conditional probabilities. Finally, a complete Bayesian network model was built to quantify the strength of the coupling interactions between the PSF.

Interpretative structural models for PSF

ISM can build the correlation relationship between elements and achieve the building of multi-layer ladder models through matrix operations and directed graphs, and then obtain a clear system structure and hierarchy. In this study, we used ISM to sort out the PSF affecting human reliability, and determined the interactions between PSF factors by building reachability matrix. The classification of all PSF levels based on the reachability matrix. The relevant PSF factors were connected through directed arcs to build a ISM of PSF for the HMI of a deep-sea manned submersible, shown in Fig. 2.

Figure 2
figure 2

ISM of PSF.

As shown in Fig. 2, the ISM of PSF was divided into 3 levels. A hierarchical progressive interpretive relationship existed at each level from bottom to top. This study combined the four dimensions of L–L, L–H, L–S and L–E to analyse the model as follows:

  1. (1)

    The direct cause of errors were the first level. In other words, the submariner's fatigue level (S1), knowledge-skills and performance (S3), concentration level (S4), and level of teamwork (S6) in the L–L dimension were the direct causes of human-caused errors of the submariners.

  2. (2)

    The indirect causes of errors were the second level. In particular, the L–L dimension includes the factor of emotional status (S2). The L–H dimension includes the factors of information conveyed through digital interfaces (S7), signs for directions (S8), display and control device layout (S9), seats & chairs (S11), communication equipment (S12), workstation alarm equipment (S13). The L–S dimension includes the factors of integrity of the interface display (S14), reasonableness of the software feedback system (S15), integrity of the software operating procedures (S17), reasonableness of system operation time (S18), emergencies and Preparedness (S19), system security level (S20), system interconnection level (S21).

  3. (3)

    The deeper causes of errors were the third level. The awareness of work responsibility (S5) factor in the L–L dimension. The general layout of the space (S10) factor in the L–H dimension. The adequacy of software system training (S16) factor in the L–S dimension. The all factors (S22S24) of L–E dimension.

Human reliability analysis based on Bayesian networks

Fuzzification of node occurrence probabilities

The model was adjusted using the causal graph correction method33. The final Bayesian network topology based on the interpreted structural model was established, as shown in Fig. 3.

Figure 3
figure 3

The ISM-based Bayesian network.

This study assumed that each node in the network hierarchy consists of two states that have a positive and negative impact on human reliability. The node state settings and meanings were shown in Table 5. The mapping relationship between natural linguistic variables and fuzzy numbers was established using the natural linguistic variables description method, and the correspondence between linguistic variables and triangular fuzzy numbers is shown in Table 6.

Table 5 Meaning of all node states.
Table 6 Correspondence between natural language variables and triangular fuzzy numbers.

Synthesis of fuzzy probabilities

When inviting experts to score, because each expert has a different educational background, knowledge base and level of perception, it can easily lead to conflicting opinions during the group's decision-making process. In this study, the Similarity Aggregation Method (SAM)34 was used to process the expert opinions in order to enable a consensus of expert opinions. The steps of SAM were as follows:

Step 1: Experts' similarity calculations for opinions.

Suppose the set of experts was \({E}_{k}(\mathrm{k}=\mathrm{1,2},\dots ,\mathrm{n})\), and \({R}_{u}\), \({R}_{v}\) were used to represent the opinions of any two experts, then \({\widetilde{R}}_{u}=\left({r}_{u1},{r}_{u2},{r}_{u3}\right)\) and \({\widetilde{R}}_{v}=({r}_{v1},{r}_{v2},{r}_{v3})\), and the similarity function \({S}_{uv}\) of experts \({E}_{u}\) and experts \({E}_{v}\) was shown in the formula (1). \({R}_{u}\) and \({R}_{v}\) were the standard triangular fuzzy numbers for expert opinion. The similarity function takes on a value between 0 and 1, with larger values representing higher similarity. In these formulas \(\mathrm{k}\) is the number of experts. \({R}_{u}\) and \({R}_{v}\) represent the u and v experts, respectively. \({r}_{u1}\) represent the education level of the \({R}_{u}\) expert. \({r}_{u2}\) represent the knowledge level of the \({R}_{u}\) expert. \({r}_{u3}\) represent the perception level of the \({R}_{u}\) expert. \({r}_{v1}\) represent the education level of the \({R}_{v}\) expert. \({r}_{v2}\) represent the knowledge level of the \({R}_{v}\) expert. \({r}_{v3}\) represent the perception level of the \({R}_{v}\) expert.

$${S}_{uv}=1-\frac{1}{3}\sum_{i=1}^{3}\left|{r}_{ui}-{r}_{vi}\right|$$
(1)

Step2: Calculation of the average agreement of experts.

The average agreement \({A}_{A}({E}_{k})\) was calculated using the formula (2).

$${A}_{A}({E}_{k})=\frac{1}{k-1}{\sum }_{\begin{array}{c}v=1\\ u\ne v\end{array}}^{k}{S}_{uv}$$
(2)

Step3: Calculation of the relative agreement of experts.

The relative agreement \({R}_{A}({E}_{k})\) was calculated using the formula (3).

$${A}_{R}({E}_{k})=\frac{{A}_{A}({E}_{k})}{{\sum }_{k=1}^{n}{A}_{A}({E}_{k})}$$
(3)

Step4: The agreement factor for experts \(C({E}_{k})\) was calculated as shown in formula (4).

In the formula: \(w({E}_{k})\) being the weight of the expert; β is the slack factor, β = 0 when no expert weight is considered.

$$C({E}_{k})=\beta \cdot w({E}_{k})+(1-\beta )\cdot {A}_{R}({E}_{k})$$
(4)

Step5: The aggregation of expert opinion \({\widetilde{R}}_{AG}\) is calculated as shown in formula (5).

$${\widetilde{R}}_{AG}=\mathrm{C}({E}_{1})\cdot {\widetilde{R}}_{1}+\mathrm{C}({E}_{2})\cdot {\widetilde{R}}_{2}+\dots +\mathrm{C}({E}_{k})\cdot {\widetilde{R}}_{k}$$
(5)

The aggregation of PSF S14 was used as an example to illustrate the calculation process. Firstly, the semantic values of experts' fuzzy judgments for node S14 at "STATE = 1" were collected and then mathematically calculated according to the above method. After the aggregation of experts' opinions, the triangular fuzzy number of S14 was (0.00, 0.08, 0.21).

Defuzzification of data

Defuzzification is the mathematical calculation of fuzzy probabilities to obtain an exact value. In this study, the mean area method was used for defuzzification. The formula for defuzzification is shown in formula (6), where (a, m, b) represents a set of fuzzy numbers and P was the value after deconvoluting the triangular fuzzy numbers.

$$P=\frac{a+2m+b}{4}$$
(6)

The results for the root node S14:

$$P=\frac{0.00+2\times 0.08+0.21}{4}=0.09$$
(7)

Obtaining conditional probabilities for Bayesian networks

The Noisy-OR model can significantly reduce the number of parameters required to populate the CPT (Conditional Probability Table, CPT) of an event probability table in a Bayesian network. The Noisy-OR model was used in this study to describe the interaction between the cause of an event and its resulting impact. Suppose a binary variable Y which has n binary parents of X. Each variable has two states 0, 1 (0 for not occurring and 1 for occurring) as shown in Fig. 4.

Figure 4
figure 4

Diagram of the Noisy-OR model.

After obtaining the conditional probabilities of child nodes independently influenced by parent nodes, the conditional probabilities of multiple parent nodes acting together can be calculated. The calculation formula was given in (8) and (9).

$$P(Y \leftarrow X_i)=\text P(\text B \textbar \overline{X}_1,\, \overline{X}_2...,\, {X}_i...,\, \overline{X}_n)$$
(8)
$$\text P(\text Y)=1 -\text P(\text y \leftarrow X_p)= \Pi_X {_i}\epsilon_X{_p}(1-P(y\leftarrow x_i))$$
(9)

In the formula: \({X}_{i}\) indicates that node \({X}_{i}\) occurs, \(\overline{{X }_{i}}\) indicates that node does not occur; Xp indicates that simultaneous parent node union occurs; \({\text{P}}\left( {{\text{Y}} \leftarrow X_{i} } \right)\) indicates the probability of occurrence of node Y when parent node \({X}_{i}\) was independently influenced. The calculation process for node S4 was used as an example for illustration. The conditional probability that node S4 was under the influence of the parent node alone was:

P(S4 ← S22) = 0.34, P(S4 ← S7) = 0.27, P(S4 ← S2) = 0.58, P(S4 ← S20) = 0.65, P(S4 ← S24) = 0.77, P(S4 ← S13) = 0.46, P(S4 ← S14) = 0.75, P(S4 ← S15) = 0.35, P(S4 ← S18) = 0.52.

Calculate the conditional probability under the joint action of multiple parent nodes:

P(S4 ← S22, S7, S2, S20, S24, S13, S14, S15, S18) = 1 − [1 − P(S4 ← S22)]·[1 − P(S4 ← S7)]·[1 − P(S4 ← S2)]·[1 − P(S4 ← S20)]·[1 − P(S4 ← S24)]·[1 − P(S4 ← S13)]·[1 − P(S4 ← S14)]·[1 − P(S4 ← S15)]·[1 − P(S4 ← S18)] = 1 − (1–0.34)·(1–0.27)·(1–0.58)·(1–0.65)·(1–0.77)·(1–0.46)·(1–0.75)·(1–0.35)·(1–0.52) = 0.99.

A case of human reliability analysis

A real-life case from the China Deep-sea Warrior manned submersible safety case compilation was selected for this study. According to the incident report, the submersible was on a 4500 m class sea trial. During the submersible's powered dive to sit on the bottom, the submariner failed to adjust the ballast water tank volume. The submersible's thrusters were underpowered triggering Inadequate Power's working chamber alarm, constituting a serious error event for the safety of a manned submersible.

Probability calculation of case events

Five experts in the field were invited to conduct interviews for this study. The experts gave fuzzy judgement values for PSF at "STATE = 1" based on practical experience and basic event information, and we used the formula to calculate the conditional probabilities under the influence of different combinations of parent nodes. As shown in Fig. 5, the human factor reliability (S0) probability for the HMI of this manned submersible was calculated to be 49.1% using Netica software, which is generally consistent with the state of the submersible during operation. Netica is the most widely used Bayesian network analysis software in the world.

Figure 5
figure 5

Diagram of Bayesian network causal inference. Data in a figure from Netica V5.18 version of software to access the address: http://www.3h3.com/soft/163546.html.

The manned submersible was diving to sit on the bottom when the divers neglected to adjust the amount of water in the ballast water tanks. After the alarm the submariner recognized the error and ballast water was fed into the tanks in time for the subsequent dive to proceed normally without making a major error. This coincides with the results of this study and verifies the applicability of the proposed Bayesian network approach based on an interpreted structural model for the human reliability evaluation of the HMI of deep-sea manned submersibles.

Analysis of key PSF affecting human reliability

Suppose that the human interface of the manned submersible was in a negative state due to low human reliability. Set the state P(S0 = 1) = 100% of node S0, update the probability parameters of the network and get the posterior probability of each node. By comparing the prior probability with the posterior probability, the sensitive factors affecting the human factor reliability can be identified based on the before and after change values. The results obtained were shown in Table 7.

Table 7 Comparison of probabilities of each node of Bayesian networks.

Discussion

In this study, a system of PSF was proposed, consisting of four different dimensions, L–L, L–H, L–S and L–E. The following discussion was conducted in this study.

The fatigue level factor had the highest impact in the L–L dimension

The results of this study showed that individual fatigue was a key factor affecting the human reliability of the manned submersible human–machine interface, which was the same as the results found in many previous studies. Many safety incidents occur as a direct result of individual fatigue35,36,37. The small and confined space inside a manned submersible can easily cause submariner fatigue. Studies have shown that when operators return to work after a period of temporary absence from the task, it significantly increases staff resourcefulness, so appropriate breaks can be used as a risk management measure38. All submariners were tested for fatigue prior to entering the submersible, but due to the long duration of the dive and the small confined working area fatigue can easily be generated. Managers need to monitor submariner fatigue in order to develop effective management measures to cope with the demands of the submariner's position.

The seats & chairs factor had the highest impact in the L–H dimension

In the hardware environment of the HMI, the seat & chairs was a key factor in the human factor reliability. This differs from the results of other studies. This is probably due to the small space inside the submersible and the predominantly sideways working position of the submariners. Such a position is not common in daily work, and prolonged lying on one's side is more likely to cause discomfort than a sitting position39. As a result, a higher level of design is required of the designers. The designers have to take into account the working characteristics and habits of the submariners and adopt a more humane design to meet the special requirements of the submarine process.

The reasonableness of system operation time factor had the highest impact in the L–S dimension

At present, most of the ICAO member states have regulations on the maximum flight time and the duration of a single work session for pilots40. However, for the manned submersible field, there is no standard work duration regulation, moreover, there is a lack of detailed work time limits and arrangements. The work of submariners requires alternating day and night, which is physically demanding. Previous studies have pointed out that alternating day and night shifts require full consideration of human adaptability, with night and day shifts needing to be at least 48 h apart when they cross over41. The results of this study could provide insights into the development of the submersible field.

The noise and vibration factor had the highest impact in the L–E dimension

Noise and vibration have emerged as key causes of psychological and physiological effects on individuals in confined human–computer interaction spaces. This is consistent with the model results presented in this study. The sound pressure level of the noise source can be controlled by, for example, arranging some sound insulation and absorption materials, vibration isolation and vibration absorption structures in the bulkhead of the submersible.

Limitations

There are a number of limitations to the results of this study that may affect the generalisability of the model. Firstly, the initial identification of 28 PSF does not fully describe all the factors influencing human reliability. The human–machine interface of a manned submersible is constructed in a complex manner, which includes many other influencing factors. Although we obtained some important influencing factors through literature and expert interviews, more PSF will be included in the future to ensure the accuracy of the model as the internal design of the manned submersible is continuously updated. Secondly, this study has fuzzed the experts' opinions, and although some of the subjective differences can be removed to a certain extent, there is still some subjectivity, and it is important to remove as much error as possible from subjective results in future studies.

Conclusion

This study analyses the human reliability of the HMI of deep-sea manned submersibles. By analyzing the relationship between four dimensions of PSF, we proposed a human reliability evaluation method for the human–machine interface of deep-sea manned submersible. Our innovation mainly includes the following aspects:

  1. 1.

    Four dimensions were selected to evaluate the human factor reliability of deep-sea manned submersible.

  2. 2.

    In addition to the effects of individual PSF on human factor reliability, we also analyzed the correlation effects between PSF.

  3. 3.

    The fatigue level factor had the highest impact in the L–L dimension. The Seats & Chairs factor had the highest impact in the L–H dimension. The Reasonableness of system operation time factor had the highest impact in the L–S dimension. The Noise and vibration factor had the highest impact in the L–E dimension.

The method allows for a more scientific evaluation study of the HMI of manned submersibles.