Introduction

Parkinson’s disease (PD) is a progressively debilitating neurodegenerative disorder primarily affecting the dopaminergic system in the basal ganglia, and it impacts millions of individuals worldwide1. The initial description of PD was provided by James Parkinson in his essay “An Essay on the Shaking Palsy”2, published in 1817. This essay is often regarded as groundwork for Parkinson’s disease understanding. James Parkinson’s essay described several key clinical characteristics of the disease, encompassing resting tremors, muscle rigidity, bradykinesia (slowness of movement), and postural instability. He referred to the condition as “shaking palsy,” emphasizing the tremors observed in the individuals affected by the disease.

Currently, PD diagnosis primarily relies on clinical evaluation without reliable diagnostic tests3, and it can be imprecise and inherently subjective. Standardized rating tools, such as the Unified Parkinson’s Disease Rating Scale (UPDRS), were developed to assess various aspects of PD symptoms. Additionally, the finger-tapping test (FTT), as part of UPDRS, can serve as a clinical marker for evaluating motor performance. FTT is a psychomotor task that involves repetitive tapping movements for evaluating motor function in individuals affected with PD. The FTT involves tapping by the thumb or middle finger as quickly and accurately as possible over a set duration. Variations of the test include unilateral and bilateral tapping, with the latter offering insights into coordination and symmetry. Individuals with PD exhibit impaired finger-tapping performance compared to healthy controls. Key findings in studies include reduced tapping speed (bradykinesia), increased variability in tapping intervals, and reduced tapping amplitude. FTT can be a promising tool for detecting PD in early stages by combining it with other tools and measurements.

Artificial intelligence (AI) has revolutionized healthcare by offering data-driven solutions for complex medical problems. In medical datasets, deep learning methods, have demonstrated exceptional potential in extracting intricate patterns. In healthcare, the application of Long Short-Term Memory (LSTM) networks, a type of recurrent neural network (RNN) is particularly promising for analyzing medical data. Additionally, metaheuristic optimization algorithms have emerged as powerful methods for optimizing the parameters of complex models, enhancing their performance.

An extensive literature review suggests that there is a research gap concerning the application of optimized LSTM networks for Parkinson’s diagnosis. Additionally, the application of emerging optimizers such as the crayfish optimization algorithm (COA)4. has yet to be explored. The main focus of this work is to address this issue presenting a novel diagnostic approach for PD by combining the capabilities of deep LSTM networks, optimized by a modified metaheuristic algorithm. The primary goal of this work is to address the need for early and accurate PD diagnosis. An additional scientific contribution of this work is a proposal for a modified version of the COA that tackles some of the observed drawbacks of the original algorithm.

For the experiments, a dataset comprising recordings from inertial wearable sensors with gyroscopes is employed. This dataset encompasses recordings collected from individuals affected by PD, those with atypical PD, and healthy control subjects. During data acquisition, a 3D gyroscope was meticulously positioned inside the patient’s shoe soles, and participants were instructed to walk down a well-lit path while counting backward from 500 in increments of 7, known as a dual-task walk test. Multiple trials were conducted for each participant, and it’s important to note that the data exclusively pertained to the right hand, which was typically the more affected hand in these individuals.

The primary scientific contributions of this work can be outlined as follows:

  • A proposal for a novel time-series classification-based approach for PD detection in affected individuals.

  • An innovative application of the recently proposed COA for parameter optimization of LSTM tasked by PD diagnosis.

  • A modified version of the COA specifically developed for this study and to address the drawbacks of the original algorithm.

Background and related works

The integration of AI into the realm of medical diagnostics has garnered substantial scholarly interest and is effecting a profound transformation within the healthcare sector. AI presents a promising technique to enhance the accuracy of medical diagnoses, reduce healthcare costs, and improve patient outcomes. AI has been widely applied in radiology to assist in the diagnosis of diseases from X-rays, CT scans, and MRIs. Notable applications include the early detection of lung cancer CT scans using a 3-dimensional deep learning algorithm5 and the identification of diabetic retinopathy using networks trained by a dataset of retinal fundus photographs6. AI-driven pathology, particularly in the field of digital pathology, has advanced the accuracy of cancer diagnosis and tumor classification. Deep learning models have been employed to aid pathologists in identifying and grading cancers7. In cardiology, AI has shown potential in analyzing electrocardiograms (ECGs) for arrhythmia detection8 and echocardiograms for cardiac disease assessment. Preceding works have demonstrate impressive results for arrhythmia detection by integrating optimization techniques to tackle large search spaces for parameter optimization9 exceeding 98% accuracy.

Machine learning and deep learning methodologies have unequivocally exhibited their efficacy in the field of neurodiagnostics. These sophisticated algorithms are adept at parsing intricate neurophysiological data, encompassing medical imagery, electrophysiological measurements, and behavioral evaluations, thereby culminating in heightened precision and expedience in the diagnostic process. AI-enabled systems are poised to contribute significantly to the timely identification and categorization of neurological disorders, including but not limited to Alzheimer’s disease, multiple sclerosis, and intracranial neoplasms10,11. AI has engendered notable enhancements in the scrutiny of electrophysiological data, encompassing electroencephalography (EEG) and magnetoencephalography (MEG) signals, with the express purpose of diagnosing and overseeing conditions such as epilepsy, sleep disorders, and various other neurological maladies. Deep learning algorithms remain essential for the identification of aberrations, the precise localization of epileptic foci, and the prognostication of seizure occurrences. Khan et al. conducted an evaluation, comparing two distinct deep learning methodologies12.

The utilization of the finger-tapping test as a diagnostic modality for Parkinson’s disease has garnered attention within the realm of clinical investigation. This test serves as an evaluative measure of the motor function and dexterity of the fingers, presenting itself as a prospective instrument for the early detection and continuous monitoring of Parkinson’s disease. Akram et al13. developed a new Distal Finger Tapping (DFT) test to assess distal upper-limb function in PD patients, focusing on kinetic parameters like kinesia score (KS20), akinesia time (AT20), and incoordination score (IS20). The DFT test effectively discriminated between PD patients and controls, with KS20 exhibiting the highest sensitivity (79%) and an area under the receiver operating characteristic curve (AUC) of 0.90. In a research undertaken by Williams et al14. a new computer vision technology, DeepLabCut, was used to track and measure finger tapping in smartphone videos to objectively assess bradykinesia in Parkinson’s disease. The computer measures, including tapping speed, amplitude, and rhythm, correlated well with clinical ratings from movement disorder neurologists, demonstrating its accuracy (Spearman coefficients ranged from −  0.50 to −  0.74, \(p<\) .001). DeepLabCut offers a ’contactless’ and easily accessible method for quantifying Parkinson’s bradykinesia during clinical examinations, with potential applications in other neurological disorders characterized by altered movements.

Preceding works have tackled PD diagnosis using MRI image analysis reporting outcomes raining form 78%15 to 88%16. However, the use of MRI is significantly higher d to shoe mounted sensing systems. One major advantage of the proposed approach is the significantly lower diagnosis costs as well as greater availability of diagnosis tools. Researchers also considered handwriting analysis for diagnosis. The paper17 tested several classifiers with the best accuracy demonstrated by the Naive Bayes models aching an accuracy of 88.63%. Researchers have considered the use of generative adversarial networks to tackles issues associated with data availability for gait freezing in PS patients18. Models trained on the augmented data arrained a reported an exceeding of 90%, however the use of data optimization techniques has not considered in this work. There is an evident research gap for using timeseries PD detection, as well as the application of parameter tuning via metastatic algorithms in the field of PD diagnosis. This work seeks to address the observed gap by proposing a low cost AI powered approach.

Attention based LSTM

The LSTM19 represents a variant of RNNs. These networks retain prior information and incorporate it into their processing of current input data. However, a limitation of traditional RNNs is their inability to effectively capture long-term dependencies, mainly because of the vanishing gradient issue. LSTMs, on the other hand, are purposefully engineered to avoid these challenges associated with long-term dependencies.

The cell state is a crucial component of the LSTM network, which is designed to capture and carry information over long-term dependencies. The hidden state is computed at each time step based on the cell state and the input at that time step. It serves as the output of the LSTM at each step and contains information that the network has learned to be significant for making predictions. The third main element of LSTMs is the gates and they incorporate three different gates for controlling the information flow, the forget gate, the input gate, and the output gate. These gates play an important role in LSTMs to selectively modify and utilize information from the cell state, managing the flow of data within the network. This capability empowers LSTMs to grasp and apply both short-term and long-term dependencies in sequential data.

The forget gate decides which information from the prior cell state should be forgotten. The input gate is responsible for deciding which new information should be incorporated into the cell state. The output gate regulates which information should be extracted from the cell state and utilized in generating the hidden state and output of the LSTM. The LSTM defines the gate, forget gate, cell state, output gate, and hidden state through the following mathematical formulations:

$$\begin{aligned} i_t = \sigma (W_{xi}x_t + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i) \end{aligned}$$
(1)

where \(i_t\) refers to input gate activation at time t, \(x_t\) is the input at time t. The hidden state and the cell state at time \(t-1\) are referred to by \(h_{t-1}\), and \(t-1\) respectively. Cell state at time \(t-1\) is denoted by \(c_{t-1}\). \(W_{xi}, W_{hi}, W_{ci}, b_i\) are the weight matrices and bias vectors for the input gate. \(\sigma\) denotes the Sigmoid activation function.

$$\begin{aligned} f_t = \sigma (W_{xf}x_t + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f) \end{aligned}$$
(2)

where \(f_t\) denotes the forget gate activation at time t.

$$\begin{aligned} c_t = f_t \cdot c_{t-1} + i_t \cdot \tanh (W_{xc}x_t + W_{hc}h_{t-1} + b_c) \end{aligned}$$
(3)

where \(c_t\) denotes the cell state at time t. \(\tanh\) refers to the hyperbolic tangent activation function defined as follows:

$$\begin{aligned}{} & {} \tanh (x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}} \end{aligned}$$
(4)
$$\begin{aligned}{} & {} o_t = \sigma (W_{xo}x_t + W_{ho}h_{t-1} + W_{co}c_t + b_o) \end{aligned}$$
(5)

where \(o_t\) denotes the output gate activation at time t.

$$\begin{aligned} h_t = o_t \cdot \tanh (c_t) \end{aligned}$$
(6)

where \(h_t\) denotes the hidden state at time t.

The attention phenomenon lacks a precise mathematical definition, and its incorporation into the Luong attention-based model should be viewed as a mechanism. Networks capable of operating with this attention mechanism and possessing LSTM characteristics are considered attention-based. The primary goal of such a mechanism is to assign varying weights to the input sequence, allowing for the capture of data and the utilization of input-output relationships. The fundamental resolution for this architecture involves implementing a second network.

In pursuit of this objective, the authors opted for the Luong attention-based model. The weight, denoted as \(w_t\), is computed for each timestep t in the source during the decoding process of the attention-based encoder-decoder, with the constraint \(\Sigma _sw_t(s)=1\) and \(\forall s; w_t(s) \ge 0\). The hidden state \(h_t\) serves as a function representing the predicted token for the corresponding timestep, given by \(\Sigma _sw_t(s) * \hat{h}_s\).

Various mathematical applications of the attention mechanism exhibit differences in how they calculate weights. In the Luong model, the computation involves applying the softmax function to the scaled scores of each token. The matrix \(W_a\) linearly transforms the dot product of the decoder’s \(h_t\) and the encoder’s \(\hat{h}_s\) to obtain the score.

Metaheuristics and hyperparameter optimization

Metaheuristic algorithms have many successful implementations in different areas, including wireless sensor networks20, hybridizing by K-means algorithm for text-document clustering21, tuning LSTM models22, convolutional neural network architecture design23, feature selection24, fraud detection25,26, and many others27,28,29.

In the domain of metaheuristics, hyperparameter optimization has a crucial role when tuning the operations of specific algorithms. Hyperparameter optimization is the process of selecting the right configuration of hyperparameters in a specific method for a given optimization problem. The choice of hyperparameters significantly influences the algorithm’s convergence, robustness, and overall efficacy. It is important to note that hyperparameter optimization itself is an NP-hard problem and metaheuristics are shown to be successful for tackling NP-hard optimization problems.

The NP-hardness of hyperparameter optimization arises from the large search space of possible configurations and the computational effort required to identify the optimal set of hyperparameters. In an NP-hard problem, the time required to find an optimal solution grows exponentially with the problem size, making it impractical to perform an exhaustive search. Therefore, finding the best set of hyperparameters efficiently is a formidable challenge. To tackle the NP-hard nature of hyperparameter optimization, metaheuristics offer an efficient and effective approach. Metaheuristics are a class of optimization algorithms that are designed to handle complex, large-scale problems, often characterized by non-linearity and high dimensionality.

It is important to highlight that no one-size-fits-all solution exists when it comes to optimization problems. This assertion is underpinned by the No Free Lunch (NFL)30 theorem, which stipulates that no universally optimal approach functions equally well for all existing problems. Consequently, the diverse field of metaheuristics has emerged, each with its own set of advantages and disadvantages. Selection is essential when determining an appropriate metaheuristic for a given problem domain, considering the problem’s characteristics and the algorithm’s strengths and weaknesses.

Proposed method

This section presents the base Crayfish Optimization Algorithm (COA)4, as well as the inspiration behind the preparation of an altered version used for the purposes of our research. Subsequently, details and pseudocode of the modified algorithm are provided.

Original crayfish optimization algorithm

The COA4, a novel optimization metaheuristic emulates the foraging, avoidance, and social behavior patterns observed in crayfish populations4. This algorithm leverages principles from the biological realm to tackle optimization problems in various fields using three distinct operating phases. These phases are designed to establish an equilibrium of exploration and exploitation. In the initial “summer resort” stage, COA focuses on exploring potential solutions. Subsequently, the “competition” and “foraging” stages simulate the exploitation phase. Transitions between these stages are influenced by temperature control. Elevated temperatures prompt crayfish to seek shelter or engage in competition for shelter, while optimal temperatures dictate foraging strategies based on food size. Temperature regulation enriches COA’s level of randomness and bolsters its global optimization capabilities.

The following equations describe the functioning of the COA:

$$\begin{aligned} X=\left[ X_1, X_2, \cdots , X_N\right] =\left[ \begin{array}{ccccc} X_{1,1} &{} \cdots &{} X_{1, j} &{} \cdots &{} X_{1, {\text {dim}}} \\ \vdots &{} \cdots &{} \vdots &{} \cdots &{} \vdots \\ X_{i, 1} &{} \cdots &{} X_{i, j} &{} \cdots &{} X_{i, d i m} \\ \vdots &{} \cdots &{} \vdots &{} \cdots &{} \vdots \\ X_{N, 1} &{} \cdots &{} X_{N, j} &{} \cdots &{} X_{N, {\text {dim}}} \end{array}\right] \end{aligned}$$
(7)

here P denotes the population, k the dimensionality of said problem and N the population limit, \(X_{i,j}\) is the position of an agent in the i and j coordinate. Agents are randomly dispersed across the search space according to:

$$\begin{aligned} X_{i, j}=l b_j+\left( u b_j-l b_j\right) \times r a n d, \end{aligned}$$
(8)

in which ll represents the lower limit, ul the upper limit and rnd is sued to introduced randomness. A major influence of agent behavior is simulated temperature defined as per the following.

$$\begin{aligned} \text{ temp } = \text{ rand } \times 15+20 \end{aligned}$$
(9)

Once temperatures exceed 30 agents choose to locate a cooler region to vacation and resume foraging at a more appropriate temperature. Agent intake can be approximately assumed to be normally distributed and can be determined in accordance with:

$$\begin{aligned} p=C_1 \times \left( \frac{1}{\sqrt{2 \times \pi } \times \sigma )} \times \exp \left( -\frac{(t e m p-\mu )^2}{2 \sigma ^2}\right) \right) \end{aligned}$$
(10)

where \(\mu\) denotes the optimal agent temperature, and \(\sigma\) and C define control parameters for the given algorithm. Crayfish will fight for cave space. This is simulated by the algorithm as a random event with a 0.5 probability of occurring once tmp exceeds 30 as:

$$\begin{aligned} X_{i, j}^{t+1}=X_{i, j}^t-X_{z, j}^t+X_{\text{ shade } } \end{aligned}$$
(11)

with z denoting a random agent. Positions are therefore adjusted in accordance with other competing individual agents.

Agent positions are updated according to:

$$\begin{aligned} X_{i, j}^{t+1}=X_{i, j}^t+X_{\text{ food } } \times p \times (\cos (2 \times \pi \times \text{ rand } )-\sin (2 \times \pi \times \text{ rand } )) \end{aligned}$$
(12)

During the foraging phase, COA will progress towards the most effective solution, bolstering the algorithm’s ability to exploit resources and ensuring robust convergence capabilities.

Modified crayfish optimization algorithm

While the original COA algorithm demonstrates decent performance, it is a relatively novel algorithm with a lot of room for growth. Testing conducted using CEC standard evaluation methods suggests a lack of exploration can be associated with this algorithm. The modified version attempts to tackle this deficiency by introducing two new mechanisms.

The first introduced mechanism comes from the ABC31 algorithm. Depleted solutions are rejected if they do not show improvement and are replaced by newly generated solutions. Given the limited number of iterations conducted in this experiment, solutions that do not improve are rejected after two iterations if no improvement is observed. This approach has been shown to boost exploration. The second mechanism introduced is quasi-reflective learning (QRL)32. This technique is utilized to generate new solutions further boosting exploration. Additionally, this mechanism is utilized for the initial generation of potential solutions in the initialization stages of the algorithm. Quasi-reflected component z of the solution of a given solution X is determined as:

$$\begin{aligned} X^{qr}_z = rand\bigg (\frac{lb_z + ub_z}{2}, x_z\bigg ) \end{aligned}$$
(13)

where lb and ub denote lower and upper bounds of the search space and rand denotes a random value within the given interval. The introduced algorithm is named the modified COA (MCOA). The pseudocode for the described optimizer is presented in 1.

Algorithm 1
figure a

Pseudocode for the described MCOA algorithm

Experimental setup

To establish the quality of the introduced approach, data from a publicly available clinical study is utilized33 that can be found on the following link https://physionet.org/content/gaitpdb/1.0.0/. The data is sourced from a collection of shoe-mounted accelerators, specifically chosen for its representation in a clinically significant study conducted by experts in the field. Moreover, the dataset is publicly available and exhibits well-organized data. One challenge associated with this dataset is its presentation in text format.

The preprocessing phase involves converting it into a suitable data frame, ensuring proper formatting, and applying labels to each patient’s sample. Patient details, including their status, are provided in a separate text file, and labels are assigned to each utilized sample based on this information. The dataset contains no missing values and all values are normalized therefore appropriate as inputs for a model. The original data is structured as a time series, and information from various patients is amalgamated to construct a balanced and unified dataset for time-series classification using the TensorFlow time series generator. The number of lags is set to 15, and a batch size of 1 is employed in the process.

Network architecture parameters including the number of layers and neurons per layer are optimized for an LSTM attention model (LSTM-ATT). Constraints for these two parameters as as follows [1, 3] layers and [5, 15] neurons per layer. Additionally, training parameters are selected. The number of training epochs, dropout, and learning rate are optimized in ranges [30, 60], [0.05, 0.2], and [0.0001, 0.01] respectively. Early stooping is also utilized to prevent overtraining with the threshold set to 1/3 of the selected number of training epochs. Respective ranges are presented in Table 1.

Table 1 Hyperparamaters and their respective ranges.

Several metaheuristics are included in a comparative analysis of LSTM-ATT hyperparameter tuning. The introduced MCOA algorithm alongside the original COA4 are tested. Several well-established algorithms are included in the comparison as well such as the GA34, PSO35, FA 36, GWO37, BSO 38 and COLSHADE39 algorithm. All metaheuristics are implemented under identical testing conditions with a population size of five agents and with six allocated iterations for optimization. All metaheuristics are implemented specifically for this study with control parameter values set to those suggested in the original works. Finally, experiments are repeated 30 times to ensure a valid comparison that accounts for some of the inherent randomness in these algorithms.

To facilitate a comparison between the optimization potential of the assessed algorithms standard testing metrics including accuracy, precision, recall, and f1-score are utilized. To support the optimization process error rate is used as the objective function determined as per the:

$$\begin{aligned} Error\_rate = 1 - Accuracy \end{aligned}$$
(14)

An additional metric Cohen’s kappa is included as it may provide a better assessment of datasets that have an inherent imbalance. These metrics are used as the indicator function during the optimization and outcomes are logged through the entire process for each evaluated algorithm. The metrics are calculated according to:

$$\begin{aligned} \kappa = \frac{v_o - v_e}{1 - v_e} \end{aligned}$$
(15)

where \(v_o\) denotes the observed and \(v_e\) expected values.

A flowchart of the proposed process is provided in Fig. 1.

Figure 1
figure 1

Flowchart of the proposed model evaluation process.

Simulation outcomes

Objective function outcomes during simulations in terms of best, worst as well as mean and median outcomes are provided in Table 2 and in terms of indicator function in Table 3.

Table 2 Overall objective function simulation outcomes.
Table 3 Overall objective function simulation outcomes.

As can be observed in Table 2 as well as Table 3 models optimized by the introduced MCOA attained the best outcomes in terms of objective and indicator functions in all test cases. Furthermore, admirable stability has been demonstrated across all cases. Algorithm stability is further showcased in the distribution plots for the objective and indicator functions shown in Fig. 2

Figure 2
figure 2

Outcome distributions for the objective and indicator function outcomes.

As shown in Fig. 2 the introduced modified metaheuristic demonstrates reliable outcomes ahead of competing algorithms. The introduced algorithm outperformed the original version of the algorithm as well as others included in the comparative analysis. Convergence rate changes in the observed algorithm can be seen in the convergence graphs in terms of objective and indicator functions in Fig. 3 and average objective and convergence graphs shown in Fig. 4.

Figure 3
figure 3

Algorithm convergence in terms of objective and indicator function outcomes.

An improvement in convergence rate can be observed for the introduced algorithm. The original COA showcases a slow convergence after stagnating at a local minimum. However, the modification introduced in this work helps the agents locate a better solution within the solution space. A detailed comparison between the best-performing models is showcased in Table 4.

Figure 4
figure 4

Average algorithm convergence in terms of objective and indicator function outcomes.

Table 4 Detailed metric comparison between the best performing models.

As shown in Table 4 the introduced algorithms demonstrate the highest accuracy and a high f1-score for both PD and control group identification. However, admirable results are shown by the PSO and BSO algorithms in terms of PD and control group when observing precision alone. These outcomes are to be expected as per the NFL, no single approach will work equally well across all metrics and test cases. Further details of the best-performing model are shown in Fig. 5.

Figure 5
figure 5

Best performing model PR plot and confusion matrix.

Finally, to facilitate experimental repeatability, the hyperparameter choices made by optimizers for the best-performing models are presented in Table 5.

Table 5 Hyperparameter choices made for best-performing models constructed by optimizers.

Outcome statistical validation

Within the realm of optimization problems, the assessment of models emerges as a crucial focal point. Understanding the statistical significance of implemented enhancements becomes imperative, as a reliance solely on outcomes falls short of establishing the superiority of one algorithm over another.

According to prior investigations40, a judicious statistical assessment should transpire only subsequent to the thorough sampling of the evaluated methods. This involves the establishment of objective averages across numerous independent runs, with an additional prerequisite that the samples adhere to a normal distribution to preclude erroneous conclusions. The utilization of objective function averages remains an unresolved inquiry in the comparison of stochastic methods among researchers41.

In order to establish the statistical significance of the observed results, the optimal values from 30 independent executions of each metaheuristic were employed to construct the samples. However, the judicious application of parametric tests necessitated verification. To this end, compliance with the recommendations of42 was ensured, encompassing considerations of independence, normality, and homoscedasticity of data variances.

The independence criterion is met by virtue of initializing each run with a pseudo-random number seed. Nevertheless, the normality condition remains unmet, as evidenced by KED plots shown in Fig. 6 and substantiated by Shapiro-Wilk test outcomes for single-problem instance analysts43. By performing the Shapiro-Wilk test, p-values are generated for each method-problem combination, and these outcomes are presented in Table 6.

Figure 6
figure 6

Objective function KDE plot.

Table 6 Shapiro-Wilk scores for the single-problem analysis for testing normality condition.

The conventional significance levels represented by \(\alpha = 0.05\) and \(\alpha = 0.1\) indicate the potential rejection of the null hypothesis (\(H_0\)). This implies that none of the samples, spanning diverse problem-method combinations, adhere to a normal distribution. These findings signal the failure to meet the normality assumption, a prerequisite for the robust application of parametric tests. Consequently, the verification of homogeneity of variances was considered unnecessary.

Given the unmet prerequisites for the reliable use of parametric tests, non-parametric tests were employed for subsequent statistical analyses. Specifically, the Wilcoxon signed-rank test, acknowledged as a non-parametric statistical method44, was conducted on the MCOA method and all alternative techniques in the conducted experiment. The same data samples utilized in the preceding normality test (Shapiro-Wilk) were applied for each method. The outcomes of this analysis are detailed in Table 7.

Table 7 Wilcoxon signed-rank test findings.

Table 7, which presents the p-values obtained from the Wilcoxon signed-rank test, demonstrates that when tackling LSTM-ATT optimization the proposed MCOA method achieved significantly better performance than all other techniques in all three experiments.

The p-values for all other methods were lower than 0.05. Therefore, the MCOA technique exhibited both robustness and effectiveness as an optimizer in these computationally intensive simulations. Based on the statistical analysis, it can be concluded that the MCOA method outperformed most of the other metaheuristics investigated in all four experiments.

Conclusion

This work tackles PD detection from patient gate data collected from a show-mounted accelerometer sensor as a noninvasive way for early diagnosis. Timely treatments are crucial for battling this neurodegenerative disease as there is currently no way of undoing the damage caused by the condition. This task is tackled through the application of AI algorithms. Attention-based LSTM models are trained on real-world data, and asses on their ability to detect signs of the condition. Furthermore, an altered variation of a relatively novel algorithm is proposed and applied to hyperparameter tuning to improve model performance. The introduced approach has shown admirable outcomes with the best-constructed models exceeding 87% accuracy. Meticulous statistical validations confirmed the observations and enforced that the introduced MCOA outperformed the original algorithm when applied to hyperparameter optimization of LSTM-ATT networks as well as competing optimizers in a statistically significant way.

Like any research, this study is not without its limitations. The inclusion of optimization algorithms in the comparative analysis has been restricted due to computational constraints. Similarly, the optimization process is constrained by the use of limited model population sizes. The potential for improved outcomes exists with the allocation of additional resources. Moreover, the current testing is based on the limited available data samples from dual-task walking tests with accelerometers, as only a restricted amount of data is presently accessible for Parkinson’s disease diagnosis.

Future research aims to refine early detection methods and explore other contemporary recurrent networks for addressing the task at hand. The introduced optimization algorithm will also be investigated for potential applications in computer security and hyperparameter optimization.