Tuning attention based long-short term memory neural networks for Parkinson’s disease detection using modified metaheuristics

Parkinson’s disease (PD) is a progressively debilitating neurodegenerative disorder that primarily affects the dopaminergic system in the basal ganglia, impacting millions of individuals globally. The clinical manifestations of the disease include resting tremors, muscle rigidity, bradykinesia, and postural instability. Diagnosis relies mainly on clinical evaluation, lacking reliable diagnostic tests and being inherently imprecise and subjective. Early detection of PD is crucial for initiating treatments that, while unable to cure the chronic condition, can enhance the life quality of patients and alleviate symptoms. This study explores the potential of utilizing long-short term memory neural networks (LSTM) with attention mechanisms to detect Parkinson’s disease based on dual-task walking test data. Given that the performance of networks is significantly inductance by architecture and training parameter choices, a modified version of the recently introduced crayfish optimization algorithm (COA) is proposed, specifically tailored to the requirements of this investigation. The proposed optimizer is assessed on a publicly accessible real-world clinical gait in Parkinson’s disease dataset, and the results demonstrate its promise, achieving an accuracy of 87.4187 % for the best-constructed models.


Background and related works
The integration of AI into the realm of medical diagnostics has garnered substantial scholarly interest and is effecting a profound transformation within the healthcare sector.AI presents a promising technique to enhance the accuracy of medical diagnoses, reduce healthcare costs, and improve patient outcomes.AI has been widely applied in radiology to assist in the diagnosis of diseases from X-rays, CT scans, and MRIs.Notable applications include the early detection of lung cancer CT scans using a 3-dimensional deep learning algorithm 5 and the identification of diabetic retinopathy using networks trained by a dataset of retinal fundus photographs 6 .AI-driven pathology, particularly in the field of digital pathology, has advanced the accuracy of cancer diagnosis and tumor classification.Deep learning models have been employed to aid pathologists in identifying and grading cancers 7 .In cardiology, AI has shown potential in analyzing electrocardiograms (ECGs) for arrhythmia detection 8 and echocardiograms for cardiac disease assessment.Preceding works have demonstrate impressive results for arrhythmia detection by integrating optimization techniques to tackle large search spaces for parameter optimization 9 exceeding 98% accuracy.
Machine learning and deep learning methodologies have unequivocally exhibited their efficacy in the field of neurodiagnostics.These sophisticated algorithms are adept at parsing intricate neurophysiological data, encompassing medical imagery, electrophysiological measurements, and behavioral evaluations, thereby culminating in heightened precision and expedience in the diagnostic process.AI-enabled systems are poised to contribute significantly to the timely identification and categorization of neurological disorders, including but not limited to Alzheimer's disease, multiple sclerosis, and intracranial neoplasms 10,11 .AI has engendered notable enhancements in the scrutiny of electrophysiological data, encompassing electroencephalography (EEG) and magnetoencephalography (MEG) signals, with the express purpose of diagnosing and overseeing conditions such as epilepsy, sleep disorders, and various other neurological maladies.Deep learning algorithms remain essential for the identification of aberrations, the precise localization of epileptic foci, and the prognostication of seizure occurrences.Khan et al. conducted an evaluation, comparing two distinct deep learning methodologies 12 .
The utilization of the finger-tapping test as a diagnostic modality for Parkinson's disease has garnered attention within the realm of clinical investigation.This test serves as an evaluative measure of the motor function and dexterity of the fingers, presenting itself as a prospective instrument for the early detection and continuous monitoring of Parkinson's disease.Akram et al 13 .developed a new Distal Finger Tapping (DFT) test to assess distal upper-limb function in PD patients, focusing on kinetic parameters like kinesia score (KS20), akinesia time (AT20), and incoordination score (IS20).The DFT test effectively discriminated between PD patients and controls, with KS20 exhibiting the highest sensitivity (79%) and an area under the receiver operating characteristic curve (AUC) of 0.90.In a research undertaken by Williams et al 14 .a new computer vision technology, Dee-pLabCut, was used to track and measure finger tapping in smartphone videos to objectively assess bradykinesia in Parkinson's disease.The computer measures, including tapping speed, amplitude, and rhythm, correlated well with clinical ratings from movement disorder neurologists, demonstrating its accuracy (Spearman coefficients ranged from − 0.50 to − 0.74, p < .001).DeepLabCut offers a ' contactless' and easily accessible method for quan- tifying Parkinson's bradykinesia during clinical examinations, with potential applications in other neurological disorders characterized by altered movements.
www.nature.com/scientificreports/Preceding works have tackled PD diagnosis using MRI image analysis reporting outcomes raining form 78% 15 to 88% 16 .However, the use of MRI is significantly higher d to shoe mounted sensing systems.One major advantage of the proposed approach is the significantly lower diagnosis costs as well as greater availability of diagnosis tools.Researchers also considered handwriting analysis for diagnosis.The paper 17 tested several classifiers with the best accuracy demonstrated by the Naive Bayes models aching an accuracy of 88.63%.Researchers have considered the use of generative adversarial networks to tackles issues associated with data availability for gait freezing in PS patients 18 .Models trained on the augmented data arrained a reported an exceeding of 90%, however the use of data optimization techniques has not considered in this work.There is an evident research gap for using timeseries PD detection, as well as the application of parameter tuning via metastatic algorithms in the field of PD diagnosis.This work seeks to address the observed gap by proposing a low cost AI powered approach.

Attention based LSTM
The LSTM 19 represents a variant of RNNs.These networks retain prior information and incorporate it into their processing of current input data.However, a limitation of traditional RNNs is their inability to effectively capture long-term dependencies, mainly because of the vanishing gradient issue.LSTMs, on the other hand, are purposefully engineered to avoid these challenges associated with long-term dependencies.
The cell state is a crucial component of the LSTM network, which is designed to capture and carry information over long-term dependencies.The hidden state is computed at each time step based on the cell state and the input at that time step.It serves as the output of the LSTM at each step and contains information that the network has learned to be significant for making predictions.The third main element of LSTMs is the gates and they incorporate three different gates for controlling the information flow, the forget gate, the input gate, and the output gate.These gates play an important role in LSTMs to selectively modify and utilize information from the cell state, managing the flow of data within the network.This capability empowers LSTMs to grasp and apply both short-term and long-term dependencies in sequential data.
The forget gate decides which information from the prior cell state should be forgotten.The input gate is responsible for deciding which new information should be incorporated into the cell state.The output gate regulates which information should be extracted from the cell state and utilized in generating the hidden state and output of the LSTM.The LSTM defines the gate, forget gate, cell state, output gate, and hidden state through the following mathematical formulations: where i t refers to input gate activation at time t, x t is the input at time t.The hidden state and the cell state at time t − 1 are referred to by h t−1 , and t − 1 respectively.Cell state at time t − 1 is denoted by c t−1 .W xi , W hi , W ci , b i are the weight matrices and bias vectors for the input gate.σ denotes the Sigmoid activation function.
where f t denotes the forget gate activation at time t.
where c t denotes the cell state at time t.tanh refers to the hyperbolic tangent activation function defined as follows: where o t denotes the output gate activation at time t.
where h t denotes the hidden state at time t.
The attention phenomenon lacks a precise mathematical definition, and its incorporation into the Luong attention-based model should be viewed as a mechanism.Networks capable of operating with this attention mechanism and possessing LSTM characteristics are considered attention-based.The primary goal of such a mechanism is to assign varying weights to the input sequence, allowing for the capture of data and the utilization of input-output relationships.The fundamental resolution for this architecture involves implementing a second network.
In pursuit of this objective, the authors opted for the Luong attention-based model.The weight, denoted as w t , is computed for each timestep t in the source during the decoding process of the attention-based encoder- decoder, with the constraint � s w t (s) = 1 and ∀s; w t (s) ≥ 0 .The hidden state h t serves as a function representing the predicted token for the corresponding timestep, given by � s w t (s) * ĥs .
Various mathematical applications of the attention mechanism exhibit differences in how they calculate weights.In the Luong model, the computation involves applying the softmax function to the scaled scores of each token.The matrix W a linearly transforms the dot product of the decoder's h t and the encoder's ĥs to obtain the score. (1)
In the domain of metaheuristics, hyperparameter optimization has a crucial role when tuning the operations of specific algorithms.Hyperparameter optimization is the process of selecting the right configuration of hyperparameters in a specific method for a given optimization problem.The choice of hyperparameters significantly influences the algorithm's convergence, robustness, and overall efficacy.It is important to note that hyperparameter optimization itself is an NP-hard problem and metaheuristics are shown to be successful for tackling NP-hard optimization problems.
The NP-hardness of hyperparameter optimization arises from the large search space of possible configurations and the computational effort required to identify the optimal set of hyperparameters.In an NP-hard problem, the time required to find an optimal solution grows exponentially with the problem size, making it impractical to perform an exhaustive search.Therefore, finding the best set of hyperparameters efficiently is a formidable challenge.To tackle the NP-hard nature of hyperparameter optimization, metaheuristics offer an efficient and effective approach.Metaheuristics are a class of optimization algorithms that are designed to handle complex, large-scale problems, often characterized by non-linearity and high dimensionality.
It is important to highlight that no one-size-fits-all solution exists when it comes to optimization problems.This assertion is underpinned by the No Free Lunch (NFL) 30 theorem, which stipulates that no universally optimal approach functions equally well for all existing problems.Consequently, the diverse field of metaheuristics has emerged, each with its own set of advantages and disadvantages.Selection is essential when determining an appropriate metaheuristic for a given problem domain, considering the problem's characteristics and the algorithm's strengths and weaknesses.

Proposed method
This section presents the base Crayfish Optimization Algorithm (COA) 4 , as well as the inspiration behind the preparation of an altered version used for the purposes of our research.Subsequently, details and pseudocode of the modified algorithm are provided.

Original crayfish optimization algorithm
The COA 4 , a novel optimization metaheuristic emulates the foraging, avoidance, and social behavior patterns observed in crayfish populations 4 .This algorithm leverages principles from the biological realm to tackle optimization problems in various fields using three distinct operating phases.These phases are designed to establish an equilibrium of exploration and exploitation.In the initial "summer resort" stage, COA focuses on exploring potential solutions.Subsequently, the "competition" and "foraging" stages simulate the exploitation phase.Transitions between these stages are influenced by temperature control.Elevated temperatures prompt crayfish to seek shelter or engage in competition for shelter, while optimal temperatures dictate foraging strategies based on food size.Temperature regulation enriches COA's level of randomness and bolsters its global optimization capabilities.
The following equations describe the functioning of the COA: here P denotes the population, k the dimensionality of said problem and N the population limit, X i,j is the posi- tion of an agent in the i and j coordinate.Agents are randomly dispersed across the search space according to: in which ll represents the lower limit, ul the upper limit and rnd is sued to introduced randomness.A major influence of agent behavior is simulated temperature defined as per the following.
Once temperatures exceed 30 agents choose to locate a cooler region to vacation and resume foraging at a more appropriate temperature.Agent intake can be approximately assumed to be normally distributed and can be determined in accordance with: where µ denotes the optimal agent temperature, and σ and C define control parameters for the given algorithm.Crayfish will fight for cave space.This is simulated by the algorithm as a random event with a 0.5 probability of occurring once tmp exceeds 30 as: During the foraging phase, COA will progress towards the most effective solution, bolstering the algorithm's ability to exploit resources and ensuring robust convergence capabilities.

Modified crayfish optimization algorithm
While the original COA algorithm demonstrates decent performance, it is a relatively novel algorithm with a lot of room for growth.Testing conducted using CEC standard evaluation methods suggests a lack of exploration can be associated with this algorithm.The modified version attempts to tackle this deficiency by introducing two new mechanisms.
The first introduced mechanism comes from the ABC 31 algorithm.Depleted solutions are rejected if they do not show improvement and are replaced by newly generated solutions.Given the limited number of iterations conducted in this experiment, solutions that do not improve are rejected after two iterations if no improvement is observed.This approach has been shown to boost exploration.The second mechanism introduced is quasireflective learning (QRL) 32 .This technique is utilized to generate new solutions further boosting exploration.Additionally, this mechanism is utilized for the initial generation of potential solutions in the initialization stages of the algorithm.Quasi-reflected component z of the solution of a given solution X is determined as: where lb and ub denote lower and upper bounds of the search space and rand denotes a random value within the given interval.The introduced algorithm is named the modified COA (MCOA).The pseudocode for the described optimizer is presented in 1.

Set initial parameter values Initialized population using QRL mechanism while T > t do
Determine simulated temperature T emp Utilize appropriate COA to update agent locations depending on T emp Determine agent fitness using an objective function for agent p in P opulation do if p did not improve for 2 iterations then Generate new solution using QRL mechanism sand replace p end if end for end while return Optimal solution from the population Algorithm 1. Pseudocode for the described MCOA algorithm

Experimental setup
To establish the quality of the introduced approach, data from a publicly available clinical study is utilized 33 that can be found on the following link https:// physi onet.org/ conte nt/ gaitp db/1.0.0/.The data is sourced from a collection of shoe-mounted accelerators, specifically chosen for its representation in a clinically significant study conducted by experts in the field.Moreover, the dataset is publicly available and exhibits well-organized data.One challenge associated with this dataset is its presentation in text format.
The preprocessing phase involves converting it into a suitable data frame, ensuring proper formatting, and applying labels to each patient's sample.Patient details, including their status, are provided in a separate text file, and labels are assigned to each utilized sample based on this information.The dataset contains no missing values and all values are normalized therefore appropriate as inputs for a model.The original data is structured as a time series, and information from various patients is amalgamated to construct a balanced and unified dataset for time-series classification using the TensorFlow time series generator.The number of lags is set to 15, and a batch size of 1 is employed in the process.
Network architecture parameters including the number of layers and neurons per layer are optimized for an LSTM attention model (LSTM-ATT).Constraints for these two parameters as as follows [1, 3] layers and [5, 15]  neurons per layer.Additionally, training parameters are selected.The number of training epochs, dropout, and learning rate are optimized in ranges [30, 60], [0.05, 0.2], and [0.0001, 0.01] respectively.Early stooping is also utilized to prevent overtraining with the threshold set to 1/3 of the selected number of training epochs.Respective ranges are presented in Table 1.
Several metaheuristics are included in a comparative analysis of LSTM-ATT hyperparameter tuning.The introduced MCOA algorithm alongside the original COA 4 are tested.Several well-established algorithms are included in the comparison as well such as the GA 34 , PSO 35 , FA 36 , GWO 37 , BSO 38 and COLSHADE 39 algorithm.All metaheuristics are implemented under identical testing conditions with a population size of five agents and with six allocated iterations for optimization.All metaheuristics are implemented specifically for this study with control parameter values set to those suggested in the original works.Finally, experiments are repeated 30 times to ensure a valid comparison that accounts for some of the inherent randomness in these algorithms.
To facilitate a comparison between the optimization potential of the assessed algorithms standard testing metrics including accuracy, precision, recall, and f1-score are utilized.To support the optimization process error rate is used as the objective function determined as per the: An additional metric Cohen's kappa is included as it may provide a better assessment of datasets that have an inherent imbalance.These metrics are used as the indicator function during the optimization and outcomes are logged through the entire process for each evaluated algorithm.The metrics are calculated according to: where v o denotes the observed and v e expected values.
A flowchart of the proposed process is provided in Fig. 1.

Simulation outcomes
Objective function outcomes during simulations in terms of best, worst as well as mean and median outcomes are provided in Table 2 and in terms of indicator function in Table 3.As can be observed in Table 2 as well as Table 3 models optimized by the introduced MCOA attained the best outcomes in terms of objective and indicator functions in all test cases.Furthermore, admirable stability has been demonstrated across all cases.Algorithm stability is further showcased in the distribution plots for the objective and indicator functions shown in Fig. 2 As shown in Fig. 2 the introduced modified metaheuristic demonstrates reliable outcomes ahead of competing algorithms.The introduced algorithm outperformed the original version of the algorithm as well as others included in the comparative analysis.Convergence rate changes in the observed algorithm can be seen in the convergence graphs in terms of objective and indicator functions in Fig. 3 and average objective and convergence graphs shown in Fig. 4.An improvement in convergence rate can be observed for the introduced algorithm.The original COA showcases a slow convergence after stagnating at a local minimum.However, the modification introduced in this work helps the agents locate a better solution within the solution space.A detailed comparison between the best-performing models is showcased in Table 4.
As shown in Table 4 the introduced algorithms demonstrate the highest accuracy and a high f1-score for both PD and control group identification.However, admirable results are shown by the PSO and BSO algorithms in terms of PD and control group when observing precision alone.These outcomes are to be expected as per the NFL, no single approach will work equally well across all metrics and test cases.Further details of the bestperforming model are shown in Fig. 5.
Finally, to facilitate experimental repeatability, the hyperparameter choices made by optimizers for the bestperforming models are presented in Table 5.

Outcome statistical validation
Within the realm of optimization problems, the assessment of models emerges as a crucial focal point.Understanding the statistical significance of implemented enhancements becomes imperative, as a reliance solely on outcomes falls short of establishing the superiority of one algorithm over another.
According to prior investigations 40 , a judicious statistical assessment should transpire only subsequent to the thorough sampling of the evaluated methods.This involves the establishment of objective averages across numerous independent runs, with an additional prerequisite that the samples adhere to a normal distribution to preclude erroneous conclusions.The utilization of objective function averages remains an unresolved inquiry in the comparison of stochastic methods among researchers 41 .
In order to establish the statistical significance of the observed results, the optimal values from 30 independent executions of each metaheuristic were employed to construct the samples.However, the judicious application of parametric tests necessitated verification.To this end, compliance with the recommendations of 42 was ensured, encompassing considerations of independence, normality, and homoscedasticity of data variances.
The independence criterion is met by virtue of initializing each run with a pseudo-random number seed.Nevertheless, the normality condition remains unmet, as evidenced by KED plots shown in Fig. 6 and substantiated by Shapiro-Wilk test outcomes for single-problem instance analysts 43 .By performing the Shapiro-Wilk test, p-values are generated for each method-problem combination, and these outcomes are presented in Table 6.
The conventional significance levels represented by α = 0.05 and α = 0.1 indicate the potential rejection of the null hypothesis ( H 0 ).This implies that none of the samples, spanning diverse problem-method combinations, adhere to a normal distribution.These findings signal the failure to meet the normality assumption, a prerequisite for the robust application of parametric tests.Consequently, the verification of homogeneity of variances was considered unnecessary.www.nature.com/scientificreports/Given the unmet prerequisites for the reliable use of parametric tests, non-parametric tests were employed for subsequent statistical analyses.Specifically, the Wilcoxon signed-rank test, acknowledged as a non-parametric statistical method 44 , was conducted on the MCOA method and all alternative techniques in the conducted experiment.The same data samples utilized in the preceding normality test (Shapiro-Wilk) were applied for each method.The outcomes of this analysis are detailed in Table 7.
Table 7, which presents the p-values obtained from the Wilcoxon signed-rank test, demonstrates that when tackling LSTM-ATT optimization the proposed MCOA method achieved significantly better performance than all other techniques in all three experiments.
The p-values for all other methods were lower than 0.05.Therefore, the MCOA technique exhibited both robustness and effectiveness as an optimizer in these computationally intensive simulations.Based on the

Conclusion
This work tackles PD detection from patient gate data collected from a show-mounted accelerometer sensor as a noninvasive way for early diagnosis.Timely treatments are crucial for battling this neurodegenerative disease as there is currently no way of undoing the damage caused by the condition.This task is tackled through the application of AI algorithms.Attention-based LSTM models are trained on real-world data, and asses on their ability to detect signs of the condition.Furthermore, an altered variation of a relatively novel algorithm is proposed and applied to hyperparameter tuning to improve model performance.The introduced approach has shown admirable outcomes with the best-constructed models exceeding 87% accuracy.Meticulous statistical validations confirmed the observations and enforced that the introduced MCOA outperformed the original algorithm when applied to hyperparameter optimization of LSTM-ATT networks as well as competing optimizers in a statistically significant way.Like any research, this study is not without its limitations.The inclusion of optimization algorithms in the comparative analysis has been restricted due to computational constraints.Similarly, the optimization process is constrained by the use of limited model population sizes.The potential for improved outcomes exists with the allocation of additional resources.Moreover, the current testing is based on the limited available data samples from dual-task walking tests with accelerometers, as only a restricted amount of data is presently accessible for Parkinson's disease diagnosis.
Future research aims to refine early detection methods and explore other contemporary recurrent networks for addressing the task at hand.The introduced optimization algorithm will also be investigated for potential applications in computer security and hyperparameter optimization. https://doi.org/10.1038/s41598-024-54680-ywww.nature.com/scientificreports/

Figure 3 .
Figure 3. Algorithm convergence in terms of objective and indicator function outcomes.

Figure 4 .
Figure 4. Average algorithm convergence in terms of objective and indicator function outcomes.

Table 1 .
Hyperparamaters and their respective ranges.
Figure 1.Flowchart of the proposed model evaluation process.

Table 2 .
Overall objective function simulation outcomes.The best metrics' values are in [bold].

Table 3 .
Overall objective function simulation outcomes.The best metrics' values are in [bold].

Table 4 .
Detailed metric comparison between the best performing models.The best metrics' values are in [bold].

Table 5 .
Hyperparameter choices made for best-performing models constructed by optimizers.

Table 6 .
Shapiro-Wilk scores for the single-problem analysis for testing normality condition.