Quantitative gait analysis and prediction using artificial intelligence for patients with gait disorders

Ben Chaabane, Nawel; Conze, Pierre-Henri; Lempereur, Mathieu; Quellec, Gwenolé; Rémy-Néris, Olivier; Brochard, Sylvain; Cochener, Béatrice; Lamard, Mathieu

doi:10.1038/s41598-023-49883-8

Download PDF

Article
Open access
Published: 28 December 2023

Quantitative gait analysis and prediction using artificial intelligence for patients with gait disorders

Nawel Ben Chaabane^1,2,
Pierre-Henri Conze^1,3,
Mathieu Lempereur^1,2,4,
Gwenolé Quellec¹,
Olivier Rémy-Néris^1,2,4,
Sylvain Brochard^1,2,4,
Béatrice Cochener^1,2,4 &
…
Mathieu Lamard^1,2

Scientific Reports volume 13, Article number: 23099 (2023) Cite this article

1637 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Quantitative Gait Analysis (QGA) is considered as an objective measure of gait performance. In this study, we aim at designing an artificial intelligence that can efficiently predict the progression of gait quality using kinematic data obtained from QGA. For this purpose, a gait database collected from 734 patients with gait disorders is used. As the patient walks, kinematic data is collected during the gait session. This data is processed to generate the Gait Profile Score (GPS) for each gait cycle. Tracking potential GPS variations enables detecting changes in gait quality. In this regard, our work is driven by predicting such future variations. Two approaches were considered: signal-based and image-based. The signal-based one uses raw gait cycles, while the image-based one employs a two-dimensional Fast Fourier Transform (2D FFT) representation of gait cycles. Several architectures were developed, and the obtained Area Under the Curve (AUC) was above 0.72 for both approaches. To the best of our knowledge, our study is the first to apply neural networks for gait prediction tasks.

Gait analysis dataset of healthy volunteers and patients before and 6 months after total hip arthroplasty

Article Open access 12 July 2022

The detection of age groups by dynamic gait outcomes using machine learning approaches

Article Open access 10 March 2020

Machine learning based estimation of dynamic balance and gait adaptability in persons with neurological diseases using inertial sensors

Article Open access 27 May 2023

Introduction

Gait disorders are described as any deviation from normal walking or gait¹. Their prevalence among adults rises with age. In the elderly population over the age of 70 years, they are present in approximately 35% of patients^2,3 and in 72% of patients over 80 years². These statistics take into account whether such disorders result from neurological etiologies or not, which can be determined through laboratory work, clinical presentation, and diagnostic testing². In fact, gait disorders etiologies include neurological conditions (e.g., sensory or motor impairments), orthopedic abnormalities (e.g., osteoarthritis and skeletal deformities), and medical conditions (e.g., heart failure, respiratory insufficiency, peripheral arterial occlusive disease, obesity)^4,5. Cerebral palsy, as a group of neurological disorders, affects about 2 in every 1000 newborns. Its prevalence reaches 5–8% among newborns with very low birth weights or very pre-term deliveries. Gait disturbances have a tremendous impact on patients, especially on their quality of life¹: they complain most often of pain, joint stiffness, numbness, or weakness⁶. Neurological gait disorders, in particular, are associated with lower cognitive function, depressed mood, and diminished quality of life⁷. To have insight into patients’ conditions and therefore treat their gait disorders, clinicians historically used Observational Gait Analysis (OGA)⁸. OGA usually relies on a clinician’s observation freeze-framed techniques and video slow-motion replay to record and analyze a patient’s gait. It is subject to bias and has limited precision because it relies on the experience of the clinician. To overcome this limitation, Quantitative Gait Analysis (QGA) is considered. It uses instrumentation to quantify the gait cycle by recording temporal-spatial, kinematic, and kinetic data that is rarely gathered by observation. The challenge facing clinicians is to analyze a large amount of clinical data from QGA in order to determine the severity of the illness and select the most effective therapeutic strategy. It is a very tricky task because of the great disparity between patients (e.g., children and adults) and the diversity of their pathologies. In this context, our aim is to assist clinicians in analyzing this large amount of clinical data with an artificial intelligence applied to kinematics from QGA. The target objective is to go beyond objectively quantifying gait quality by predicting whether it will improve within the next visit. These predictions tend to help clinicians select the most effective treatment strategy. For this purpose, two approaches were considered: signal-based, which uses raw gait cycles, and image-based, which converts gait cycles into image-like representations, making them suitable for training image-based deep neural networks, especially pre-trained ones. In the signal-based approach, a Long Short Term Memory (LSTM) and a MultiLayer Perceptron (MLP) were designed from scratch. Their hyper-parameters were tuned with KerasTuner⁹. The obtained results were compared to five state-of-the-art architectures¹⁰, including Fully Convolutional neural Network (FCN), Residual Network (ResNet), Encoder, Time Le-Net (t-LeNet), and Transformer. For the two tailored and state-of-the-art architectures, the influence of data augmentation was studied. In the image-based approach, the first step was to map the time representation of 1D gait cycles to a 2D frequency representation using the two-dimensional Fast Fourier Transform (2D FFT). Then, the obtained 2D FFT images were processed with four pre-trained Convolutional Neural Networks (CNN): VGG16, ResNet34, EfficientNet_b0, and a Vision Transformer (ViT). The obtained results were compared to those of a tailored CNN with a much smaller number of parameters. The effectiveness of the proposed models was evaluated on a gait dataset collected from more than 700 patients.

Materials and methods

Data acquisition

This study was carried out in accordance with the tenets of the Declaration of Helsinki and with the approval of the Brest, France hospital’s (CHRU’s) Ethics Committee. Patients had also signed an informed consent. Our work was conducted between 2021 and 2022. Data collected between June 2006 and June 2021 from 734 patients (115 adults and 619 children) who had undergone clinical 3D gait analysis were used. Their identities were preserved by respecting medical secret and protecting patient confidentiality. All data were recorded using the same motion analysis system (Vicon MX, Oxford Metrics, UK) and four force platforms (Advanced Mechanical Technology, Inc., Watertown, MA, USA) in the same motion laboratory (CHU Brest) between 2006 and 2022. The data collected by the 15 infrared cameras (sampling rate of 100 or 120 Hz) were synchronized with the ground reaction forces recorded by the force platforms (1000 Hz or 1200 Hz). The 16 markers were placed according to the protocol by Kadaba et al.¹¹. Marker trajectories and ground reaction forces were dual-pass filtered with a low-pass Butterworth filter at a cut-off frequency of 6 Hz. After an initial calibration in the standing position, all patients were asked to walk at a self-selected speed along a 10m walkway.

Gait kinematics were processed using the Vicon Plug-in Gait model. Kinematics were time-normalized to stride duration, from 0 to 100% from initial contact (IC) to the next IC of the ipsilateral foot. Nine gait joint angles (kinematic gait variables) were used: anteversion/retroversion of the pelvis, rotation of the pelvis, pelvic tilt, flexion/extension of the hip, abduction/adduction of the hip, internal/external rotation of the hip, flexion/extension of the knee, plantar/dorsiflexion of the ankle, and the foot’s angle of progression. As a result, a gait cycle yielded 101 $\times$ 9 measurements. Let $E_{p,d}$ denote the gait session of patient p at datetime d. It can be written as follows:

$$\begin{aligned} E_{p,d} = \left\{ {C_{ E_{p,d}}}^{1}, {C_{ E_{p,d}}}^{2}, \ldots , {C_{ E_{p,d}}}^{K} \right\} \end{aligned}$$

(1)

where ${C_{ E_{p,d}}}^{k}$ is the k-th gait cycle of a gait session $E_{p,d}$ and K the total number of gait cycles. Let $c_{t,n}^{E_{p,d}^{k}}$ denote the gait cycle ${C_{E_{p,d}}}^{k}$ value at time step t and joint angle n. To keep notations simple, $c_{t,n}^{E_{p,d}^{k}}$ is referred to as $c_{t,n}$ in what follows. ${C_{E_{p,d}}}^{k}$ can simply be represented with a matrix of 101 lines and 9 columns, as follows:

$$\begin{aligned} {C_{ E_{p,d}}}^{k} = \begin{bmatrix} c_{1,1} &{} c_{1,2} &{}\cdots &{} c_{1,9} \\ c_{2,1} &{} c_{2,2} &{}\cdots &{} c_{2,9}\\ \vdots &{} &{} &{} \\ c_{101,1} &{} c_{101,2} &{}\cdots &{} c_{101,9}\\ \end{bmatrix} \end{aligned}$$

(2)

The Gait Profile Score (GPS), a “walking behavior score”, was computed for each gait cycle from the previously described joint angles^12,13,14. The GPS is a single index measure that summarizes the overall deviation of kinematic gait data relative to normative data. It can be decomposed to provide Gait Variable Scores (GVS) for nine key component kinematic gait variables, which are presented as a Movement Analysis Profile (MAP). The GVS corresponding to the n-th kinematic variable, GVS$_{\textrm{n}}$, is given by^15,16,17:

$$\begin{aligned} GVS_n = \sqrt{\frac{1}{T}\sum _{t=1}^{T}(c_{t,n} - c_{t,n} ^{ref})^{2}} \end{aligned}$$

(3)

where t is a specific point in the gait cycle, T its total number of points (typically equal to 101^18,19), $c_{t,n}$ the value of the kinematic variable n at point t, and $c_{t,n}^{\textrm{ref}}$ is its mean on the reference population (physiological normative). The GPS is obtained from the GVS scores^15,17 as follows:

$$\begin{aligned} GPS = \sqrt{\frac{1}{N}\sum _{n=1}^{N}GVS_n^{2}} \end{aligned}$$

(4)

where N is the total number of kinematic variables (equal to 9 by definition).

Gait database

We had a total of 1459 gait sessions from 734 patients (115 adults and 619 children). Each patient had an average of 1.988 gait sessions with a standard deviation of 1.515. 53,693 gait cycles were collected. Their average number per gait session is equal to 18 with a standard deviation of 6. Neurological conditions, notably cerebral palsy, are the most frequent etiologies, as we can see in Fig. 1.

The average patient age within the first gait session is equal to 14 years, with a standard deviation of 16 years. The time delay between the first and last gait session (for patients with more than one gait session, i.e., 319) is equal to 3.92 years on average with a standard deviation of 3.24 years. Directly consecutive gait sessions are, on average, separated by approximately 740 days, with a standard deviation of 577 days. The shortest (resp. longest) time delay was equal to 4 (resp. 4438) days. We had 1384 pairs of directly consecutive gait sessions belonging to 319 patients (the remaining patients were removed since they had only one gait session). Involved gait conditions are various: without any equipment, with a cane, with a rollator, with an orthosis, with a prosthesis.. Only pairs of gait sessions without equipment were selected in order to be in the same condition (79% of all available pairs, i.e. 1152). The first gait sessions in these pairs were used for training. Models were fed the gait cycles of these first gait sessions (i.e., 21,167 gait cycles in total).

GPS variation prediction

GPS variation prediction is similar enough to a Time Series Classification (TSC) issue that its proposed popular architectures should be adopted. Consecutive gait session pairs $(E_{p,d}, E_{p,d+\Delta d})$ were considered. For each gait cycle ${C_{ E_{p,d}}}^{k}$ of the current gait session $E_{p,d}$, a GPS variation $\Delta {}GPS$ was computed using:

$$\begin{aligned} \Delta {}GPS({C_{ E_{p,d}}}^{k}) = GPS_{avg}( E_{p,d+\Delta d}) - GPS({C_{ E_{p,d}}}^{k}) \end{aligned}$$

(5)

where $GPS_{avg}(E_{p,d+\Delta d})$ is the average GPS per cycle of $E_{p,d+\Delta d}$ and $GPS({C_{ E_{p,d}}}^{k})$ the GPS of the current gait cycle ${C_{E_{p,d}}}^{k}$. The average GPS per cycle $GPS_{average}(E_{p,d})$ of a gait session $E_{p,d}$ is simply equal to:

$$\begin{aligned} GPS_{avg}(E_{p,d}) = \frac{\sum _{k=1}^{K} GPS({C_{ E_{p,d}}}^{k}) }{K} \end{aligned}$$

(6)

$\Delta {}$

GPS was ranked in a binary fashion. Either it is negative, in which case the patient’s gait improves (class 1), or it is positive, in which case the patient’s gait worsens (class 0). The metric used is the Area Under the Curve (AUC).

The distribution of patients between training, validation, and test groups is provided in Table 1. Such a split put 73%, 12%, and 14% of total gait cycles within the training, validation, and test groups, respectively.

Table 1 Data distribution for $\Delta {}$GPS prediction.

Full size table

Signal-based approach

To be exhaustive, one MLP, one recurrent neural network (LSTM), one hybrid architecture (Encoder), several CNN architectures (FCN, ResNet, t-LeNet), and a one-dimensional Transformer²⁰ were included. The MLP and LSTM were designed and developed from scratch. Their hyper-parameters were optimized manually. FCN, ResNet, Encoder, and t-LeNet are among the most effective end-to-end discriminative architectures regarding the TSC state-of-the-art¹⁰. These methods were also compared to the Transformer, a more recent and popular architecture. The Transformer does not suffer from long-range context dependency issues compared to LSTM²¹. In addition, it is notable for requiring less training. The Adam optimizer²² and binary cross-entropy loss were employed²³.

For MLP, gait cycles were flattened so that the input length was equal to 909 time steps. The number of neurons was the same across all the fully connected layers. Many values of this number were tested to find the best structure for our task. In the same way, the number of layers was optimized. The corresponding architecture is shown in Fig. 2.

LSTM layers were stacked, and a dropout was added before the last layer to avoid overfitting. The corresponding architecture is shown in Fig. 3.

For FCN, ResNet, Encoder and t-LeNet, the architectures proposed in Ref.¹⁰ were considered. They are shown in Figs. 4, 5, 6 and 7, respectively. We followed an existing implementation²⁴ to set up the Transformer.

Data augmentation

Different techniques of data augmentation were tested as a pre-processing step to avoid overfitting: jittering, scaling, window warping, permutation, and window slicing. Their hyperparameters were empirically optimized for each model. These are among the TSC literature’s most frequently utilized techniques, particularly when it comes from sensor data¹⁰.

Image-based approach

Image-based time series representation initiated a new branch of deep learning approaches that consider image transformation as an innovative pre-processing of feature engineering²⁵. In an attempt to reveal features and patterns less visible in the one-dimensional sequence of the original time series, many transformation methods were developed to encode time series as input images.

In our study, sensor modalities are transformed to the visual domain using 2D FFT in order to utilize a set of pre-trained CNN models for transfer learning on the converted imagery data. The full workflow of our framework is represented in Fig. 8.

2D FFT is used to work in the frequency domain or Fourier domain because it efficiently extracts features based on the frequency of each time step in the time series. It can be defined as:

$$F(u,v) = \frac{1}{{T.N}}\sum\limits_{{t = 0}}^{T} {\sum\limits_{{n = 0}}^{N} {c_{{t,n}} } } \exp \left( { - j2\pi \left( {\frac{{ut}}{T} + \frac{{vn}}{N}} \right)} \right)$$

(7)

where F(u, v) is the direct Fourier transform of the gait cycle. It is a complex function that shows the phase and magnitude of the signal in the frequency domain. u and v are the frequency space coordinates. The magnitude of the 2D FFT |F(u, v)|, also known as the spectrum, is a two-dimensional signal that represents frequency information. Because the 2D FFT has translation and rotation attributes, the zero-frequency component can be moved to the center of |F(u, v)| without losing any information, making the spectrum image more visible. The centralized FFT spectrums were computed and fed to the proposed deep learning models. A centralized FFT spectrum for a given gait cycle is represented in Fig. 9.

Proposed deep learning models

Timm pre-trained models

The Timm library’s²⁶ pre-trained VGG16, ResNet34, EfficientNet_b0, and the Vision Transformer ’vit_base_patch16_224’ were investigated. They were pre-trained on a large collection of images, in a supervised fashion. For the Transformer, the pre-training was at a resolution of $224 \times 224$ pixels. Its input images were considered as a sequence of fixed-size patches (resolution $16 \times 16$), which were linearly embedded.

Converting our grayscale images to RGB images was not necessary because Timm’s implementations support any number of input channels. The model’s minimum input size for VGG16 is $32 \times 32$. The image’s width dimension (N) equals 9, which is less than 32. In order to fit the minimum needed size, 2D FFT images were repeated 4 times in this width dimension. Transfer learning with fine-tuning methods was employed. One neuron’s final fully connected layer was used. In the same way that the top layers were trainable, all convolutional blocks were.

Two-dimensional 2D CNN

The pre-trained Timm models are deep and sophisticated, with many layers. As a result, a CNN model with fewer parameters, designed from scratch, was conceived. The number of used two-dimensional convolutional layers was a hyper-parameter to optimize in a finite range of values {1, 2, 3, 4, 5}. After the convolutional block, a dropout function was applied. Following that, two-dimensional max-pooling (MaxPooling2D) and batch normalization were used. The flattened output of the batch normalization was then fed to a dense layer of a certain number of neurons to tune. In order to predict the $\Delta GPS$, our model had a dense output layer with a single neuron. The corresponding architecture is shown in Fig. 10.

The following are all of the architecture hyper-parameters to tune: the number of convolutional layers (num_layers), the number of filters for each convolution layer (num_filters), the kernel size of each convolution layer (kernel_size), the dropout rate (dropout), the pooling size of the MaxPooling2D (pool_size), the number of neurons in the dense layer (units), and the learning rate (lr). Five models with a varying number of convolutional layers (from 1 to 5) were tested. For each of them, the rest of the hyper-parameters were tuned using KerasTuner⁹ to maximize the validation AUC.