A multimodal dataset of human gait at different walking speeds established on injury-free adult participants

Human motion capture is used in various fields to analyse, understand and reproduce the diversity of movements that are required during daily-life activities. The proposed dataset of human gait has been established on 50 adults healthy and injury-free for lower and upper extremities in the most recent six months, with no lower and upper extremity surgery in the last two years. Participants were asked to walk on a straight level walkway at 5 speeds during one unique session: 0–0.4 m.s−1, 0.4–0.8 m.s−1, 0.8–1.2 m.s−1, self-selected spontaneous and fast speeds. Three dimensional trajectories of 52 reflective markers spread over the whole body, 3D ground reaction forces and moment, and electromyographic signals were simultaneously recorded. For each participants, a minimum of 3 trials per condition have been made available in the dataset for a total of 1143 trials. This dataset could increase the sample size of similar datasets, lead to analyse the effect of walking speed on gait or conduct unusual analysis of gait thanks to the full body markerset used.


Background & Summary
Human motion capture is nowadays commonly used in various fields to analyse, understand and reproduce the diversity of movements that can be produced during daily-life activities. In clinical practice, the emergence of evidence-based medicine promoted the development of quantitative assessment tools for the diagnosis and treatment of pathology-related movement disorders. In particular, the process of gait disorders analysis currently often consists of the measurement of joint kinematics and kinetics in three dimensions 1 . This assessment is called clinical gait analysis (CGA) and attempts to provide an objective record that quantifies the magnitude of deviations from normal gait 2 . On this basis, a set of pathology-related impairments having the most impact on gait is identified and can be used to target the treatment 3 .
However, the identification of deviations is highly dependent with the characteristics of the normative database used 4 . Special attention is then required to discriminate the differences between pathological and asymptomatic populations that could confound deviations. In particular, the gait of pathological populations is often observed at their own self-selected walking speed and compared to normative data established at the spontaneous walking speed of an asymptomatic population 5 . Since the spontaneous walking speed of pathological populations (e.g. ranged between 0.18 and 1.03 m.s −1 for stroke 6 ) is often slower than for an asymptomatic population (ranged between 1.04 and 1.60 m.s −1 7 ), a walking speed mismatch appears. Because walking speed is known to affect kinematics, kinetics, spatiotemporal parameters and muscular activity 8 , the identification of gait deviations can then become challenging since both pathology and walking speed difference may contribute to them 9 . But walking speed is not the only variable that could be source of a mismatch in comparison of a patient and an asymptomatic population. Demographic and anthropometric parameters may also affect CGA interpretation. Recently, Chehab et al. 10 demonstrated the impact of walking speed, but also age, sex and body mass index (BMI) on 3D kinematics and kinetics of the lower limb during gait. While walking speed was the most influential variable, the authors highlighted the influence of demographic and anthropometric parameters on very common parameters (e.g. pelvis tilt, peak of hip extension) used in the identification of gait deviations. Several datasets have been made available in the literature and can be used to ease the establishment of a broad normative database allowing to match patient characteristics [11][12][13][14] . However, few datasets include all the common parameters on a large number of subjects (i.e. spatio-temporal, kinematics, kinetics, electromyography signals). The proposed dataset has been established on 50 healthy participants aged between 19 and 67 years. They were asked to walk on a straight level walkway at five different walking speeds: between 0 and 0.4 m.s −1 , between 0.4 and 0.8 m.s −1 , between 0.8 and 1.2 m.s −1 , self-selected spontaneous speed and self-selected fast speed. Three dimensional trajectories of 52 cutaneous reflective markers spread over the whole body, 3D ground reaction forces and moment, and electromyographic signals were simultaneously recorded. For each participant, 3 trials for each walking speed condition plus one static were recorded and pre-processed, for a total of 1143 trials. This dataset could increase population sample size of similar datasets, lead to analyse the effect of walking speed on gait or conduct unusual analysis of gait characteristics thanks to the full body markerset used.

Methods
Participants. Fifty participants (24 women and 26 men, 37.0 ± 13.6 years, 1.74 ± 0.09 m, 71.0 ± 12.3 kg) were recruited on a voluntary basis. The study was approved by the institutional medical ethic committee of the Rehazenter and follows the recommendations of the declaration of Helsinki. The participants gave their informed consent to participate in the study. All participants were asymptomatic, i.e. healthy and injury free for both lower and upper extremities in the most recent six months, and no lower or upper extremity surgery in the last two years. Furthermore, only participants having a leg length difference lower than 1.5% of the height (corresponding to a maximum of 0.03 m) were included in this study to avoid an effect of a leg length discrepancy in the dataset.
Procedure. For each participant, the entire data collection was acquired in a single session which lasted approximately 2 hours. All the sessions were managed by the same experienced operator. The following procedure was adopted: 1. Calibration of the systems: This calibration was performed following the instructions available in the manufacturer's documentation, including the definition of the inertial coordinate system, the dynamic calibration of the cameras, and the zeroing of forceplates. 2. Introduction to the participant: The operator introduced the laboratory, outlined the need to establish the database, and briefly explained the conduct of the session, including the material used. The participant could ask questions at any time. Table 1).

Preparation of the participant:
The participant was asked to change clothes to tight-fitting clothes or underwear, including removing shoes and socks as the acquisition was barefoot, and tied up their hair if necessary. The operator also collected participants' anthropometric and demographic information (Online-only Table 1). The participant was then equipped with EMG electrodes and cutaneous reflective markers (see section Records). 5. Static record: The participant was standing upright with lower and upper limbs outstretched, palms facing forward, right head with straight eyes. Five seconds without any movement were recorded. The record was verified by the operator. A new standing trial was performed if any marker was missing or movements perturbed the record. 6. Walking trials: The participant was asked to walk back and forth on a 10-m straight level walkway. The instruction given was "to walk as naturally as possible, looking forward". No directive was given about the forceplates to avoid a conscious adaptation of the walk. A minimum of 3 trials were recorded for each condition. All trials were rapidly verified by the operator. Five conditions of walking speed were recorded: between 0 and 0.4 m.s −1 (C1), between 0.4 and 0.8 m.s −1 (C2), between 0.8 and 1.2 m.s −1 (C3), self-selected spontaneous speed (C4) and self-selected fast speed (C5). Conditions C1, C2 and C3 were induced by a metronome 15 and correspond to the three groups described by Perry 16 (i.e. household ambulators, limited community ambulators and community ambulators). An adaptation time to the imposed cadence was foreseen for these 3 conditions and the velocity of the first trial was checked to be in the expected range of speed. C4 and C5 were self-selected conditions in response to the instructions to walk respectively "as usual" and "fast but not running". 7. Session ending: All markers and electrodes were removed. Additional explanations about the records were given to the participants while showing some videos and 3D animations.

records.
A 10-camera optoelectronic system sampled at 100 Hz (OQUS4, Qualisys, Sweden) was used to track the three-dimensional (3D) trajectories of a set of 52 cutaneous reflective markers. The markerset (Fig. 1, Table 1) was defined to allow the use of the biomechanical model proposed by Dumas and Wojtusch 17 . This model follows the recommendations of the International Society of Biomechanics (ISB) 18,19 for the definitions of joint coordinate systems and joint centres. Marker placement was achieved by anatomical palpation (anatomical landmarks reported in Table 1) following the guideline provided by Van Sint Jan 20 and remained unchanged during all trials. The same experienced physiotherapist performed both anatomical palpation and marker placement on all included participants. Two forceplates sampled at 1500 Hz (OR6-5, AMTI, USA) were used to record 3D ground reaction force and moment. These forceplates were embedded in the middle of the walkway travelled during the overground walking trials. A wireless electromyographic (EMG) system sampled at 1500 Hz (Desktop DTS, Noraxon, USA) was used to record the EMG signals collected by 8 probes connected to pairs of surface electrodes with a diameter of 10 mm (Ambu Neuroline 720, Ambu, Denmark). Skin preparation, inter-electrode distance, and electrode locations followed the recommendations of the Surface Electromyography for the Non-Invasive Assessment of Muscles (SENIAM) project 21 . Skin preparation consisted in cleaning with alcohol, preceded www.nature.com/scientificdata www.nature.com/scientificdata/ by shaving, when necessary. An inter-electrode distance of 20 mm was applied for each muscle. EMG signals were recorded on 8 muscles of the right leg: gluteus maximus, gluteus medius, rectus femoris, vastus medialis, semitendinosus, gastrocnemius medialis, soleus, and tibialis anterior. In order to reduce the baseline noise contamination due to movement artefacts, each probe with related cables and electrodes were maintained using a self-adherent wrap (Coban, 3 M, USA). All these systems were synchronised using the Qualisys Track Manager software (QTM 2.8.1065, Qualisys, Sweden).
Data processing. Labelling of the marker trajectories was performed in the Qualisys Tracking Manager software (QTM 2.8.1065, Qualisys, Sweden) and all foot strike and foot off events were manually detected by the same experienced operator. Events were defined based on the threshold of 5 N applied on the vertical ground reaction force, or based on markers trajectories when ground reaction forces were not available. Raw marker trajectories, ground reaction forces and moments and EMG signals, as well as time events, were then exported in the standard c3d file format (https://www.c3d.org) and then imported and processed under Matlab (R2018a, The MathWorks, USA) using the Biomechanics ToolKit (BTK) 22 . Markers trajectories (expressed in mm) were interpolated when necessary using a reconstruction based on marker inter-correlations obtained from a principal component analysis 23 . Then, trajectories were smoothed using a 4 th order Butterworth low pass filter with a 6 Hz cut-off frequency. Ground reaction forces and moments (expressed in N and N.mm, respectively) were smoothed using a 2 th order Butterworth low pass filter with a 15 Hz cut-off frequency. Below the threshold of 5 N defined on the vertical ground reaction force, all of these forces and moments were set to zero. EMG signals (expressed in V) were band pass filtered between 30 and 300 Hz (4 th order Butterworth filter) to reduce artefacts due to motion and electromagnetic fields. All processed data were cropped few frames before the first event and few frames after the last event, depending on the available data. Finally, they were stored in a new c3d file using BTK. These final c3d files are the ones reported in the present dataset.

Data records
All data records are available from figshare 24 . They are all stored in c3d file format (https://www.c3d.org). This file format is a public binary file format supported by all motion capture system manufacturers and biomechanics software programs. It is commonly used to store, for a single trial, synchronized 3D markers coordinates and analog data as well as a set of metadata (e.g. measurement units, custom parameters specific to the manufacturer software application).  Table 1. www.nature.com/scientificdata www.nature.com/scientificdata/  For each of the 50 participants, at least 3 trials (one right and one left gait cycle per trial) for each of the 5 conditions plus one static have been made available in the dataset, for a total of 1143 trials. Structure, labels, format, dimension, unit and description of each variable stored in the c3d files are given in Tables 1-4. Trial by trial information about the availability of forceplate data is given in Supplementary Table 2.

Technical Validation
Calibration of the optoelectronic system. As detailed in the procedure (see Methods), the optoelectronic system was calibrated before each session following the instructions available in the manufacturer's documentation. In all calibration files, residuals (i.e. average of the different residuals of the 2D marker rays that belongs to the same 3D point) were below 2 mm, and the standard deviation of the reconstructed wand (i.e. calibration tool) length remained below 1.5 mm.
3D trajectories of cutaneous reflective markers. In all static and trial files, the 3D trajectories of cutaneous reflective markers were fully reconstructed (i.e. 0% of gap in the trajectories), and residuals remained below 4 mm.   Table 3. Forceplate data stored in c3d files. *Number of frames recorded at 1500 Hz. ¤ All centres of pressure, forces and moments are expressed here in the inertial coordinate system.  Table 4. Metadata* stored in c3d files. *Additional metadata are stored by default (i.e. Copyright, Force_ Platform, Point, Analog, Trial, Event_Context). + Leg length is measured between the anterior-superior iliac spine and the medial tibial malleolus.