Abstract
The complex dynamics of animal manoeuvrability in the wild is extremely challenging to study. The cheetah (Acinonyx jubatus) is a perfect example: despite great interest in its unmatched speed and manoeuvrability, obtaining complete wholebody motion data from these animals remains an unsolved problem. This is especially difficult in wild cheetahs, where it is essential that the methods used are remote and do not constrain the animal’s motion. In this work, we use data obtained from cheetahs in the wild to present a trajectory optimisation approach for estimating the 3D kinematics and joint torques of subjects remotely. We call this approach kinetic full trajectory estimation (KFTE). We validate the method on a dataset comprising synchronised video and force plate data. We are able to reconstruct the 3D kinematics with an average reprojection error of 17.69 pixels (62.94% PCK using the nosetoeye(s) length segment as a threshold), while the estimates produce an average rootmeansquare error of 171.3N (\(\approx\) 17.16% of peak force during stride) for the estimated ground reaction force when compared against the force plate data. While the joint torques cannot be directly validated against ground truth data, as no such data is available for cheetahs, the estimated torques agree with previous studies of quadrupeds in controlled settings. These results will enable deeper insight into the study of animal locomotion in a more natural environment for both biologists and roboticists.
Similar content being viewed by others
Introduction
Highspeed manoeuvrability is the “final frontier” in the study of legged locomotion. While constantspeed gait has been studied extensively, the dynamics and control of these transient movements are still sparsely investigated. This could be attributed to the complex dynamics manoeuvring entails, which require wholebody motion and force data to quantify. This data can be gathered in a lab setting^{1} but does not reflect the natural locomotor conditions or animal motivations experienced in the field. Rapid manoeuvres are important to understand, however. From a biomechanics perspective, they push animals to perform at their mechanical limits, revealing what they are capable of in the most extreme circumstances. They are also interesting to the robotics field, as understanding this category of motion will be crucial for the development of more agile robotic systems that better match the capabilities of animals ^{2}.
The cheetah (Acinonyx jubatus) is the perfect model for studying quadruped dynamics as it is not only the fastest terrestrial animal, but also one of the most manoeuvrable. In fact, a study which used GPSIMU collars to investigate the behaviour of wild cheetahs revealed that it is the ability to rapidly accelerate that is most critical to their hunting success^{3}. Tracking collars are, however, only able to treat the animal as one lumped rigid body and are thus unable to provide information about leg, spine or tail kinematics or joint loading.
By contrast, visionbased pose estimation methods provide a means for noninvasive fullbody kinematic and kinetic estimation (“dynamic” estimation). In human research, this technique has been adopted to obtain 3D pose estimates from a single camera, i.e. monocular 3D pose estimation^{4,5}, focusing on fullbody kinematics. While datadriven models or purely kinematic formulations are widely applied in research of human locomotion, there have been efforts to incorporate a more complete physicsbased model^{6,7}, which often is formulated as a nonlinear program (NLP) to solve for both the kinematic and kinetic parameters.
These models provide a strong prior on the motion, making it a popular choice for potentially ambiguous pose detection from a single camera setup. The internal and external torques and forces are a natural byproduct of modelling the kinetics together with the kinematics. Some researchers have used this to explicitly analyse joint torques produced by humans while performing different activities^{8,9}. In conjunction, optimalcontrolbased approaches have been used to fit a human model to corresponding kinematic data to obtain joint torques and contact forces^{10,11}.
For all the work done on humans, there is little to show for markerless dynamic motion estimation of animals in the wild. That said, animal 3D pose estimation (without explicit modelling of physics), for both multiview camera and monocular systems, have been well established in recent years. The multiview camera techniques often resort to triangulationbased methods^{12,13,14}, while the monocular systems have used the skinned multianimal linear model (SMAL)^{15,16} and pose “lifting”^{17} to disambiguate 3D pose estimates from a single view. These methods have been successful at 3D pose estimation of animals in the wild, but they lack the joint torques and ground reaction forces (GRFs) that are important for biomechanic analysis and robotic design.
Researchers have estimated animal torques and GRFs using invasive methods in controlled experiments. In one study, researchers placed reflective markers on racing greyhounds and, together with a sixcamera setup and force plates, were able to determine joint torque and power profiles of the hind limbs^{18}. Similar work was done on the sittostand dynamics of greyhounds^{19}. In the same line of work, a study was done on the forelimb muscle activity of horses^{20}.
While there are examples of dynamic motion estimation being applied to animals, work has mostly been limited to a laboratory setting, or involved invasive markers (or trackers) being placed on the subjects in the wild. In contrast, the proposed method performs markerless fullbody kinetic and kinematic estimation of animals (in this case, cheetahs) with a multicamera setup. To the best of our knowledge, this is a novel approach to the dynamic estimation of wildlife locomotion in the wild.
In our previous work, AcinoSet ^{14,21}, we were able to reconstruct the 3D kinematics of freerunning cheetahs from multiple cameras. Here, we extend this work to include kinetic modelling for the cheetah using a complete physicsbased model to describe the motion. In particular, we are interested in the joint torques during a gallop. To achieve this, we created a new dataset (denoted “kinetic dataset” in this work) that contains synchronised video and force plate data. As with AcinoSet, a trajectory optimisation problem (henceforth referred to as the kinetic full trajectory estimation (KFTE) method is formulated and solved for the dynamic motion estimate, and then evaluated against the kinetic dataset. This differs from the purely kinematic full trajectory estimation (FTE) method developed for AcinoSet. Once the KFTE method has been validated with the kinetic dataset, we can use this method to perform motion and torque estimation on a test set of AcinoSet. This provides valuable information about the joint loading of the limbs of the cheetah during locomotion.
Even though this work focuses on the cheetah, the proposed method can easily be generalised to work with any wild animal. This will enable biologists to understand animal locomotion in much more detail and can provide data for the design of new robotic controllers.
Results
We first present the quantitative and qualitative results of the motion and torque estimation on both datasets, and then we motivate its use using the kinetic dataset to compare estimated GRFs with the ground truth measurements from the force plates.
Motion and torque estimation
The cheetah was represented using a model consisting of 17 rigid bodies, shown in Fig. 1. The 3D pose estimation results are shown in Table 1. The mean position error (MPE) provides a relative measure of how well the reconstruction matches the baseline FTE result using our previous kinematic methods^{14,21}. The MPE is minimal, suggesting that the KFTE method produces similar 3D kinematics to the baseline FTE.
Note that we calculated an average reprojection error of 17.69 pixels, or 62.94 % percentage of keypoints (PCK) using the nosetoeye(s) length threshold), using a subset of 540 frames of 2D handlabelled data as ground truth from two different trials, T4 and T5. This result compliments the MPE shown in Table 1, while also providing an absolute measure of the accuracy of the 3D reconstruction.
Figure 2 shows a single pose viewed from all six camera angles. Visually, the pose estimate is reasonably accurate given the estimate is wellmatched to the cheetah’s skeleton in each view.
The estimated joint angles and torques for the fore and hind limbs, while in contact with the ground (stance phase of the gait), are shown in Figs. 3 and 4 for the kinetic dataset and AcinoSet respectively. The torque estimates produced for the kinetic dataset serve as a reference for what is expected given that we know the corresponding GRF. There is a good correlation between the torque estimates during the stance for both datasets.
Based on the kinetic dataset (see Fig. 3), both the fore and hind limbs exhibit similar torque trajectories. For the front and back ankle joints, the torque profile during the stance follows a half sine wave, where the peak torque on average is around 50%. The hindlimb produces a slightly higher peak torque on average than the forelimb.
The hip torques do not conform to as consistent a profile as the ankle torques do, but they are consistently bounded between 100 and −50 N m. The trailing shoulder joint torques produce peaks at around 20 % of the stance, and quickly returns to zero thereafter. A similar pattern is found in the trailing shoulder joint, however the torque peaks on average around two points in the stance: a negative peak at 20 % and a positive peak at 40 %.
There is consistency in the joint angles for the duration of the stance. The shoulder and hip joints increase linearly from start to finish, while the front and back ankle joints display a parabolalike change during the stance.
Method validation
It must be stressed that the force and torque values obtained using this method are estimates at best—not remote measurements. If multiple feet are on the ground, the inverse dynamics problem does not have a unique solution, so the results are merely one possible combination of forces out of many that could produce the observed motion in the 3D model. Still, these estimates should give a reliable indication of the magnitudes of the forces and torques involved, even if the precise profiles cannot be confirmed.
Without ground truth joint torques, we set up an alternative evaluation procedure to determine whether the observed joint torques are plausible. We make use of the measurements from the force plate as a proxy for a “ground truth” to validate motion and torque estimates. Two KFTE methods for estimating the ground reaction forces were compared: sinusoidal GRF and freeform GRF. The sinusoidal approach assumes a “sinusoidal” GRF profile, while the freeform approach does not make any assumptions about the GRF profile. The idea of using a sinusoidal profile for the GRF was taken from a previous study that examined the GRFs on horses during steadystate galloping^{22}. In addition, the force plate obtained GRF profiles found in the kinetic dataset exhibited the familiar sinusoidal shape, and therefore, it made sense to develop a method that uses this inherent structure.
Table 2 provides the GRF estimation error for both methods, and Fig. 5 provides a visual example of the GRFs.
The sinusoidal KFTE makes a drastic improvement on the GRF error that is evident in both Table 2 and Fig. 5. From the example in the figure, the sinusoidal KFTE provides good tracking of the GRF trajectory, whereas the freeform KFTE fails to produce a valid GRF trajectory.
Discussion
The kinetic dataset facilitated the development of a wholebody dynamic state estimator using the KFTE method. From Table 2, it is clear that the sinusoidal KFTE produces more accurate GRF estimates. The GRF estimate resembles that of the force measurements from the force plates (see Fig. 5a). Many of the stateoftheart physicsbased methods have validated GRF estimate errors in a similar fashion, by using existing studies on predetermined locomotion styles^{7,23}. However, one study validated its GRF estimates using a parkour dataset^{6}, which quotes similar mean errors to our work (see Table 2).
Furthermore, this result confirms the use of the assumed sinusoidal GRF profile for steadystate galloping of quadrupeds. This was previously applied to horses, where it was also found to be a good fit for the data^{22}. Limiting the GRF to a sinusoidal profile allowed for an easier optimisation problem to solve, and reduced the problem of nonunique solutions by restricting the space of possible GRFs to those matching a widelyobserved form from previous quadruped studies.
Note that although the GRF error is vastly different between the two methods reported in Table 2, the resultant 3D reconstructions are similar (calculated average MPE of 13.0). Therefore, the freeform KFTE does yield forces with plausible magnitudes even if the profile does not match the observed truth data. We compared the ground truth impulse to the estimated impulse for both methods for the same trial (T4 of the kinetic dataset). The ground truth impulse was calculated as \({119.4}\,{kg\,\hbox {ms}^{1}}\) and the sinusoidal KFTE produced an impulse of \({122.32}\,{\hbox {kg}\,\hbox {ms}^{1}}\) whereas the an impulse of \({97.50}\,{kg\,\hbox {ms}^{1}}\) was obtained for the freeform KFTE. Furthermore, this is confirmed by the GRF trajectory in Fig. 5a, where the area under the curve (i.e. the impulse) is similar to the ground truth (\(< {20}\,{\%}\) error).
The estimated joint torques produced a consistent torque profile on both datasets (see Figs. 3 and 4). This result is in line with the results on the hindlimbs of racing greyhounds (similar to the cheetah in that they are very fast quadrupeds)^{18}. However, the quoted torques are of different magnitudes, with the greyhound’s back ankle average peak roughly calculated as 40 N m, while the cheetah’s average peak was calculated to be around 100 N m. This is roughly twice the torque generated during the stance when compared to the greyhound. It is known that the cheetah generates larger joint torques to resist larger GRFs than the greyhound^{24}. However, it would not be advisable to place too much emphasis on this particular result given the simplified kinematic model used (in addition to inverse dynamics) to generate these quantities in this work.
The 3D pose estimation is adequate and has not drastically changed from the previous AcinoSet baseline^{21}, as shown in Table 1 and Fig. 2. The MPE is very low and suggests that adding a more complete physics model has not diminished the resultant kinematics. Therefore, without compromising our pose estimation accuracy, we have gained kinetic information, and with it, a more complete picture of the dynamics of the cheetah.
That said, an average PCK of 62.94 % suggests that there is room for improvement in our 2D pose estimation. This result could be attributed to the shortcomings of using a simplified rigid body model for the cheetah. The rigid body model, together with fixed joint centres, are not representative of the flexibility within the cheetah’s skeleton. In addition, modelling the muscles as torquedriven actuators is clearly not characteristic of a cheetah’s muscle system. Lastly, point contacts are used to model the dynamics between the foot and the ground, which completely overlooks the complexity of the cheetah’s foot/paw. As highlighted in Fig. 6, there are times when a rigid body model cannot accommodate the forelimbs and neck body parts because of a lack of extension/contraction.
Nevertheless, the developed method provides a stepping stone to accurate dynamic analysis of wild quadrupeds using a noninvasive multicamera system. Ultimately, the accuracy of the estimated 3D kinematics were adequate for our purposes.
Looking to the future, we have identified two aspects of the current implementation that we could improve: removing the explicit assumption on the GRF profile, and the rigid body model. First, we would like to investigate the incorporation of a learned prior model of the GRF profile in the cost function as a possible improvement to the existing freeform GRF method. The probability model can use examples from existing force data on quadrupeds. This would enable the study of more dynamic movements (turns and rapid acceleration or deceleration).
Second, even though the overall 3D kinematic estimate was sufficient for this work, we can potentially obtain more accurate predictions by remodelling the shoulder and neck links as variable length^{25} which should address the problem highlighted in Fig. 2.
Methods
We first provide background theory to multibody dynamics and the KFTE method used in this work. Then, we provide more details on the the kinetic and AcinoSet datasets. Lastly, we present the evaluation metrics used in this work.
Multibody dynamics
The dynamics of the cheetah was modelled as a rigid multibody system using absolute angle coordinates \(\textbf{q}(t)\)^{26}, often expressed as
where \(\textbf{M}(\textbf{q})\) represents the inertia matrix, \(\textbf{C}(\textbf{q}, {\dot{\textbf{q}}})\) captures Coriolis and centrifugal terms, \(\textbf{G}(q)\) is the gravity vector, \(\textbf{B}\) maps \(\textbf{u}\) inputs to generalised forces, \(\varvec{\lambda }_{g}\) are the ground constraint forces, and \(\varvec{\lambda }_{c}\) are the constraint torques that enforce the geometry of the model. These constraint forces are mapped to the generalised coordinates of the model through the Jacobians, \(\textbf{J}^{T}_{g}(\textbf{q})\) and \(\textbf{J}^{T}_{c}(\textbf{q})\). For brevity, we will combine these contact forces into a single constraint force vector \(\varvec{\lambda }\), mapped by a single Jacobian \(\textbf{J}\). In this work, \(\textbf{B}\textbf{u}\) represents the joint torques produced by the cheetah during locomotion, henceforth denoted by \(\varvec{\tau }\).
The 17link rigid multibody model of the cheetah results in a state vector \(\textbf{q}\) of size 54 (3 angles for each link and an additional 3 for the position of the base link in the inertia frame).
Each link was modelled as a cylinder described by three parameters, length, mass, and radius. These parameters were determined per subject, using the dimensions and masses quoted in multiple resources produced Hudson et al.^{24,27,28}. Note that the length parameters were finetuned by projecting the cheetah skeleton into one of the cameras and validating that the links match the limb lengths of the cheetah.
Trajectory optimisation
In general, the trajectory optimisation problem can be described as a nonlinear program (NLP). Here, we formulate a NLP with the following constraints:
subject to
where \(f(\cdot )\) denotes the multibody dynamics (Equation (1)) and \(C(\cdot )\) are the path and contact constraints. The cost function to optimise is \(g(\cdot )\). The index k indicates the discrete instant in time (or node) within the trajectory. The state variables at each node are related to those at the previous node by numerical integration constraints
where h is the timestep between nodes.
An equality constraint is used to enforce the dynamics of the system (i.e. \(f(\textbf{q}_{k}, \varvec{\tau }_{k}, \varvec{\lambda }_{k})\), see Equation (1)):
The function notation has been omitted for brevity (e.g. \(\textbf{M}(\textbf{q})\) is simply stated as \(\textbf{M}\)). The additional variable \(\mathbf {w_k}\) added to this equation accounts for unknown disturbances to the model, as might be introduced by factors such as variation in the animal subjects and ground conditions.
In addition to the motion constraint above, two constraints to incorporate the measurement data from the cameras are specified:

1.
A constraint that relates the current pose \(\textbf{q}\) to a set of 3D marker positions.

2.
A constraint that relates reprojection of the 3D marker positions to the 2D pixel coordinates for each camera.
Constraint 1 can be determined using the pose equations that relate the generalised coordinates to positions. This is defined as a general function \(g(\textbf{q}_{k})=\textbf{x}_{k}\) made into an equality constraint, \(g(\textbf{q}_{k})  \textbf{x}_{k} = \textbf{0}\), at finite time k. Constraint 2 uses a camera projection matrix, \(\textbf{P}\), to transform a 3D marker from world coordinates to 2D image space. The aim is to minimise the error between the ground truth and reprojected pixel locations; therefore, \(\textbf{v}\) is used to capture this error so that it can be added to the cost function. The resultant equality constraint is defined as
where c selects a specific camera to project to, m selects a specific marker on the cheetah, and k is the finite time that this operation is performed at.
The constraints used in the optimisation include not only the measurement data but also two additional types of constraints: contact constraints to enforce the correct GRF at nodes where a foot has been determined to be grounded, represented by \(\varvec{\lambda }_{g,k}\), and joint constraints to ensure that constraint torques, \(\mathbf {\lambda }_{c,k}\), to prevent motion in forbidden directions (which is necessary when using absolute angle coordinates).
The ground contact constraints are enforced only at those nodes where a foot has been determined to be grounded. We will refer to this subset of nodes as D, and use the notation \(\varvec{\hat{\lambda }_g}\) to refer to the contact force vector acting on the grounded foot. A noslip model of contact is developed using the foot velocity \(\textbf{v}_{f}\) and contact force \(\varvec{\lambda _g}\) to construct the following constraints,
The first constraint enables the generation of a GRF vector, while the second constraint ensures the foot is approximately fixed (\(\epsilon \le 1\)) to the ground during the contact phase. The contact constraints are relaxed to within a small penalty variable, \(\epsilon\), which is minimised in the cost function. This penalty approach makes the optimisation problem easier to solve.
Since sliding contact is not considered, the contact force \(\varvec{\lambda }_g\) vector has to satisfy friction cone constraints^{29} to estimate valid forces. To obtain a simpler model of contact, linearised friction cone constraints are formulated (resulting in a friction pyramid):
where \(\hat{\lambda }_{x}\) and \(\hat{\lambda }_{y}\) are the tangential components of the contact force, \(\hat{\lambda }_{z}\) is the vertical component (\(\hat{\lambda }_{z} > 0\)), and \(\mu _{s}\) is the friction coefficient (and was set to 1.3 for all experiments^{3}). The tangential components of the contact force are decomposed into two positive variables in implementation, i.e. \(\hat{\lambda }_{x}=\hat{\lambda }_{x}^{+}  \hat{\lambda }_{x}^{}\) where \(\hat{\lambda }_{x}^{+},\hat{\lambda }_{x}^{}\ge 0\). This facilitates the optimisation by replacing the nonlinear absolute value function with a linear version (e.g. \(\hat{\lambda }_{x}\equiv \hat{\lambda }_{x}^{+} + \hat{\lambda }_{x}^{}\)) in the above friction cone constraints. (Note that this is only true if either \(\hat{\lambda }_{x}^{+}\) or \(\hat{\lambda }_{x}^{}\) is zero. IPOPT ensured that this condition was true.)
There is a requirement for explicit modelling of joints to remove redundant degrees of freedom introduced by the absolute angle coordinates. Two different types of joints are supported: revolute and universal. The former is created by the constraints
where the vector \(\textbf{r}_i{i,y}\) is aligned with the y axis of the first link, and \(\textbf{r}_i{i+1,x}\) and \(\textbf{r}_i{i+1,y}\) are aligned with the x and z axes of the subsequent link. We have left out the index k for clarity, as these constraints are applied at all nodes. When these constraints are met, the joint is restricted to rotate about the yaxis of the first link only, removing two degrees of freedom. This type of joint is applied at the cheetah’s elbows and knees. Note that \(\textbf{r}_{i,y}\) is the column vector from the rotation matrix \(\textbf{R}\) that associates rotations about the yaxis.
A universal joint is represented by the constraint
preventing a rotation about the zaxis and removing a single degree of freedom. This type of joint is used in the middle of the model’s spine.
Finally, the cost function to be minimised is defined as
where \(\alpha _{1}=1\), \(\alpha _{2}={10000}\), and \(\alpha _{3}=1\). The \(e_{\text { meas}}\) term is defined as
where \(C(\cdot )\) is the redescending robust cost function^{30} and \(\sigma _{i}\) is used to normalise the measurement error. This uncertainty parameter is derived from the predicted 2D keypoint error distribution. The uncertainty parameters were obtained in a similar fashion to our previous work^{14}. In addition, \(e_{\text { model}}\) now represents the error in the dynamic equation of motion instead of the assumed constant acceleration motion:
Note that the above error did not require normalisation.
Lastly, the \(e_{\text { smooth}}\) term is defined as:
where h is the frame rate. The first term penalises the joint torques to ensure a minimum energy solution. The literature suggests that it is a common assumption that humans and animals conserve energy during movement^{31}. The 10 weight is used to increase its contribution during minimisation.
During experiments, it was noted that the optimal solutions would produce jittery motion. For this reason, the second term in Equation (19) was added to the cost function. This term penalises the acceleration of the 3D markers, favouring solutions that have smooth trajectories for all 3D markers. The \(h^2\) weighting term performs normalisation, and the 0.1 weight is used to decrease its contribution during minimisation. Note that through normalisation, each term is assumed to be equally weighted in Equation (16). However, the goal is for the resultant motion estimate to be as physically plausible as possible, and therefore an enormous weight is placed on the \(e_{\text { model}}\) to direct the optimiser to find solutions that essentially force the model noise to be zero, i.e. \(\textbf{w}=\textbf{0}\). Note that IPOPT^{32} was used and configured with the MA97 linear solver^{33} to implement the optimisation.
KFTE
Trajectory optimsiation was used to investigate two KFTE methods: sinusoidal GRF and freeform GRF. The sinusoidal KFTE assumes a “sinusoidal” GRF profile that essentially reduces the optimisation problem to solely estimate the joint torques (we allow for 20 % variation from this assumption). While the freeform KFTE does not make any assumptions about the GRF profile, it instead performs a complete joint estimation of the torques and GRFs. Both methods set up the same optimisation problem (as outlined above), but the sinusoidal KFTE has an extra preprocessing step that generates the initial GRF profile. An example of the expected profile is shown in Fig. 7. We assume contact timing is known and therefore did not require an automatic contact detection algorithm.
In Fig. 7, the \(F_{z}\) component is created using a half sine wave with an amplitude that is determined through a linear model for each limb. The linear model relates the running speed of the cheetah with the peak vertical force^{27}. The \(F_{x}\) component is approximated by a spline using 5 control points: zero points for the start, midpoint, and end of the stance, \(\frac{F_{z}}{2}\) for the deceleration peak, and \(\frac{F_{z}}{4}\) for the acceleration peak. We assume that there is no \(F_{y}\) component present in the straightline steadystate gallops that are included in the datasets (in every trial, the cheetah runs along the xaxis).
Datasets
Two cheetah locomotion datasets were used to develop and evaluate the proposed approach. Neither was a perfect test case – one (AcinoSet) lacked force plate data, while the other used a nonstandard camera calibration that resulted in slightly higher uncertainty in the animal’s pose – however, the combination of the two with these complimentary deficiencies was sufficient to prove the concept. The availability of more complete and accurate locomotion data will likely improve this method in future.
AcinoSet
AcinoSet ^{14} is a cheetahrunning dataset that is used for the evaluation of the motion and torque estimation. The dataset contains 90 running videos with six different views from lowcost GoPro cameras (2704 x 1520 resolution filmed at 120 frames per second (FPS) and 1920 x 1080 resolution, filmed at 90 FPS). There are 7588 humanannotated frames and the average video length is approximately 2 s. Furthermore, the dataset includes DeepLabCut ^{34} 2D pose estimates for each camera view for each video set, as well as the corresponding 3D reconstruction data, in addition to both the intrinsic and extrinsic calibration data. The intrinsic camera calibration data included fisheye lens distortion parameters.
Here, we use the DeepLabCut network that was developed in our previous work^{21} and a subset of five trials from AcinoSet for evaluation (see Table 3). Each trial consists of the cheetah doing a steadystate gallop for two different subjects.
Kinetic dataset
This dataset was generated for this work to validate the proposed method of dynamic data estimation through the use of force plate measurements. The dataset was acquired from The Royal Veterinary College, and methods used to capture the data are described in their previous work^{27}.
The dataset contains grayscale videos (including calibration targets) filmed at 1000 FPS (1280 x 560 and 800 x 600 resolutions) with synchronised force plate data sampled at 3500 Hz (see Fig. 8). From the original dataset, a subset of five trials of two different subjects was selected (Cheetah 1 and Cheetah 2, referred to in Table 1 of the paper by Hudson et al. ^{27})o. This included only those strides where each foot strike landed on a single force plate, to reduce ambiguity in the data. We denote this subset the ‘’kinetic dataset” in this work (see Table 4).
The original work gathered trials of the cheetahs chasing a mechanical lure across eight Kistler force plates (model 9287A and 9287B) and in front of four cameras^{27}. Both the video and the force plate data were resampled to a 200 Hz sample rate for the kinetic dataset. This allows for a more tractable KFTE problem formulation.
As the dataset only consisted of grayscale videos and synchronised force plate measurements, 2D keypoint inference and a scene calibration are required to enable 3D reconstructions of the cheetah.
2D pose estimation
The DeepLabCut ^{34} network trained for AcinoSet (RMSE of 8.38 pixels on the test set) was first used to obtain 2D keypoints on the cheetah in this dataset. However, to quantify the networks ability to infer 2D keypoints on an entirely new dataset, we needed to produce a new test set from the kinetic dataset. A subset of 260 frames of the new data was handlabelled to form a test set, and the RMSE between ground truth and inferred 2D keypoints was 10.85 pixels. In order to reduce the error on the test set, we decided to train a new network to improve inference on the kinetic dataset. An additional 95 frames were manually labelled from the kinetic dataset to improve the training dataset for AcinoSet. This improved performance on the kinetic test set by reducing the RMSE to 6.10 pixels.
Camera calibration
The calibration techniques developed for AcinoSet were used to perform intrinsic and extrinsic calibration for the kinetic dataset. There were some hurdles to overcome, in that the dataset did not contain a standard checkerboard calibration target that was compatible with OpenCV’s calibration library^{35} (see Fig. 8b). This meant that the automatic corner finder could not be used. Instead, multiple frames of the calibration target from each camera were gathered and the points of interest were handlabelled. There were nine points of interest that were chosen on the calibration target shown in Fig. 8b: the four inside corners of the black border, four centre points of the outer squares, and the single centre point of the centre square. This is a smaller number of points than typically used for intrinsic calibration, which could decrease the accuracy of the reconstruction of the fisheye distortion model (particularly components that are orthogonal to the relatively narrow horizontal band that the calibration points were concentrated in), however, as these components are not likely to have a large effect on the parameters being estimated, we decided to proceed with this data set in spite of these deficiencies.
The hand labelling of the calibration target was done for roughly 16 frames across the camera’s field of view, and for each trial included in the dataset. Once the hand labelling was done, a standard intrinsic calibration was performed to obtain the camera intrinsic parameters. Note that radial distortion parameters were estimated during the intrinsic camera calibration process.
The extrinsic calibration of all four cameras simultaneously proved difficult due to the lack of shared scenes between the cameras and the calibration target. In particular, the kinetic dataset used in the calibration process contained videos of the calibration target that were not seen in more than two cameras at the same time, and the calibration target was not viewed at the same time across the force plate track. This meant that there was no shared scene data between the first stereo pair (\(C_{1}\) and \(C_{2}\)) and the second stereo pair (\(C_{3}\) and \(C_{4}\)) shown in Fig. 8b. Based on the distance from the force plate track, cameras \(C_{1}\) and \(C_{2}\) are referred to as the near side stereo pair, and cameras \(C_{3}\) and \(C_{4}\) are referred to as the far side stereo pair.
To overcome this challenge, a stereo calibration was performed between cameras \(C_{1}\) and \(C_{2}\) to obtain the pose of camera \(C_{2}\) relative to camara \(C_{1}\). The world frame was set to the first force plate, and the pose of camera \(C_{1}\) was determined using the perspectivenpoint (PnP) algorithm^{35}. This algorithm uses a set of 3D2D point correspondences on the force plates, viewed by camera \(C_{1}\), to compute the pose of the camera relative to the world frame.
After this stereo calibration, the extrinsic parameters for the near side stereo pair had been estimated. However, it was not possible to determine the pose of the far side cameras relative to the near side due to the lack of shared scenes of the calibration target. To overcome this limitation, a joint optimisation process was utilised to obtain the pose of the cameras on the far side. By using the points on the cheetah, and assuming the cheetah runs in a straight line, it was possible to set up an optimisation problem that jointly estimated the cheetah’s 3D pose and the pose of one of the far side cameras. The stereo calibration of the near side cameras was enough to seed the optimisation for roughly 30 frames of an example trial. The optimisation cost function was defined to minimise reprojection error. This process was performed for each camera on the far side separately. The optimal solutions from the optimisation were found to be reasonable by comparing them with the rough measurements that were taken of the scene by the original researchers^{27}.
It is worth noting that this nonstandard approach added uncertainty to the observations made by the far side cameras. Therefore, when using KFTE for 3D reconstructions of the kinetic dataset, an extra 40 % (i.e. multiply observation errors by 0.6) of uncertainty was added to the observation errors from the far side cameras. This figure was found to provide good solutions during experimentation.
Evaluation
Rootmeansquare error
The RMSE is a general error metric used to quantify the performance of a model by measuring the differences between estimated \(\hat {\mathbf{x}}\) and ground truth \(\textbf{x}\) values in \(\mathbb {R}^{n}\). It is defined as
where \(x_{i}\) is the ith element of the vector \(\textbf{x}\).
2D metrics
To assess pose estimation methods in image space, two reprojection error metrics are used: \(\ell ^2\) norm in pixels and percentage correct keypoints (PCK)^{36}. The \(\ell ^2\) norm of a vector \(\textbf{x}\in \mathbb {R}^{n}\) is defined as
PCK is a metric that considers the percentage of estimated keypoints that are correctly located within a certain threshold distance of the ground truth keypoints. In this project, a matching threshold of the nosetoeye(s) segment length was used, similar to what was done in a study by Mathis et al. ^{37}. This value is a normalised error metric that is calculated in pixels from the ground truth keypoints. Note that the PCK metric is ‘strict’, in that the segment length is small relative to the body and therefore considers a small region for a correct keypoint. In addition, this metric is only evaluated on images where the nose and one of the eyes are clearly visible. This allows the threshold to be determined.
Unless stated otherwise, the \(\ell ^2\) norm is used to denote a ‘general error’ value for prediction performance.
3D metrics
To assess 3D pose estimation methods, a 3D position error metric is used: global mean position error (MPE) over the full trajectory. The MPE accounts for the absolute position in 3D space and is defined as
where N is the length (in frames) of the trajectory, M is the number of markers (placed on the joints of the cheetah), \(\textbf{x}_{k,j}\) is the position of marker j in \(\mathbb {R}^{3}\) at time k, and \({\hat{\textbf{x}}}_{k,j}\) is an estimate of \(\textbf{x}_{k,j}\).
Conclusions
In this paper, we present kinetic full trajectory estimation: a modelbased nonlinear optimisation approach that incorporates estimation of the ground reaction forces to improve markerless 3D pose estimation in wild animals. Although this cannot be regarded as a remote force measurement method, the GRF profiles obtained using the sinusoidal model agree with those observed in this study from force plate data obtained from running cheetahs, as well as other studies of running quadrupeds. The approach produced promising results when tested on running cheetahs. In future, we hope that access to more complete truth data, and the incorporation of more advanced ground reaction force and skeletal models will improve the approach sufficiently to support its use in the study of fast, dynamic manoeuvres.
Data availibility
All data and code are available at https://github.com/AfricanRoboticsUnit/torqueestimationhttps://github.com/AfricanRoboticsUnit/torqueestimation.
References
Robertson, D. G. E., Caldwell, G. E., Hamill, J., Kamen, G. & Whittlesey, S. Research Methods in Biomechanics (Human Kinetics, 2013).
Daley, M. A. Nonsteady locomotion. In Understanding Mammalian Locomotion: Concepts and Applications. 277–306 (2016).
Wilson, A. M. et al. Locomotion dynamics of hunting in wild cheetahs. Nature 498, 185–189 (2013).
Mehta, D. et al. Monocular 3D human pose estimation in the wild using improved CNN supervision. In 2017 International Conference on 3D Vision (3DV). 506–516 (IEEE, 2017).
Martinez, J. Hossain, R. Romero, J. & Little, J. J. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 2640–2649 (2017).
Li, Z. et al. Estimating 3D motion and forces of human–object interactions from internet videos. Int. J. Comput. Vis. 130, 363–383 (2022).
Shimada, S., Golyanik, V., Xu, W. & Theobalt, C. Physcap: Physically plausible monocular 3D motion capture in real time. ACM Trans. Graph. (ToG) 39, 1–16 (2020).
Riemer, R. & HsiaoWecksler, E. T. Improving joint torque calculations: Optimizationbased inverse dynamics to reduce the effect of motion errors. J. Biomech. 41, 1503–1509 (2008).
Zell, P. Wandt, B. & Rosenhahn, B. Joint 3D human motion capture and physical analysis from monocular videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 17–26 (2017).
Felis, M. L., Mombaur, K. & Berthoz, A. An optimal control approach to reconstruct human gait dynamics from kinematic data. In 2015 IEEERAS 15th International Conference on Humanoid Robots (Humanoids). 1044–1051 (IEEE, 2015).
Schemschat, R. M. et al. Joint torque analysis of push recovery motions during human walking. In 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob). 133–139 (IEEE, 2016).
Nath, T. et al. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176. https://doi.org/10.1038/s4159601901760 (2019).
Karashchuk, P. et al. Anipose: A toolkit for robust markerless 3D pose estimation. https://doi.org/10.1101/2020.05.26.117325 (2021).
Joska, D. et al. Acinoset: A 3D pose estimation dataset and baseline models for cheetahs in the wild. In 2021 IEEE International Conference on Robotics and Automation (ICRA). 13901–13908 (IEEE, 2021).
Zuffi, S. Kanazawa, A. Jacobs, D. W. & Black, M. J. 3D menagerie: Modeling the 3D shape and pose of animals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6365–6373 (2017).
Biggs, B. Roddick, T. Fitzgibbon, A. & Cipolla, R. Creatures great and small: Recovering the shape and motion of animals from video. In Asian Conference on Computer Vision. 3–19 (Springer, 2018).
Gosztolai, A. et al. Liftpose3d, a deep learningbased approach for transforming twodimensional to threedimensional poses in laboratory animals. Nat. Methods 18, 975–981 (2021).
Williams, S., Usherwood, J., Jespers, K., Channon, A. & Wilson, A. Exploring the mechanical basis for acceleration: Pelvic limb locomotor function during accelerations in racing greyhounds (Canis familiaris). J. Exp. Biol. 212, 550–565 (2009).
Ellis, R. G., Rankin, J. W. & Hutchinson, J. R. Limb kinematics, kinetics and muscle dynamics during the sittostand transition in greyhounds. Front. Bioeng. Biotechnol. 6, 162 (2018).
Harrison, S. M. et al. Forelimb muscle activity during equine locomotion. J. Exp. Biol. 215, 2980–2991 (2012).
Muramatsu, N. da Silva, Z. Joska, D. Nicolls, F. & Patel, A. Improving 3D markerless pose estimation of animals in the wild using lowcost cameras. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 3770–3776 (IEEE, 2022).
Witte, T., Knill, K. & Wilson, A. Determination of peak vertical ground reaction force from duty factor in the horse (Equus caballus). J. Exp. Biol. 207, 3639–3648 (2004).
Rempe, D. et al. Contact and human dynamics from monocular video. In Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. 71–87 (Springer, 2020).
Hudson, P. E. et al. Functional anatomy of the cheetah (Acinonyx jubatus) hindlimb. J. Anat. 218, 363–374 (2011).
Fukuhara, A., Gunji, M., Masuda, Y., Tadakuma, K. & Ishiguro, A. A bioinspired quadruped robot exploiting flexible shoulder for stable and efficient walking. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 7832–7839 (IEEE, 2020).
Knemeyer, A., Shield, S. & Patel, A. Minor change, major gains: The effect of orientation formulation on solving time for multibody trajectory optimization. IEEE Robot. Autom. Lett. 5, 5331–5338 (2020).
Hudson, P. E., Corr, S. A. & Wilson, A. M. High speed galloping in the cheetah (Acinonyx jubatus) and the racing greyhound (Canis familiaris): Spatiotemporal and kinetic characteristics. J. Exp. Biol. 215, 2425–2434 (2012).
Hudson, P. E. et al. Functional anatomy of the cheetah (Acinonyx jubatus) forelimb. J. Anat. 218, 375–385 (2011).
Trinkle, J. C., Pang, J.S., Sudarsky, S. & Lo, G. On dynamic multirigidbody contact problems with coulomb friction. ZAMMJ. Appl. Math. Mech./Z. Angew. Math. Mech. 77, 267–279 (1997).
Nicholson, B., LopezNegrete, R. & Biegler, L. Online state estimation of nonlinear dynamic systems with gross errors. Comput. Chem. Eng. 70, 149–159. https://doi.org/10.1016/j.compchemeng.2013.11.018 (2014).
Brubaker, M. A., Sigal, L. & Fleet, D. J. Physicsbased human motion modeling for people tracking: A short tutorial. Image (Rochester, NY) 1–48 (2009).
Wächter, A. & Biegler, L. On the implementation of an interiorpoint filter linesearch algorithm for largescale nonlinear programming. Math. Program. 106, 25–57 https://doi.org/10.1007/s101070040559y (2006).
HSL. A Collection of Fortran Codes for Large Scale Scientific Computation. https://www.hsl.rl.ac.uk/ (2007).
Mathis, A. et al. Deeplabcut: Markerless pose estimation of userdefined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Bradski, G. The OpenCV library. Dr. Dobb’s J. Softw. Tools (2000).
Andriluka, M. Pishchulin, L. Gehler, P. & Schiele, B. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. 3686–3693 (2014).
Mathis, A. et al. Pretraining boosts outofdomain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1859–1868 (2021).
Author information
Authors and Affiliations
Contributions
Z.D.S. performed algorithm design, experiments and primary writing of the manuscript. S.S. assisted in algorithm design and review/revision of the manuscript. A.P. conceived the study and revised the manuscript. P.E.H. and A.M.W. collected the cheetah kinetic data, and F.N. provided technical inputs on the image processing and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
da Silva, Z., Shield, S., Hudson, P.E. et al. Markerless 3D kinematics and force estimation in cheetahs. Sci Rep 14, 10579 (2024). https://doi.org/10.1038/s41598024607311
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598024607311
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.