Deep learning methodology for predicting time history of head angular kinematics from simulated crash videos

Hasija, Vikas; Takhounts, Erik G.

doi:10.1038/s41598-022-10480-w

Download PDF

Article
Open access
Published: 20 April 2022

Deep learning methodology for predicting time history of head angular kinematics from simulated crash videos

Vikas Hasija¹^na1 &
Erik G. Takhounts²^na1

Scientific Reports volume 12, Article number: 6526 (2022) Cite this article

1670 Accesses
Metrics details

Subjects

Abstract

Head kinematics information is important as it is used to measure brain injury risk. Currently, head kinematics are measured using wearable devices or instrumentation mounted on the head. This paper evaluates the deep learning approach in predicting time history of head angular kinematics directly from videos without any instrumentation. To prove the concept, a deep learning model was developed for predicting time history of head angular velocities using finite element (FE) based crash simulation videos. This FE dataset was split into training, validation, and test datasets. A combined convolutional neural network and recurrent neural network based deep learning model was developed using the training and validations sets. The test (unseen) dataset was used to evaluate the predictive capability of the deep learning model. On the test dataset, correlation coefficient obtained between the actual and predicted peak angular velocities was 0.73, 0.85, and 0.92 for X, Y, and Z components respectively.

Convolutional neural network for efficient estimation of regional brain strains

Article Open access 22 November 2019

Automated soccer head impact exposure tracking using video and deep learning

Article Open access 03 June 2022

Expert surgeons and deep learning models can predict the outcome of surgical hemorrhage from 1 min of video

Article Open access 17 May 2022

Introduction

In the United States, traumatic brain injury (TBI) is a serious public health issue. In 2014, about 2.87 million TBI related emergency department (ED) visits, hospitalizations and deaths occurred in United States¹. Falls and motor vehicle crashes (MVC) were the first and second leading causes of TBI-related hospitalizations¹. The lifetime economic cost of TBI, including direct and indirect medical costs, was estimated to be approximately $76.5 billion (in 2010 dollars)². Given the cost and number of TBI cases, understanding the mechanism of brain injury and preventing them is critical. Researchers over the years have found head motion kinematics to be an important correlate to brain injuries. Many head/brain injury metrics that have been developed such as the head injury criterion (HIC)³, brain injury criterion (BrIC)⁴, rotational injury criterion (RIC)⁵ etc. all make use of head motion kinematics. HIC is based on linear accelerations and is part of vehicle safety regulation⁶, BrIC is based on angular velocities and RIC is based on angular accelerations. Measuring head kinematics is thus extremely important to understand the risk of brain injury.

Deep learning is a part of machine learning based on artificial neural networks and has been shown to be very effective in solving complex problems in the area of computer vision, natural language processing, drug discovery, medical image analysis, etc. Recently, deep learning models were used in brain injury biomechanics field as well. Wu et al.⁷ used American college football, boxing and mixed martial arts (MMA) datasets along with lab-reconstructed National Football League impacts dataset to develop a deep learning model to predict 95th percentile max principal strain of the entire brain and the corpus callosum along with fiber strain of the corpus callosum. Zhan et al.⁸ used kinematic data generated by FE simulations and those collected from on-field football and MMA using instrumented mouthguards and developed a deep learning head model to predict the peak maximum principal strain (MPS) of every element in the brain. Ghazi et al.⁹ developed a convolutional neural network (CNN) to instantly estimate element-wise distribution of peak maximum principal strain of the entire brain using two-dimensional images of head rotational velocity and acceleration temporal profiles as input to CNN model. Also, Bourdet et al.¹⁰ developed a deep learning model with linear accelerations and linear velocities from helmet tests as input to the model to predict maximum Von Mises stress within the brain.

In addition to predicting strains and stresses in the brain, deep learning models have also been developed to detect impacts to the head in American Football. Gabler et al.¹¹ evaluated a broad range of machine learning (ML) models and developed a Adaboost based ML model to discriminate between head impacts and spurious events using 6DOF head kinematic data collected from a custom-fit mouthguard sensor. More recently, Raymond et al.¹² used head kinematic data from instrumented mouthguards augmented with synthetic head kinematic data obtained from FE head impacts to detect impacts to the head using physics-informed ML model.

One common thread in these studies is that they use head kinematics data obtained from FE simulations/ wearable devices/ head instrumentation as input for their deep learning models to predict either strains in the brain or detect impact to the head.

Related to head kinematics, video analysis has also been used in the past. For example, Sanchez et al.¹³ evaluated laboratory reconstruction videos of head impacts collected from professional football games. The videos were generated from a high-speed camera recording at 500 frames per second. These videos were not used to predict or compute head kinematics but were analyzed to identify a time region of applicability (RoA) for head kinematics and for application to FE brain models to determine MPS and cumulative strain damage measure (CSDM)¹⁴.

The goal of this study was to evaluate the feasibility of deep learning methodology to predict time history of head angular kinematics directly from simulated crash videos and its applicability to controlled testing environments like National Highway Traffic Safety Administration (NHTSA) commissioned vehicle crash tests. As a proof of concept, a deep learning model was developed to predict time history of X, Y and Z-components of head angular velocity vector from FE based crash videos. Angular velocity has been shown to better correlate with brain strains as compared to angular accelerations¹⁵ and is used in brain injury criterion (BrIC) developed by NHTSA for assessing risk of TBI. For these reasons, predicting angular velocity time histories was chosen for this study, while corresponding angular accelerations could be readily computed from the time histories of angular velocities. Skull fracture in not a major concern in vehicular crashes¹⁶ due to the presence of airbags and thus linear acceleration based head injury criterion (HIC) was not considered.

Methods

Data

A supervised deep learning model takes in the inputs and the corresponding outputs and learns the mapping between the inputs and the outputs. For developing a deep learning model for predicting time history of angular velocities from crash videos, crash videos are required as inputs and the corresponding time history of angular velocities are required as outputs. FE based crash simulation data was utilized in this proof of concept study.

To generate the data, validated simplified Global Human Body Models Consortium (GHBMC) 50th percentile male^17,18 and 5th percentile female^19,20 FE human models were used in a variety of frontal crash simulations. These human models were positioned in the driver compartment (Fig. 1) that was extracted from the validated FE model of a 2014 Honda Accord²¹.

A validated generic seatbelt system with retractor, pretensioner and load limiter was included in the model along with validated frontal and side airbags²¹. In addition, steering column collapse was implemented and was included in these simulations. The roof rails, side door, B-pillar, and floor were deformable in the full FE model, but were made rigid in this study. The knee bolster and A-pillar were kept deformable. The human models were positioned in the driver compartment based on clearance measurements taken from physical crash tests (NHTSA test number 8035 for 50^th male, NHTSA test number 8380 for 5th female; https://www-nrd.nhtsa.dot.gov/database/veh/veh.htm). The crash pulse used for the simulations was taken from a physical crash test (NHTSA test number 9476) and is shown in Supplementary document Sect. 1.

These human models were evaluated in full frontal test condition, following which a design of experiments (DOE) study was conducted. For the DOE study, both crash-related parameters and restraint-related parameters were varied (Table 1). The crash related parameters were Delta-V and principal direction of force (PDOF). The restraint parameters were both seatbelt and airbag related. The parameters were varied over a wide range to generate a range of head motions including cases where the head hits the steering wheel.

Table 1 Parameters and their ranges.

Full size table

The crash pulse for the same vehicle may be different for different PDOF, frontal overlap, and type and stiffness of the impacting surface. In addition, for the same PDOF, frontal overlap, and impacting surface, the crash pulse can vary for different vehicles of the same size (e.g., mid-size sedans). To keep the number of variables manageable for the DOE study, crash pulse shape was kept constant. Only crash pulse magnitude was scaled to achieve different Delta-Vs.

A total of 1010 scenarios were simulated covering a wide range of crash conditions. Each crash scenario was simulated for a duration of 150 ms, which takes approximately 4.5 h on 28 processors. For each simulation, the time history of head angular velocities about the three head rotational axes (Fig. 2a) was computed and four crash videos with different views were generated (Fig. 2b).

The views chosen were similar to camera views available from NHTSA crash tests. Since the aim of the study was to predict the time history of head angular velocities from any view, each crash view was treated as a separate sample. Thus, we had a total of 4040 crash videos and their corresponding head angular velocity time histories (ω_x, ω_y, ω_z) about the three head rotational axes.

The crash videos were then used as inputs for the deep learning model and the corresponding angular velocity time histories were used as the “ground truth” outputs. For the purposes of this study, all crash videos were generated such that only the human model was visible. The vehicle structure and the airbags were removed from the videos to prevent any head occlusion.

Since videos are used as inputs to the deep learning model in the form of sequence of images, an additional input pre-processing step was carried out to convert the FE based crash videos to sequences of RGB images. Given the goal of this study was to predict the time histories of head angular velocities, the motion of the head was extracted as a sequence of RGB images over time from each FE crash video (Supplementary document Sect. 2). These sequences of images were then used as inputs to the deep learning model.

The images were extracted every 2 ms from the 150 ms crash event and thus each sequence of images had a length of 76. The corresponding “ground truth” time histories of angular velocities (outputs or targets) were also sampled every 2 ms to match the corresponding sequence of images. This was done to support the deep learning architecture used in this study as described below in the deep learning model section. An example of the input and corresponding output for training the deep learning model is shown in Fig. 3. For visualization purpose, the input sequence of images in Fig. 3 is shown every 20 ms.

Input data transformation

The input data (sequence of images) were RGB images with pixel values in the range from 0 to 255. Deep learning models train better and faster when input data is on the same scale. Thus, all the input sequences of images were normalized so that the pixel values were in the range from 0 to 1. Due to resource limitations, all images were resized to a height and width of 64 pixels and subsequently converted to grayscale such that each sequence of images had a shape of (76, 64, 64, 1), where number 76 stands for the number of images in a sequence, numbers 64 are for image size, and 1 stands for the number of channels (1 represents grayscale image).

Data splitting

The entire dataset had 4040 samples. For developing the deep learning model, this dataset was split into three datasets: training, validation and test datasets. 74% the data was used for training, 13% of the data was used for validation and 13% of the data was used for testing. Data splitting was carried out using stratified sampling based on human model size and the crash view to ensure each of these (human model size and crash view) were equally represented in all three datasets (Supplementary document Sect. 3).

The training and validation datasets (87% of the data) were used for model development. The validation dataset was used for hyperparameters tuning and was a part of model development. The test dataset was not used in model development and was treated as an unseen dataset that was used to evaluate the final performance of the model.

Deep learning model

The overall architecture for a deep learning model depends on the type of input data. The input data in this study is a sequence of images over time. Convolutional neural networks (CNN) can capture spatial dependency and are one of the most common types of neural networks used in computer vision to recognize objects and patterns in images. On the other hand, recurrent neural networks (RNN) can capture temporal dependency and are commonly used for sequential data processing. Thus, to process sequences of images in this study, a deep learning model that combines CNN²² and Long Short-Term Memory (LSTM)²³ based RNN was used. The CNN-LSTM architecture uses CNN layers for feature extraction on input data combined with LSTMs to support sequence prediction.

Since the best architecture for our problem was not known at the start of model development, a lightweight baseline model (with fewer trainable parameters) was developed, which was later improved using hyperparameter tuning. For the CNN part of the baseline model, a Visual Geometry Group (VGG) style architecture²⁴ was used, which consisted of a three-block network with two convolutional layers per block followed by a max pooling layer. Batch normalization²⁵ and a rectified linear unit (ReLU) activation function²⁶ were used after each convolutional layer. The baseline (initial) values selected for the number of convolutional filters for the three blocks were 16, 32 and 64 respectively. A global average pooling layer was added as the last layer of the CNN model to obtain the feature vector. Since each input sample is a sequence of images, the CNN part of the model was wrapped in a time distributed layer²⁷ to get feature vector corresponding to the entire sequence. The time distributed wrapper helps apply the same CNN network to every temporal slice (image) of the input. The output from CNN was used as an input to the LSTM network. The LSTM network can be set up in multiple ways, the details of which are provided in Supplementary document Sect. 4. For the LSTM part of the baseline model, one LSTM layer with a hidden size of 128 was used. Since input sequence has a length of 76 and the goal is to predict the time history of angular velocity, the output was obtained at each recurring timestep from the LSTM layer. The output of the LSTM was then used as an input for a fully-connected layer with the ReLU activation function, followed by a dropout layer²⁸ to control for overfitting. The output of the dropout layer was then fed to a fully-connected layer with a linear activation function to generate the final output, i.e. the predicted time history of angular velocity. Linear activation generates continuous numerical values and hence was used in the final output layer as angular velocity time history prediction was solved as a regression task.

The mean squared error (MSE) between the actual and predicted time history was used as the loss function for training the entire model. Adaptive moment estimation (Adam) optimizer²⁹ was utilized for optimization. Since the ReLU activation was used in the network, He-Normal initializer³⁰ was used to initialize the trainable weights of the model. The model was developed using Tensorflow v2.4²⁷. The training was carried out on Google Colab using a single Tesla-P100 GPU. The model training time ranged from 1.5 to 2 h.

Individual deep learning models and training

Training a single deep learning model to predict time history of all three components of angular velocity did not produce good results. Since the three components of angular velocity (ω_x, ω_y, ω_z) are independent of each other, three separate deep learning models were trained—one for each component of angular velocity ω_x, ω_y, and ω_z, which led to marked improvement in the results. The same training and validation inputs were used for training all three models. Only the “ground truth” targets were changed depending on the model. The baseline models for ω_x, ω_y, and ω_z were trained with a learning rate of 0.0001 and with a batch size of 4 for a maximum of 80 epochs. Early stopping²⁷ with a patience of 10 and model checkpointing²⁷ callbacks were used to save the best model based on validation loss. Models often benefit from reducing the learning rate by a factor of 2–10 once learning stagnates. For this purpose, ReduceLROnPlateau²⁷ callback was utilized. This callback monitors the validation loss and if no improvement is seen for 5 epochs, the learning rate is reduced.

The hyperparameter values chosen for the CNN, LSTM, and the extended part of the baseline model were selected at random and did not necessarily correspond to the best architecture for the problem. To improve the models, hyperparameter tuning (Supplementary document Sect. 5) was carried out to find the set of hyperparameter values that give the best results for our problem.

Because of resource limitations, hyperparameter tuning was only performed for the ω_x model to find the best set of hyperparameters. This set of hyperparameters was then used to train the final deep learning models for all three components of angular velocity.

Combined model

The three individually trained models for ω_x, ω_y, and ω_z were combined into a single deep learning model as shown in Fig. 4. To predict the time history of the three components of angular velocity from a video input of any view, the video (preprocessed as sequence of images) is passed into the combined model. It is then propagated (forward pass) through the individually trained networks that output the time history of the three components of angular velocity ω_x, ω_y, and ω_z.

Model evaluation

Individual model evaluation

The three individually trained deep learning models for ω_x, ω_y, and ω_z were evaluated on the test dataset to see how well they generalize on unseen data. The actual and predicted time histories for cases from the test dataset were compared quantitatively using CORA³¹. While time histories of angular velocities are important to assess overall head kinematics, for computing brain injury metrics peak values are usually used. For example, brain injury criterion (BrIC)⁴ is computed using absolute peaks of ω_x, ω_y, and ω_z (Eq. (1)).

$${\text{BrIC}} = \sqrt {\left( {\frac{{\omega_{xm} }}{66.25}} \right)^{2} + \left( {\frac{{\omega_{ym} }}{56.45}} \right)^{2} + \left( {\frac{{\omega_{zm} }}{42.87}} \right)^{2} }$$

(1)

To evaluate prediction of the peak angular velocity, correlation coefficient between the actual and predicted peaks was computed for all three models using the test dataset.

Frame rate evaluation

The individual models were trained on a sequence of images captured every 2 ms, i.e. 500 frames/second (fps) videos (Fig. 5a). Evaluation was carried out to determine the influence of frame rate on both time history and peak predictions. Three different frame rates were evaluated i.e. 250 fps, 125 fps and real time video at 25 fps. For this evaluation, the “ground truth” angular velocities sampled every 2 ms were kept the same, but the input sequence of images were changed. For 250 fps, images sampled every 4 ms were kept in the sequence of images while others were converted to a black image (Fig. 5b). Similarly, only images sampled at 8 ms and 40 ms were kept for 125 fps (Fig. 5c) and 25 fps (Fig. 5d) while the rest were converted to black images. This procedure was followed to support the deep learning architecture described above.

Models were trained for ω_x, ω_y, and ω_z for each frame rate using the same hyperparameters as in the 500 fps study. These models were then evaluated on the test dataset by quantitatively comparing the actual and predicted time histories using CORA for the same cases as the 500 fps study. In addition, correlation coefficients between the actual and predicted peaks were evaluated.

Combined model evaluation

The combined model was evaluated for a few cases from the test dataset. For these cases, the actual and predicted time histories were compared using CORA. These actual and predicted time histories were also used to simulate the SIMon head model¹⁵ to compare the actual and predicted brain strains. In addition, the actual and predicted BrIC values were compared.

Camera view performance evaluation

Since four different views were used in this study to train the models, evaluation was conducted using the test dataset to determine the performance of each view. For each view, average CORA scores were computed for all three components of angular velocity (ω_x, ω_y, and ω_z). Correlation coefficients between the actual and predicted peaks for the three components of angular velocity were computed for each view as well. Both the average CORA scores and correlation coefficients were used to make the performance determination.

Additional crash pulse evaluation

The models for ω_x, ω_y, and ω_z were trained using a single crash pulse (Supplementary document Sect. 1), which was taken from NHTSA test 9476 (2015 Chevrolet Malibu in a frontal oblique offset test). The crash pulse magnitude was changed but shape was kept the same. To test the robustness, the combined model was further evaluated using three additional crash pulses (Fig. 6). These crash pulses were taken from the NHTSA test numbers 8035 (2013 Honda Accord in a frontal Impact test), 9010 (2015 Ford Escape in a frontal Impact test), and 9011 (2015 Dodge Challenger in a frontal Impact test). Frontal impacts were simulated with these additional crash pulses using the 50th male GHBMC model. Videos were then generated for these simulations and the combined deep learning model time history predictions were compared with the actual time histories of head angular velocities.

Results

Final model architecture

Figure 7 shows the final model architecture with tuned hyperparameters (Supplementary document Sect. 5, Supplementary Table S2). The data shapes shown in Fig. 7 are the output shapes from each layer. This model has approximately 845,000 trainable parameters.