Dynamic forecasting of severe acute graft-versus-host disease after transplantation

Forecasting of severe acute graft-versus-host disease (aGVHD) after transplantation is a challenging ‘large p, small n’ problem that suffers from nonuniform data sampling. We propose a dynamic probabilistic algorithm, daGOAT, that accommodates sampling heterogeneity, integrates multidimensional clinical data and continuously updates the daily risk score for severe aGVHD onset within a two-week moving window. In the studied cohorts, the cross-validated area under the receiver operator characteristic curve (AUROC) of daGOAT rose steadily after transplantation and peaked at ≥0.78 in both the adult and pediatric cohorts, outperforming the two-biomarker MAGIC score, three-biomarker Ann Arbor score, peri-transplantation features-based models and XGBoost. Simulation experiments indicated that the daGOAT algorithm is well suited for short time-series scenarios where the underlying process for event generation is smooth, multidimensional and where there are frequent and irregular data missing. daGOAT’s broader utility was demonstrated by performance testing on a remotely different task, that is, prediction of imminent human postural change based on smartphone inertial sensor time-series data.

Few studies have used time-series of all available patient information for forecasting severe aGVHD. One recent study applied penalized logistic regression to vital signs (body temperature, heart rate and so on) that were consistently recorded within the first 10 days after HSCT and achieved an AUROC value of 0.66 for forecasting grade II-IV aGVHD 11 . Modeling in HSCT-unlike in more common medical situations-is challenged by the small sample size (n), high feature number (p) and nonuniform data sampling 12 . This study thus aims to utilize evidence from all available dynamic variables to forecast severe aGVHD better.
We compiled and curated the post-transplant multidimensional time-series data of patients treated with human leukocyte antigen (HLA)-mismatched allo-HSCT using stem cells derived from peripheral blood, bone marrow or both at the Institute of Hematology, Chinese Academy of Medical Sciences (IHCAMS) between 2012 and 2021 (hereafter referred to as the 'aGOAT' (aGVHD Onset Anticipation Tianjin) dataset).
aGOAT contained data from 584 adult and 45 pediatric cases, and 16% of the adult cohort and 24% of the pediatric cohort suffered from severe aGVHD (Supplementary Table 1). There was a substantial difference in overall survival between the severe aGVHD cases and the other patients in the adult cohort (Fig. 1a). aGOAT encompassed a total of 194 dynamic variables for the adult cohort and 159 dynamic variables for the pediatric cohort (Supplementary Table 2). The dynamic variables were not measured uniformly across all the patients ( Fig. 1b and Supplementary Table 3). Fifteen peri-transplantation variables were also included in aGOAT (Supplementary Table 4). We have devised a dynamic probabilistic model-'daGOAT' (dynamic aGVHD Onset Anticipation Tianjin)-that integrates multidimensional time-series data to calculate the risk for severe aGVHD. Our model updates the risk score φ i (t) for developing severe aGVHD between t + 1 and t + δ according to

Dynamic forecasting of severe acute graftversus-host disease after transplantation
where ρ(z i ) and θ k (x ik (τ), τ) define the contribution of all peritransplantation features z i and the contribution of the individual dynamic variable x ik (τ), respectively, to the relative risk of the ith patient developing severe aGVHD between t + 1 and t + δ. The motivation behind equation (1) was to first highlight shortstretch temporal patterns that were suggestive of prodromes for severe aGVHD, with δ being the length of the moving time window (we set δ = 14). The higher the φ i (t), the likelier there was a prodrome between τ = t − δ + 1 and τ = t. Considering the evolving nature of immune reconstitution after HSCT, we assumed the development of severe aGVHD was not a stationary process and defined θ k (·) to be a function of time. In other words, the same value for the same clinical feature might have different implications at different times after transplantation. The final risk score on each day was highly dependent on all available clinical information and computed scores in previous days, so, in this sense, daGOAT is an adaptive algorithm.
In the adult cohort, plasma biomarker data at days 6-8 were available for 67 patients. The MAGIC score achieved AUROCs of 0.86 and 0.49 for six-month NRM and severe aGVHD, respectively, and the Ann Arbor score achieved AUROCs of 0.59 and 0.50 for six-month NRM and severe aGVHD, respectively (Fig. 1c). The pediatric cohort's biomarker data were too small in size to test the biomarker models.
To evaluate daGOAT, PeriHSCT-NB, PeriHSCT-RF and XGBoost, the patients who received HSCT before (excluding) 1 December 2020 were designated as training sets and the rest of the patients as test sets. Internal validation within the training sets (a series of temporal splitting within the training sets) and holdout validation on the test sets were then conducted.
holdout validation for the pediatric cases (Extended Data Fig. 4a and Supplementary Fig. 3). For the pediatric cohort, on day 10, HRs between patients classified as high-risk (risk score top 1/6) and lowrisk (risk score bottom 5/6) identified by daGOAT were 4.47 (95% CI, 1.14-17.50) and incalculable (due to the small sample size) in internal and holdout validations, respectively, in the ensuing twoweek window (Extended Data Figs. 3d and 4d).
We also investigated the differential contributions of individual dynamic features to aGVHD prediction. The importance score of a feature at a given time point was calculated as the decremental change of AUPRC (based on internal validation within the training set) at that time point if the feature was entirely ignored from day 1 through day 100. Interestingly, the importance scores of many features varied smoothly with time (that is, scores at neighboring time points were similar to one another; Fig. 2a,b). We conducted an ablation experiment with daGOAT by removing its smoothing component and then testing the truncated version of the model. As expected, daGOAT without smoothing performed worse in both the adult and pediatric cohorts (Fig. 2c).
We ranked all the dynamic features according to their maximum importance scores during days 8-30 (Fig. 2a,b and Supplementary Table 5). Spearman's rank correlation coefficients between feature importance and data density were 0.51 and 0.72 in the adult and pediatric cohorts, respectively. Although data densities were clearly influential in the rankings of the dynamic features, some top-ranked features nonetheless had lower data densities (especially in the adult cohort).
The ability of daGOAT to predict severe aGVHD depended on leveraging the multidimensionality of data (Fig. 2d,e). In the training sets, performance metrics peaked when ~20% top-ranked variables were used in the model. In holdout validations, however, it took at least 80% and 50% of the dynamic features for model performance to approach saturation in the adult and pediatric cohorts, respectively.
To explore plausible explanations for daGOAT's advantage over benchmarks in predicting severe aGVHD, we conducted simulation experiments in which three data characteristic parameters were manipulated systematically. The first was the complexity of the underlying process. A more complex process had a higher number of effector features that each had independent association with event onset. On the other hand, when the underlying process was more simple, most observed features were dummies and made no contribution to relative risk. The second parameter was the smoothness of the underlying process. When the underlying process was 'smooth' , each feature's contribution to relative risk could change over time, but there were no rapid up-and-down swings. The third parameter was the data missing rate. Data missing was expected to make model fitting more difficult.
Examining the simulation results, we found that daGOAT outperformed XGBoost when most of the observed variables were associated with event onsets, when the underlying event-generating process was smooth and when there was much data missing (Extended Data Fig. 5a).
Although this study focused on HSCT, we also tested whether our proposed approach could be generalized to a remotely unrelated scenario of dynamic event forecasting using multivariate time-series. More specifically, we asked whether waist-mounted smartphone inertial sensor data could be utilized by daGOAT to anticipate a person's postural change, that is, to predict if a sitting person was about to stand up. A publicly available smartphone inertial sensor dataset 14 was downloaded from the UCI Machine Learning Repository. The dataset contained time-series data (Δt = 1.28 s) of 30 human subjects that covered 561 features. Each discrete time point was associated with a label that indicated whether the person was sitting, standing up (transitioning from sitting to standing), standing or performing another activity at that moment. There was no missing value. Although we did not have direct insights into the underlying neurobehavioral process, we postulated that the relationship between prodromic subtle motions and human postural change was probably smooth.
Four models-daGOAT, Naïve Bayes, Random Forest and XGBoost-were tested on the smartphone inertial sensor dataset. For daGOAT, a '+' data segment would be akin to a 'severe aGVHD case' , and its associated 561-feature time-series would be its 'presymptomatic clinical data' . daGOAT's AUPRC and AUROC peaked at 0.29 and 0.73 (Extended Data Fig. 5b-e), respectively, ≥2.56 s before the observed postural transition, outperforming Naïve Bayes, Random Forest and XGBoost.
Although we caution that our simulation experiments and smartphone data analysis were far from encompassing all possible real-world scenarios, they nonetheless served as a conduit to understanding the possible mechanisms behind daGOAT's comparative advantage in severe aGVHD prediction.
In contrast to modeling in HSCT, machine learning research on dynamic risk monitoring based on high-density multidimensional time-series data has been particularly active in intensive care in recent years. Instead of banking on a small set of biomarkers, researchers have taken a holistic approach that considers time-series of a high number of features to forecast shock 15,16 and to make artificial intelligence-based recommendations for sepsis treatment 17 . Homogeneous ultrahigh data density in intensive care units is nevertheless an outlier situation. As of today, 'spotty data' (inadequate data densities) remain the norm in most real-world healthcare settings (outside of clinical trials), and this includes HSCT. Machine learning research in HSCT is furthermore hampered by smaller sample sizes.
When most of the observed features independently contribute to relative risk, there is little benefit for a model to distinguish between true effectors and dummies. Accordingly, daGOAT does not conduct any variable selection, whereas-despite the small sample size and the time-varying nature of feature contributions to relative risk-on each day XGBoost would have to pick a new set of key variables to grow trees. The comparatively good performance of our modeling approach suggests that it is feasible to predict severe aGVHD cost-effectively when taking a panoramic and  Supplementary Table 5. The average data density of each dynamic feature was calculated by dividing its total data volume (that is, the total number of available values after 'time-limited sample-and-hold' data imputations) by the total number of patients and by the total number of days (30 days). Right panels: orange bars, maximum importance score during days 8-30; black lines, average data density during days 1-30. c, Ablation experiment with daGOAT. Removing the smoothing component of the algorithm hurt daGOAT's performance at the peak performance days (adults, day 23; children, day 10). There was only one run (comparing 'without smoothing' and 'with smoothing') in each scenario, and the standard error was incalculable. Striped bars, AUPRCs; solid bars, AUROCs; brown, internal validation; green, holdout validation. d,e, Relationships between the number of top-ranked features included in the model and the model's performance metrics (AUPRC and AUROC) on the peak performance days (adults, day 23; children, day 10) in internal and holdout validations for the adult (d) and pediatric (e) cohorts. Crosses, AUPRCs; circles, AUROCs; brown, internal validation; green, holdout validation.  For deployment in clinical settings, the daGOAT model must be integrated into the hospital information system. On any given day we have approximately 100 patients who have recently undergone HSCT and are still hospitalized at the IHCAMS; these are the patients whose dynamic clinical data need to be updated daily. Our semi-automatic data process takes less than 30 min to extract the 100 patients' newly collected data on the latest day from the electronic health records and subsequently append the new incoming data to the data accumulated in previous days. daGOAT is fast to compute. On average, computing φ i (t) for 100 consecutive days for one patient takes ~0.5 s. Model fitting is also reasonably fast. Fitting daGOAT on our adult training set, for example, took less than 1 min using a typical desktop computer. In summary, daGOAT is easy to implement, provided that the hospital information system is sufficiently 'modern' .
Regrettably, this study was limited to data from one hematological center in China, and additional validation at other hospitals will be needed. The ultimate litmus test of our model would be testing whether we can reduce early mortality after transplantation by applying the model prospectively to administer intensified prophylactic immunosuppression to a targeted subset of allo-HSCT patients who are predicted to have high risk for developing severe aGVHD.

Methods
The aGOAT dataset. We focused on modeling severe aGVHD in HLA-mismatched allo-HSCT, because HLA mismatch is the most important factor associated with aGVHD 18 . It should be noted, however, that 97% of the HLA-mismatched adult and 100% of the HLA-mismatched pediatric instances were haploidentical in the final dataset (Supplementary Table 1). In the following, we describe, in detail, the process of compiling the aGOAT dataset ( Supplementary Fig. 1).
Post-transplant multidimensional time-series clinical data of 598 adult patients (age >16 years) who received HLA-mismatched allo-HSCT with stem cells sourced from peripheral blood, bone marrow or both between 1 April 2012 and 30 April 2021 and 54 pediatric patients (age ≤16 years) who received HLA-mismatched allo-HSCT with stem cells sourced from peripheral blood, bone marrow or both between 1 April 2018 and 31 March 2021 at the IHCAMS were able to be electronically retrieved and curated.
The medical records for each case were reviewed by two or three physicians to confirm the aGVHD diagnosis and grading (according to the MAGIC criteria 19 ). To avoid ambiguity, onset of aGVHD was uniformly defined as the day of initiating aGVHD treatment. After the physicians' review, 16 cases (ten adults and six children) were eliminated due to failure of neutrophil engraftment within 30 days of transplantation. An additional seven cases (four adults and three children) were eliminated because the recorded date of neutrophil engraftment (defined as the date of the first of three consecutive measurements spanning ≥3 days of achieving a sustained peripheral blood neutrophil count of >500 × 10 6 l −1 ) did not precede the recorded onset of aGVHD.
The final dataset contained 584 adult cases and 45 pediatric cases. The adult and pediatric cohorts had substantially different baseline distributions in age, primary diseases, stem cell sources, conditioning regimens and aGVHD prophylaxis regimens (Supplementary Table 1). Because the adult cohort and the pediatric cohort were treated at different divisions of the IHCAMS, our modeling efforts on the two cohorts were two independent validations of the daGOAT algorithm.
Sixteen percent of the adult cohort and 24% of the pediatric cohort suffered from severe aGVHD within 100 days. Moreover, the severe aGVHD instances in the pediatric cohort tended to experience onset much earlier than those in the adult cohort ( Fig. 1b and Supplementary Table 1). There was a substantial difference in three-year all-cause mortality between the patients with severe aGVHD and the other patients in the adult cohort (HR 3.30 (95% CI, 2.28-4.79); P < 0.001, log-rank test) (Fig. 1a), and a similar trend also appeared to exist in the pediatric cohort, although it did not reach statistical significance (HR 4.40 (95% CI, 0.58-33.47); P = 0.120, log-rank test) (Supplementary Fig. 2).
aGOAT encompassed a total of 194 dynamic variables for the adult cohort and 159 dynamic variables for the pediatric cohort, collected during the first 100 days after transplantation (Supplementary Table 2), including vital signs, daily fluid loss (due to diarrhea, vomiting and so on), complete blood counts, blood chemistry and electrolytes, peripheral blood/bone marrow immune cell profiles (measured by flow cytometry), plasma inflammatory factor levels and so on. The dynamic variables were not measured uniformly across all patients. Some dynamic variables such as vital signs were available nearly daily, whereas others such as blood immune cell profiles and plasma inflammatory factor levels were measured less frequently and not in all patients. In addition, 15 peri-transplantation variables were also included in aGOAT (Supplementary Table 4), including information related to primary disease, blood type, stem cell source, HLA mismatch, conditioning regimen before transplantation, use of antithymocyte globulin in conditioning, aGVHD prophylaxis regimen, transplantation year and so on.
Outlier values in vital signs (for example, exorbitant values for body temperature) were made blank. Whenever a dynamic variable was measured more than once (distinct samples) on one particular day for one patient, the average measurement value of that day was used for that day for that patient. We applied the 'time-limited sample-and-hold' approach commonly used in intensive care unit data analysis 17 to augment the aGOAT dataset (holding time set to three days after sampling), based on the hypothesis that most measurements were valid for three additional days. This augmented dataset was still very sparse in multiple categories of dynamic variables (Fig. 1b and Supplementary Table 3). No other missing-data imputation procedure was conducted to address the problem of nonuniform data measurement.
Validation of the MAGIC score and the Ann Arbor score. The MAGIC score was calculated as −11.263 + 1.844(log(ST2)) + 0.577(log(REG3α)). The Ann Arbor score was calculated as −9.169 + 0.598(log(TNFR1)) − 0.028(log(REG3α)) + 0.189(log(ST2)). All the used coefficient values were identical to those reported in the original reports 8,9 . The original reports did not specify the units for plasma biomarker measurements, and the two scores used different bases for the logarithm (MAGIC, 10; Ann Arbor, 2). None of these, however, affected the AUROC and AUPRC calculations.
daGOAT model. We designed the daGOAT algorithm with the motivation to leverage one presumed nature of post-HSCT time-series data: the underlying biological process for aGVHD onset is multidimensional and smooth with respect to time. By explicitly taking the temporal order of data into account, daGOAT borrows strengths from neighboring time points. Even if a feature is missing a value on one particular day, the model might still learn its contribution to relative risk on that day by interpolating between neighboring time points.
Our model integrates multidimensional time-series data to calculate risk for severe aGVHD onset between t + 1 and t + δ according to where ρ(z i ) and θ k (x ik (τ), τ) define the contribution of all peri-transplantation features z i and the contribution of the individual dynamic variable x ik (τ), respectively, to the relative risk of patient i developing severe aGVHD between t + 1 and t + δ. I ikτ = 0 when x ik (τ) lacks value for the ith patient, and I ikτ = 1 otherwise. Although equation (1) does not presume the size of the time step, updating φ i (t) daily was deemed sufficient for severe aGVHD prediction. We fitted daGOAT as follows. First, ρ(z i ) was set to be the log-odds ratio computed by the standard Naïve Bayes algorithm. Second, for every k and t, we computed the cutoff value c kt that maximized Shannon's mutual information between the kth dynamic variable at time t and severe aGVHD occurrence, then we set l k and u k to be the 25th and 75th percentile values among c k1 , …, c kT , respectively. This step computed the optimal cutoff values {l k , u k } to discretize the kth dynamic variable. Third, for every k and t we computed ρ (L) 1kt = P(x k (t) < l k |Severe aGVHD onset between t + 1 and t + δ), ρ (H) 1kt = P(x k (t) > u k |Severe aGVHD onset between t + 1 and t + δ), ρ (L) 0kt = P(x k (t) < l k |No severe aGVHD onset between t + 1 and t + δ), and ρ (H) 0kt = P(x k (t) > u k |No severe aGVHD onset between t + 1 and t + δ), 0k (t) and ρ (H) 0k (t) as 'smoothed' versions of ρ (L) 1kt , ρ (H) 1kt , ρ (L) 0kt and ρ (H) 0kt , respectively, by smoothing-spline fitting (smooth with respect to t). This step computed the discretized probability distribution of the kth dynamic variable that was smooth along the time axis. Note that we did not conduct interpolations on the raw data. True distributions of feature values in normal, prodromic and diseased states are unknown, and it is unclear how to best perform interpolations on the raw data for a wide range of variables expressed in various units (for example, should we interpolate values in the original scale or in the log scale?). Instead, our model was distribution-agnostic and performed interpolation on the estimated relative risk contribution terms, that is, in the probability space.
Data generation for simulation experiments. We generated multidimensional time-series data using a simplified version of equation (1): where φ i is the relative risk of the event of interest happening to individual i, p = 50 and T = 14. To further simplify the simulations (without hurting generalizability), we assumed each x ik (t) term had only two possible values: low and high. First, for each individual i and feature k, a smooth time-series x ik (t) was generated by a random walk that meandered between low and high (transition rates: 'low→low' or 'high→high' , 0.7; 'low→high' or 'high→low' , 0.3).
Second, we simulated values for θ k (x, t), the underlying process for event generation. When the underlying process was designated to be more complex (that is, a high percentage of observed variables were indeed correlated with event onset), we set θ k (x, t) = 0 for k ∈ [46, 50]. On the other hand, when the underlying process was designated to be simple (that is, most observed variables were dummies), we set θ k (x, t) = 0 for k ∈ [6,50]. For all the other k values, when the underlying process was designated to be smooth, we first generated a smooth timeseries α k (t) through a random walk with multiplicative Gaussian steps (mean = 1; s.d. = 0.05) starting from α k (1) = 1. On the other hand, when the underlying process was designated to be not smooth, α k (t) at each t was independently drawn from a uniform distribution between 0 and 1. The values of α k (t) were then linearly rescaled so that min t∈ [1, T] α k (t) = 0.3 and max t∈ [1, T] α k (t) = 0.7; then we defined θ k (high, t) = log ( and θ k (low, t) = −θ k (high, t).
Third, the generated values x ik (t) and θ k (x, t) were then plugged into equation (2) to calculate φ i . For each i, draw a random number γ from a uniform distribution between 0 and 1. If γ < φ i , the event of interest happened to individual i; otherwise, the event did not happen to individual i.
Finally, when the data missing rate was designated to be high, 60% of the x ik (t) terms were randomly marked as missing; otherwise, there was no data missing.
For each combination of data characteristic parameters, the simulation was repeated 150 times. Each run was conducted with n = 1,000 (80:20 random split for holdout validation) and a constant 15% percentage of positive cases within the simulated sample (a larger cohort of individuals was first generated and then randomly downsampled to the desired size).
The smartphone-based recognition of human activities and postural transitions dataset. The waist-mounted smartphone inertial sensor dataset 14 was downloaded from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/ datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+ Transitions). Before download, the smartphone dataset had already been randomly partitioned into two subsets, with 70% of the subjects designated as the training set and the rest as the test set. From this dataset we extracted all 8.96-s time-series segments in which a person was continuously sitting and then classified each segment according to whether the person stood up within the next 5.12 s ('+') or not ('−'). The training set contained 789 data segments ('+' , n = 92; '−' , n = 697), and the test set contained 292 segments ('+' , n = 40; '−' , n = 252).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this Article.

Data availability
Although there was no genetic polymorphism, gene expression or protein sequence data involved in this study, sharing of substantial clinical data generated from China's human genetic resources needs to abide by the Regulations of the People's Republic of China (PRC) on the Administration of Human Genetic Resources. A 'minimum dataset' for severe aGVHD that would be necessary to verify the research in this Article would include all dynamic features (from day 1 through day 100 post-HSCT) listed in Supplementary Table 2, all peri-transplantation  features listed in Supplementary Table 4, presence/absence of severe aGVHD within 100 days, and onset dates of severe aGVHD. On 21 February 2022, the PRC Human Genetic Resources Administration Office approved the compilation of a desensitized version of the 'minimum aGOAT dataset' for facilitating international research collaborations (Reference No. CJ0272 (2022)). At the time of the publication of the manuscript, the authors' application to deposit this desensitized dataset at the PRC National Genomics Data Center (NGDC) database was still under review. A mock-up dataset that can be used for demo runs of the daGOAT algorithm is available in a public Zenodo repository (https://doi.org/10.5281/ zenodo.6050675) 20 . After publication of this work, the Chinese government has approved the archiving of the aGOAT dataset at the PRC National Genomics Data Center (NGDC) in April, 2022 (ref. no. 2022BAT1224). Accordingly, the authors have made the dataset publicly accessible at the NGDC website (https://ngdc. cncb.ac.cn/omix/release/OMIX001095/). The waist-mounted smartphone inertial sensor dataset is available from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+ Human+Activities+and+Postural+Transitions). Source data for Figs. 1 and 2 and Extended Data Figs. 1-5 are provided with this paper.
Corresponding author(s): Junren Chen, Erlie Jiang, Xiaofan Zhu Last updated by author(s): Feb 12, 2022 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection All data used in this study were retrieved from the electronic health record system at the Institute of Hematology, Chinese Academy of Medical Sciences (IHCAMS). Custom scripts used for accessing the hospital's internal database and for curating the retrieved data have to remain confidential. For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability While there was no genetic polymorphism, gene expression, or protein sequence data involved in this study, sharing of substantial clinical data generated from China's human genetic resources needs to abide by the Regulations of the People's Republic of China (PRC) on the Administration of Human Genetic Resources. A 'minimum dataset' for severe aGVHD that would be necessary to verify the research in this article would include all dynamic features (from day 1 through day 100 post-HSCT) listed in Supplementary